Abstract

In this paper, the K-nearest neighbor algorithm and the convolutional neural network will be used to train the handwritten digit recognition model, respectively. To establish a reasonable model structure, and through the training data, the model can learn to reflect ten different handwritten number features and finally give the probability of predicting number corresponding to the likelihood of each number. Taking the learning process of the handwritten numeral recognition algorithm based on deep learning as a clue, from deep learning to convolutional neural network, from simple to deep, the relevant basic concepts, model construction, and training process of deep learning are learned and understood. Finally, the deep learning framework uses MNIST as the training dataset to train a model with high recognition rate and then combines it with Open CV technology to realize the identification of handwritten numbers. A reasonable model structure is used to accurately identify the handwritten numbers in the test set. The neural network of deep learning is established with TensorFlow to realize the classification and recognition of handwritten numbers. Various deep learning methods such as CNN and KNN are learned and compared to complete the construction of deep learning architecture. The MNIST dataset was preprocessed, features extracted, and identified. The program is to complete the training of neural network and the recognition of numbers in the image, the recognition results of deep learning methods used are counted and analyzed, and the recognition rates of two different methods are compared to find ways to optimize these methods and improve the recognition rate.

1. Introduction

1.1. Research Background

With the rapid development of the current era and the great progress of the society, the machine gradually began to liberate people from the heavy work. With the progress of science and technology, people can no longer be satisfied with the artificial operation of machines and gradually move forward to learn the capable of artificial intelligence. Now, artificial intelligence is getting closer to people’s lives. Machine learning, as the core of artificial intelligence technology, is a fundamental way to make computers intelligent. Deep learning is a new key technology and research direction in the field of machine learning, with high research price and application value. Deep learning [1, 2] enables machines to simulate human audio-visual, thinking, and other activities; solves many complex pattern recognition problems; and makes great progress in artificial intelligence-related technologies [35].

1.2. Research Meaning

With the development of computer technology and the advancement of information wave, how to input the massive digital information on paper into the computer has become a major research hotspot. For example, the manual input of bank bills, invoices, checks, tax bills, and other bills, often need to manually deal with a lot of information, will inevitably make mistakes, and there may be high labor cost, low efficiency, large workload, and other problems. The computer automatic identification input instead of manual input not only can complete the task in a high-precision way but also can liberate the relevant staff so that the workload can be greatly reduced. According to the different ways of digital sources, the current digital recognition problems can be divided into handwritten digital recognition, printed digital recognition, optical digital identification, and natural scene digital recognition, which has great practical value. For example, handwritten digit recognition can be applied to the recognition of bank money order numbers, greatly reducing labor costs. Print digital identification can be applied to the automatic identification of postal codes. Optical digital recognition and natural scene digital recognition can be applied to license plate number identification in vehicle detection. Thus, it can be seen that the handwritten digital recognition technology has considerable application prospect and value. How to apply deep learning algorithms to the recognition of handwritten digits is a more popular research.

1.3. Domestic and Foreign Research

In recent years, the popularity of artificial intelligence has been greatly increased, and people's requirements for machine vision have become more and more demanding. Researchers around the world have devoted themselves to the research of handwritten digital recognition and made many achievements in this field.

1.3.1. Foreign Research

Liang uses 10 structural features, such as profile features, self-structure features, and curvature, combined with eight classifiers on the test set of CENPARMI, CEDAR, and MNIST databases and achieves a test identification rate of 99.58%. However, the calculation and storage costs of this method are high [69]. Guangbin et al applied existing biological vision to build a handwritten digital recognition model, which extracted linear separable features and reduced the error rate to 0.59% [10] in the MNIST training set. Guo et al proposed a method for integrating statistical and structural information on unconstrained handwritten digit recognition. The method improves the modeling of state time in conventional HMM by using state duration adaptive transition probability, using the macro states overcoming the difficulty [11] of HMM modeling pattern structure. There is a great improvement in speed and accuracy [12].

1.3.2. Domestic Research

Liu Gang and Zhang Honggang used the BP neural network based on handwritten digital recognition system designed by visual C  +   +  6.0 to verify the feasibility of BP neural network for handwritten digital recognition, with a good recognition rate of [13]. Wang proposed a new method, combining PCA (principal component analysis) and CNN method, and conducted experiments on the SVHN dataset, trying to improve the recognition rate of characters in natural scenes. Geng et al constructed and realized the handwritten digital recognition model based on Hopfield neural network, whose error identification rate and accuracy rate are more ideal [14, 15] than the identification method of BP network. However, due to the dependence of research on test samples, test images need to be similar to training images. With the gradual expansion of the scale of data collection, the requirements of test pictures and training pictures are gradually reduced, but the requirements of writing regularity are increased. In short, in this era, handwritten numeral recognition applications can replace manual writing in occasions with a high degree of standardization, such as bank checks [16].

1.4. Handwriting Number Identification Difficulties

Similar numeric distinction: However, only ten numbers are used, and the strokes are very simple, but different numbers can be written in the book, and there are significant regional characteristics. The writing method of a number is different, and people from different places are also different, so it is very difficult to create a universal high recognition digital recognition model [68].

The data are not large enough: seven billion people around the world in Arabic numbers, everyone has different writing habits, and existing any kind of dataset in the world cannot completely include everyone style handwritten digital pictures. Arabic digital user grow much faster than the dataset content, so the data are not big enough to handle this problem, which will become more obvious in the future [911].

1.5. Application of the Handwritten Digital Recognition System

Occasion 1: in the schools, students and teachers are senior intellectuals, writing Arabic numerals is very standard, and the handwritten number recognition system applied in the test paper results in summary can greatly reduce the workload of teachers [12].

Occasion 2: in the government, the government staff are high cultural literacy, writing Arabic numerals is very standard, and in the absence of handwritten digital recognition technology, the government office form is by artificial input, but this requires huge human input, and the emergence of handwritten digital recognition technology will liberate civil servants and improve the efficiency of government workers [17].

Occasion 3: in the banks, bank staff to deal with many checks every day, bills and forms of handwritten digital information, and long processing monotonous handwritten numbers will make bank workers to produce visual fatigue and greatly affect their work efficiency. The use of handwritten digital recognition technology will greatly reduce manpower and improve the efficiency and accuracy of bank workers [18].

2. Foundation and Model of Deep Learning

2.1. Introduction to Basic Principles
2.1.1. The Concept of Deep Learning

The concept of deep learning originated from the study of artificial neural networks. With the study of artificial neural networks, neural networks with multiple hidden layers have gradually entered people’s vision. The deep learning based on the neural network training model with single output, multi hidden layers and single output structure has begun to attract scholars’ attention [1924]. The concept of deep learning was proposed by Hinton et al. in 2006. Its simplest deep learning model is shown in Figure 1.

In the above formula, x1, x2, and x3 represent the input; output represents the output; and ,, and represent weights when delivered in the neural network, indicating the extent to which the input affects the output, which is more important. The larger is (even over 1), the less important the input is, and the closer tends to 0. In general, a typical deep learning model refers to a neural network with multihidden layers, representing more than three hidden layers, and deep learning models usually have eight or nine or more hidden layers. With more hidden layers, the corresponding neuronal connection weight parameters are more [25], [26]. This means that the deep learning model can automatically transfer many complex features, and the number of hidden layers can be transferred deeper on the neural network. A rigorous and mature neural network can realize complex functions and even realize mechanical intelligence, namely, artificial intelligence. The deep learning model diagram of the multihidden layer neural network is shown in Figure 2.

2.1.2. Deep Learning Algorithms—Classification of Neural Networks

In addition to the learning method, the deep learning algorithm is classified into networks containing only the encoder parts, networks containing only the decoder [6]parts, and networks with both the encoder and decoder parts. According to the application mode of technical structure, it is divided into differentiated depth structure, generative depth structure, and mixed structure. It can also be divided into a mentor network and a no-mentor network.

2.1.3. Sample Dataset for the Trained Neural Networks—MNIST Dataset

Flow chart of handwritten digital image recognition based on deep learning is shown in Figure 3.

In this graduation design, MNIST dataset is a widely used in the field of handwritten digital dataset because of small memory and easy to become the current learning handwritten number recognition students, so we use MNIST database for the experimental sample set to train the research method of the graduation design. The MNIST database serves as a standardized dataset with Arabic digital images of 0 to 9, all of which have been normalized and are aggregated into images of the same size and place numbers in the center. A uniform size of grayscale images is 2828, where the pixels range from 0 to 255. The representation of the data is expressed in the form of a vector, and in the TensorFlow, the pixel values of each pixel point can be viewed by the print corresponding array. The MNIST dataset contains 70000 handwritten digital images, of which 60000 handwritten digital images are training sample sets and 10000 handwritten digital images are test sample sets.

Some images of the MNIST dataset are as shown in Figure 4.

The dataset stores common handwritten numbers, including the value of 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 and the Arabic numerals of 10. For each number, it contains a wide variety of strange forms. A lot of numbers are not the normal standard form of writing, so many images in the MNIST dataset are hard to identify, for example, the Arabic numeral “9,” as shown in Figure 5 and 6. We can still see that the morphological difference of the same number in the entire MNIST database is quite large. It is precisely because there are so many different handwritten numeral images that MNIST data sets can have good recognition accuracy no matter how strange handwritten numerals are. However, this has also become a disadvantage of this recognition system and over-reliance on the big data of the dataset; once the picture is separated from the dataset, the accuracy is difficult to guarantee.

2.1.4. Working Framework for Deep Learning

With the gradual popularity of deep learning, scholars and research staff at home and abroad have developed a lot of deep learning work environment, such as Caffe, Torch, Theano, and TensorFlow; Caffe used to more applications in the field of image recognition, Torch and Theano used in programming and import process is slow, and TensorFlow used as the software, which become the most widely used deep learning framework. A large number of tedious handwritten data are transmitted into the artificial intelligence neural network, using the built neural network for research and processing, and finally get the output we want. TensorFlow was developed by Google in 2015 and attracted the attention of scholars all over the world once it was proposed. Just four years later, TensorFlow has become the most popular research and development software in the current era, and almost every deep learning lover is using TensorFlow.

Installation of the working frame

Step 1. Install the anaconda environment, log in to the anaconda official website, download the corresponding version of the anaconda installation package, and install it normally according to the prompts.

Step 2. Enter command line mode for the Windows system, enter [conda-version] to verify whether anaconda is successfully installed.

Step 3. Activate the TensorFlow environment and enter [pip install-upgrade--ignore-installed TensorFlow] in the command-line mode of the Windows system.

Step 4. In the TensorFlow environment, enter [python] and continue entering [import TensorFlow as tf]. If there is no error code reported, then the TensorFlow module is called successfully, which means that the installation is successful.

2.2. Handwritten Digital Image Recognition Based on the Convolutional Neural Network
2.2.1. Identification Principles of Convolutional Neural Networks

Convolutional neural network (CNN), as a class of feed-forward neural network with convolutional computing and deep structure, performs well in identifying handwritten numbers. The visual file convolution neural network includes convolution operation, pool operation (also known as down sampling) and full connection operation, the processed handwritten digital image, and the lenet-5 model in the handwritten digital feature sequence in the picture.

Convolution operation (Figure 7):

Green indicates the original image element value, red indicates the parameters in the convolution core, and yellow indicates the convolution core sliding on the original image. The right graph represents the feature map generated after the convolution operation. The results were calculated as the sum of each primary pixel value and the parameters in the convolutional kernel.

Pooling (subsampling) (Figure 8):

One pixel for pooling replaces a number of adjacent pixels on the original image, squeezing its size while retaining the feature map features. The effect of pooling can prevent the data explosion and save the operation amount and operation time, and it can be used to prevent overfitting and overlearning.

Full connection (Figure 9):

The final result is made based on the output of the full connection. There are generally two full connection layers for handwritten digit recognition.

Activation function (Figure 10, and 11):

Linear function is as follows:

Ramp function is as follows:

Threshold function is as follows:

Type-S function is as follows:

Bipolar S-type function is as follows:

ReLU function is as follows:

Considering the above algorithm and according to the LeNet-5 model, a complete CNN-based handwritten digit identification workflow is shown in Figure 12.(i)Step 1. Handle write font picture conversion into a pixel matrix.(ii)Step 2. The first layer of convolution of the pixel matrix is to generate six feature maps.(iii)Step 3. Subsample each feature map to reduce the amount of data while retaining the feature maps. Six small graphs are generated, which look similar to the respective feature map of the previous layer, but are reduced in size.(iv)Step 4. The second convolution of six small graphs is to generate more feature map s.(v)Step 5. Subsample the feature map is generated by the second convolution.(vi)Step 6. The first layer is a full connection layer.(vii)Step 7. The second layer is a full connection layer.(viii)Step 8. Gaussian connection layer to generate output results.

The CNN internal processing formula is as follows:(i)Part 1: convolution formula(ii)In formula (8), represents the element covered by j l-1 layer feature graph convolution; is the element in the l layer convolutional kernel; refers to the offset; is the region covered by the jth convolution core; represents the activation function.(iii)Part 2: pool formula(iv)In formula (9), is the output obtained after downsampling of l-1 layer image block; is the output of l-1 layer and the input element of l layer.(v)Part 3: full connection formula

After convolution and pooling, the image output advanced features are weighted by the fully connected layer, and the final output is obtained through the activation function. For example, is the output feature diagram of the previous layer and is the weight coefficient of the fully connected layer.

2.2.2. Handwritten Digital Image Recognition Based on KNN

One of the simplest methods in the K-nearest neighbor classification algorithm (KNN), which is a data mining classification technology, is a theoretically mature method. The term K-nearest neighbor means that each sample can be represented by its nearest k neighbors. The core idea of the KNN algorithm is that if a sample has the k most adjacent samples in the feature space belong to a certain category, the sample also belongs to this category and has the characteristics of samples on this category. KNN is a commonly used opponent to write numbers to identify classification. A simple version of the algorithm is easily implemented by calculating the distance from the test examples to all stored examples, but it is massive for large training sets. Even for large datasets, the KNN is computationally tractable by using an approximate nearest neighbor search algorithm. Many more recent neighbor search algorithms have been proposed over the years. These are often designed to reduce the number of distance assessments actually performed. Euclidean distance, Manhattan distance, Minkowski distance, and cosine distance are as follows:

Part 1: Euclidean distance is as follows:

The Euclidean distance was taken as a distance measure, but this applies only for continuous variables

Distance of the n-dimensional space is as follows:

Part 2: Manhattan distance is

Part 3: the Minkowski distance

For two points x and y in n-dimensional space, the Minkowski distance between two points x and y is as follows:where p represents the dimension of the space, the Manhattan distance when p = 1, the Euclidean distance when p = 2, and the Chebyshev distance when p tends to infinity. Then, the Chebyshev distance between two points is the maximum of the absolute difference in the coordinate values between the two points

3. Design and Implementation

3.1. Install

OpenCV is a library file established by Intel in 1999. With the development of recent years, OpenCV has developed into an open source cross-platform computer vision library, with very good compatibility and perfectly compatible with Linux, Windows, and Mac OS operating systems. The OpenCV provides a transparent interface for the Intel. There is a special optimized processor APP library where OpenCV automatically loads some database during startup. After the anaconda environment is installed, the configured Anaconda Navigator has integrated various library files. OpenCV can use python as the programming language. After Anaconda Navigator installs the OpenCV library files, the call of OpenCV can be completed with anaconda built-in software. We open the anaconda built-in software Spyder or Jupyter Notebook (TensorFlow) and enter import cv2 as cv. No error code produced that the OpenCV installed successfully. Figure 13 is the Installed Anaconda Navigator interface diagram.

3.2. Programming Language—Python

The birth of Python language perfectly solves the deficiency of ABC language and also completes the function that ABC language does not have. It can be said that Python has developed from ABC, and countless ABC speakers have turned to the embrace of Python. Today, Python has developed into one of the most popular computer languages in the world. Even in some countries, Python language learning has joined the local primary school students, enough to show the popularity of Python. Using the Python has the following features:(i)Concise and readability(ii)Good scalability(iii)Completely free

3.3. Handwritten Number Recognition System
3.3.1. KNN Handwritten Number Recognition Method

KNN is an instance-based learning, or local approximation and delayed learning, delaying all computations into classification. The K-nearest neighbor algorithm is one of the simplest machine learning algorithms.

3.3.2. CNN Handwritten Number Recognition Method

As a commonly used classical machine learning algorithm, CNN has been proposed and studied since forty years ago. Some experts have proposed the classical CNN architecture, demonstrated the potential of deep structure in feature extraction, and made major breakthroughs in image recognition tasks, setting off a wave of in-depth learning research boom. Convolutional neural network, as an existing deep structure with certain application cases, has also returned to people's vision for further research and application. Practice has proved that the accuracy of CNN network structure in handwriting digital recognition system is ideal and can basically meet the actual needs.

3.4. System Design and Implementation
3.4.1. System Construction

We install anaconda prompt in Anaconda Prompt, use Anaconda to create a virtual environment with Python version 3.6 by entering command line conda create-n TensorFlow python = 3.6, and install CPU version TensorFlow 2.0.0 with command line pip install TensorFlow  2.0.0. We will import all the modules required to train the model, wait for them to download and install the corresponding library, and test the TensorFlow with the following code to verify that the installation of the TensorFlow is successful, as shown in Figure 14.

Computational results.

This means that the TensorFlow environment has been successfully built. Code running diagram is shown in Figure 15.

The programming software used is Spyder, a cross-platform, scientific computing integrated development environment using Python language. Writing code with this editor has many advantages, and it is convenient to import libraries. This is a digital recognition task. Thus, there are 10 numbers (0 to 9) or 10 categories to be predicted. The prediction error is reported using Python.

3.4.2. Sample Selection

The sample is a simple and practical computer vision dataset, MNIST dataset, which contains the image set of handwritten numbers as shown in Figure 16.

Implementation with “mnist = input_data.read_data_sets (‘MNIST_data’,one_hot = True)” code.

A total of 60000 training data images in the MNIST dataset can be used to train the model, and 10000 test data images to test the recognition accuracy, each at 28 28 pixel. Each pixel can be represented by a single grayscale value. The dataset was constructed from a number of scanned document datasets available from the National Institute of Standard Technology (NIST). This is where the dataset names originate, such as the modified NIST or MNIST datasets. Digital images are obtained from various scanned documents, standardized, and centered. This makes it an excellent dataset for evaluating models, allowing developers to focus on machine learning with very little data cleaning or preparation. Each image is a square of 2828 pixel (total of 784 pixels). Standard partitioning of the dataset was used to evaluate and compare the models, where 60,000 images were used to train the model, while a separate set of 10,000 images was used to test the model.

3.4.3. Model Construction

KNN mode

The sample is a simple and practical computer vision dataset, MNIST dataset, which contains the image set of handwritten numbers:(i)Step 1. Load data.(ii)Step 2. Data preprocessing.(iii)Step 3. Calculate the distance between the test data and each training data.(iv)Step 4. Sort it by the increasing relationship of the distance.(v)Step 5. Select the K points with the smallest distance.(vi)Step 6. Determine the frequency of the previous K points.(vii)Step 7. The most frequent category among the first K points was returned as the predictive classification of the test data.

CNN model

The model is constructed based on the basic architecture of the convolutional neural network. The convolutional layer is responsible for extracting features, the sampling layer is responsible for feature selection, and the fully connected layer is responsible for classification.(i)Step 1. Load the data.(ii)Step 2. Data preprocessing: dimension adjustment.(iii)Step 3. Convolutional layer for convolution operation: the convolution kernel is generally initialized in the form of a random decimal matrix, and the convolution kernel will learn reasonable weights.(iv)Step 4. Pooling layer: a single pixel is used to replace the neighboring multiple pixels on the original image to maximum sample the data and retaining features while greatly simplifying the complexity of the model and reducing the parameters of the model.(v)Step 5. Full connection layer: we integrate the distributed features together and output them as a value, greatly reducing the impact of feature location on classification. The fully connected data have its own weight, and the sum of their own weight product is the probability of the original image identification.(vi)Step 6. Start the training, find the error between the output value and the target value when the error is greater than the expected value, send the error back to the network, successively to obtain the error of the full connection layer, low sampling layer, and convolution layer. The error of each layer can be understood as the total error of the network, how much the network should bear; when the cycle iteration reaches the set cycle after the training.(vii)Step 7. Return the prediction results.

This chapter mainly introduces the library file OpenCV of image processing and then introduces Python, a language that can be invoked by OpenCV, and finally returns to the design requirements of this graduation design and describes the design and implementation of TensorFlow for the workflow of KNN and CNN.

4. Training Process and Results

4.1. Deep Learning and Training Process

KNN training process is shown in Figure 17.

KNN model first read test sample data, calculate the test data and the training data, sort the distance from small to large, select the smallest K points, determine the frequency of K point category, select the highest frequency category as prediction classification, and finally output classification results.

CNN training process is shown in Figure 18.

The CNN model first reads the test sample data; extracts the data features, updates the weight, convolutional layer, and pooling layer; and finally reaches the full connection layer. After judging the training cycle, it ends the training and finally outputs the classification results.

Training data for the CNN and KNN models are shown in Table 1.

Due to the low recognition rate of the convolutional and pooling layers, there are lag and crash phenomena due to the computer configuration problem. Therefore, in order to improve the recognition rate, it is decided to add a convolutional layer and a pooling layer into two convolutional layers and pooling layer. The training data are as shown in Table 3.

After adding a convolutional layer and a pooling layer, the program operation speed slowed down significantly. It took 20 minutes for cycle iteration and 30 cycles for cycle iteration for nearly two hours, but the recognition rate was significantly improved.

4.2. Realize the Whole Process
4.2.1. Training and Preserving the Model

After the convolutional neural network framework is built, the keeper saver is defined, and the trained model is saved with saver after the training completion.

Definition:saver = tf.train.Saver()//saver.

saver.save(sess,'/home/XXX/learning_tensorflow/form/model.ckpt')//Save the model/Fill the save address of the model in quotes.

The model is obtained as shown in Figure 19.

4.2.2. Image Preprocessing
(1)Step 1. Select the image(2)Open the handwritten digital image of the computer, as shown in Figure 20.(3)It can be seen from the above that the handwritten digital pictures that can input the convolutional neural network requires that the pixels of these pictures are 28 28, while the pictures to be identified are often greater than 28 28 pixels and do not meet the requirements, so appropriately reduce the images to be processed into the same format as the MINIST dataset.(4)Step 2. Processing methodThe imread function comes with grayscale image reading, and the imread function reads in grayscale image.Counterphase grayscale diagram, reversing the black and white threshold with access pixel and processing pixel by pixel.Using the threshold function, the reverse-phase binarization image is performed.Use the rank scanning method to find out the digital border to determine the specific location of the number.The filled pixels were resized to 2828 pixels and finally processed into the same format as the images in the MNIST dataset, as shown in Figure 21.(5)Step 3. Adjusted model

[saver.restore(sess,”C:/Users/Desktop/demo/model.ckpt”)//Call the trained model].

We import the handwritten digital pictures processed by OpenCV into the neural network model, run the test.py program, and identify the results. All the test pictures are shown in Table 4.

After the analysis, it was found that only the picture “4” in the above table was wrong, and all the others were correct, so that the task of this graduation design was successfully completed.

5. Conclusion

This graduation innovation point includes the following three parts:(i)Part 1: The title of the graduation design is the handwritten digital picture research based on convolutional neural network, but the relatively simple KNN algorithm and CNN algorithm are easy to achieve, highlighting the advanced aspects of convolutional neural network. Learning deep learning from different angles can more intuitively understand the advantages of convolutional neural network in image recognition. Choosing KNN algorithm is to better understand the idea of K-nearest neighbor method. At the same time, several algorithms are also helpful to learn Python programming language.(ii)Part 2: It really realizes the recognition of handwritten digital pictures, not just an accuracy, but the real identification of a written digital picture, the more practical graduation design requirements, can make students interested, and the boring accuracy is not intuitive and too theoretical.(iii)Part 3: Really introduced OpenCV technology, with OpenCV to process pictures so that the measured pictures are not subject to 2828 pixels requirements, and any pictures after OpenCV processing can be imported into the built neural network framework [26].

Data Availability

The dataset can be accessed upon request.

Conflicts of Interest

The authors declared that they have no conflicts of interest.