Research Article

PF-ViT: Parallel and Fast Vision Transformer for Offline Handwritten Chinese Character Recognition

Table 1

A summary table of offline HCCR related work.

Algorithm nameBrief methodologyHighlightsLimitations

MCDNN [30]The model trained eight networks using different datasets, each with four convolutional layers and two fully connected layers.It is the first model to successfully apply CNN to handwritten Chinese character recognition.

R–CNN and ATR-CNN [31]R-CNN consists of relaxation convolution layers whose neurons within a feature map do not share the same convolutional kernel. ATR-CNN further adopts an alternate training strategy, i.e., the weight parameters of a certain layer do not change by the backpropagation algorithm given a training epoch.Relaxation convolution can be considered to enhance the learning ability of the neural network.The replacement of the traditional convolutional layer with a relaxation convolution layer cannot further improve the recognition accuracy.

BP-NN [32]The algorithm is improved by the selection of initial weights, excitation function, error function, and so on.The method improves the speed and accuracy of offline handwritten Chinese character recognition.The convergence speed is too slow, and it is easy to fall into the local minimum point.

HCCR-IncBN [33]This model takes advantage of the sparse connections of the Inception module, performs convolution operations on the same input feature map at multiple scales, and uses 1 × 1 convolution kernels to compress data multiple times, which can increase the depth of the network and ensure that the computing resources are reduced.The model has fewer training parameters, converges faster, and only requires 26 MB of storage space to store the entire model.The recognition accuracy of the model is low.

SqueezeNet [34]The proposed model retains small convolution kernels instead of large ones. In addition, the feature fusion algorithm between layers and the softmax function with L2-norm constraints are used.The model parameters become less, the training becomes faster, and the portability is strong.The accuracy of the model drops.