Abstract

Nowadays, wearable sensors play a vital role in the detection of human motions, innovating an alternate and intuitive form in human–computer interaction (HCI). In this study, we present a novel real-time wearable system for finger air-writing recognition in three-dimensional (3D) space based on the Arduino Nano 33 BLE Sense as an edge device, which can run TensorFlow Lite to realize recognition and classification on the device. This system enables users to have the freedom and flexibility to write characters (10 digits and 26 English lower-case letters) in free space by moving fingers and uses a deep learning algorithm to recognize 36 characters from the motion data captured by inertial measurement units (IMUs) and processed by a microcontroller, which are both embedded in an Arduino Nano 33 BLE Sense. We prepared 63000 air-writing stroke data samples of 35 subjects containing 18 males and 17 females for convolutional neural network (CNN) training and achieved a high recognition accuracy at 97.95%.

1. Introduction

Living in the digital world, almost every aspect of our lives involves HCI, sometimes even without our noticing. For the output of HCI, the most advanced technology would refer to virtual and augmented reality, enabling users to see the results with dedicated glasses [1], rather than displays or screens, which are portable and more convenient. However, as for input, breakthroughs are still being explored.

Conventionally, keyboards or touchscreens are used for accurate input. However, they cannot fit into all situations, such as for those with visual impairment or in the dark. Moreover, they cannot be carried around all the time. The easier accessibility and faster input speed guarantee that writing on paper with a pen is stilled preferred by most people [2]. However, pens and papers are still an extra burden to carry, and real-time digital character input is required, particularly in the HCI environment. Furthermore, converting handwriting papers to a certain digital format costs a great deal of effort and time, especially when handwriting numbers and letters are asked for recognition. To make handwriting input anytime anywhere in a natural and user-friendly way, wearable devices with IMUs that contain accelerometers, gyroscopes, and microcontrollers are employed to collect hand motion data and implement handwriting recognition [3].

With the introduction of low-power microcontrollers equipped with advanced analysis tools, performing complex data analysis using deep learning algorithms on edge devices has been achieved and has greatly improved users’ experience. Edge devices could be a single-board computer (SBC), such as the Raspberry Pi, Arduino Nano 33 BLE Sense, or SparkFun Edge. Most of them are able to implement the deep learning algorithm using TensorFlow Lite, which is a newly developed lightweight framework based on the TensorFlow platform [4], aiming at performing deep learning algorithms (for classification and recognition, etc.) on edge devices without sending data to central stations or the cloud, which is called edge computing [5, 6]. Edge computing costs lower latency and gains more data integrity since the data are computed on the edge devices without being sent to the cloud or local servers. Edge AI or edge deep learning is an indispensable part of edge computing that moves the deep learning models/algorithms from the cloud to the edge devices. Some researchers utilized the Arduino Nano 33 BLE Sense equipped with TensorFlow Lite as a wearable edge device for real-time handwriting recognition, and some of them achieved a high accuracy [2, 5].

In this paper, a novel real-time air-writing character recognition wearable system based on edge computing and deep learning is proposed. Air-writing refers to writing letters and numbers with fingers in 3D space and lacks haptic and visual feedback, mainly relying on users’ writing habits and orientation [1]. In this work, the latest Arduino Nano 33 BLE Sense that combines IMUs and microcontrollers is worn on the users’ index finger to detect dynamic movements and conduct air-writing recognition by using TensorFlow Lite, revealing results immediately after finger motions finish. A five-layer CNN is presented in this paper, which has been converted into the TensorFlow Lite format to run on Arduino Nano 33 BLE Sense. For the finger stroke input, users only need to wear the Arduino Nano 33 BLE Sense on the index finger without carrying extra dedicated devices. The air-writing real-time recognition system is able to recognize 10 digits and 26 lower-case letters and enables users to make finger strokes at their preferred speed without small-space limitations, achieving a high recognition accuracy.

The rest of this article will be expanded from the following sections: Section 2 describes the current research status and related works about finger stroke recognition. Section 3 completes the description of the finger stroke recognition system, including hardware and communication implementation. Section 4 presents the experiment for data acquisition. Section 5 describes data preprocessing and the neural network structure. Section 6 provides a detailed analysis of the results of finger stroke recognition. Section 7 concludes the paper and puts forward the prospect.

Handwriting recognition (HWR) has received wide interest in the research community in recent years due to the increased requirement of recognition applications for improved human-computer interaction, and different approaches have been exploited for HWR.

One approach, based on computer vision, uses cameras and scanners dealing with finger motion detection and finger stroke recognition [7]. To improve recognition accuracy, deep learning algorithms were applied and obtained satisfactory performance [810]. Recent work in the field has used digital devices to capture dynamic writing processes or finger strokes instead of static images [11, 12]. Similar to our work, optical sensors were applied for air-writing recognition, and the test results for English and numeric characters reached an average accuracy more than 90% [13]. However, those methods all shared the limitation that lighting conditions exhibit a great influence on the recognition results.

Meanwhile, motion sensors for HWR have gained some achievements. In this approach, users generally hold or wear motion sensors to capture data for HWR [1]. Shintani et al. introduced a new digital pen for alphabet recognition [2]. The pen utilized data from force sensors and IMUs for recognition but required a surface for writing. Amma et al. introduced a wearable input system for air-writing recognition [14]. The system deployed motion sensors and IMUs that were attached to the back of the hand and liberated users from extra writing surfaces, allowing them to write in free space. Yanay and Shmueli also proposed an air-writing recognition system using smart bands [1]. Chen et al. published two companion articles illustrating some problems in air-writing recognition [15, 16]. Air-writing has attracted researchers’ attention because of its inherent advantages for writing freely. However, research works above were not conducted online or on-device, requiring time and bandwidth to transmit data from motion sensors to central stations for analysis.

After data are collected by optical devices or motion sensors, several deep learning algorithms are applied for data analysis. Among them, edge deep learning (deep learning on edge devices) has attracted researchers’ attention as it enhances efficiency and obtains better user experience. A recurrent neural network (RNN) was employed in a wearable sensor-based system for activity prediction, and the approach surpassed other traditional methods [17]. An RNN combined with a 1-dimensional CNN was used to process the overall spatial information and temporal information of each frequency domain of the acoustic features for speech recognition, reaching a high weighted average recall [18]. A lightweight real-time fault detection system adopting a long short-term memory recurrent neural network (LSTM) for edge computing was proposed and showed satisfactory performance [19]. However, the latest research on edge HWR is so inadequate that few works have been found. A new digital pen equipped with force sensors and IMUs was introduced for alphabet recognition. Data recognition of the pen was performed on an SBC [2]. Wehbi et al. presented an online HWR system based on IMUs for digitizing text written on paper [3]. In the work, they built an end-to-end system processing sensor recordings and outputting the interpreted digital text on a tablet. Wehbi et al. also introduced an online HWR system for writing on normal paper with a sensor pen [20]. For air-writing recognition, only similar works elaborating finger stroke recognition have been published. Coffen et al. constructed a multilayer long short-term memory model to analyse data collected from a finger-worn ring profile device and reached an accuracy from 75 to 95% per finger stroke, but their attempt to transform the model to a compressed TF Lite format to run on-device did not succeed [5]. Jiang et al. proposed a wearable deep learning system capable of processing data on the end device locally [21]. Moreover, air-writing is more complicated than finger stroke recognition and involves more subtle distinctions.

In this work, a novel real-time wearable system for air-writing finger stroke recognition using the Arduino Nano 33 BLE Sense as an edge device is presented. The system enables users to write characters (10 digits and 26 English lower-case letters) in 3D space freely by moving the index finger and reached a high recognition accuracy at 97.95%.

3. System Description

The proposed system is aimed at realizing a low-power, wearable, and online air-writing recognition system based on edge computing. Figure 1 illustrates the overall architecture of this system, detailing the interdependencies between each module in this system. Our system mainly consists of IMUs, an Arm Cortex®-M4F processor, and a Bluetooth module, which are all integrated on the Arduino Nano 33 BLE Sense board (Figure 2), a single-board computer (SBC) [22]. Users wear the prototype on the index finger and draw characters in the air; the system realizes finger stroke recognition online via edge computing. More specifically, data are collected by IMUs which capture the motion signal data of samples and send them to the processor. Then, TensorFlow Lite is applied to recognize finger strokes by an Arm Cortex®-M4F processor, and the recognition results are transmitted to the terminal to display using Bluetooth Low Energy (BLE) technology [23]. Figure 3 shows the device prototype and how to wear it.

Table 1 lists low-cost SBCs that support edge computing. These SBCs provide viable options for implementing deep learning algorithms using TensorFlow Lite. HHowever, through the comprehensive consideration of factors in Table 1. In this study, we chose the Arduino Nano 33 BLE Sense to implement our project because of its small size, easy integration, and edge computing, which makes this board ideal for wearable devices. (i)Inertial measurement units (IMUs). IMUs include accelerometers, gyroscopes, and magnetometers. The accelerometer measures acceleration and the rate of change of velocity, the gyroscope measures rotation and angular velocity, and the magnetometer measures bearing direction in a magnetic field. IMUs integrate multiple inputs from different sensors to accurately record movement information [24]. In this study, a 3-axis accelerometer and 3-axis gyroscope in the IMU sensor (LSM9DS1) is used, with six degrees of freedom in a single integrated circuit. Moreover, LSM9DS1 was chosen because of its accuracy (LSM9DS1 has a linear acceleration full scale of ±2 g/±4 g/±8/±16 g and an angular rate of ±245/±500/±2000 dps), and its small size enables users to wear it on the index finger [23].(i)Microcontroller unit. A microcontroller is needed to read and process motion data from IMU sensors and then classify the data by performing a lightweight neural network locally. In this work, the ARM® Cortex®-M4 32-bit processor in the nRF52840 microcontroller, a System-on-Chip (SoC), performs sequential inference of the classifier on the device. The nRF52840 has 1 MB flash memory and 256 KB RAM and clocks at 64 MHz. The microcontroller unit operates at a 3.3-voltage level with built-in battery charging circuitry [23].(i)Communication unit. A Bluetooth with the advantages of low power consumption, flexibility, and wide compatibility is applied for communication between the microcontroller and the terminal. In our study, we utilize Bluetooth with both BLE protocol stacks and 2.4 GHz protocol stacks in the nRF52840 microcontroller [23].

3.1. Data Acquisition

As the subjects in this study, 35 students from our institute were recruited to collect data for CNN model training. Of the 35 students, both sexes were fairly equal, with a slight male majority of 18 (51.4%) students. Considering the dominant hands of the participants, out of the 35, there were 28 (80%) right-handed students and 7 left-handed students (20%). Figure 4 depicts the detailed distribution of the above-mentioned properties, and Figure 5 provides the age distribution of the subjects. Besides, the subjects’ writing speed and range of motion were also comprehensively considered.

At first, each subject was provided instructions on the air-writing approach before data collection. The subjects were asked to wear the air-writing recognition system on the index finger to vertically draw each character from digits to letters (as shown in Figure 6) in a 3D space of within 2 seconds, and the data for all characters were recorded successively without interruption. In this phase, the data were recorded at a sampling rate of 119 Hz, and the finger Graffiti unistrokes used are shown in Figure 7.

Data recording is affected by faults during the data collection process. To ensure that our model was trained on valid data, all hovering data before and at the end of each finger stroke recording were trimmed out. This was achieved by removing the data associated with IMU sensor readings below a prespecified threshold of which the value is 1.8 g (if the sum of the three parameters of the 3-axis accelerometer x-axis, y-axis, and z-axis is less than 1.8 g, it will not be recorded) at the beginning and the end of a record. When air-writing begins and motion parameters rise, rapidly exceeding the threshold, the data are recorded as the start point; when air-writing finishes and motion parameters drop under the threshold, the data are recorded as the end point. The records between the start and end points are considered the input signal of a single character. The threshold was determined through experiments that monitored the IMU sensor values while hovering with the air-writing system. This method can avoid recording unconscious movements, such as movements that are not associated with the writing itself, and effectively limits the phenomenon of zero drift. To provide another layer of quality control, at the end of each character recording, an option was raised to the users, allowing them to discard the recording and rerecord the character from scratch. On the other hand, subjects were asked to practice the air-writing method. Specifically, they were asked to write the Graffiti unistrokes defined in Figure 7 in a manner that imagined they were writing on a blackboard with their index finger acting as a chalk. The actual recordings began when they became familiar with this writing style, and each subject was asked to provide 5 sets of all 36 characters containing digits and letters, resulting in 180 samples collected from each subject and a total number of 63000 samples.

3.2. Air-Writing Finger Stroke Recognition with CNN
3.2.1. Data Preprocess

Data preprocessing is one of the most critical steps in the air-writing recognition system, which contains data segmentation and feature extraction. After the data are collected, we split motion data into datasets that only contain the data of a single character to extract the important features in the finger strokes of the input signal’s traits when using the CNN. Figure 8 shows examples of the IMU measurement filtered data of air-written characters “0,” “2,” “a,” and “c” after segmentation. When comparing the six diagrams of the six characters, there are clear differences to distinguish the characters even by our naked eye.

After data segmentation, we stored the dataset for each air-written digit and English lower-case letter in the csv file format as training samples for air-writing recognition, which can be beneficial for CNN feature extraction. Among the dataset, 60% was reserved for training, 20% was reserved for validation, and the remainder was used for testing. These data were randomly selected using the Python package random.

3.2.2. CNN

It was observed that participants in our study wrote the same character in multiple ways and made it difficult to distinguish different styles of the same character. In addition, the angle of the device tip changes depending on where one wears the system on the index finger, which also affects the accuracy of this system.

CNN is a class of artificial neural networks which has typically been used in human activity recognition [2530]. In this study, we designed a five-layer CNN structure for finger stroke recognition using the Keras frontend of TensorFlow. Figure 9 shows the structure of the CNN model employed in this research. In the five convolution layers, features were extracted from the input data through the convolutional filters. The extracted features from the three pooling layers played as summarized features of the input feature. After three distinct convolution layers, a “Max-Pooling” was applied to reduce the spatial dimensions of the input data, which was finally outputted as a vector with fewer dimensions. Dimensionally, after the input data passed through the five convolutional layers and the three pooling layers, a (with 72 channels) output was produced.

3.3. The Selection of CNN Architecture

With regard to the number and size of convolution filters, the number of convolution blocks, and the dropout layer’s probability, these four parameters will significantly affect the performance of CNN [31]. Through the ablation study of 12 CNNs with different configurations and parameters (Table 2), the optimal network structure can be determined. All other parameters were constant in training; for example, each convolutional block had the same nonlinear activation function, a down-sampling pooling layer with filters of size, each convolution layer followed by a dropout layer which can attempt to prevent overfitting and loss of generalization due to the limited number of datasets for training, and an FC (fully connected) with size 64 dense layers and size 40 dense layers, both activated using ReLU, and one size 36 dense layer with Softmax activation added at the end of the network to achieve the desired multiclass functionality.

The results of this experiment are shown in Table 2 where we can clearly observe that five convolution blocks had a higher accuracy than being similar only with three convolutional blocks in the CNN configuration [31]. Thus, N-10 was selected as the final neural network. Moreover, this model in each experiment was trained for up to 2000 epochs, and each epoch had 500 steps.

4. Results and Discussion

In our system, the finger strokes of 36 characters containing 10 digits from 0 to 9 and 26 lower-case letters of the English alphabet were recognized. The real-time recognition system communicated with the terminal via BLE to display the recognition results. The combination of accelerations and angular velocities acted as feature parameters for air-writing recognition. Automatic segmentation of the finger stroke data into significant movement segments allowed users to produce affordance input in 3D space using the air-writing recognition system and generated a confusion matrix to find the association between similar finger strokes.

4.1. CNN Model Evaluation

The CNN algorithm was programmed and trained on a computer with an NVIDIA RTX 3080 with 10 GB of GDDR6X memory, 8704 CUDA cores, an Intel Core i7-7700 HQ at 2.80 GHz, and 32 GB of RAM. With limited test data, the CNN model achieved a validation accuracy of approximately 95.12%, a training accuracy of 98.23% (Figure 10), and a final evaluation accuracy from 83% to 100% of each character on the confusion matrix, as seen in Table 2. The loss curve vs. epoch is shown in Figure 11, where the training loss and validation loss track with similar curves.

Note that the validation loss diverges from the training loss but maintains a slightly negative slope, indicating no major loss in generalization. The loss for validation and training data are not equivalent, indicating that the generalization of this model is not ideal and that some overfitting may be present in this model. The validation loss does not increase but remains relatively constant, indicating that the model is not too specialized to the training data as the number of epochs increases.

In addition, the map of the 36-character confusion matrix of the CNN classifier for the test set can be found in Table 3, which was generated by Matplotlib, a portable Python plotting package.

4.2. Recognition Test for Characters

We constructed a practical test to evaluate this air-writing recognition system. The experimental equipment consisted of IMU sensors for obtaining finger motion data, a microcontroller for edge computing, and a terminal for the display of recognition results [32]. The Arduino Nano 33 BLE Sense, as an edge device, does not support the tflite model file that the CNN is saved in; thus, the generated tflite file needs to be converted into a C byte array using Arduino IDE to store it in read-only program memory on the device. Then, sequential inference was run on the device using the C++ library, and data from the IMU were processed to recognize the finger strokes [4]. Figure 12 shows that the user was writing characters in free space using the prototype, and the result of recognized finger strokes was shown on the terminal in real-time.

To evaluate the performance of the CNN in this system, we used the following four indicators: accuracy, precision, recall, and F1 score [3335]. Equations (1)–(4) and Table 4 show how to derive the accuracy, precision, recall, F1 score, and confusion matrix of two-class classification. These four expressions are the most frequently used performance indicators for machine learning models.

The evaluation results of the performance of the CNN model in this system are shown in Table 5.

5. Discussion

The classification model based on CNN in this paper classifies 36 finger strokes at an average accuracy of 97.95%. Before acquiring the test results, in consideration of the similarity of trajectory among different finger Graffiti unistrokes, it was assumed that several finger strokes may be unrecognized by the classification model. However, our results contradicted with this assumption. For example, it was assumed that “c” and “o” would be confused, as their finger stroke trajectories are simple and the movement directions are similar to each other, yet the classification model was distinguished with a high accuracy in actual testing.

In contrast, it was assumed that “e” would not be confused at all owing to its complex strokes; the results showed that it is confused with “a” and “q.” As we can see in the confusion matrix in Table 2, the highest accuracy of finger strokes reached 100%, such as “1” and “3,” and the lowest recognition performance was 83% for both letters “f” and “l.” The confusion between “l” and “v” could be explained by the similarity in their finger Graffiti unistrokes, both with an angle pointing to the right side. Similarly, “f” and “r” are also confused (in the recognition process of “f,” there is a 17% probability it is identified as “r”), with the only obvious difference being the finger returning to the full circle for “f.” Although the results are not identical to the original conception, it was verified that this air-writing recognition system recognized each character with a high recognition rate.

6. Conclusion and Future Works

In this study, we presented a wearable air-writing recognition system with the Arduino Nano 33 BLE Sense as an edge device, which can collect finger stroke data from a built-in IMU sensor to realize the recognition of 10 digits and 26 English lower-case letters. The TensorFlow Lite-based CNN algorithm with a combination of acceleration and angular velocity data to recognize finger strokes in air was effective and efficient in real-time for our system and achieved a high recognition accuracy at 97.95%.

Our experimental results showed that the proposed method is practical for real-time finger stroke recognition. Users can effectively input their texts in a virtual environment beyond vision, touchscreen, keyboard, and mouse interactions. This recognition system could also be used in other natural interaction techniques, such as virtual reality input and real-time user activity recognition.

Although our presented method is effective for recognizing air-writing characters, it still has some limitations. This system only performed when a user writes a single character and required users to split words to each letter for input because of the sampling method and data size. Thus, we plan to further investigate the issue of adding the Dynamic Time Warping (DTW) algorithm and improve our system to recognize sequences of words. We will collect more data and incorporate the long short-term memory (LSTM) algorithm in order to improve the accuracy and robustness of this system [3640]. In addition, we will quantitatively and qualitatively evaluate the performance of the air-writing recognition system and identify potential problems under different daily life scenarios.

Data Availability

The data used to support this research are available from the corresponding author on request.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Authors’ Contributions

Hongyu Zhang and Lichang Chen contributed equally to this work.

Acknowledgments

This research was funded by the National Natural Science Foundation of China (32101616).