Abstract
Recognition and understanding of sign language can aid communication between nondeaf and deaf people. Recently, research groups have developed sign language recognition algorithms using multiple sensors. However, in everyday life, minimizing the number of sensors would still require the use of a sign language interpreter. In this study, a sign language classification method was developed using an accelerometer to recognize the Korean sign language alphabet. The accelerometer is worn on the proximal phalanx of the index finger of the dominant hand. Triaxial accelerometer signals were used to segment the sign gesture (i.e., the time period when a user is performing a sign) and recognize the 31 Korean sign language letters (producing a chance level of 3.2%). The vector sum of the accelerometer signals was used to segment the sign gesture with 98.9% segmentation accuracy, which is comparable to that of previous multisensor systems (99.49%). The system was able to classify the Korean sign language alphabet with 92.2% accuracy. The recognition accuracy of this approach was found to be higher than that of a previous work in the same sign language alphabet classification task. The findings demonstrate that a single-sensor accelerometer with simple features can be reliably used for Korean sign language alphabet recognition in everyday life.
1. Introduction
Hearing-impaired or deaf people generally use sign language and finger spelling to communicate with others. However, their communication with those who are not familiar with sign language is limited. This limitation can be a social barrier between hearing-disabled and nondisabled people, leading to low sociality of deaf persons. Various approaches have been developed to understand or recognize sign language or finger spelling using computer vision and wearable sensor systems [1–4]. For example, Rivera-Acosta et al. proposed an American sign language alphabet translation system using neuromorphic camera sensors [1]. Tao et al. developed an American sign language alphabet recognition system using Microsoft Kinetic motion data [2]. Such a vision-based system does not require signers to use complicated instruments. Although the motion data-based recognition systems achieved reliable recognition accuracy, they had their own limitations in daily life.
Wearable sensor-based sign language systems have also been investigated widely; most of them included a wearable glove-type system. The advantages of such a system, compared with the vision-based system, are the mobility and comfort they offer. The glove-incorporated strain gaze sensors or inertial measurement units (IMUs) have been investigated directly to detect finger movements [5–7]. Suri and Gupta developed a wearable IMU device that consists of an accelerometer and a gyroscope for sentence-level sign recognition. The recognition accuracy was 94.0% using a deep neural network model [8]. Mummadi et al. developed a data glove using five IMUs in the fingertips [9]. The recognition accuracy was 93.0% using a random forest model. A combination of IMUs and electromyography (EMG) sensors has been used to develop EMG signal-based language systems for ease of use in daily life even though it measures finger movements through wrist and muscle movements [10]. For example, Yeo and Shin interpreted the Korean sign language alphabet using a four-channel EMG, an accelerometer, and a gyro sensor. The recognition accuracy was 95.3% and 92.4% for 6 consonants and 6 vowels, respectively, using a Gaussian model [10]. Paudyal et al. developed a real-time sign language recognition model with 95.4% recognition accuracy using an eight-channel EMG and a nine-dimensional IMU sensor [11]. However, the EMG-based systems induced an uncomfortable experience for users because it required skin contact; it also had a high power consumption due to the high sampling rate.
Furthermore, researchers have used accelerometer sensors to recognize sign language using fingertips because of the small mass characteristic and the low sampling rate required. For instance, Bui and Nguyen developed a glove using 6 accelerometers to distinguish 23 Vietnamese sign language letters with 95% accuracy [12]. To make the system available for use in everyday life, researchers detected the sign gesture onset and offset times from the accelerometer signals [13, 14]. Ibarguren et al. segmented sign gestures with 99% accuracy in real time using an accelerometer sensor on the back of the hand [14].
Despite the high recognition accuracy, the aforementioned multisensor systems have their own limitations for use in daily life. First, multisensor systems involve high power consumption because they comprise multiple high-power sensors and high-performance processors [10, 15]. Second, their bulky design makes the system infeasible for long-term use [7, 16]. Third, adopting additional signal processes (e.g., normalization and synchronization) might involve high computational costs. Computational cost is directly proportional to timing delay. Therefore, minimizing the delay between gesturing and classification would have a significant impact on how natural the human-machine interface would appear to the user [17]. To reduce the computational cost, simple classification algorithms can be used as an alternative to complex algorithms (e.g., neural network) to extend the battery lifespan, although accuracy may be reduced. For use in everyday life, a sign language recognition system should consider achieving balance among design, computational cost, and accuracy. Therefore, a single accelerometer-based wireless system with a low-cost computational algorithm could be employed to overcome the limitations of multisensor recognition systems.
In this study, we developed a novel approach using a single accelerometer-based sensor worn on the index finger to recognize the Korean sign language alphabet. This study makes the following contributions: (i)It develops a sign language alphabet classification using a simple machine learning algorithm and minimum features(ii)It introduces an effective method of sign language segmentation using a single accelerometer, with lower computational time(iii)It investigates the possibility of using a single accelerometer-based sign language recognition system
2. Materials and Methods
2.1. Motion-Sensing Module
The motion-sensing module included a triaxial accelerometer (BMA250; ±2 g, Bosch Co., Germany) and an nRF51822 system-on-chip (SoC) (Nordic Semiconductor, Norway) [18] with an integrated 2.4 GHz RF/Bluetooth 4.0 transceiver attached on the index finger of the glove as seen in Figure 1. The motion-sensing module was powered by a rechargeable battery and controlled using an Android-based smartphone via Bluetooth. The sampling rate range was between 100 Hz and 1 kHz in similar research [6, 7, 12, 15]. A higher sampling rate leads to higher power consumption [19]. To develop a system that does not consume excessive power, we adopted a sampling rate of 20 Hz. A custom-made SoC program controlled the acceleration data acquisition. The motion-sensing module was housed in a custom-made case using a 3D printer and attached to the index finger on a glove.

2.2. Data Acquisition
The 15 subjects who voluntarily participated in the study (7 males and 8 females, average age 22.5) had no musculoskeletal disorders in their hands and were not hearing disabled. The Korean sign language consists of 31 letters (14 consonants, 10 basic vowels, and 7 double vowels). The participants were asked to wear the glove with the motion-sensing module attached on the index finger and sign the 31 Korean sign language letters, as shown in Figure 2.

The experiment consisted of two sessions, and in each session, subjects were asked to randomly sign 31 Korean sign language letters 10 times, as shown in Figure 3. The subject sat in an armchair during the experiment. In the baseline and the rest periods, the subject was asked to put their arm on the armrest and put their wrist at the end of the armrest without force. During the experiment, subjects had to keep their arm on the chair arm. They had to use only fingers and palm to express a sign gesture. The sign gesture order was displayed on a monitor, and each gesture was maintained for 3 s. There was a 6 s interval between each gesture and a 10 min break between sessions. The acceleration data contained 620 gestures for each subject. Each category of Korean sign language letters contained 20 samples. All study procedures were reviewed and approved by the University of Ulsan Institutional Review Board.

2.3. Preprocessing
The motion data were postprocessed in a PC using a custom-made program based on MATLAB (MathWorks, USA). Figure 4 summarizes the procedure of motion data processing to classify sign language. The baseline drift correction was performed using the mean of the acceleration data from 5 to 7 s during the baseline period. The baseline-corrected signal was segmented by the automatic segmentation process to extract the features. The extracted features were used for the classification of the Korean sign language letters.

2.4. Segmentation
Figure 5 shows the segmentation process. The baseline-corrected signals were filtered with a 0.1 Hz 1st-order Butterworth low-pass filter, and the vector sum of the three-axis signals was calculated to detect the gesture period (see Figure 5(b)). The offset of sign language was computed based on a specified threshold using the vector sum. The threshold was 0.7 times the mean value of the baseline for 6 s. The motion data for each sign were segmented for a 2 s period from the offset of the gesture, as seen in Figure 5(c). The number of detected gestures was counted if the gesture’s offset and onset were included when the user maintained a gesture.

(a)

(b)

(c)
Table 1 shows the segmentation results. The nondetected signs were m5anually segmented to extract the features.
2.5. Feature Extraction
There are time domain features and frequency domain features to detect motion from an accelerometer signal. Frequency domain features are used for dynamic motion, which makes them inappropriate here because motion data are measured for static gestures. Considering this aspect, we focused on the time domain features for classification, such as mean, standard deviation, and sample entropy. The average acceleration on the three axes and the vector sum, roll, pitch, and yaw per sign were used as features for classification.
2.6. Classification Model
The multiclass support vector machine (SVM) was employed to classify sign language letters. SVM has few parameters to optimize, which makes it preferable to the neural network model. The classification model for each subject was developed using six different SVM kernels in MATLAB (i.e., linear, quadratic, cubic, fine Gaussian, medium Gaussian, and rough Gaussian). The kernel of SVM was optimized by comparing accuracies, as shown in Figure 6. Ten-fold crossvalidation was employed to overcome the limitation of the available dataset [20, 21]. The data were divided into an 18-trial training set (90%) and a 2-trial test set (10%).

3. Results
Figure 6 shows the average accuracy in the classification of the 31 Korean sign language letters across six different SVM kernels. All SVMs achieved better results than those predicted by random chance (, one-sample -test). Since the quadratic kernel SVM is the best classifier (, paired -test), it is employed as the classifier for the single-sensor-based system. Figure 7 summarizes the classification accuracy of the 31 Korean sign language letters obtained from 15 subjects. The accuracy across sign language letters ranged from 87.1% to 98.6%, as shown in Figure 7(a). The lowest accuracy was observed at the 5th sign language letter, /m/, while the highest was at the 24th sign language letter, /i/. As shown in Figure 7(b), the lowest and highest accuracies among subjects were 85.5% (for S15) and 98.5% (for S14), respectively. The average classification accuracy for the 31 Korean sign language letters was . The average accuracy was certainly higher than the chance level of 3.2%.

(a)

(b)
Figure 8 shows the average confusion matrix across all subjects. The correct (actual) sign language letter and the predicted sign language letter are represented on the vertical and horizontal axes, respectively. The average classification accuracy is represented using colors as seen on the right side. All letters that were correctly recognized lie on the diagonal line, whereas those that were not recognized lie outside the diagonal line. The average diagonal accuracy is 92.2%. In some cases, pairs of sign language letters were interchanged because one letter was mistaken for the other due to their similarities in hand postures; the incorrectly predicted pairs tend to show symmetricity around the diagonal line. Figure 9 shows examples of pairs of sign language letters that were mistakenly identified; each row in Figure 9(a) shows a pair of those. The horizontal axis represents the sign language letter index, and the vertical axis represents inaccuracy. In the first-row panels, the left panel shows inaccuracy in the gesture of the 19th sign language letter. It was mostly predicted incorrectly as the 20th sign language letter, at a rate of 5.2%. The algorithm also incorrectly predicted the 4th sign language letter motion as the 12th sign language letter, at a rate of 5.1%. Such paired sets were observed between the 4th and 12th, 7th and 9th, and 21st and 22nd sign language letters. The sign language letter pairs that were often confused for each other are similar, as presented in Figure 9(b).


(a)

(b)
4. Discussion
In this study, a simplified approach was proposed using a single motion sensor worn on the index finger to recognize the Korean sign language alphabet. The SVM-based classification algorithm could recognize the sign language alphabet with an accuracy rate of over a 3.2% chance level. Table 2 summarizes the algorithm and the accuracy of the existing sign language recognition studies.
To validate our approach, we trained our system using 6 consonants and 6 vowels signs as described in the publication of Yeo and Shin [10]. Our approach shows an accuracy of 99.2% for consonants and 98.3% for vowel classification, outperforming their scores of 95.3% and 92.4%, respectively. However, the proposed system requires an accelerometer with a low sampling rate. As a comparative approach, Abualola et al. achieved lower accuracy (85.00%) in the American sign language alphabet with a similar random chance (3.9%) even though they used 6 IMU sensors on the fingertips with a high sampling rate [15]. The abovementioned studies used multisensor systems that experienced issues in synchronization between sensors, computational time, power consumption, and ease of use, whereas single-sensor systems can avoid such problems. These results provide evidence for the potential use of a single sensor and simple classification algorithms.
Segmenting a sign language gesture is as critical as high classification accuracy in a real-time system. However, Yeo and Shin have no segmentation part in their system [10]. While our segmentation algorithm uses the relative value of baseline power, their segmentation process could be hard to use for a general model in real life because the threshold value was experimentally determined depending on a user. In other words, our segmentation algorithm can be used to develop a general real-time system. Ibarguren et al. found that the signals from the accelerometer are reliable in determining sign language gesture. They used an IMU to segment the sign language gesture with a pause between the sign language letters using a genetic algorithm [14]. Although their segmentation accuracy was 99.5%, the genetic algorithm-based segmentation requires high computational cost compared to the proposed automatic segmentation algorithm, owing to the large number of needed computations. Since the computational cost to segment the gestures is high, their approach suffers from time delay and the inability to segment during signal processing. Our segmentation results show that a single accelerometer on the index finger is adequate to detect sign language gesture with 98.9% segmentation accuracy. In other words, the automatic segmentation algorithm can be used in a real-time system to segment sign language gesture with low computational cost.
To develop a real-time system, the computational cost is a crucial factor. Systems with higher computational cost require more time for recognition [17]. Table 3 summarizes model training time and recognition time to compare the computational costs of representative models. Although Paudyal et al. had only 67 ms recognition time delay, their computational cost was high on a smartphone because of its hardware computational capabilities [11]. Mummadi et al. used five IMU devices for classification and the recognition time was 140 ms [9]. Although they used only ten features, the computational cost increased due to the use of a sophisticated classifier. Suri and Gupta used an accelerometer and a gyro sensor for sign language recognition [8]. Whereas their system required much time to train the classification model because they employed a deep neural network, our proposed system requires only approximately 1 s for training. In addition, a timing delay of 100 ms or less is considered instantaneous by the user [22]. Therefore, our system would be viewed as instantaneous because the recognition time is only 1.8 ms. In other words, the proposed system could be used to develop a smartphone-based real-time system without a noticeable delay for the user.
There continues to be ambiguity about the optimal finger and the accelerometer sensor position of the finger used in the sign language recognition system. The sensor position was decided by each research group that used accelerometer signals to interpret sign language. The sensor position on the finger can influence system accuracy. In this study, the sensor was set on the proximal phalanx of the index finger. The sign language letters misclassified by the algorithm are similar, as can be seen in Figure 9. However, if the sensor is set on the middle part of the index finger, it could improve accuracy in a typical case of misreading between “ㅂ(/p/, 6th)” and “ㅍ(/ph/, 13th).” The fingers for placing sensors have been differently selected because of different gestures of individual sign languages. For example, in American sign language, the index and middle fingers are primarily used [23]. In contrast, Sadek et al. identified the most important fingers (e.g., index, ring, middle, and pinky) based on statistical analysis in Arabic sign language [24]. Thus, the choice of the finger for wearing the sensor can also affect system accuracy. Despite the proposed system successfully recognizing the Korean sign language alphabet using a single motion sensor, it could be improved by determining the optimal finger. Such factors need to be investigated in future studies.
In addition, the errors in the sign language letters in Figure 9 could be surmounted by combining the sign language alphabet with a language model in future work [12, 25–27]. Although the processing time would be increased by implementing a language model, a simple language model does not need much processing time. For example, it takes a few milliseconds to process short -grams (e.g., 2- or 3-grams) [28, 29]. This can lead to expanding the proposed system to word- and sentence-based recognition.
Deep neural networks have also been used for sign language recognition with precision; however, they require a larger amount of training data than traditional machine learning algorithms. Moreover, traditional machine learning algorithms outperform neural networks with even a small amount of training data and their computational cost is not as high as that of deep learning networks. Therefore, the amount of data is vital in deciding between traditional machine learning and deep learning. In this study, the number of samples per sign is 20; therefore, machine learning is preferred. The SVM was considered a state-of-the-art classification model before the revival of deep learning. It is also a powerful and versatile algorithm well suited for classification of small- and medium-sized datasets. Considering these advantages of the SVM, we employed several kernels of SVM to classify data and the cubic kernels seemed best fitted for our data, as can be seen in Figure 6. In addition, the performance of the classifiers used in previous studies was compared with the SVM. The performances of linear discriminant analysis and the Gaussian model were 91.8% and 53.8%, respectively, lower than the SVM performance optimized in this study. Therefore, the SVM classifier is suitable for sign language alphabet classification using a single accelerometer. Increasing the number of features might improve the classification accuracy of the system. However, in this case, increasing the number of features by adding sample entropy, STD, and a combination of these values with the mean feature resulted in decreased recognition accuracies (88.0%, 89.6%, and 88.6%, respectively) when compared to using only the mean feature (92.2%). The results show that the proposed system is not underfitted because of the small number of features. For future use in everyday life, recognition accuracy should be improved by increasing the number of samples and using more complex classification algorithms. The proposed model was computed offline on a computer. Thus, the proposed model needs to be developed for online computation so it can be implemented into smartphones to use a sign language alphabet via Bluetooth. In addition, our system could be developed as a smart ring wearable system connected wirelessly to a smartphone, which would be convenient for use in daily life [30, 31]. Future studies should also consider using the proposed system via the Internet of things.
5. Conclusions
This paper presented an accelerometer-based Korean sign language recognition system. The system can automatically segment the sign language alphabet gesture and recognize the Korean sign language alphabet using only a single accelerometer sensor. Results show that an accelerometer sensor can be employed to develop a simple system to interpret a sign language alphabet. Compared with existing methods, the proposed method not only segments the gesture precisely but also offers a comparable recognition of the Korean sign language alphabet. The accelerometer-based system can work effectively as an interpreting tool for the Korean sign language alphabet. Furthermore, this light-wearable system can be developed as a sign language-interpreting tool in the form of an easy-to-use wearable smart ring.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that there is no conflict of interest.
Acknowledgments
This work was supported by the 2018 Research Fund of the University of Ulsan.