Gender Classification Based on Multiscale Facial Fusion Feature

Zhang, Chunyu; Ding, Hui; Shang, Yuanyuan; Shao, Zhuhong; Fu, Xiaoyan

doi:https://doi.org/10.1155/2018/1924151

Mathematical Problems in Engineering

On this page

Abstract Introduction Experimental Results and Discussion Conclusion Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2018 | Article ID 1924151 | https://doi.org/10.1155/2018/1924151

Gender Classification Based on Multiscale Facial Fusion Feature

Chunyu Zhang,¹Hui Ding,^1,2Yuanyuan Shang,^1,2Zhuhong Shao,^1,3and Xiaoyan Fu^1,3

Academic Editor: Laurent Dewasme

Received18 Jul 2018

Accepted21 Oct 2018

Published04 Nov 2018

Abstract

For gender classification, we present a new approach based on Multiscale facial fusion feature (MS3F) to classify gender from face images. Fusion feature is extracted by the combination of Local Binary Pattern (LBP) and Local Phase Quantization (LPQ) descriptors, and a multiscale feature is generated through Multiblock (MB) and Multilevel (ML) methods. Support Vector Machine (SVM) is employed as the classifier to conduct gender classification. All the experiments are performed based on the Images of Groups (IoG) dataset. The results demonstrate that the application of Multiscale fusion feature greatly improves the performance of gender classification, and our approach outperforms the state-of-the-art techniques.

1. Introduction

Gender classification plays an important role in many scenarios. As one of the demographic classification attributes, gender information belongs to soft biometrics that provides ancillary information of an individual’s identity information. Moreover, it can improve the performance of face recognition. Thus, it is widely used in many applications to provide smart services in human-computer interaction, such as visual surveillance, intelligent interface, and intelligent advertising.

Varieties of modalities are used in gender classification, including gait [1], iris [2], hand shape [3], and human face. The majority of the existing work on gender classification used the modality of the human face. This paper uses face images since face images provide more useful information than other modalities for gender classification. Face images contain the distinguished differences between men and women, for example, face contour, hair, and beard.

There are a great number of studies on gender classification. For the first time, Golomb et al. [4] trained a fully connected three-layer neural network to discriminate gender for a set of 90 face images in the early 1990s. Then, several similar methods have been proposed, for example, neural network classifier [5–7]. And a few studies [8, 9] conducted based on FERET dataset to highlight the choice of the classifier. Gutta et al. [8] proposed the hybrid classification architecture, which is an ensemble of Radial Basis Function (RBF) networks and Decision Trees (DT). Moghaddam et al. [9] investigated nonlinear Support Vector Machines (SVMs) with low-resolution faces, which is superior to traditional pattern classifiers, e.g., Fisher Linear Discriminant and Nearest Neighbour Classifier. Viola et al. [10] developed a visual object detection framework based on AdaBoost. Shakhnarovich et al. [11] adopted the object detection framework to classify gender obtaining an extremely fast speed.

More recently, the method of feature extraction combining with a classifier is widely used. For example, Yildirim et al. [12] explained a Haar cascade to classify gender for frontal face images. Shan [13] reached the accuracy of 74.9% with the boosted Local Binary Pattern (LBP) features combined with SVM. Bekhouche et al. [14] extracted Multilevel Local Phase Quantization (ML-LPQ) features from normalized face images to predict the gender. Different types of features have been used in classification, such as Gabor [15] and Histogram of Oriented Gradient (HOG) [16]. Several comparative studies on different features in facial gender classification can be found in [17–19]. Deep feature extraction using pre-trained convolutional neural networks (CNNs) is very powerful and has recently been successfully applied in many image domains. However, it is rarely applied to IoG for gender classification. Although CNN architecture can reduce the time required for feature selection, it is still time-consuming when training. And IoG is not big enough to train the network fully. Whether CNN is the output feature of a fully connected layer or an end-to-end classifier, as long as the handcrafted feature is exhaustive, feature descriptors approach may compete with deep learning-based algorithm [20].

However, only the single facial feature could be extracted from the images, which are captured in controlled conditions in above-mentioned methods. The illumination and angle may have a big impact on the result of gender classification. The single feature cannot represent the information in facial images detailed. Fusion feature is often used to improve the feature extraction, and it is also a very effective method in computer vision tasks, such as visual tracking [21–25].

In this paper, we propose a new approach, Multiscale facial fusion feature (MS3F), to classify gender for the face images which are captured in uncontrolled conditions. The MS3F is extracted by applying LBP descriptor and LPQ descriptor. Each face image is divided into two blocks and LBP is applied to the top block to extract feature, while LPQ is applied to the bottom block. Besides, there is a change in the parameter of LBP and LPQ varying with the size of subblocks. By dividing face images, calculation cost can be reduced by half. Finally, SVM is employed as the classifier to conduct gender classification. The experimental results are shown in Section 3.

2. Our Approach

2.1. Feature Extraction

In gender classification, the dimensionality of raw data is very high; feature extraction should be applied before classification. And the classification performance largely depends on the selection of feature descriptor. LBP characterizes the spatial structure of a local image texture pattern, and LPQ is based on computing short-term Fourier transform (STFT) on the local image window. Both of them are proved with a better performance than other facial descriptors [13, 19]; they are complementary with each. Besides, LBP and LPQ do not require a large amount of data or computational resources. In our approach, LBP descriptor and LPQ descriptor are used together.

LBP is a descriptor proposed by Ojala et al. [26]. It expresses the texture of an image in a local 3 by 3 neighbourhood using the binary code. Comparing the central pixel value with around it, if is smaller than , setting the neighbourhood pixel to 0 and in other cases setting the neighbourhood pixel to 1. An eight-bit binary code can be obtained and can be transformed into a decimal value as follows:Furthermore, there is an extended LBP named , which is given by Ojala et al. [27]. Here, P is the number of pixels in the local neighbourhood and is the radius of the local neighbourhood. The basic LBP is defined as , where . An extended is shown in Figure 1.

LPQ descriptor is proposed for texture classification, especially for blurred images by Ojansivu et al. [28]. It utilizes local phase information extracted by short-term Fourier transform (STFE) computing over a rectangular M-by-M neighbourhood at each pixel position of the image is defined in Here is frequency, is the basis vector of the STFT at frequency , and is another vector containing all image samples from .

In LPQ, only four complex coefficients are considered, which are . The transformation matrix is defined in Decorrelation and quantization are considered before the coefficients are represented as decimal values between 0 and 255 using binary coding.

2.2. Feature Fusion

In most of the existing works, LBP and LPQ are applied to extract facial feature. However, LBP is a spatial domain descriptor operating on the pixel gray value, which is sensitive to the change of gray value. LPQ is a frequency domain descriptor operating on the frequency coefficients, which reflects the change of image gradient. Owing to this, by using LBP or LPQ some useful information cannot be extracted.

Fusion feature is proposed in this paper to represent facial information by fusing LBP and LPQ. In this paper, considering the calculation cost, fusion feature is not the histogram cascade directly. There is a process that cutting the image into two subblocks before fusing the features. Then applying different descriptors to each subblock and combing the histograms (see Figure 2). For the reason, half calculation cost can be reduced while the feature is more exhaustive.

In order to improve the performance, the division in face region is adopted horizontally rather than vertically. LBP is applied to the top block, while LPQ is applied to the bottom block. Four divisions in the face region are conducted to prove the division, which is adopted in our work can reach a better performance. The experimental results are shown in Section 3.

2.3. Multiscale Feature

Based on the fusion feature method, Multi-Block (MB) and Multi-Level (ML) are also utilized to improve the classifier performance. MB and ML are two kinds of face representations. The main idea of the MB is to divide the face into several subblocks and extract features from each subblock. In this way, the local feature can be obtained. ML combines a series of MBs. ML is a spatial pyramid representation containing both global feature and local feature as demonstrated in Figure 3.

We use MS3F to represent facial information so that extracted feature is more precisely. The main idea of MS3F is using different window sizes at different levels, rather than one size for LBP and LPQ. The window size is calculated by the size of the neighbourhood.

Considering, in ML the blocks are in different sizes, where the features express the global feature and local feature, respectively. We use large window size in low level to extract global feature, while we use small window size in high level to extract local feature as shown in Figure 4.

3. Experimental Results and Discussion

3.1. Experimental Setting

For a fair comparison, Images of Groups (IoG) dataset is used, which is the largest database of this kind collected from Flickr images [29]. The dataset consists of 5050 images with 28231 faces categorized in ages and genders (see Table 1).

IoG covers unconstrained facial expressions, different head poses, age range, and close to real world illuminations. Some images have people sitting, laying, standing on elevated surfaces, and even having dark glasses. In addition, some images in the IoG are in low resolution, having face occlusion or unusual facial expression (see Figure 5).

With the feature extracted from the face images in IoG, we use SVM to perform classification. In SVM, the Radial Basis Function (RBF) kernel is utilized, and the parameters are searched by the cross-validation. Penalty term C is searched from to and the parameter of kernel gamma is searched from to . Step is the parameter multiply by . The final result is the case and . Without loss of generalization, 14116 face images are randomly selected for training while the rest face images are used for testing. The results obtained with the proposed method are compared to the traditional pattern classifier, Principal Component Analysis (PCA) in Section 3.2.

3.2. Experimental Results

We demonstrated its performance in various experiments. The results showed that the proposed method has a higher accuracy than PCA. Even extracting single feature combined with SVM can achieve a better accuracy (see Table 2). Here, the performance of the gender classification is evaluated as follows:where is the number of face images classified correctly and is the number of total face images.

After the satisfactory classifier is obtained, the focus of our work turns to feature extraction. As mentioned above, there is a division before fusing feature. To evaluate the performance of the division we adopted, four division experiments are conducted to find the best division (see Table 3), which are dividing the image horizontally and vertically and extracting LBP and LPQ, respectively.

It is not difficult to see that the division of UpLBP-DownLPQ achieves the higher accuracy. Some repeated information was captured when dividing vertically because the face is almost a symmetrical structure. The main area of the top block is hair and eyes, which means that gray value changes obviously in this block. There are chin and nose in the bottom block where the contour is clear. Therefore, the division we proposed can extract more valuable information and obtain the best accuracy.

Based on the fusion feature, MB and ML are adopted to improve the classification performance. Dividing the image into subblocks and extracting the feature from each subblock until most of the local features are captured. Here, is an integer from 0 to 3. That means there are totally four different sizes of MB in this paper. There is subblock in MB1, subblocks in MB2, subblocks in MB3, and subblocks in MB4. ML can be obtained by combining MB. Thus, there is four ML in total. The last ML is a histogram (see Table 4).

MS3F is a critical technique proposed in this paper. The key to MS3F is the window size of feature descriptor. To get the most appropriate window size in MS3F, we compare two different scales of MS3F with the basic window size (see Figure 6).

It can be clearly observed that the MS3F shows the best performance, especially Multiscale 2. The number of pixels calculated grows with the window size increases. Only the appropriate window size can achieve the best performance. In order to evaluate the effect of Multiscale approach, we compare the basic LPQ with the Multiscale LPQ (see Figure 7). Obviously, the Multiscale LPQ gives the better performance.

We perform all the experiments on the Images of Groups dataset for gender classification. Here, the comparison against the state-of-the-art is given (see Table 5), which proves the superiority of our proposed approach.

4. Conclusion

Gender classification is one of the most important tasks in computer vision. With regard to the problem that a low accuracy in gender classification on images captured in uncontrolled conditions, a new approach MS3F is proposed in this paper. LBP and LPQ feature descriptors have been used together to extract the feature from face images. The proposed approach is tested on IoG dataset and compared to the state-of-the-art methods in terms of accuracy. The experiment results illustrate that this approach can achieve a better performance to conduct gender classification based on face images captured in daily life.

Data Availability

Previously reported IoG database was used to support this study and can be available at http://chenlab.ece.cornell.edu/people/Andy/ImagesOfGroups.html. The prior studies (and datasets) are cited at relevant places with the text as references [29].

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by National Natural Science Foundation of China (61876112, 61303104, 61601311, and 61603022), Natural Science Foundation of Beijing (4162017), Support Project of High-Level Teachers in Beijing Municipal Universities in the period of 13th Five-Year Plan (CIT&TCD20170322), Project of Beijing Excellent Talents (2016000020124G088), Beijing Municipal Education Research Plan Project (SQKM201810028018), Capacity Building for Sci-Tech Innovation-Fundamental Scientific Research Funds (025185305000/134/187/188/189), and also the Youth Innovative Research Team of Capital Normal University.

References

S. Caifeng, G. Shaogang, and P. W. McOwan, “Learning gender from human gaits and faces,” in Proceedings of the 2007 IEEE Conference on Advanced Video and Signal Based Surveillance, AVSS 2007, pp. 505–510, UK, September 2007.
View at: Google Scholar
V. Thomas, N. V. Chawla, K. W. Bowyer, and P. J. Flynn, “Learning to predict gender from iris images,” in Proceedings of the 2007 First IEEE International Conference on Biometrics: Theory, Applications, and Systems, pp. 1–5, Crystal City, VA, USA, September 2007.
View at: Publisher Site | Google Scholar
G. Amayeh, G. Bebis, and M. Nicolescu, “Gender classification from hand shape,” in Proceedings of the 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops, USA, June 2008.
View at: Google Scholar
B. A. Golomb, D. T. Lawrence, and T. J. Sejnowski, “Sexnet: A Neural Network Identifies Sex from Human Faces,” Advances in Neural Information Processing Systems, vol. 3, pp. 572–577, 1991.
View at: Google Scholar
S. Tamura, H. Kawai, and H. Mitsumoto, “Male/female identification from 8 × 6 very low resolution face images by neural network,” Pattern Recognition, vol. 29, no. 2, pp. 331–335, 1996.
View at: Publisher Site | Google Scholar
G. W. Cottrell and J. Metcalfe, “Empath: Face, Emotion, and Gender Recognition Using Holons,” in Advances in Neural Information Processing Systems, vol. 3, pp. 564–571, 1991.
View at: Google Scholar
K. Tarwani and K. K. Bhoyar, “A Neural Network Approach to Gender Classification Using Facial Images,” International Conference on Industrial Automation and Computing, pp. 67–72, 2014.
View at: Google Scholar
S. Gutta, H. Wechsler, and P. J. Phillips, “Gender and ethnic classification of face images,” in Proceedings of the 3rd IEEE International Conference on Automatic Face and Gesture Recognition, FG 1998, pp. 194–199, Japan, April 1998.
View at: Google Scholar
B. Moghaddam and M.-H. Yang, “Learning gender with support faces,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 707–711, 2002.
View at: Publisher Site | Google Scholar
P. A. Viola and M. J. Jones, “Robust Real-time Object Detection,” International Journal of Computer Vision, vol. 57, no. 2, pp. 87–101, 2001.
View at: Google Scholar
G. Shakhnarovich, P. A. Viola, and B. Moghaddam, “A unified learning framework for real time face detection and classification,” in Proceedings of the 5th IEEE International Conference on Automatic Face Gesture Recognition, FGR 2002, pp. 16–23, USA, May 2002.
View at: Google Scholar
M. E. Yildirim, J. S. Park, J. Song, and B. W. Yoon, “Gender Classification Based on Binary Haar Cascade,” International Journal of Computer and Communication Engineering, vol. 3, no. 2, pp. 105–108, 2014.
View at: Publisher Site | Google Scholar
C. Shan, “Learning local features for age estimation on real-life faces,” in Proceedings of the 1st ACM International Workshop on Multimodal Pervasive Video Analysis, MPVA 2010, Co-located with ACM Multimedia 2010, pp. 23–28, Italy, October 2010.
View at: Google Scholar
S. E. Bekhouche, A. Ouafi, A. Benlamoudi, A. Taleb-Ahmed, and A. Hadid, “Facial age estimation and gender classification using multi level local phase quantization,” in Proceedings of the 3rd International Conference on Control, Engineering and Information Technology, CEIT 2015, Algeria, May 2015.
View at: Google Scholar
B. Xia, H. Sun, and B.-L. Lu, “Multi-view gender classification based on local gabor binary mapping pattern and support vector machines,” in Proceedings of the 2008 International Joint Conference on Neural Networks, IJCNN 2008, pp. 3388–3395, China, June 2008.
View at: Google Scholar
V. Singh, V. Shokeen, and M. B. Singh, “Comparison of Feature Extraction Algorithms for Gender Classification from Face Images,” International Journal of Engineering Research and Technology, vol. 2, no. 5, pp. 1313–1318, 2013.
View at: Google Scholar
E. Makinen and R. Raisamo, “Evaluation of gender classification methods with automatically detected and aligned faces,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 3, pp. 541–547, 2008.
View at: Publisher Site | Google Scholar
P. Carcagnì, M. D. Coco, D. Cazzato, M. Leo, and C. Distante, “A study on different experimental configurations for age, race, and gender estimation problems,” Eurasip Journal on Image and Video Processing, vol. 2015, no. 1, pp. 37–59, 2015.
View at: Publisher Site | Google Scholar
F. Bougourzi, S. E. BEkhouche, and M. E. Zighern, “A Comparative Study on Textures Descriptors in Facial Gender Classification , Electrical Engineering Conference,” in Proceedings of the A Comparative Study on Textures Descriptors in Facial Gender Classification, Electrical Engineering Conference, 2017.
View at: Google Scholar
M. Castrillón-Santana, J. Lorenzo-Navarro, and E. Ramón-Balmaseda, “Multi-scale score level fusion of local descriptors for gender classification in the wild,” Multimedia Tools and Applications, vol. 76, no. 4, pp. 4695–4711, 2017.
View at: Publisher Site | Google Scholar
X. Lan, A. J. Ma, P. C. Yuen, and R. Chellappa, “Joint sparse representation and robust feature-level fusion for multi-cue visual tracking,” IEEE Transactions on Image Processing, vol. 24, no. 12, pp. 5826–5841, 2015.
View at: Publisher Site | Google Scholar | MathSciNet
X. Lan, S. Zhang, P. C. Yuen, and R. Chellappa, “Learning common and feature-specific patterns: a novel multiple-sparse-representation-based tracker,” IEEE Transactions on Image Processing, vol. 27, no. 4, pp. 2022–2037, 2018.
View at: Publisher Site | Google Scholar | MathSciNet
X. Lan, S. P. Zhang, and P. C. Yuen, “Robust joint discriminative feature learning for visual tracking,” in Proceedings of the 25th International Joint Conference on Artificial Intelligence, IJCAI 2016, pp. 3403–3410, July 2016.
View at: Google Scholar
X. Lan, P. C. Yuen, and R. Chellappa, “Robust MIL-based feature template learning for object tracking,” in Proceedings of the 31st AAAI Conference on Artificial Intelligence, AAAI 2017, pp. 4118–4125, USA, February 2017.
View at: Google Scholar
X. Y. Lan, M. Ye, and S. P. Zhang, “Robust Collaborative Discriminative Learning for RGB-Infrared Tracking,” in Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, pp. 7008–7015, 2018.
View at: Google Scholar
T. Ojala, M. Pietikäinen, and D. Harwood, “A comparative study of texture measures with classification based on feature distributions,” Pattern Recognition, vol. 29, no. 1, pp. 51–59, 1996.
View at: Publisher Site | Google Scholar
T. Ojala, M. Pietikäinen, and T. Mäenpää, “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 971–987, 2002.
View at: Publisher Site | Google Scholar
V. Ojansivu and J. Heikkilä, “Blur insensitive texture classification using local phase quantization,” in International Conference on Image and Signal Processing, A. Elmoataz, O. Lezoray, F. Nouboud, and D. Mammass, Eds., vol. 5099 of Lecture Notes in Computer Science, pp. 236–243, 2008.
View at: Publisher Site | Google Scholar
A. C. Gallagher and T. Chen, “Understanding images of groups of people,” in Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2009, pp. 256–263, USA, June 2009.
View at: Google Scholar

Copyright

Copyright © 2018 Chunyu Zhang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies