Hadoop Small Image Processing Technology Based on Big Data Processing and Its Application Effect in Face Feature Extraction and Face Recognition System Design

Zhang, Yidi

doi:https://doi.org/10.1155/2022/7493441

Mobile Information Systems

On this page

Abstract Introduction Related Works Conclusion Data Availability Conflicts of Interest References Copyright Related Articles

Special Issue

Graph-based Intelligence for Industrial Internet-of-Things

View this Special Issue

Research Article | Open Access

Volume 2022 | Article ID 7493441 | https://doi.org/10.1155/2022/7493441

Hadoop Small Image Processing Technology Based on Big Data Processing and Its Application Effect in Face Feature Extraction and Face Recognition System Design

Yidi Zhang¹

Academic Editor: M. Praveen Kumar Reddy

Received29 Mar 2022

Revised18 Apr 2022

Accepted26 Apr 2022

Published15 Jun 2022

Abstract

From standard software to crucial applications, face recognition (FR) is always at the core of many unique advances over the past 20 years. Big data is a rapidly growing collection of data. It is important in many applications such as medical, academic, and industries, and it refers to vast data sources that are hard to analyze, store, and interpret for subsequent procedures. Hadoop is an open-source large data processing platform that stores and analyses data in scalable computer clusters. FR technology is built on Hadoop processing to boost the effectiveness of recognition using MapReduce computing. This paper presents a novel adaptive fine-tuned AdaBoost (AFTA) algorithm to enhance the FR in the Hadoop processing. Collected face data sets are employed in preprocessing stage to normalize and enhance the superiority through the median filter (MF) and contour-based image enhancement (CIE), correspondingly. To handle vast quantities of the data, boosted k-means clustering (BKC) approach is used over the Hadoop servers for the MapReduce process. The binary partition tree (BPT) approach is employed in the segmentation stage to split the data into many subgroups. To lessen the dimension of the data, we use the Gabor filter. To select the consistent features, threshold-based binary particle swarm optimization (T-BPSO) is applied. Then, our proposed technique is utilized in the FR process. Finally, the performance metrics of the proposed technique are examined and compared with existing techniques to accomplish our research with the greatest effectiveness. For accuracy, precision, recall, and F-score, we attained the proposed (AFTA) values of 99.67, 97.13, 95.21, and 94.11, respectively. The outcomes are depicted in graphical representation by employing the MATLAB tool.

1. Introduction

Face recognition is now commonly employed in regular lifestyles due to technological advancement in China [1]. Currently, image recognition is a contentious issue in machine or image vision and understanding. Face recognition is indeed a perceptual recognition subproblem. China’s people constantly identify visual patterns and receive sensory information via their eyes. The brain recognizes these as significant notions. A computer sees an image or a clip as pixels. The computer must figure out what conception each piece of information reflects. It is a graphical modeling recognition problem. Face recognition requires distinguishing whomever the face corresponds to in the information. It is a subdivision issue.

Big data is a term newly developed to describe data collected in vast amounts and at a growing speed [2]. In China, big data nowadays has been a focus for academics, businesses, and authorities. Nevertheless, big data’s core characteristics of many sources, vast quantity, and rapid change make standard data processing approaches such as data mining challenging to handle. A sustainable big data processing framework is required to solve the computational burden of big data applications. Thus, Hadoop processing is included in this face recognition research and is depicted in Figure 1 with certain features.

The Apache Software Organization created Hadoop, a technology platform for big data processing. Hadoop is a one-stop store for all things related to big data. A massive quantity of data can be stored and processed on it (i.e., big data). The data is stored and processed in Hadoop using affordable clusters of computers. [3] Nodes in a Hadoop cluster are linked together through a network. With terabytes to petabytes, Hadoop clusters may keep files. Unstructured, semistructured, and structured data can be stored and processed by it. It is free to use and can save a lot of money. Using Hadoop clusters, petabytes of data may be processed in a matter of minutes.

Face recognition encompasses a wide range of technologies. Detect faces, face location, identity authentication, and the process of an image are examples. In a single image, a face recognition algorithm finds all faces’ coordinate systems. The whole image is scanned to see if the eligible region is indeed a face. The face coordinate system’s result could be rectangular or square. Face location is indeed the face feature’s coordinates positioning throughout the face recognition coordinate. The deep learning model uses current good positioning technology. Face location takes substantially less time to calculate than face recognition.

In China, several strategies for the field of computer vision can be applied to massive data. In many cases, it is desirable to execute these techniques on large data sets presently constrained by one machine. These responsibilities are usually divided over features such as method parameters, pictures, or pixels. A parameter’s functions can be performed in near-perfect parallel. Face recognition and location categorization are good instances. Parallelizing such activities enables flexible, effective resource-intensive implementation. The MapReduce architecture enables such implementations.

With Hadoop’s MapReduce model, the learning curve is enormous, and the complexity is mind-boggling. Because of the time and resources needed to develop and maintain these applications, investigators are hampered in their work [4]. It eliminates the complexities of Hadoop’s architecture and gives users familiarity with an image file while accessing advanced resources in some kind of a decentralized system via Hadoop image processing interface (HIPI). With such a process that is familiar to the MapReduce system, the solution gives users easy accessibility to image-based data types, making it straightforward and adaptable to employ for vision techniques.

To summarize, MapReduce is indeed the core of the Hadoop framework. Data processing applications can be written using this platform. In the Hadoop cluster, MapReduce distributes data processing [5]. Face recognition in big data is useful for surveillance, biometric security, and IoT throughout China. The typical face recognition model is no longer sensitive to public requests as the number of facial image data expands. So we introduce a novel adaptive fine-tuned AdaBoost (AFTA) technique employed in the investigation regarding face recognition.

Contributions to this research are as follows:(i)To preprocess the face image, MF and CIE are utilized to normalize the actual face image and enhance the superiority, correspondingly(ii)To handle the big data (a large amount of data), the BKC approach is employed in the Hadoop processing(iii)To fragment the data into several subclasses in the segmentation stage, BPT is provided with face refinement(iv)To lessen the dimensions of the face data in the feature extraction stage, the Gabor filter is used(v)To select the reliable feature subsets, the T-BPSO technique is used(vi)To recognize the face with the greatest effectiveness, the proposed AFTA approach is performed

The additional detail of this paper is ordered as Section 2 shows related works with a problem statement, Section 3 shows the proposed work, Section 4 shows performance analysis, and Section 5 shows conclusion.

Several investigation trials have lately been conducted to solve face recognition (FR) concerns. In [6], many of the disorders have multiple odontogenic keratocysts. A 12-year-old female youngster had several odontogenic keratocysts. The studies found no other anomalies indicative of a condition. In [7], personalized medicine employs fine-grained data to identify specific deviations from normal. These developing data-driven healthcare methods were conceptually and ethically investigated using “Digital Twins” within engineering. Physical artifacts were coupled using digital techniques, which continuously represent their state. Moral differences can be observed based on data structures and interpretations imposed on them. Digital twins’ ethical and sociological ramifications are examined. The healthcare system has become increasingly data-driven. This technique could be a social equalizer by providing efficient equalizing enhancing strategies. The obstruction of the technique is there is no approach for rotating the care mechanism to depend on the individual patient. In [8], allergic rhinitis would be a long-standing worldwide epidemic. Taiwanese doctors commonly treat it with either traditional Chinese or Chinese–Western drugs. Outpatient traditional Chinese medicine therapy of respiratory illnesses was dominated by allergic rhinitis. They compare traditional Chinese medicine with western medical therapies in treating allergic rhinitis throughout Taiwan. It is possible to have serious drug-drug interactions while taking Chinese drugs and western drugs at the same time if you consume them. In [9], the usage of high-dose-rate (HDR) brachytherapy avoids radioactivity, allows for outpatient therapy, and reduces diagnosis time frames. A single-stepping source could also enhance dosage dispersion by adjusting latency at every dwell location. The shorter processing intervals need not permit any error checking, and inaccuracies could injure individuals; hence, HDR brachytherapy therapies should be performed properly. In [10], this study presented a treatment as well as the technology of domestic sewage to improve the rural surroundings. In [11], soil samples from chosen vegetable farms throughout Zamfara State, Nigeria, have been tested for physicochemical and organochlorine pesticides. Testing procedure and data were analyzed using QuEChERS with GC-MS. Organochlorine pesticide residues are hazardous to humans and animals, causing immediate and long-term consequences such as immune system and reproductive damage, mutation, and carcinogenic effects. In [12], a big data architecture enabling preprocessing, as well as categorization of image and text analytics was presented using two significant processes: the big data (BD) workflow and the machine learning (ML) workflow. The retrieved tweets from a user may contain geolocation information in some of them, which, if not handled appropriately, might result in an error since various pages of tweet entries would have varied numbers of fields. In [13], the face recognition approach was examined by using a cloud platform. To take benefit of MapReduce’s parallel processing capacity, a face recognition approach was developed on the Hadoop platform using a distributed support vector machine classification algorithm. Traditional data processing platforms, with limited computational capacity and space for growth, are unable to handle large amounts of data for digital statistics and analysis. In [14], they used the Gabor wavelet approach as well as the MapReduce parallel computing approach to develop a facial recognition system. The MapReduce paradigm in the SparkContext was used to execute parallel processing at the extracting and recognizing steps. In authentication systems, information security is very crucial. They typically utilize magnetic cards, passwords and passports, and other forms of identification to verify someone’s identity. However, these approaches are vulnerable to data theft. In [15], more people than ever utilize Internet social media sites nowadays. Malicious actors exploit images of people to create fresh fraudulent profiles, causing harm to both social and corporate enterprises. For this challenge, they offer a Spark ML-based approach that could also anticipate fraudulent pictures with high clarity of profile detection. Random forest model is one of the techniques included in the Spark ML packages. In [16], they presented a biometric verification-based FR to track the attendance of students. There was the creation of a graphical user interface enabling student attendance tracking. In [17], an important aim of the research would provide a comprehensive survey of face technology for scientists in countries such as Nigeria; human facial recognition competence, on the other hand, is restricted to a few unfamiliar faces. So that they can stay abreast of research issues and solutions for a technology that could soon be used as the standard for international border control and migration are needed. In [18], Russian individuals’ perceptions and adoption of face recognition are influenced by sociodemographic aspects such as trust in government, expertise with face recognition, considered implications, user satisfaction, and regarded dependability. The respondents were inclined to not support the deployment of face recognition technologies in Russia. In [19], Indonesia Labeled Face in the Wild (ILFW) was created that aggregates face pictures of renowned Indonesian people from the online platform in different stances, expressions, illumination, and style attributes. In [20], an own-race bias, in which people are better able to recognize people of their own race than people from other races, could contribute to mistaken identity and, in certain instances, the conviction of innocent persons. This bias was investigated utilizing Black and White individuals from South Africa and England. TEMP is a state-of-the-art approach that addresses current real-world security concerns [21]. They examine several security applications’ viewpoints where ML models play a crucial role and compare, with different conceivable dimensions, their accuracy outcomes. By examining ML algorithms in security applications, it gives a model for a multidisciplinary field of research. Prerequisites rise to analyze the risk of the ML models to cope with the adversarial assaults during the period of development. In [22], the study offers a framework called SDN-RMbw (software-defined networking resilience management for bandwidth), which is a contract-based framework. The SDN-based system aims at delivering fault-resilience as well as reacting to varied network-state changes. This framework is explored using a Ryu SDN controller on a hardware testbed. To overcome these issues in this paper, we introduced an adaptive fine-tuned AdaBoost (AFTA).

2.1. Problem Statement

As a tough and trimming issue, face recognition holds a lot of promise for future research. When compared to facial detection, face recognition includes multiple steps such as face preprocessing, extraction and classification, and pairing in huge data sets. It is not the same as image detection. Surveillance, human-computer interfacing, biometric safety, the army, and the IoT all benefit from this face recognition in big data. As the volume of facial image data grows, the standard face recognition model is no longer responsive to the demands of consumers in a standalone approach.

3. Proposed Work

In this section, the methodologies of the proposed work are illustrated as depicted in Figure 2. The median filter (MF) and contour-based image enhancement (CIE) are used to normalize and enhance the quality at the preprocessing stage. BKC is used on the Hadoop servers to handle large amounts of data. With BPT, data is segmented into multiple subgroups. The Gabor filter is used to reduce data dimensions. T-BPSO is used to select consistent characteristics. To enhance the Hadoop processing with FR, this work proposes an adaptive fine-tuned AdaBoost (AFTA).

3.1. Face Data Set

In this research, the “CAS-PEAL” data sets [23] are employed for recognizing faces. The CAS-PEAL face collection has seven variants: pure-pose, expression, light, accessory, background, duration, and distance. Because every subject is captured by nine cameras concurrently, all variants are automatically integrated using nine-position (viewpoint) modifications. Table 1 depicts the CAS-PEAL face data sets with the variants.

A total of 80% of the data is used for training and 20% of the data are used for testing. Hence, we have 273 images for training and 69 images for testing.

3.2. Preprocessing

An image’s clarity and accuracy can be enhanced during the preprocessing step. The MF can be used to eliminate noise in the collected face data set, and CIE has been used to boost the brightness/contrast of the face images.

3.2.1. Median Filter (MF)

In digital imaging, MF is a nonlinear filtration that has been employed to eliminate the distortion from the face data set. MF is being used frequently since it can keep the edge when eliminating distortion under certain circumstances. Filtering a picture windowwise with MF replaces every element with the midpoint of the next closest ones. MF is a nonlinear smoothness approach that completely nullifies noise while still preserving the edge for certain noises (like random noise and salt-pepper noise). For effective recognition, the shape of the edge contains critical data. As a result, the MF is critical in preprocessing since it preserves the edge design. Figure 3 sketches the functioning of MF (median filter).

3.2.2. Contour-Based Image Enhancement (CIE)

CIE is an important key for outlines. By using CIE, the boundary of the face can be retrieved. The face portion of the digital picture is removed from the contour. Finally, the actual image and face part is generated by combining the binary image of the face part and also the actual image. It could follow the directions of moving spatially and temporally. This approach is particularly useful since it aids to increase contrast, particularly whenever the ROI and surroundings have similar contrast values. The contrast augmentation index (CAI) formula is used to describe the image’s contrast as a parameter as follows:where = value of the contrast of the processed image and = value of the contrast of the actual image.where m = gray-level value of the “foreground” of the image and s = gray-level value of the “background” of the image.

Finally, we attain the denoised and superiority-enhanced face images with the MF and CIE techniques throughout the preprocessing stage. To handle these preprocessed images, the BKC approach is employed in the Hadoop processing.

3.3. MapReduce Using Boosted K-Means Clustering (BKC)

Hadoop is a one-stop store for all things related to big data. A massive quantity of data can be stored and processed on the big data. The data are stored and processed in Hadoop using affordable clusters of computers. Nodes in a Hadoop cluster are linked together through a network. With terabytes to petabytes, Hadoop clusters may keep files. Hadoop’s appeal stems from its ability to retrieve available information.

MapReduce is indeed a Hadoop element that can be used with various applications. MapReduce is the core of the Hadoop framework. Data processing applications can be written using this platform. In the Hadoop cluster, MapReduce distributes data processing.

For example, K-means clustering distributes n data items into k divisions so that each data item corresponds to the closest partition. The intracluster similarity must be as large, while the intercluster similarity is as minimal as possible. By determining the mean value of data items inside the cluster, K-means clustering measures cluster similarity. Using K-means clustering, n data items are assigned to k groups with random beginning centers. Using random initial centers causes wasteful and unreliable information retrieval outputs. If we run K-means clustering within the same information set many times with random initial centers, we get somewhat inconsistent cluster outcomes each time.

To solve this issue, we employed K-means clustering with boosted initial centers. The objective behind such a BKC technique is to start with the ideal K data points inside the dense areas. The MapReduce process needs that the BKC to be implemented in two sections as depicted in Algorithm 1. First is the mapper, and the other is the reducer.

	Mapper
	Step 1: Take D is an m-point data set
	Step 2: To create a distance-vector V, first estimate the Euclidean distance for every data point (z_i) to all data points (using equation (3))
	Step 3: Compute the average distance (R) by equation (4)
	Step 4: Set neighbor_count to zero
	Step 5: For every distance d
	If (distance < R)
	{
	neighbor_count = neighbor_count + 1;
	}
	Step 6: end if
	Step 7: end for
	Step 8: Determine the threshold (T) values
	Step 9: Identify data points in the highly dense area. If possible, put it in a highly dense area. Alternately, put it in a low dense area.
	If (neighbor_count ≥ T)
	{
	Use the distance value’s index as key and then save it in <key, value> structure like HD_set (1, key)
	}
	Else
	LD_set (2, key)
	Step 10: end if
	Reducer
	Step 1: Gather the mapper function’s results like HD_set (1, list<values>)
	Step 2: Choose K initial centers (1, list<values>)
	Step 3: Initialize S[k] using these points
	Step 4: Set min_distance to the maximum value.
	Step 5: For i = 0 to S_k.length
	Distance_estimate = determine distance (d, S[i])
	If (Distance_estimate < min_distance)
	{
	min_distance = Distance_estimate; index = i;
	}
	Step 6: end for
	Step 7: Consider the index as KEY and the matching values as VAL
	Step 8: Cluster outcomes as (KEY, list<VAL>)
	Step 9: End

Consider d = dimension:where z_j = input data’s data point, d_ij = distance between z_i and z_j, and z_i1 = z_i’s first dimension distance.

Average distance (R):

3.4. Segmentation Using Binary Partition Tree (BPT)

An image is segmented into smaller subsets called image segmentation that simplifies subsequent computation or assessment of the image by lowering its complexity. Allocating labels to individual pixels is a simple way of segmenting an image.

An organized portrayal of areas that could be generated with an initial partition has been provided by a BPT. Figure 4 is an illustration of this. The tree’s branches indicate the initial partition’s areas. When two children of a node become merged, the result is a new node that represents the new area. Every part of an image’s support is represented by the root node. According to the tree’s size, it encompasses a wide range of areas.

Smaller features could be located at reduced levels, while larger areas could be found closer to the root. A balance regarding accuracy and processing speed must be taken into account when assessing this model. Indeed, the tree does not show all possible mergers of areas belonging to the original partition.

Because their merger is a connected component, each area is linked directly to its child in the tree; however, the other links among areas of the initial partition are not depicted in the framework. With a tree representation, complex processing algorithms can be implemented quickly.

Nodes of the BPT are calculated using distance as a merging criterion.

To generate a face class (Ω), the collection of standardized face pictures must be used. In applied to measure faces of every size, a B-dimensional vector indicating the scaled edition of the area is created for every area. It is possible to model an image’s () class membership with only a single unimodal Gaussian density function.

And residual reconstruction error is given bywhere y_i = projection of z_i and A = principal component and A << B.

A scaled version of a specific area cannot be computed and evaluated. Consequently, an additional image is formed, which is scaled horizontally and vertically in accordance with the data set.

Nodes in the BPT must be scaled differently; therefore, distances between them cannot be estimated based on the distances with their children nodes. Certain nodes and then even subtrees are typically trimmed according to size and coloring requirements to lessen the computational burden. Figure 4 shows the BPT after pruning the node 2 subtree. A face could, in theory, be represented by the blue screen in the image’s upper right corner.

According to an examination, node 1 should be selected. A face is most likely to be represented by BPT node 1, which is located in this area.

3.4.1. Face Refinement

The face may be missing from the selected node since it lacks a few of the areas that make it up. The face segmentation approach can be simplified by using a chrominance criterion-based merging process. The optimal area (in terms of probability) also is not guaranteed to be a node inside the BPT, nevertheless. Once the basic components were identified, a refining process is used to fully retrieve the face characteristics without significantly increasing the computing burden.

A two-step process is used to achieve this level of polish. We will start by introducing some basic geometric details about the area we are trying to map. The second phase is founded on the same merging method that was used to create the primary face element. However, in this case, the BPT is not employed, and the evaluation is limited to merging the identified face element including its nearby areas. Due to its vast size, the face element’s scale for assessing various mergers could be modified. This enables the process to be sped up because the distance to be calculated for the various combinations could be calculated repeatedly.

3.5. Feature Extraction Using Gabor Filter (GF)

GF is commonly employed to retrieve characteristics in images; it could retrieve the spatial/frequency data. Amplitude, phases, and direction are the three kinds of characteristics created by the GF. A sine plane wave modulates a Gaussian envelope in such filters. Throughout the spatial region, the Gabor filter is stated as follows:wherewhere f_c = center frequency, f_max = 1/4, and = direction.

Evaluate the proportion of the center frequency to the size of such Gaussian envelope using t and n. Here, have been the most widely used variables.

Throughout this research, we employed a bank of filters with five scales and eight directions to retrieve distinct information from the face pictures, c = 1 to 4 and d = 0 to 7, respectively.

Consider I(x, y) become a gray-scale face image, and the feature extraction process is as follows:where = complex filtering result.

3.6. Feature Selection Using Threshold-Based Binary Particle Swarm Optimization (T-BPSO)

A random optimization technique, BPSO, chooses the greatest characteristics including other features that are not as essential when it is performed. To arrive at a single feature vector, several random factors need to be considered into account, and this feature vector is not quite the same each time. About half of all extracted features are selected by the standard BPSO, and the T-BPSO approach brings this selection mechanism nearer to an optimal and strict value. The specified set of features was represented in g_best, which would be a logical array containing 1 s and 0 s. In essence, “1” signifies selection, while “0” indicates masking. It is possible to set a nonzero threshold value by running the BPSO “P” times. The frequency with which a dimension is picked can indicate its relevance. The BPSO technique could be performed infinitely. During that last attempt, a threshold value could be specified.

The number of times a feature receives a “1” inside the gbest array per attempt determines its importance. This value indicates how restricting the FR scheme ought to be. Providing a threshold level ensures that any character with a high likelihood of being chosen is believed to be significant. This threshold’s limit is 1 to P. Figure 5 depicts the flow of the T-BPSO algorithm.

While T-BPSO is indeed a stochastic model, this is not a Bernoulli test. However, the selection procedure generates comparable features, and the improvement of the same cost function to every image may generate certain differences in features. Although every feature is randomly chosen (unlike a Bernoulli trial), it is chosen based on how it identifies the image. Therefore, executing the T-BPSO several rounds is not a Bernoulli procedure because every attempt influences the result of the next. Table 2 shows the type of feature sets.

3.6.1. Fitness Function

Searching for the best relevant feature subset in the feature extraction area seems to be the objective of BPSO’s innovative process. In the technique, every particle provides a potential response (feature subset). This decision is made by reducing the fitness value and in this situation is primarily connected to the class dispersion or scatter value. Every optimization difficulty has its fitness value.

The particle’s coordinates are used to assess the fitness function, which produces a value that can be given to the particle’s present location. Personal best (p_best) and global best (g_best) positions are modified if the value exceeds their relevant personal best. Within every generation, all particles would be assessed, and the fitness function returns a value for that particle’s fitness. The fitness function F acts as a driving force in this process of evolutionary change. The fitness function’s main goal is to help the feature selector narrow down its search for a viable candidate solution.where i = 1 to L, j = 1 to N_i, V = class (ranging from 1 to L), N = image in every class (from 1 to 3), H_i = mean of relevant class (from 1 to L), and H₀ = grand mean.

3.7. Face Recognition Using Adaptive Fine-Tuned AdaBoost (AFTA) Algorithm

As a leading model of enhancing, the AFTA algorithm is an enhancing method. Its underlying concept is to train a huge number of anemic classifiers having generic classifying abilities on the same training data set and afterward combine the anemic classifiers using a combined technique to produce an active classifier having greater recognition abilities. AFTA algorithm solves the recognition tasks in the following, as per the underlying concept:where T_d = training set of data, a = input sample, and b = sample’s type.Step 1:set up the training database’s dispersion of weight. Every training data is given a weight by the AFTA algorithm, which is adjusted in every cycle. To begin, each sample is given a uniform weight.Step 2:train the information with the previous weight, and the anemic classifier A_q(a) is accepted.where q = 1, 2, …r denotes the q-th cycle.

Determine the rate of the classifier’s error e_q of the qth A_q(a) as follows:where e_q is the total weight of the samples miscategorized by A_q(a).

The weight factor f_q of the A_q should be updated based on e_q as follows

The significance of A_q(a) in the final active classifier is given by f_q. The greater e_q, the lower the weight factor f_q of A_q(a), according Ong et al. [16].

Adopt the weight dispersion d(q + 1) of the training sample as per f_q and afterward execute the next cycle till the number of iterations exceeds the “r” chosen in advance as follows:where , and N_q = normalization component. This component can be denoted as follows:Step 3:merge the anemic classifier via a combination technique and produce the final active classifier A(a) as follows:where f(a) is a linear function of the anemic classifiers across individuals. The factor of the q-th cycle is denoted by f_q. The qth cycle’s anemic classifier is represented by A_q(a).

T(f) is a fine-tuning parameter utilized to identify the classified outcome with a higher degree of accurateness.

In the AFTA algorithm, k-estimators (iterations/cycles) are indeed a hyperparameter. The AFTA algorithm sets the default value before training the model. Various k-estimators numbers indicate that the AFTA algorithm has various amounts of anemic classifiers, various weight dispersions for the anemic classifiers, and various degrees of fitness for the framework developed using the AFTA algorithm. Tiny k-estimators are susceptible to model under-fitting, whereas big n-estimators are simple to produce fitting problems when using the AFTA algorithm in the training phase.

4. Performance Analysis

One master and two slaves compensate the Hadoop cluster in this investigation. We are running Centos 7.3. The latest edition of Hadoop includes 2.6.5. Table 2 contains the details of the cluster’s setup. The testable theories of the proposed approach are evaluated throughout this part, and the simulation procedure is carried out by using the MATLAB tool. For the gathered CAS-PEAL face data sets, our proposed AFTA approach is applied to accomplish the greatest face recognition (FR). From this investigation, we gain certain metrics such as accuracy, precision, recall, and F-score. These metrics are assessed through the below-mentioned calculations and specify the accurateness of the proposed technique.

4.1. Metrics for Assessing Performance

FR can indeed be categorized as either a positive or negative event. Using a mix of actual type and trainee type, we divided the face data sets into FOUR trials (i.e., “true positive,” “true negative,” “false positive,” and “false negative”). Our proposed AFTA algorithm identifies the data individually; therefore, we arrange them as per their recognition findings and then use the data as good instances. In this section, certain metrics such as accuracy, precision, recall, and F-score are assessed for the proposed work for FR. These metrics are expressed below.

4.1.1. Accuracy (A)

Accuracy provides recognition with the required face data as follows:where true positive = tp = amount of right forecasts of a positive sample, true negative = tn = amount of right forecasts of a negative sample, false positive = fp = amount of wrong forecasts of a positive sample, and false negative = fn = amount of wrong forecasts of a negative sample.

4.1.2. Precision (B)

It is the percentage of useful occurrences among those recovered.

4.1.3. Recall (C)

It is the percentage of relevant images that have been recovered:

4.1.4. F-Score (D)

The F-score is a metric for determining how accurate recognition is on a given face data:

Table 3 and Figure 6 show the F-score of proposed and existing methods.

4.1.5. Expressivity Score (E)

Expressivity score is defined as negative expressivity, positive expressivity, and emotion strength. Table 4 and Figure 7 show the expressivity score of proposed and existing methods.

4.1.6. Recognition Score (F)

For each step of the process, the percentage of favorable choices out of the total number of decisions is known as the recognition score (i.e., first instance and final on appeal). The sum of positive and negative choices equals the total number of decisions. Table 5 and Figure 8 show the recognition score of proposed and existing methods.

4.2. Discussion

In this part, we discuss the effectiveness of our proposed technique by assessing the above-mentioned performance metrics regarding FR for the given face data. As per Table 1, the data sets are gathered for FR, and also numerous variants are indicated. The preprocessing stage is carried out through the MF and CIE techniques for removing the unwanted outliers/noises and enhancing the quality of the face image, correspondingly. The MapReduce process is also performed through the BPT approach inside the Hadoop clusters to handle the preprocessed data since the volume of the data is so bulky. The generated data of this stage is performed in further processes such as feature extraction, feature selection, and FR to efficiently recognize the face images. Our proposed technique is performed and also matched with other standard techniques (voxel-based 3D [24], ensemble-aided FR [25], coupled mapping [26], and deep-DA [27]). The aforementioned metrics are determined with both the proposed and standard techniques. Figure 9 and Table 6 depicted the comparison of accuracy with proposed and existing techniques regarding the individual face collections. In this graph, the x-axis denotes face data sets, and the y-axis denotes the accuracy. By employing (20), we estimated the accuracy rate for the specified data sets. Figure 10 and Table 7 depicted the comparison of accuracy with proposed and existing techniques regarding the individual face collections. In this graph, the x-axis denotes face data sets, and the y-axis denotes the precision. By employing (21), we estimated the precision rate for the specified face data sets. Figure 11 and Table 8 depicted the comparison of accuracy with proposed and existing techniques regarding the individual face collections. In this graph, the x-axis denotes face data sets, and the y-axis denotes the recall. By employing (22), we estimated the proportion of the recall for the specified face data sets. Figure 6 and Table 9 depicted the comparison of accuracy with proposed and existing techniques regarding the individual face collections. In this graph, the x-axis denotes face data sets, and the y-axis denotes the F-score. By employing (23), we estimated the F-score for the specified face data sets. The existing techniques had certain downsides regarding face recognition that are explained below.

In [24], a video-based FR in security monitoring and 4D FR techniques should be implemented to enhance the effectiveness of recognition. By this, the research [24] attained the inconsistent recognition accuracy of face images. In [25], there is no feature selection stage to choose the feature subsets. Hence, this research [25] gained less efficiency than our proposed technique. In [26], there is a need for additional RAM since the issue of memory usage in the training set. In [27], there are the limitations of cross-domain object recognition tests to validate the generality of this research for different implementations. From this assessment, we accomplished the proposed approach with the greatest level of face recognition than that of the existing techniques (voxel-based 3D [24], ensemble-aided FR [25], coupled mapping [26], and deep-DA [27]).

5. Conclusion

This paper presented a novel adaptive fine-tuned AdaBoost (AFTA) technique for face recognition (FR). Here, the CAS-PEAL data sets were employed and collected from the big data source. A median filter (MF) was utilized for denoising the original face data sets, and a contour-based image enhancement (CIE) approach was also employed for improving the quality inside the preprocessing stage. Due to the huge data, the boosted k-means clustering (BKC) technique was used for MapReduce in the Hadoop clusters. Binary partition tree (BPT) and Gabor filter approaches were provided in the segmentation and extraction stages to segment and extract the face features, correspondingly. For feature subsets collection in the feature selection stage, the threshold-based binary particle swarm optimization (T-BPSO) technique was applied. Finally, our proposed strategy was performed for face recognition with maximum accuracy. Moreover, the performance metrics of the proposed technique were examined, and we accomplished our research with accuracy (99.67%), precision (97.13%), recall (95.21%), and F-score (94.11%) over existing techniques regarding face recognition. The prospect of using facial recognition technology is garnering a lot of interest. However, the proposed method is also very contentious in regard to concerns such as privacy, dependability, and a lack of regulatory oversight. We may improve the efficiency and scalability of our proposed work with the execution of advanced optimization approaches in upcoming years.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The author declares that there are no conflicts of interest or personal relationships that could have appeared to influence the work reported in this paper.

References

L. Li, X. Mu, S. Li, and H. Peng, “A review of face recognition technology,” IEEE Access, vol. 8, pp. 139110–139120, 2020.
View at: Publisher Site | Google Scholar
H. Ni, “Face recognition based on deep learning under the background of big data,” Informatica, vol. 44, no. 4, 2020.
View at: Publisher Site | Google Scholar
S. Guo, Y. Zhang, Q. Wu, L. Niu, W. Zhang, and S. Li, “The performance evaluation of a distributed image classification pipeline based on Hadoop and MapReduce with initial application to medical images,” Journal of Medical Imaging and Health Informatics, vol. 8, no. 1, pp. 78–83, 2018.
View at: Publisher Site | Google Scholar
J. Liu, S. Tang, G. Xu, C. Ma, and M. Lin, “A novel configuration tuning method based on feature selection for Hadoop MapReduce,” IEEE Access, vol. 8, pp. 63862–63871, 2020.
View at: Publisher Site | Google Scholar
Y. Zhu and Y. Jiang, “Optimization of face recognition algorithm based on deep learning multi feature fusion driven by big data,” Image and Vision Computing, vol. 104, Article ID 104023, 2020.
View at: Publisher Site | Google Scholar
R. N. Mody and A. R. Bhoosreddy, “Multiple odontogenic keratocysts: a case report,” Annals of Dentistry, vol. 54, no. 1-2, pp. 41–43, 1995.
View at: Google Scholar
H. Garg, “Digital twin technology: revolutionary to improve personalized healthcare,” Science Progress and Research, vol. 1, no. 1, p. 1, 2020.
View at: Publisher Site | Google Scholar
B. Ahmed and A. Ali, “Usage of traditional Chinese medicine, western medicine and integrated ChineseWestern medicine for the treatment of allergic rhinitis,” Science Progress and Research, vol. 1, no. 1, pp. 1–9, 2020.
View at: Publisher Site | Google Scholar
A. Shahabaz and M. Afzal, “Implementation of high dose rate brachytherapy in cancer treatment,” Science Progress and Research, vol. 1, no. 3, pp. 77–106, 2021.
View at: Publisher Site | Google Scholar
Z. Li, “Treatment and technology of domestic sewage for improvement of rural environment in China-Jiangsu: a research,” SPR, vol. 2, no. 2, pp. 466–475, 2022.
View at: Google Scholar
S. O. Salihu and I. Zayyanu, “Assessment of Physicochemical parameters and Organochlorine pesticide residues in selected vegetable farmlands soil in Zamfara State, Nigeria,” Science Progress and Research (SPR), vol. 2, p. 2, 2022.
View at: Google Scholar
S. P. R. Asaithambi, S. Venkatraman, S. Venkatraman, and R. Venkatraman, “Proposed big data architecture for facial recognition using machine learning,” AIMS Electronics and Electrical Engineering, vol. 5, no. 1, pp. 68–92, 2021.
View at: Publisher Site | Google Scholar
B. Zhang, “Distributed SVM face recognition based on Hadoop,” Cluster Computing, vol. 22, no. S1, pp. 827–834, 2019.
View at: Publisher Site | Google Scholar
A.-C. Phan, H.-P. Cao, H.-D. Tran, and T.-C. Phan, “Face recognition using gabor wavelet in MapReduce and Spark,” World Congress on Global Optimization, Springer, New York, NY, USA, 2019.
View at: Google Scholar
M. J. Awan, M. A. Khan, Z. K. Ansari, A. Yasin, and H. M. F. Shehzad, “Fake profile recognition using big data analytics in social media platforms,” International Journal of Computer Applications in Technology, 2021.
View at: Google Scholar
P. Ong, T. W. Chong, and W. K. Lee, “Development of class Attendance system using face recognition for faculty of mechanical and manufacturing engineering, universiti tun hussein onn Malaysia,” Challenges and Applications for Implementing Machine Learning in Computer Vision, IGI Global, Hershey, PA, USA, 2020.
View at: Publisher Site | Google Scholar
C. Mbah, I. Ogechukwu, and I. Anarado, “December. Face recognition trends: a guide for advanced national security research IN Nigeria,” International Conference on Engineering Adaptation and Policy Reforms, vol. 1, no. 1, pp. 40–50, 2019.
View at: Google Scholar
A. Chernenkova, Facial Recognition Technology in Russia: Do the Citizens of Russia Accept it, University of Twente, Enschede, Netherland, 2021, Master's thesis.
M. Wirianto, “The development OF face recognition model IN Indonesia pandemic context based ON dcnn and arcface loss function,” International Journal of Innovative Computing Information and Control, vol. 17, no. 05, pp. 1513–1530, 2021.
View at: Google Scholar
D. B. Wright, C. E. Boyd, and C. G. Tredoux, “Inter-racial contact and the own-race bias for face recognition in South Africa and England,” Applied Cognitive Psychology, vol. 17, no. 3, pp. 365–373, 2003.
View at: Publisher Site | Google Scholar
R. Sagar, R. Jhaveri, and C. Borrego, “Applications in security and evasions in machine learning: a survey,” Electronics, vol. 9, no. 1, p. 97, 2020.
View at: Publisher Site | Google Scholar
R. H. JhaveriH, S. V. Ramani, G. Srivastava, T. R. Gadekallu, and V. Aggarwal, “Fault-resilience for bandwidth management in industrial software-defined networks,” IEEE Transactions on Network Science and Engineering, vol. 8, no. 4, pp. 3129–3139, 2021.
View at: Publisher Site | Google Scholar
W. Gao, B. Cao, S. Shan et al., “The CAS-PEAL large-scale Chinese face database and baseline evaluations,” IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, vol. 38, no. 1, pp. 149–161, 2007.
View at: Google Scholar
S. Sharma and V. Kumar, “Voxel-based 3D face reconstruction and its application to face recognition using sequential deep learning,” Multimedia Tools and Applications, vol. 79, no. 25-26, pp. 17303–17330, 2020.
View at: Publisher Site | Google Scholar
P. VenkateswarLal, G. R. Nitta, and A. Prasad, “Ensemble of texture and shape descriptors using support vector machine classification for face recognition,” Journal of Ambient Intelligence and Humanized Computing, pp. 1–8, 2019.
View at: Publisher Site | Google Scholar
E. Zangeneh, M. Rahmati, and Y. Mohsenzadeh, “Low resolution face recognition using a two-branch deep convolutional neural network architecture,” Expert Systems with Applications, vol. 139, Article ID 112854, 2020.
View at: Publisher Site | Google Scholar
S. Banerjee and S. Das, “Mutual variation of information on transfer-CNN for face recognition with degraded probe samples,” Neurocomputing, vol. 310, pp. 299–315, 2018.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2022 Yidi Zhang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Mobile Information Systems

Graph-based Intelligence for Industrial Internet-of-Things

Hadoop Small Image Processing Technology Based on Big Data Processing and Its Application Effect in Face Feature Extraction and Face Recognition System Design

Abstract

1. Introduction

2. Related Works

2.1. Problem Statement

3. Proposed Work

3.1. Face Data Set

3.2. Preprocessing

3.2.1. Median Filter (MF)

3.2.2. Contour-Based Image Enhancement (CIE)

3.3. MapReduce Using Boosted K-Means Clustering (BKC)

3.4. Segmentation Using Binary Partition Tree (BPT)

3.4.1. Face Refinement

3.5. Feature Extraction Using Gabor Filter (GF)

3.6. Feature Selection Using Threshold-Based Binary Particle Swarm Optimization (T-BPSO)

3.6.1. Fitness Function

3.7. Face Recognition Using Adaptive Fine-Tuned AdaBoost (AFTA) Algorithm

4. Performance Analysis

4.1. Metrics for Assessing Performance

4.1.1. Accuracy (A)

4.1.2. Precision (B)

4.1.3. Recall (C)

4.1.4. F-Score (D)

4.1.5. Expressivity Score (E)

4.1.6. Recognition Score (F)

4.2. Discussion

5. Conclusion

Data Availability

Conflicts of Interest

References

Copyright