Abstract
Autism Spectrum Disorder (ASD) is a complicated collection of neurodevelopmental illnesses characterized by a variety of developmental defects. It is a binary classification system that cannot cope with reality. Furthermore, ASD, data label noise, high dimension, and data distribution imbalance have all hampered the existing classification algorithms. As a result, a new ASD was proposed. This strategy employs label distribution learning (LDL) to deal with label noise and uses support vector regression (SVR) to deal with sample imbalance. The experimental results show that the proposed method balances the effects of majority and minority classes on outcomes. It can effectively deal with imbalanced data in ASD diagnosis, and it can help with ASD diagnosis. This study presents a cost-sensitive approach to correct sample imbalance and uses a support vector regression (SVR)-based method to remove label noise. The label distribution learning approach overcomes high-dimensional feature classification issues by mapping samples to the feature space and then diagnosing multiclass ASD. This technique outperforms previous methods in terms of classification performance and accuracy, as well as resolving the issue of unbalanced data in ASD diagnosis.
1. Introduction
Autism spectrum disorder (ASD) is a series of complex neurodevelopmental disorders, and its clinical manifestations are mainly social interaction disorders, verbal communication disorders, and stereotyped repetitive movements [1, 2]. Statistics from the US Centers for Disease Control and Prevention show that the prevalence of autism in American children is as high as 1 : 59. This shows that autism has become a rather serious health problem and there is an urgent need to develop an effective method for timely diagnosis. However, because the physiological cause of autism is not clear, medical diagnosis can only be based on the patient’s symptoms and feedback, qualitative/quantitative testing information, and the physician’s personal experience, which has great uncertainty [3]. Therefore, it is of great significance to use computers to assist in the diagnosis of autism.
Studies have shown that autism spectrum disorders are related to the abnormal brain function in patients and resting-state functional magnetic resonance imaging, which reflects functional changes such as brain metabolic activity in patients under a resting state, is reflected using blood oxygen-dependent levels [4, 5]. Resonance image (resting-state functional magnetic resonance imaging, rs-fMRI) has become a powerful tool for quantifying neural activity in the brain and has gradually become one of the important means for the study of brain diseases such as ASD [6, 7]. Based on this diagnosis, researchers have proposed a variety of computer-aided autism diagnosis algorithms [8, 9]. For example, the authors used high-order functional connectivity matrix for auxiliary diagnosis of autism. He proposed multivariate graph learning for auxiliary diagnosis of autism, the authors explored the relationship between brain regions through deep learning and Correlation for auxiliary diagnosis of autism and so on [10]. However, these methods can only deal with dichotomous problems, and in clinical practice, autism spectrum disorder includes several disorders related to developmental disorders, such as autism [11] and Asperger’s syndrome (Asperger’s disorder), nonspecific general developmental disorders (pervasive developmental disorder not otherwise specified, PDD-NOS), and so on. Most of the existing auxiliary diagnosis models for autism can only solve the problem of binary classification and cannot distinguish several related diseases of ASD at the same time. In addition, these methods also do not deal with label noise in a targeted manner [12]. Labeling noise is a challenge involved in the auxiliary diagnosis of multiclass ASD and has serious adverse effects on classifier performance [13]. Label noise refers to the deviation between the target label of the training sample and the true label of the corresponding instance. There are many factors in the generation of labeling noise, such as the subjectivity of the labeling process, the low recognizability of the samples to be labeled, and communication/coding problems. Labeling noise is prevalent in autism diagnosis scenarios. Subjectivity in the diagnostic process, inconsistent diagnostic criteria, and blurring of the boundaries of ASD subcategories contribute to labeling noise [14].
The class imbalance problem under high-dimensional features is another challenge involved in the auxiliary diagnosis of multiclass ASD [15]. The neuroimaging data usually used for the auxiliary diagnosis of ASD often have hundreds or thousands of features, and the number of training samples is very limited, which may easily lead to overfitting problems during classifier training. Moreover, the samples used to construct the ASD classifier have the problem of class imbalance, which causes the classification prediction results to be biased towards the majority class [16, 17]. This paper proposes a cost-sensitive label distribution support vector regression learning for auxiliary diagnosis of ASD [18]. First of all, multiclass ASD auxiliary diagnosis is faced with the problem of label noise, and the unique label form of label distribution can better overcome the influence of label noise on the classifier through the description of the same sample by different labels to accurately express the difference between labels. The degree of correlation makes the learning process contain richer semantic information, can better distinguish the relative importance of multiple markers, and has better pertinence to the problem of marker noise in the auxiliary diagnosis of ASD [19, 20]. The kernel approach is introduced at the same time as the support vector regression. The linearly inseparable data in the original input space may be transferred into a linearly separable feature space using the kernel method’s nonlinear mapping, offering additional discriminative information. Finally, a cost-sensitive technique is devised to address the issue of category imbalance. The algorithm may adjust to the demands of actual applications to some degree and treat a limited number of individuals equitably by introducing the imbalance of misjudgment costs of various categories in reality.
Label distribution learning (LDL) is designed to cope with label noise in this technique, while support vector regression (SVR) is also used to handle the sample imbalance. According to the results of the trials, the proposed technique optimizes the effects of majority and minority classes on outcomes. It can handle skewed data in ASD diagnosis and can assist with ASD diagnosis. This study provided a cost-sensitive technique for correcting sample imbalance using a support vector regression (SVR)-based method to reduce label noise. The label distribution learning approach addresses high-dimensional feature classification challenges by mapping data to the feature space and then diagnosing multiclass ASD. In terms of classification performance and accuracy, our proposed strategy outperforms earlier methods, as well as eliminates the challenge of unbalanced data in ASD diagnosis.
However, the improved model is still biased towards the majority class to some extent, and the imbalanced data problem should be improved further as a future study. Researchers can further try to improve the data sampling method or use the synthetic minority sample method as future prespective
1.1. Organization
The paper is framed into several sections where Section 1 states about the Introduction followed by related work section in Section 2. Section 3 states about cost-sensitive marker distribution learning for ASD-aided diagnosis, followed by Section 4 that describes the evaluation of proposed methodology. The final section is the concluding section numbered 5 that discusses the results obtained in the study.
2. Related Work
2.1. Labeled Distribution Learning
Label distribution learning (LDL) is a machine learning method that has emerged in recent years [21]. It introduces the concept of label distribution on the basis of single-label and multilabel learning [22, 23]. In a multimarket scenario, if a sample is related to multiple markers, the importance of these markers to the sample will generally be different, and the marker distribution is a marker form that describes the importance of different markers to the same sample. Label distribution learning is a machine learning method that takes label distribution as the learning target and has been applied in many fields. Author proposed a deep label distribution learning algorithm combining convolutional neural network and label distribution learning to estimate age by face, and Author uses wheel of emotions to automatically identify the user’s emotional state from the text. Author proposed an algorithm based on multivariate label distribution to detect head pose [24, 25]. However, it has not yet been reported for the auxiliary diagnosis of brain diseases. This study aimed to identify particular qualities that aid in the automation of the diagnostics, as well as evaluating and contrasting various machine learning techniques [26]. The functional connectivity structure acquired from resting-state MRI that was being used to construct the auto-encoder that is semi-supervised for autism diagnosis in this research is proposed [27].
2.2. Marker Enhancements
Label distribution learning requires that the training data contain label distribution information. However, in real life, people often label samples in the form of single-label or multilabel, making it difficult to directly obtain label distribution information. Nonetheless, the labels of these data still contain relevant information about the distribution of the labels. Marker enhancement enhances the supervised information of samples through the implicit correlation between different sample markers, thereby achieving better results in marker distribution learning [28]. For example, the authors proposed tag augmentation as an auxiliary algorithm for tag distribution learning, which is used to mine the implied tag importance information in the training set, promote the original logical tag to tag distribution, and assist tag distribution learning. The authors proposed label-enhanced multilabel learning to reconstruct latent label importance information from logical labels to improve the performance of label distribution learning [29].
3. Cost-Sensitive Marker Distribution Learning for ASD-Aided Diagnosis
3.1. Symbolic Representation
The main symbols in this paper are expressed as follows: Use xi ∈ Rq to represent the ith sample, where q represents the dimension of the feature vector; X = [x1, x2, ⋯, xN] ∈ Rq×N; represents the logical token corresponding to xi, where K represents the number of possible tokens; and ∈ {0, 1}. Similarly, ∈ RK represents the label distribution of the ith sample, where ∈ [0, 1] represents the jth value of the label distribution of the ith sample, satisfying D = [d1,d2,⋯,dN] ∈ RK × N.
3.2. Proposed Methodology
The label distribution learning algorithm for multiclass autism auxiliary diagnosis proposed in this paper is shown in Figure 1. First, the rs-fMRI images are preprocessed, and the functional connectivity matrix is constructed on this basis, and the functional connectivity feature vector of each sample is obtained based on the functional connectivity matrix. At the same time, combining the logical marker data and functional connectivity features for marker enhancement, the marker distribution form of the sample is obtained. Finally, a cost-sensitive label distribution learning model is carried out to obtain a multi classification model for the auxiliary diagnosis of autism.

3.3. Marker Distribution Mechanism
Label distribution learning describes the degree of correlation between each label and sample by introducing a descriptive degree, so it can obtain richer semantic information from the data than multilabel and more accurately express the relative importance difference of multiple labels of the same sample. However, the basic requirement of labeled distribution learning is to have labeled distributed datasets, which is often difficult to meet in reality. The marker distribution data can be obtained by transforming a given multimarket form sample by a marker enhancement method. The label enhancement method based on FCM (fuzzy C-means) and fuzzy operation is adopted [30]. The basic idea is as follows:(1)Use FCM to divide N samples into p fuzzy clusters, and find the center of each cluster, so that the sum of the weighted distances from all training samples to the cluster center is the smallest. Equation (1) lists the specific weighted distance formula: Among them, represents the membership degree of the ith sample to the kth cluster center, μk represents the kth cluster center, β is a fuzzy factor greater than 1, Dist (∗, ∗) represents the distance measure, and each sample the membership degree represents the strength of the association between the sample and the cluster. The clustering result of traditional FCM is greatly affected by the initial value and cannot ensure convergence to the global optimal solution, but in label enhancement, the clustering result of FCM is only used as a transitional bridge. Although the clustering result fluctuates, however, it has little effect on the results of label enhancement, and the gaps between the Chebyshev distance and the KL divergence (Kullback–Leibler divergence) of the results of multiple label enhancements are both below 10−6.(2)Construct an association matrix A between markers and clusters. The elements in the matrix represent the degree of association between markers and clusters. The calculation method of the association matrix is as follows: In the formula, Aj is the jth row of the matrix and Aj is the sum of the membership degree vectors of the samples of the jth class. After the rows are normalized, the association matrix A can be regarded as a fuzzy relationship matrix of clustering and labeling.(3)According to the fuzzy logic reasoning mechanism, the fuzzy synthesis operation is performed on the association matrix and the membership degree, and the membership degree of the sample to the label is obtained [31]. After normalization, it is the label distribution.
The marker enhancement based on FCM and fuzzy operation introduces cluster analysis as a bridge. Through the compound operation between the membership degree of the sample to the cluster and the membership degree of the cluster to the marker, the membership degree of the sample to the marker, that is, the marker, is obtained distributed. In this process, the topological relationship of the sample space is mined through fuzzy clustering, and this relationship is projected to the label space through the association matrix, so that the simple logical labeling generates richer semantic information and transforms it into a label distribution.
4. Evaluation of Proposed Methodology
4.1. Evaluation Metrics
This paper uses both the evaluation metric of the label distribution and the evaluation metric of the multiclassification task for algorithm evaluation. All evaluation indicators and calculation formulas are shown in Table 1. The first six are evaluation indicators for labeled distribution learning, and the last two are evaluation indicators for multiclassification tasks. “↑” after the index name means that the larger the value, the better the algorithm effect; with “↓,” the smaller the value, the better the algorithm effect.
In Table 1, Pj is the precision of the jth class, xnor is the XOR calculation, Dis is the distance, Sim is the similarity, and mAP is the macro-averaging precision.
4.2. Dataset Used
All rs-fMRI datasets used in this paper were obtained from the ABIDE website (Autism Brain Imaging Data Exchange, http://fcon_1000.projects.nitrc.org/indi/abide/). Table 2 shows the composition of each type of sample in each dataset. Taking the NYU (New York University) dataset as an example, the data collection institution of the NYU dataset is New York University. During the collection process, the subjects remained in a still state and did not perform any actions. The specific parameters are shown in Table 2.
In Table 2, UM stands for the University of Michigan, KKI for the Kennedy Krieger Institute, Leuven for the University of Leuven, and UCLA for the University of California, Los Angeles.
Although brain regions are spatially isolated from each other, the neural activity between them influences each other. This paper uses the brain functional connectivity matrix between brain regions as a classification feature [32]. The calculation step (preprocessing step) of the functional connectivity matrix is as follows:(1)According to the resting-state functional magnetic resonance imaging data, use the DPARSF (data processing assistant for resting-state fMRI) tool to extract the average time-series signals of each brain region, calculate the Pearson coefficient between the brain regions, and obtain the functional connectivity matrix(2)Take each row of the functional connectivity matrix as the feature description of each brain region, take the upper triangular matrix of the functional connectivity matrix, and connect the rows in series to obtain the corresponding eigenvectors
4.3. Proposed Algorithm
The proposed CSLDSVR method is compared with six existing LDL algorithms and two multiclassification algorithms. Two multiclassification algorithms are decision tree and K-nearest neighbor (KNN), both of which are classic multiclassification algorithms [33, 34]. The six existing LDL algorithms are PT-SVM, PT-BAYES, AA-KNN, AA-BP (back propagation), SA-IIS (improved iterative scaling), and LDSVR, where “PT” stands for problem transformation, “AA” for algorithm adaptation, and “SA” for specialized algorithm [35, 36]. The specific description of the comparison algorithm is shown in Table 3.
The CSLDSVR algorithm proposed in this paper has four parameters, namely, the weight coefficient C, the type of kernel function, the size of the insensitive region ε, and the kernel bandwidth of the Gaussian kernel. The specific range of parameters is shown in Table 4. The results were calculated using ten-fold cross-validation. The specific operation steps are as follows: Randomly divide the dataset into 10 equal parts in each fold cross validation, and take 1 part as the test set and the remaining 9 parts as the training set. Repeat the above process 10 times, and take the average of the 10 results as the evaluation index.
4.4. Comparison of Mark Distribution Algorithms
Table 5 summarizes the experimental results of six labeled distribution learning algorithms and CSLDSVR on five different datasets, and the experimental results are recorded in the form of mean ± standard deviation. Among them, the bold is the best value of each indicator in different methods on the current dataset. Clearly, in comparison with the label distribution learning algorithm, CSLDSVR has shown excellent results in most cases, and it is more obvious on the UM, UCLA, and KKI datasets. Among the indicators of the labeled distribution, KL divergence is an indicator describing the difference between the two distributions, and the LDL algorithm used as a comparison uses KL divergence as the objective function. The KL divergence of the prediction result of CSLDSVR can be minimized. It shows that the label distribution predicted by the new algorithm is the closest to the real data distribution on the whole, which is better than the comparison algorithm.
Figure 2 summarizes the results of CSLDSVR and the marker distribution algorithm multiclass metrics precision and mAP; from the two most important multiclass metrics, CSLDSVR performs better. Some algorithms have a high accuracy rate but a low macro average because these algorithms do not consider the class imbalance problem, and the model classification is biased towards the majority class. CSLDSVR uses the kernel trick to solve the problem in a more discriminative feature space, and CSLDSVR considers the size of each class, which effectively solves the problem caused by class imbalance.

To verify the performance improvement of the cost-sensitive mechanism, the algorithm in this paper is compared with the LDSVR without the cost-sensitive mechanism. As shown in Table 5, in most cases, the learning effect of the algorithm CSLDSVR in this paper is better; in addition, the standard deviation of the results is basically maintained at a low level, that is, the stability of the algorithm is improved. However, LDSVR does not introduce a cost-sensitive mechanism, and the standard deviation of the results obtained by the algorithm is large and fluctuating. For example, the standard deviation of the Canberra indicators in UCLA and KKI exceeds 0.1.
4.5. Multiclass Comparison Experiment
Table 6 shows the comparison results of precision and mAP metrics of CSLDSVR and two classical multiclassification algorithms, decision tree and KNN, on five datasets. Among them, the bold is the best value of the corresponding indicator in different methods on the current dataset. Observing the experimental results of the KNN method, it can be found that the mAP of the KNN method appears 0.333 3 times, this is because KNN is too biased towards the majority class, and there is an extreme case of classifying all samples into the majority class. In the case of high-dimensional imbalance of autism neuroimaging data, traditional multiclassification algorithms are prone to fall into the dimensional trap or bias towards the majority class. The algorithm CSLDSVR in this paper solves the above problems by using kernel skills and cost-sensitive mechanisms and achieves better results. Good classification model. The cost-sensitive mechanism reduces the overall misclassification cost by increasing the misclassification cost of the minority class and reducing the misclassification cost of the majority class and makes the model avoid leaning towards the majority class. In other words, the cost-sensitive mechanism is based on the original standard cost loss function, adding some constraints and weight constraints, so that the final model is biased towards another minority class that is more concerned in practical applications. This paper achieves the purpose of different misjudgment costs for different categories by introducing 1Nj. In theory, this can avoid the tendency of the algorithm model to the majority class and improve the prediction accuracy for the minority class [37]. In the experiment, the experimental results in Table 6 also verify this theory, and in most cases, the stability of the algorithm has also been improved, and the standard deviation of the experimental results is small.
4.6. Effect of Parameters
In this section, we study the effect of parameter changes on the performance of the algorithm CSLDSVR. Figure 3 shows the changes of the evaluation indicators precision and KL divergence when the parameters C and ε take different values on five different datasets. Comparing and studying two graphs of the same parameter and different indicators, such as Figures 3(a) and 3(c), it can be found that the curve trend of the same dataset is basically opposite, and the point where precision takes the maximum value is generally the same as the KL divergence is the minimum value, which also corresponds to the previous analysis of KL divergence, indicating that when the KL divergence is small, the label distributions of the two are more similar, and the classification results are more accurate.

(a)

(b)

(c)

(d)
It is found that for different datasets, the parameter values for obtaining the optimal solution are not the same, which also shows that in the diagnosis of autism, the data distribution of different data centers is different, and the parameters for building the model should also be different. Moreover, it is found that for a dataset with fewer samples, the result is more sensitive to the change of the parameters, such as for the KKI dataset with only 48 samples, the fluctuation is the largest when the parameter value changes.
It can be seen that the parameters of the CSLDSVR algorithm should be based on the characteristics of the dataset, and the corresponding parameter values should be set to build a model. If the parameter settings are reasonable, CSLDSVR can overcome the high dimensionality and category imbalance of the autism dataset. Thus, the whole section contains the strategy of how ASD detection is done by evaluating several strategies such as SVR and LDL by considering certain parameters. Therefore, the label distribution learning approach overcomes high-dimensional feature classification issues by mapping samples to the feature space and then diagnosing multiclass ASD. This technique outperforms previous methods in terms of classification performance and accuracy, as well as resolving the issue of unbalanced data in ASD diagnosis.
5. Conclusion
This research presents a cost-sensitive marker distribution to enable an ASD-aided diagnostic approach for vector regression based on functional connectivity characteristics collected from rs-fMRI. Since ASD patients’ brain function differs from that of healthy persons,so rs-fMRI is a useful method for capturing brain activity. In this study, researchers have introduced the label distribution learning that solves the label noise problem in multiclassification ASD diagnosis. Furthermore, the new technique have been implemented, which provides class balancing and in addition balances the effect of the majority and minority classes on the objective function using the labeled distribution support vector regression method. The new method employed in this study effectively solves the imbalanced data problem in ASD diagnosis by overcoming the imbalance of the influence of the majority and minority classes on the results obtained in the paper. Besides, it presents a cost-sensitive approach to correct sample imbalance and uses a support vector regression (SVR)-based method to remove label noise. The label distribution learning approach overcomes high-dimensional feature classification issues by mapping samples to the feature space and then diagnosing multiclass ASD. The overall result obtained in this technique outperforms previous methods in terms of classification performance and accuracy, as well as resolves the issue of unbalanced data in ASD diagnosis. However, the improved model is still biased towards the majority class to some extent, and the imbalanced data problem should be improved further as a future study. Researchers can further try to improve the data sampling method or use the synthetic minority sample method, etc. as future prespective. However, relatively high-level distances must also be introduced, which necessitates more prior knowledge. Since prior knowledge is no longer used, the Euclidean distance is used instead. Other advanced distances have their set of benefits that will be refined in future research.
Data Availability
The data shall be made available on request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.