Abstract

Understanding and solving the psychological health problems of college students have become a focus of social attention. Complex networks have become important tools to study the factors affecting psychological health, and the Gaussian graphical model is often used to estimate psychological networks. However, previous studies leave some gaps to overcome, including the following aspects. (1) When studying networks of subpopulations, the estimation neglects the intrinsic relationships among subpopulations, leading to a large difference between the estimated network and the real network. (2) Because of the high cost, previous psychological surveys often have a small sample size, and the psychological description is insufficient. Here, the intrinsic connections among multiple tasks are used, and multitask machine learning is applied to develop a multitask Gaussian graphical model. The psychological networks of the population and subpopulations are estimated based on psychological questionnaire data. This study is the first to apply a psychological network to such a large-scale college student psychological analysis, and we obtain some interesting results. The model presented here is a dynamic model based on complex networks which predicts individual behavior and provides insight into the intrinsic links among various symptoms.

1. Introduction

With the continuous development of science and technology and the continuous progress of society, the pressure on college students in terms of learning, life, emotions, and employment has increased substantially. Understanding and solving the psychological health problems of college students have become a focus of social attention. A recent review showed that the prevalence of depression in college students was 30.6% [1], much higher than the lifetime prevalence of 16.2% in the general population [2]. In addition, the incidence and severity of depression among college students have increased significantly [3]. A survey [4] of 126,000 college students in China showed that 20.3% have psychological problems and 16%–30% suffer from psychological problems or mental disorders such as depression, anxiety, coercion, interpersonal relationship issues, personality disorder, and mental illness. Studies have shown that suicide accounts for the largest proportion of death among Japanese youth in their twenties [5], and suicide is also the leading cause of teenage death in other countries, including the United States. Depression, anxiety, and suicide rates are rising [6, 7]. Many similar studies indicate that the mental health of college students cannot be ignored: it has become an important factor affecting campus safety and social stability. Psychological research can help to provide targeted psychological counseling to college students, reduce their chances of suffering from psychological diseases, and help improve students’ psychological health.

In this regard, scholars have conducted research on psychological issues to explore factors that affect psychological health. Some scholars have used regression analysis. For example, Masuda et al. [8] used logistic regression to study the effects of social isolation on suicidal behavior in the context of fixed social networks. Taliaferro et al. [9] used linear regression and multiple regression to study the factors affecting mood and suicidal behavior in international college students. In recent years, complex networks have received considerable attention in psychological research [10]. In the field of psychopathology, the network perspective was introduced as a general psychopathology concept, especially a new method of analyzing depression. De Beurs et al. [11] used complex networks, that is, the visualization of factor correlations obtained via statistical analysis, to analyze the factors affecting suicidal ideation in patients. They added L1 regularization to make the network sparse. Van Borkulo et al. [12] used the same method to study the factors that influence the persistence of patients with major depression. Complex networks can be used to analyze psychological health because a complex network is an effective tool for modeling complex social systems. Psychological behavior can be conceptualized as complex interactions between psychology and other components, and psychological networks can portray the underlying structures of interactions among these components. This method has been applied to a variety of different fields of psychology, such as clinical psychology [13], psychiatry [14], personality research [15], and social psychology [16].

The main methods used to estimate psychological networks include the Ising graphical model for binary data [17], the Gaussian graphical model for categorical data, and the extended Bayesian information criterion (EBIC) parameter selection method for small samples [18]. In complex network research based on questionnaire data, errors occur due to factors such as carelessness and exhaustion of the respondents. Traditionally, samples satisfying the polygraph problems are assumed to be accurate. However, in fact, the polygraph problems can only remove a portion of the error samples. The samples satisfying the polygraph problem still have errors. The errors show that the answers displayed on the questionnaire are exactly opposite to the real answers or fluctuate up and down. The results of these methods have large errors in the generated network when processing small sample data containing noise. In addition, when the samples are divided into different groups according to the attributes of the students (such as profession), the traditional methods ignore the intrinsic connection between each subpopulation. For example, although students of different majors have different professions, they belong to the same overall population, which is connected internally; that is, there are internal connections among different subgroups. Because this type of connection exists objectively, ignoring this connection is equivalent to ignoring a precondition, and the results will be unscientific. Therefore, this internal connection cannot be ignored. A network estimation method based on a multitask Gaussian graph model is proposed to solve the above problems. In contrast to existing methods, this method can simultaneously learn multiple tasks with intrinsic links and make full use of the internal links among multiple tasks. Thus, the proposed method is suitable for estimating networks of samples with noise and improving the generalization performance [19]. The advantages of the method proposed in this paper include the following aspects:(i)This paper applies the psychological network to the psychological health analysis of college students on a large scale for the first time. Based on a large quantity of college psychological questionnaire data, we estimate and analyze the psychological networks of small subpopulations based on big data samples.(ii)The multitask Gaussian model utilizes the intrinsic links among multiple tasks, thereby reducing the error in estimating networks of small samples with noise and providing a machine learning perspective to study psychological networks.(iii)The results show that some characteristics, such as “interested in learning,” have a strong correlation with the students’ academic status. The psychological differences between students of different majors provide insight for college psychological counseling.(iv)The model proposed in this paper is a dynamic model based on complex networks that can predict individual behavior and provide insight into the intrinsic relationships among symptoms.

The following sections are structured as follows. This article introduces the relevant research on psychological networks in Section 2 and describes the dataset, Gaussian graphical model, multitask learning, and the derivation of the multitask Gaussian graphical model in Section 3. Then, we describe the experimental results, including experiments on a simulated dataset and a real dataset in Section 4. Section 5 discusses some experimental results and the Gaussian graphical model. Section 6 presents a summary and forecasts.

Traditional psychoanalysis commonly uses logistic regression, linear regression, or multiple regression methods. However, with the increased popularity of network analysis in recent years, complex networks have gradually been applied to psychoanalysis [20]. A network is a general way of visualizing and analyzing interactions between nodes. It is widely used in many fields [2125]. The most notable networks are social networks, which have been used and studied for decades [26]. In a social network, the nodes are people who are connected by some type of relationship, such as friendship. In the network, the focus is on obtaining connections between network nodes [27]. Unlike those in social networks, the nodes of a psychological network are not people, but psychological variables, such as emotional states or symptoms [28, 29]. Psychological behavior is conceptualized as a complex interaction of psychology and other components. Researchers use psychological networks to portray the underlying structure of these component interactions. In social networks, the connections between nodes are observable, and the connections between nodes in a psychological network are parameters estimated using existing modeling techniques [30]. One popular network model that is used to estimate mental networks is the paired Markov random field (PMRF) [15]. Two main graph models exist for different types of data: for binary data, the appropriate PMRF model is the Ising model [17]; when the data obey a multivariate normal distribution [15] or are ordered data [31], the Gaussian graphical model is typically used. For continuous data with a nonnormal distribution, a transformation can be applied before estimating the Gaussian model [32]. For small sample problems, researchers introduced the LASSO method, which is a form of regularization that can be adjusted by EBIC [33]. The EBIC method of selecting parameters performs well in estimating the Ising model and the Gaussian graphical model [31], but it does not work well when faced with multiple noise-containing datasets. In particular, when the original dataset is split into multiple subdatasets, the sample size of each subdataset is not equal. When the network differences between subdatasets are analyzed, the previous graph models will result in large errors between the estimated parameters and the real parameters because of the noise in the dataset.

When a relationship exists between the tasks to be learned, it is better to learn all the tasks simultaneously rather than learning each task independently [34]. Many studies have shown that multitask learning has the same advantages as single-task learning [35, 36]. Multitask learning can combine multiple related tasks and learn multiple tasks simultaneously. The multiple tasks help each other by sharing relevant information, complement each other’s domain-related information, promote learning, and enhance the generalization effects. Therefore, we introduce multitask learning into mental network modeling to reduce the interference of noise in small sample learning and to construct an accurate network model.

3. Materials and Methods

Asif et al. [37] have utilized educational data mining to analyze undergraduate students’ performance. Here, to better understand the psychological status of students and the impact of psychological factors on student achievement, this paper estimates psychological networks based on two aspects. In this chapter, we introduce the dataset and multitask Gaussian graphical models.

3.1. Data Collection and Data Description

We conducted a questionnaire survey on freshmen enrolled in a university using the Chinese College Students’ Mental Health Scale (CCSMHS) [38]. The scale was compiled using empirical methods based on literature research, consulting case analysis, expert interviews and discussions, and open questionnaires. The norm of Chinese college students was formulated, and its reliability and validity were tested. The scale was shown to have good reliability and validity and can be used to detect the psychological health level of Chinese college students. The scale contains 104 items, 4 of which are recurring questions that are used to test whether the questionnaire is qualified. After removing the lie test, the entire scale has a total of 100 questions, which are divided into 12 dimensions: somatization, anxiety, depression, inferiority, social withdrawal, social attack, sexual psychological disorder, paranoia, compulsion, dependence, impulse, and psychotic.

For freshmen enrolled in a Chinese university for two consecutive years, CCSMHS was used to conduct a questionnaire survey and obtain data for two grades. First, data containing professional information and achievement information were selected. Then, questionnaires that did not satisfy the four lie detection problems were removed. Finally, we excluded questionnaires where the 104 questions were answered in less than 104 seconds. For Grade I, 2065 questionnaires were obtained: those from 1274 male students and 791 female students. For Grade II, 1871 questionnaires, 939 from male students and 932 from female students, were obtained. The statistics for the two grades are shown in Table 1, and the remaining statistics are shown in the Appendix (available here).

3.2. Multitask Gaussian Graphical Models

In this section, we introduce the multitask Gauss graphical model. We introduce the Gaussian graphical model in Section 3.2.1, multitask learning in Section 3.2.2, and the mathematical derivation of the multitask Gaussian graphical model and its significance in Section 3.2.3.

3.2.1. Gaussian Graphical Models

A Gaussian graphical model consists of a set of variables and a set of lines used to visualize the relationships between variables [39]. The thickness of a line indicates the strength of the relationship. The lack of an edge connection between two variables indicates that there is no relationship between the two variables or that the relationship is weak and can be ignored. Note that, in the Gaussian graphical model, the lines represent partial correlations [40]. A partial correlation is the correlation between two variables when controlling for other variables in the dataset, and this type of correlation avoids false correlations. A false correlation occurs in bivariate analysis when the correlation between two variables may be false, that is, caused by a third variable present in the dataset. The relationship estimated by the Gaussian model can be interpreted as a partial correlation coefficient, which reduces the risk of discovering false relationships.

Although the visual representation of the relationships between variables helps to visually perceive the data, the visualization of Gaussian graphical model becomes worse when the estimated graph contains a large number of dense lines. Therefore, the LASSO method is commonly used to obtain sparse graphs [41]. LASSO uses a regularization technique derived from machine learning to impose additional penalties on model complexity and estimate many parameters of the network to be exactly zero, thus obtaining a sparse graph [31]. The degree of sparsity in the graph is controlled by the tuning parameter. The smaller the tuning parameter is, the denser the resulting graph is. In general, the EBIC is used to select the optimal value of the tuning parameter [18], so that the strongest relationship remains in the graph (maximizing true positives).

3.2.2. Multitask Learning

In our scenario, the means of different subsamples may vary greatly, but the covariance matrices are similar. When the number of samples is small, the single-task model produces large errors, so we use multitask learning to improve the performance of each model. The schematic diagram of multitask learning is as shown in Figure 1.where is the number of tasks, is a hyperparameter, is the loss function of the task, and the average model .

3.2.3. The Derivation Process of Multitask Gaussian Graphical Models

In the Gaussian model, p-dimensional variables satisfy multivariate Gaussian distributions with mean , covariance matrix , sample covariance matrix , and precision matrix . The precision matrix reflects the conditional independent relationships between the nodes. Specifically, is equivalent to stating that the variable and the variable are independent of the remaining variables and is also equivalent to an edgeless connection between the node and the node . For the Gaussian graphical model, the reconstructed network structure can be estimated by estimating the inverse of the covariance matrix. One way to estimate the Gaussian graphical model is graphical LASSO.

In 2007, Yuan and Li proposed penalty likelihood estimation [42]. The density function of the Gaussian distribution is

The logarithmic density function and the negative log-likelihood function of the Gaussian distribution with a mean of 0 and a covariance matrix of areandwhere is the sample covariance matrix, is the determinant of the matrix, and is the trace of the matrix. After introducing the LASSO penalty, the estimate of by the regularization model is

To solve the above problem, in 2008, Friedman [41] proposed an efficient coordinate descent method. Consider the estimates of and , and divide and as follows:

Equation (5) can be transformed towhere , and equation (7) can be solved to obtain . Then, . For each , is updated column by column with until convergence.

Based on the above, in our study, the objective function is as follows:where K is the number of the psychological networks in our experiment of K subgroup. represents the average covariance matrix. As shown previously, there are internal connections among different subgroups. The covariance matrices of all subpopulations are similar. The regularization term means that, in our objective function, we should minimize this term. Each subnetwork iterates to convergence, when the error before and after the iteration is less than . After iterative updating of all subnetworks once, the average covariance matrix is updated.

Since the local optimal solution of the convex optimization problem is the global optimal solution, considering the estimates of and , equation (8) can be transformed intowhereis an estimate of , andis the sample covariance of the task, , and . For each task, for , is used to update column by column until convergence.

There are some methods to solve the multitask learning model such as [43, 44]. But, in our study, the problem is about Gaussian graphical model, so we use the method proposed by [41] to solve it. We first transform the problem described by equation (8) to equation (9). Then, we use the convex optimization module, CVXPY, to obtain the results The computing process can be seen in Algorithm 1.

Input: multivariate samples from conditions
Output: networks, one for each condition
(1)for each do
(2) Calculate generalized correlation matrix using correlation measures
(3)end for
(4)Use cvxpy module to optimize (9) above
(5) Solution of the optimization problem in (9)
(6) Using
(7) Using updates column by column until convergence for i in [1,K]
(8)Solution of the optimization problem in (8),
(9)for each do
(10) Generate networks k from
(11)end for

4. Result

4.1. Validation Study

To verify that the multitask Gaussian graphical model proposed in Section 3.2.3 can effectively estimate the psychological network with small samples with noise, we use the model to estimate the network of an artificially generated noise-containing simulation dataset and compare the estimation results with those of real networks. If the results produced by the multitask Gaussian graphical model and the traditional Gaussian graphical model differ greatly from the real network and the error of the former is much smaller than that of the latter, then the model is valid.

4.1.1. Simulation Dataset Generation

To obtain a representative psychological network structure, this paper starts from the public Big Five dataset, which consists of 13278 samples, including 41 items. The Gaussian graphical model is used to obtain the precision matrix , and the covariance matrix is obtained by inverting the precision matrix. Given and the mean , we can generate the dataset , whose sample size is . Then, by slightly modifying and modifying , a new covariance matrix and mean are obtained, which are used to generate dataset of other tasks, whose sample size is . After operations, we obtain a dataset of tasks. The dataset of the task is , whose sample size is , following a multivariate Gaussian distribution with mean and covariance , where .

Considering that the real data are noisy and do not obey a multivariate Gaussian distribution, we add a fixed proportion of noise samples that do obey a Gaussian distribution to generate real data based on the previously generated datasets. The process of adding noise is as follows.

Set the noise sample , the original dataset without noise, , and . Let the ratio of noise samples be ; then, the number of noise samples is , and the randomly generated noise is . After noise is added to the original dataset, the new noise-containing dataset is . When is a zero vector, the noise is unbiased noise; otherwise, it is biased noise.

In this paper, we generated 17 noise-free raw datasets with sample sizes of 30, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, and 800. The ratio of added noise is 0.2, and the noise is a biased noise with a mean of 2.

4.1.2. Comparison of the Single-Task Learning and Multitask Learning Results

In this paper, the Gaussian graphical model is used to obtain the accuracy matrix and of each task; then, is calculated as the error of the task through single-task learning. The multitask Gaussian graphical model is then used to learn multiple tasks simultaneously to obtain and , and is the error of the task after multitask learning, where .

The results are as Figure 2 shows.

The experimental results show that multitask learning can effectively reduce the error, and as the number of tasks increases, the error decreases. This reduction is more obvious for small sample datasets.

The simulation experiment demonstrates that the multitask learning result is significantly smaller than the single-task learning result when facing small sample data with noise, which indicates that the multitask-based Gaussian model can effectively estimate psychological networks based on a small sample dataset with noise. In the subsequent psychological network analysis of the subpopulations, we use this method to estimate the psychological networks of the subpopulations because the sample size of the subpopulations is small.

4.2. Application to Real Data

As mentioned above, to better understand the psychological state of students and the impact of psychological factors on student achievement, based on the data from 3936 valid questionnaires, psychological networks are estimated in two aspects and analyzed. First, we analyze the psychological status of the overall population and the relationship between psychological factors and student achievement. Then, we further divide the population according to student achievement, majors, and dimensions and conduct a more detailed analysis of the psychological networks of the subpopulations to obtain a series of conclusions.

4.2.1. Overall Network Analysis Results

To analyze the psychological status of the overall population, we first use EBICglasso [18] to estimate the overall psychological network. For the sake of analysis, we present the network only after threshold filtering, as shown in Figure 3(a). The most relevant nodes are Q14 “without other people’s arrangement, do not know what to do” and Q15 “if the teacher does not arrange the task, they do not know what to do,” and there is a strong positive correlation between the two nodes.

To study the relationship between psychological factors and student achievement, we include the “score” node in the baseline and filter with the same threshold to obtain the network shown in Figure 3(b). The node that is most relevant to the “score” is Q43 “has no interest in learning,” which is negatively correlated with grades. Interest in learning is the most influential factor on performance.

The central indicators for the two networks are shown in Figure 4. There is no difference in the most central nodes: Q92 “sexual fantasy prevents me from concentrating on learning.”

4.2.2. Analysis of the Psychological Network of Students with Good Grades and Poor Performance

After analyzing the relationship between psychological factors and performance, we analyze the psychological network differences between students with good grades and those with poor performance. In previous network studies of psychopathology, mental illness was conceptualized as a complex network of interacting symptoms, and studies have found that the disease is more closely linked than the diseased network; that is, there is a clear association between symptoms. This article conceptualizes psychological symptoms as a complex network of interacting symptoms. We suspect that students with good grades have a more densely connected network of psychological symptoms at baseline than that of students with poor performance.

To verify this hypothesis, the whole population was divided into two groups, “winners” and “losers,” according to whether they were awarded. After estimating the networks, we use the global strength to compare the structural differences between the two networks [12]. Global strength is defined as the weighted sum of absolute connections [45]. We used the network comparison test (NCT) to detect significant differences in global network strength. There are other methods to estimate the difference between the two networks such as differential graphical models (DGM) [46, 47]. However, through DGM, we cannot obtain the difference on the node level such as which node is the most important in one network. Figures 5(a) and 5(b) are the psychological networks of the “losers” and the “winners.” NCT revealed that the network of winners was tighter and that the difference was significant (). Figure 6 compares the “winners” and the “losers” of the central indicators. For the whole population, “I feel that the spermatorrhea (or menstruation) is dirty” is the most central node and has the highest score.

4.2.3. The Analysis of Different Professional Psychological Networks

The university contains many majors, and students with different majors have different characteristics. Below, we analyze the psychological networks of different professional groups. Because the two grades contain a wide variety of majors, the number of students with some majors is very small due to missing data (see the Appendix for professional information and demographics). The ratios of students in Science and Engineering, Literature and History, and Sports or Art to students in Statistics are . According to this ratio, 18 majors are selected, namely, Chinese language and literature, primary education, English, law, Japanese, small language, mathematics and applied mathematics, software engineering, chemistry, biological science, international economy and trade, computer science and technology, physics, information management and information systems, business management, sports, drama, and painting. The precision matrix is used to define the structural similarity. According to the positive and negative edge weights, the precision matrix is divided into a positive matrix and a negative matrix, which are, respectively, denoted as “pos” and “neg,” and the similarity between networks A and B is as follows:where is the two-norm of the difference between the two matrices.

The multitask Gaussian graphical model is used to estimate the psychological networks of the abovementioned professional population, and the structural similarity between the two networks is calculated (see Tables 2 and 3 for details). In Grade I, the smallest gaps with other majors are concentrated in painting, physics, small languages, drama, and computer science and technology, and the largest gaps are concentrated in Chinese language and literature, information management and information systems, and mathematics. In Grade II, the smallest gaps with other majors are concentrated in painting, computer science and technology, and physics, and the largest gaps are concentrated in Chinese language and literature, drama, film and television literature, international trade, and biological sciences. In short, the gap between painting and other professional structures is the smallest, and the gap between Chinese language and literature and other professional structures is the largest.

Furthermore, we analyze the difference in scores of these professions in 12 dimensions. The average score of each profession in each dimension represents the overall psychological state of the profession in this dimension. We combine the data of two grades and count the scores on 12 dimensions for professions with more than 25 people. Because each professional class has approximately 30 people, the missing data for professions with fewer than 25 people account for more than half.

The results show that mathematics and applied mathematics have the highest scores for obsessive-compulsive disorder; that is, students in this major are most likely to suffer from obsessive-compulsive disorder. The information and computer science majors have the highest paranoid scores, and students in mathematics majors may be more obsessive and paranoid. Students with painting majors are least anxious, least inferior, and most unbiased. Thus, students with painting majors may have a more balanced attitude and a stronger ability to adapt. However, the students with world history majors are most likely to have problems with their psychological status, and they have high scores on six dimensions.

The principle of relationship analysis for different professional classes and psychological factors is the same as that of the overall network analysis. A detailed analysis can be found in the Appendix. The following is the analysis of the psychological network differences between “winners” and “losers” in different professions. The method is different from the overall large-sample data analysis.

4.2.4. Differences between “Winners” and “Losers” in Different Professions

To further analyze the psychological differences between the “winners” and “losers” of different professions, we select the majors with more than 25 “winners” and “losers” according to the profession and award data. We use the multitask Gaussian graphical model to obtain the subpopulations’ psychological networks. Because NCT has a large error in estimating the saliency in the case of a small number of samples [10], the similarity method defined above is used to measure the structural differences of these psychological networks, as shown in Table 4.

The physical education major is the most similar to other professions, and the least similar is the Chinese language and literature major. The following is a comparison of the central indicators of these two majors, as shown in Figure 7. In the network of “winners” in the Chinese language and literature major, Q5 “does not allow yourself to make mistakes” node is central. Perhaps the winners do not allow themselves to make mistakes, and this node has a substantial impact on other nodes. In the network of “losers,” Q10 “does not take the initiative,” Q52 “I feel that I am a loser,” and Q68 “frustrated and depressed” are most central. Unsuccessful people may be more prone to the above symptoms. The nodes in the network of “winners” in the physical education major are more central than those in the “losers” network. The network of “winners” has a denser structure, and the relationship between mental states is closer. The “winners” Q2 “doing things repeatedly must be checked” is more central than in the “losers” network. Among the three indicators of “losers,” only Q0 “headache or dizziness” and Q10 “do not act actively” are higher than they are for the “winners.” Therefore, the abovementioned symptoms may be more likely to occur in “losers.”

4.2.5. Analysis of the Psychological Network of Subpopulations with High/Low Scores in Different Dimensions

To explore the psychological differences in people with different likelihoods of different symptoms, such as depression, the likelihood of an individual suffering from depression is proportional to the individual’s score on the dimension of “depression.” Therefore, we extract people with high and low scores for each dimension and obtain 24 subpopulations. Then, the psychological networks are estimated and analyzed to determine the factors that affect each psychological health symptom. We combine the data of the two grades. To make the data more extreme, we take the top 10% and the bottom 10% of the samples on each dimension. According to Liu et al. [48], the two dimensions of anxiety and depression are the most concerning, so we use the multitask Gaussian graphical model to estimate the psychological networks and analyze mainly these two dimensions. The central indicators are shown in Figures 8 and 9. In the network of people with a high score on anxiety, Q16 “fear of taking consequences” is the most central and may influence other psychological states. The centrality of nodes in the network of people with high scores on depression is higher than that of people with low scores; that is, the mental state of people with high scores on depression has dense links. Furthermore, Q32 has the largest central disparity in the two populations, and those with high depression scores may be more likely to suffer other mental states because they “hate the people around them.”

5. Discussion

The complex network is used to study the psychological status of college students in two grades, and the structural differences in the psychological networks of different majors are compared. Then, the psychological network differences of people with different mental illness risks and the influence of psychological factors on academic performance are analyzed. The multitask Gaussian graphical model is used to reduce the errors in estimating the networks based on small samples with noise, which also provides a machine learning perspective for studying psychological networks. This meaningful study enables us to predict students’ academic performance based on psychological test results and helps teachers to provide timely assistance to students with academic difficulties and to provide psychological intervention for students with psychological problems.

Some of the results of this study deserve further discussion. In the preprocessing stage, after screening, we find that there are more “winners” than “losers” in the valid samples. However, in reality, only a small number of people received rewards in colleges and universities. More data from “winners” are reserved, indicating that the winners are more serious in answering the questionnaire and that the serious students are more likely to score better. In addition, as shown in Figure 3(c), Q43 is not the only factor related to achievement; the remaining factors are related to the scores, although their effects are small. Positively related factors, such as Q31 “suspicions that others know my inner thoughts,” Q97 “exceeded some worries about something that is irrelevant,” Q74 “I feel that others love the limelight,” and Q36 “easy to be nervous,” do not appear to be the reason for good grades but are common “performance” indicators of the students with good grades. These characteristics are not acknowledged by the students with good grades but are expressed through data analysis. The relationship between psychology and achievement can be further studied.

Finally, the Gaussian graphical model assumes that the data are subject to a multivariate Gaussian distribution. Obviously, the data are assumed to be continuous, but the data in the questionnaire are discrete order data. According to Bhushan et al. [40], the current method of estimating the regularization coefficients of the glasso and EBIC models commonly used in psychological networks relies on hypothetical ordered observation variables and normally distributed latent variables, which may not be reasonable. For example, some psychopathological symptoms, such as suicidal ideation, may have a true zero because of the absence of this symptom, resulting in failure to follow normal distribution.

6. Conclusion

The multitask Gaussian graphical model is used to reduce the errors in estimating networks based on small samples with noise. On the basis of this model, we estimate the psychological networks of many subpopulations in colleges and universities, such as psychological networks of different professions. We analyze the differences in the psychological network structure of different professions and find that the difference between the painting major and the other professions is the smallest. Further analysis shows that the reason for this result may be the fact that students majoring in painting have lower scores on all dimensions and have stronger psychological adaptability. Students are extracted according to high and low scores on each dimension for analysis, and the factors affecting depression and anxiety are determined. Students who easily become anxious tend to be “afraid to bear the consequences,” which has a substantial impact on other nodes in the psychological network. “Hate the people around” has a considerable impact on other symptoms in the psychological network of people who are prone to depression. After analyzing the relationship between performance and psychological status, the factor that has the greatest impact on the performance C “not interested in learning” is found to be negatively correlated with the scores; that is, the less interested a student is in learning, the worse the score is. Further analyses of the relationship between the scores of different majors and psychology and the factors that influence the scores of specific professions are carried out in detail in the Appendix.

With the recent development of new techniques for collecting and integrating fine-grained student-related datasets, there is potential to gain a better understanding of the relationship between students’ mental health and campus risks. Several interesting research topics exist, such as (1) mining psychological actors behind campus risks, for example, suicide, depression, Internet fraud, and campus bullying, (2) integrating environmental data into the model to research the relation between the campus environment and students health and actions, and (3) identifying the process of how a risk event occurs. We will leave these topics for future research.

Data Availability

The datasets used to support the results of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Authors’ Contributions

Qiang Tian and Rui Wang contributed equally to this work.

Acknowledgments

This work was supported by the National Key R&D Program of China (2018YFC0831000, 2018YFC0809800, and 2016QY15Z2502-02) and the Science and Technology Key R&D Program of Tianjin (18YFZCSF01370).

Supplementary Materials

Appendix A. Figure A.1: the psychological networks of the four majors. Table A.1: sample demographics of Grade I (N = 2065). Table A.2: sample demographics of Grade II (N = 1871). Table A.3: the numbers of students in different majors. Table A.4: the questions in the CCSMHS. (Supplementary Materials)