Abstract

Aiming at the intelligent needs of psychological state assessment of university students, the text information-based psychological problem identification approach is investigated in the paper. This approach uses the text of student forums within universities as the database and introduces the convolutional neural network (CNN) model in deep learning, which contains a convolutional layer, a pooling layer, and a fully connected layer. After the convolution is completed, the convolution result is delinearized by the activation function, and then, pooling is performed to improve the fitting ability of the network for nonlinearities. For data processing, behavioral features, attribute features, content features, and social relationship features are extracted from text information as the input of the CNN. The psychological lexicon of expertise (LIWC) is used to enhance the efficiency of text word frequency statistics when performing text content extraction. To evaluate the performance of the proposed method, simulations are performed in the open dataset of CLPsyh2017 ReachOut Forum, and the FastText method is used as a comparison. The results show that the CNN model achieves an accuracy of 0.71 in the full-sample domain, which is significantly higher than that of the FastText model at 0.64. In the early warning evaluation of mental states, the CNN performance is better than that of FastText.

1. Introduction

According to the latest research statistics from the World Health Organization (WHO), mental health disorders have become the fourth most common disease worldwide. Taking depression as an example, about 300 million people worldwide are currently suffering from the ravages of depression [15]. With the development of China’s economy and society, people have started to pay more attention to their mental health. At the university stage, due to the rapid changes in the internal and external environment in which students live, more students are unable to adapt in time and are prone to psychological problems. According to research statistics, the psychological problems of college students have obvious stage characteristics, and more students are unable to detect their psychological changes in time, which leads to the deterioration of psychological problems and serious consequences [612].

To detect the psychological problems of college students and provide psychological help on time, the intelligent psychological state evaluation method is studied in this paper. Considering that psychological problems are difficult to detect by themselves and that students are generally resistant to psychological counseling and investigation, the textual information analysis method is used to identify psychological problems. The Internet is an important platform for students’ extracurricular spiritual life, and various social networks generate a large amount of textual information everyday, which can reflect the changes in students’ psychological status. Based on the text resources generated by the internal student forums of universities, this paper introduces artificial intelligence algorithms and deeply investigates the psychological state evaluation and early warning model.

In this section, we have presented some related discussion about the related works.

2.1. Mental State Evaluation Based on Textual Information

Campus forums in higher education institutions are important places for reflecting students’ voices and expressing their personal opinions, and it is important to automate the monitoring of this place to grasp students’ current psychological status and predict their psychological future dynamics on time. The social network-based assessment of college students’ mental health requires the collection of various characteristic indicators reflecting mental health, as shown in Figure 1 [1315].

It can be seen from Figure 1 when assessing mental states, based on text content, the main focus is on four characteristics including behavioral characteristics, attribute characteristics, content characteristics, and social relationship characteristics.

Behavioral characteristics, : online behaviors that can portray the psychological characteristics of users. From the perspective of psychology, the frequency of college students’ posting, commenting, and liking on the forum, as well as the length of time students are active online are influenced by the state of students’ mental health.

Attribute characteristics, : the characteristics left by college students on online forums that portray their basic personal information, such as age, gender, place of origin, major, and whether they are single.

Content features, : text messages are left directly by college students on the forum, which can truly reflect their true inner thoughts.

Social relationship characteristics, : in psychology, the social relationship is the interrelationship between students in various contexts such as school and society due to various behaviors such as studying and socializing. In social forums, students follow each other and have different intimacy to make connections with different users. If a student is considered as a node and its active and passive attention behaviors are considered as a connection, a social network can be mapped for each student, and this network is also important for the assessment of students’ mental health status.

Based on the analysis in Figure 1, the task definition of intelligent mental state assessment is given here. Under the campus forum collection, the following collections are defined, which are represented by the following equation:where D denotes the set of campus forums, P denotes N different posts in the forum, H denotes L different topics of the posts, and R denotes the coupling between posts.

At this point, the mathematical definition of intelligent psychological evaluation and early warning is as follows. For any element p in a set D, search for a mapping relation m and its corresponding set of features F by using the following equation:where C is the result of the classification of text to mental state. Equation (2) indicates that each text message published by each user corresponds to a classification, which can characterize his mental health status and thus alert the mental health teachers in universities to provide timely intervention. The mapping relationship m used in this paper is a convolutional neural network (CNN) in deep learning.

Furthermore, in [16], the study focused on speech monitoring approach of depression recurrence under the Chinese language condition. For the speech gathering, it is distributed into two portions, semantic feature and acoustic feature. As the acoustic signal of the foreign speech database is supplementary perfect whereas domestic one is malfunction. This work states to the foreign speech database and gathers normal and depression speech database from domestic medicinal organizations. This work also targets on the algorithm of speech information, relates the numerous algorithms at abroad and home, investigates the benefits and drawbacks, and then selects the top from them. Lastly, the speech monitoring platform of depression recurrence is achieved. In [17], a commonly used approach Mel Frequency Cepstral Coefficient (MFCC) is presented for speech processing in depression monitoring. The low-frequency MFCC can be used to recognize patient speech; however, it is partial by a definite degree of noise. In genuine study, smart headsets are usually used to gather audio; noise in these processes can be efficiently reduced by high-band voice activity detection, conservative speech segment choice strategy, and particularly personalized normalization algorithm.

For the detection of depression, Afshan et al. [18] acquired the clustering data by the Gaussian Mixture (GM) model and maximum probability assessment. The i-vector classification method was used to combine speech quality and MFCC features. Similarly, it the consistency of by speech quality features to detect depression is proved. In [19], the possibility of multilingual database in the detection of depression by matching Turkish and German language is established and the possibility of multilingual fusion algorithm is discovered, which delivered a decent example for the formation of a comprehensive database in China.

In the above works, it is to be noted that, most of the work focused on simple features and data dependent feature to the detection depression. Compared to the previous work, our proposed deep learning CNN model performed superior in sportsman’s mental state evaluation and early warning.

3. Convolutional Neural Network

The basic structure of the CNN model used in the paper is represented in Figure 2. This contains an input layer, convolutional layers, pooling layers, and a fully connected layer. The convolutional layer performs the convolutional operations and the pooling layer performs the pooling operations. In the input layer, the text content is first processed into a sequence of word vectors of length n with the help of LIWC and then converted into a sequence of word vectors of length n with the help of word embedding (Word2Vec) by using the following equation:

3.1. Convolutional Layer

In the convolution layer, the word vector is first divided by using the following equation:

Convolution is a unique operation in CNN, and the convolutional operation can be used to obtain local semantic information at different locations of the text for feature detection and extraction with the help of convolutional kernel windows of different sizes. For the division vector in equation (4), the convolution operation in equation (5) is used to process one by one:where ʄ is the convolution kernel function used in the convolution and  ⟶ is the eigenvalue obtained after convolution.

After the convolution is completed, the activation function is used to delinearize the convolution result and then perform the stitching. At this point, the feature matrix G of the output of the convolution layer can be obtained in the following equation:

3.2. 2. Pooling Layer

The pooling layer is used to downsample the features obtained by convolution to reduce the feature dimensionality and prevent the network from being too complex, resulting in reduced operational efficiency and overfitting. The pooling method used in this paper is maximum pooling, which is represented in the following equation:

3.3. Fully Connected Layer

The fully connected layer is used to connect all the feature values obtained after convolution and pooling operations and use them as the final feature vector to characterize the text information. The computation in the fully connected layer is performed by using the following equation:where is the original feature information obtained after full concatenation and y is the final classification result.

4. Proposed Approach

In this section, different phases of the proposed approach are discussed.

4.1. Data Preprocessing

Since it is oriented to the internal forum of university students, the textual information of existing open forums can be selected for the simulation of the model to ensure the fit of the application scenario. In this paper, the training set of the CLPsyh2017 ReachOut forum is selected. In Table 1, the statistics of target data are represented.

In Table 2, the structure of each data item is represented. In this dataset, each piece of data consists of 6 parts including time of posting, author, section, number of reads, number of likes, and content.

Table 3 represents the dataset collected which is labeled into 4 categories and the amount of each category.(i)Crisis indicates a psychological problem tends to self-aggression.(ii)Red indicates a psychological problem that suffers from severe psychological distress.(iii)Amber indicates a psychological problem that is likely to occur.(iv)Green indicates a psychological problem that has a low probability of occurring. The amount of data corresponding to each category is shown in Table 3.

The paper is based on the LIWC dictionary for text data processing. In extracting linguistic feature information, equation (9) is used.

For a post with sample size |D| and length ,

The frequency of occurrence of its category l (Crisis) in the category in Table 3 is calculated by using the following equation:

Based on the word frequency, the standard deviation of words can be calculated in equation (11). The larger this indicator is, the greater the difference in such mental issues for the words in that category is. The standard deviation is calculated as follows:where term frequency (TF) is calculated in the following equation:

When model parameters are set, the number of samples in different categories in the dataset varies greatly in terms of the number shared. The classification accuracy of the deep learning network will gradually deteriorate with the growth of the unevenness of the sample number share, so the weights of the samples of different categories are distinguished by using the following equation:

The loss function used in the training is determined by the following equation:

When determining the number of iterations of the model, the number of manually labeled samples is small due to the samples used in the paper. If too much iteration is performed, the CNN network will be overfitted. If the number of iterations is too small, the accuracy of the model will not reach the requirement. Figure 3 shows the accuracy of the model with a different number of iterations in the training and validation sets. It can be seen that when the number of iterations of the training set is small, the accuracy of the validation set is consistent with that of the training set and the model accuracy is low. When the number of iterations of the model is large, the accuracy of the model in the training set increases, but the gap between the accuracy of the model in the validation set and that in the training set becomes larger. At this time, the model appears to be overfitted. Therefore, to balance the relationship between model accuracy and overfitting, we have chosen about 600 of the model.

In Table 4, the final parameters of our proposed neural network model are discussed.

4.2. Simulation Results

To better evaluate the effectiveness of our model in identifying the psychological states of college students, the FastText model was introduced for comparison experiments. Before experimenting, the original 4 categories of Crisi, Red, Amber, and Green were reclassified to distinguish the different mental states. Table 5 shows the five reclassified categories, for which the two metrics F1 and Acc (classification accuracy), which are commonly used in machine learning classification problems and are used as the evaluation metrics of the models. The test results of the two models are shown in Tables 6 and 7.

It is important to note that, in the model’s index performance, nongreen F1 is the average of nongreen F1, which reflects the model’s ability to identify all psychologically unhealthy students in the validation set, while flagged F1 is the average of green F1, which reflects the model’s ability to distinguish psychologically healthy samples from psychologically unhealthy samples. The results showed that the CNN model improved by 0.05 and flagged F1 improved by 0.06 compared with the FastText model in terms of nongreen F1, which indicated that the CNN model improved in both sample differentiation and recognition of nonhealthy samples.

The CNN model improved by 0.11 over the FastText model for urgent F1, which indicates that the model is more capable of distinguishing between general and urgent psychological problems. This can help students get help quickly for their psychological problems. In terms of the accuracy of the model for each category, the CNN model outperforms the FastText model; in terms of the accuracy of the full sample, the CNN model achieves an accuracy of 0.71, which is higher than the FastText model’s 0.64, an improvement of 0.07. In summary, the CNN model has a better performance in the evaluation and warning of psychological states.

5. Conclusions

To achieve intelligent evaluation and timely warning of psychological problems of students in higher education institutions, this paper uses the textual information generated in students’ daily life for the extraction of relevant features of psychological problems from the perspective of monitoring public opinion in campus forums. Compared with traditional psychological questionnaires and psychological counseling, it can detect students’ psychological problems in study and life in a more hidden, effective, and timely manner. The simulation results show that the intelligent psychological state recognition method based on CNN proposed in this paper has better accuracy and differentiation ability in the recognition of various psychological problems and can be applied to the existing psychological work in universities.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.