[Retracted] Analysis of College Students’ Ideological and Political Dynamics and Communication Path Based on Reinforcement Learning

Wu, Wenbin; Liu, Hongwei

doi:https://doi.org/10.1155/2022/9704315

Journal of Sensors

On this page

Abstract Introduction Conclusion Data Availability Conflicts of Interest References Copyright Related Articles

Research Article Retraction

!

This article has been Retracted. To view the article details, please click the ‘Retraction’ tab above.

Special Issue

Advanced Sensor Technologies in Agricultural, Environmental, and Ecological Engineering 2021

View this Special Issue

Research Article | Open Access

Volume 2022 | Article ID 9704315 | https://doi.org/10.1155/2022/9704315

[Retracted] Analysis of College Students’ Ideological and Political Dynamics and Communication Path Based on Reinforcement Learning

Wenbin Wu¹and Hongwei Liu¹

Academic Editor: Yuan Li

Received25 May 2022

Revised11 Jul 2022

Accepted05 Aug 2022

Published12 Sept 2022

Abstract

Contemporary college students are the main force of future national construction. Their ideological political dynamics are related to the development of the party and the country. Some students have some problems with study concepts and study habits. For a long time, the ideological political education of university students has not been paid attention to, resulting in the inability to accurately analyze the ideological political dynamics of university students. Grasping the ideological political dynamics of university students in the new era is the top priority of current educational work and an important guarantee for the development of ideological political education in universities. With the development of the times, communication channels are also constantly updated. The focus of this article is to analyze the ideological political dynamics and communication channels of university students. To some extent, traditional analysis methods cannot satisfy current research. This paper constructs an analysis model of university students’ ideological political dynamics and communication paths based on reinforcement learning. The Markov decision process and Monte Carlo method are used to analyze the ideological political dynamics and communication paths of college students. The results show the following: (1) the highest accuracy of reinforcement learning is 99.7%, and the lowest is 96.2%; the highest accuracy is 99.7%, and the lowest is 97.4%; the highest recall is 99.6%, and the lowest is 97.6%. (2) The average accuracy rate of reinforcement learning is 98.16%, the average accuracy rate is 98.75%, and the average recall rate is 98.65%. (3) In the ideological political dynamics of college students, the score of value orientation is 6.975, the score of learning status is 8.025, the score of consumption concept is 7.7, and the score of employment is 7.45. (4) In the communication path analysis, there are 12 people in interpersonal communication, 15 people in organizational communication, 21 people in mass communication, 28 people in network communication, and 24 people in Internet communication.

1. Introduction

As an important part of the youth group, the ideological political dynamics of college students cannot be ignored. Comprehensively analyze the ideological political dynamics and communication channels of university students to improve the effectiveness of ideological and political education for university students. This paper constructs an analysis model of university students’ ideological political dynamics and communication paths based on reinforcement learning and analyzes the ideological and political dynamics of university students. This paper provides a lot of support on the basis of previous results. Reinforcement learning is a popular model for analyzing problems [1]. Analyze behavior through trial-and-error interactions with dynamic environments. Reinforcement learning describes an algorithm similar to -learning for finding optimal policies [2]. Popular -learning algorithms overestimate action values under certain conditions [3]. A common model for reinforcement learning is the standard Markov decision process [4]. Reinforcement learning is developed from theories such as animal learning and parameter perturbation adaptive control [5]. The goal of reinforcement learning is to dynamically adjust parameters [6]. For maximum signal enhancement, the trend is known as an important content of the ideological political education of university students [7]. It is also an effective way to carry out ideological political education for university students. Understand the ideological dynamics of college students and apply the Internet to ideological political education [8]. Ideological political education must conform to the changes and trends of the form and keep innovating [9]. At present, ideological political education in universities should be combined with art education [10]. The continuous progress of information technology has broadened the dissemination path of university students’ ideological political dynamics [11]. Ideological politics teachers are one of the important ways to optimize the dissemination of ideological political education [12]. Reinforcement learning acquires learning information and updates parameters by receiving action rewards from the environment [13]. Reinforcement learning is mainly manifested in reinforcement signals [14]. Reinforcement learning focuses on online learning [15].

2. Theoretical Bases

2.1. Reinforcement Learning

2.1.1. Overview

Reinforcement learning (RL) [16] is a type of goal-oriented learning. The reinforcement learning process is the continuous interaction between the agent and the environment. In this process, the agent continuously observes the characteristics of the environment state and takes actions on the current environment according to certain policy rules. The environment gives feedback on actions taken in the form of rewards. The agent updates the policy based on the reward value to get a better reward for the next action it takes.

The basic framework of reinforcement learning is shown in Figure 1.

2.1.2. Markov Decision Process

The Markov decision process (MDP) [17] is a mathematical description that can be provided for reinforcement learning, and most reinforcement learning problems can be modeled as an MDP. MDP adds action elements to the transition probability from one state to another state, enriching the Markov feature, and can be expressed as

For the answer, MDP consists of 5 basic elements, namely, . Among them, is the state space, which can reflect all the state sets of the complete information of the system; is the current state, ; is the limited action space, which is composed of all possible actions; is the currently taken action, ; is the reward function, which represents the expectation of the reward value that the agent can get from the current state to the next state ; represents the probability of transitioning from state to state ; and represents the discount factor, which is a random float in the range of 0 to 1. Points can be used to determine whether the total reward is discounted or not.

To find the optimal strategy, that is, to find the optimal state, the action value is where is the average of instant reward .

2.1.3. Exploration and Utilization

The purpose of reinforcement learning is to obtain the optimal result; that is, the agent gets the maximum reward. Therefore, during the training process, it is necessary for the agent to perform actions according to the behavior that can obtain the maximum reward value. At the same time, considering that the “trial-and-error” experience experienced by the agent is not necessarily rich, only the local optimal solution can be obtained, so the agent cannot blindly use the existing experience to make actions, but it is necessary to improve the agent’s exploration to find new and latest ability to solve. With limited time, we need to rely on strategies to find a balance between exploration and exploitation.

2.1.4. Strategy

Policy refers to the operation policy of the agent in MDP, which is a function that can calculate the output. In reinforcement learning, policies can be defined as deterministic policies and stochastic policies. Deterministic strategy means that in the same state, the action output by the agent is deterministic and unique; on the contrary, in the same state under the stochastic strategy, the output behavior of the agent is not unique but follows a specific probability distribution, but the sum of all possible output behavior probabilities in the same state should be equal to 1.

2.1.5. Value Function

During the interaction between the agent and the environment, actions need to be evaluated to ensure that the final action set can obtain the maximum reward. There are two evaluation mechanisms here, namely, the value function and the function. The value function refers to the value function of the state, which measures the pros and cons of the agent’s state under the policy. A value function can be defined as follows:

The above formula expresses the expected reward that can be obtained by following policy in state . Among them, represents the reward obtained by the agent from the environment at time , which can be expressed as follows:

And if it is in a continuous situation, there may be no final state, namely,

A discount factor is required to reward the discount, which can be expressed as

Among them, if is 0, the reward is an immediate reward, and if is 1, the reward is mainly reflected in the future reward. Therefore, the value function can be expressed as

The function, also known as the state action value function, is used to measure the pros and cons of the agent following the policy and performing the actions in the state. The function can be defined as follows:

The above formula represents the expected reward that can be obtained by following policy and taking action in state . This formula can be expressed as

The value function is used to evaluate the state, and the function is used to evaluate the action [18]. Further derivation of the value function can be obtained:

Similarly, the function can also be derived as

According to the derivation of the above value function and function [19]. This can be further extended to the Bellman equations of both:

The value function that produces the maximum value should satisfy

Likewise, the optimal strategy should be better than or equal to any other strategy. The optimal policy can produce the optimal value function [20]. That is, the maximum value of the function is the optimal cost function:

Combining the above formula, the optimal cost equation can be obtained:

2.2. Commonly Used Reinforcement Learning Algorithms

2.2.1. Monte Carlo Method

For the Monte Carlo method [21], a very important advantage is that it does not need to know the environment, only needs to get the experience represented by the Markov quadruple interacting with the environment, and then solves the reinforcement learning problem by averaging the returns of the samples. The state value function at this time can be written as

Among them, indicates that in state , action has always used the trajectory data generated by strategy , and indicates the sum of all rewards on this trajectory. When updating the action value function, an incremental method can be used to implement the Monte Carlo method.

2.2.2. Timing Differential Method

Sutton proposed the temporal difference algorithm, which combines Monte Carlo and dynamic programming methods [22]. It is an important learning algorithm in reinforcement learning. This method can learn in some continuous state.

The standard temporal difference method is a model-free algorithm that learns directly from experience and estimates the current state value after one or more steps of action. The most basic one-step update is the TD(0) algorithm [23]. When using a table of values, the iterative formula for the TD(0) algorithm is where is the value function of state at time .

The TD method is also called the TD(0) method, because this method updates the value function with the corresponding subsequent state after one step. We can define the general form of step return as

At this time, the update of the value function becomes

2.2.3. Sarsa Learning

The name of the Sarsa algorithm comes from the 5 variables used when the value function is updated, which are the current state , the action in the current state, the reward of the current action, the next state reached, and the assumed next state. The action consists of .

In the current state and action , after the state transitions to another state , the current action cost function must be updated. Then, after reaching the next state, update the next action cost function until the end. This cost is updated as follows: where the learning is the rate and is the decay factor.

2.2.4. -Learning

-learning is a temporal difference algorithm under the off-track strategy. The off-track strategy means that the strategy for determining the current behavior is different from the strategy for updating the value function. The agent chooses the action in the current state through a strategy and interacts with the environment, but then, when the value function is updated, it uses another strategy. The action-value function update formula for -learning is as follows:

3. Analysis of University Students’ Ideological Political Dynamics and Communication Paths

3.1. Dynamic Analysis of University Students’ Ideological Politics

Facing the complex and changing social environment, to carry out the ideological political education work in universities and grasp the ideological dynamics of university students, it is necessary to analyze the current ideological political dynamics of university students. It analyzes four aspects: value orientation, learning status, consumption concept, and employment, as shown in Table 1.

3.2. Propagation Path Analysis

3.2.1. Original Propagation Path

The original communication paths of college students’ ideological and political dynamics are divided into three categories: interpersonal communication, organizational communication, and mass communication, as shown in Table 2.

3.2.2. New Propagation Paths

Although the original communication path of college students’ ideological and political dynamics has its own advantages, its influence on interpersonal communication is not extensive, and it is limited by time and place. At the same time, it is also restricted to a large extent by the quality of the communicator. The scope of organizational communication is still limited to local areas, and it is difficult to solve the problem of timely and effective communication. Mass communication is only one-way communication, not interactive communication [24]. Therefore, in the process of ideological and political dynamic dissemination, while adopting and improving the original dissemination path, a new ideological and political dynamic dissemination path should also be opened up. (1)Network communication is based on the computer communication network to transmit, exchange, and utilize information, so as to achieve the purpose of social and cultural exchange. On the Internet, people can freely browse almost all the information on the Internet [25](2)Opening up the Internet is a new way for college students to exchange ideological and political dynamics. It is not simply to publish some information on the Internet for ideological and political dynamic exchanges. The key is to use the various advantages of the Internet and computers to realize the dynamic exchange of ideological and political dynamics from postevent to preevent through the ideological and political dynamic database and scientifically use this series of databases in practice, from qualitative communication to quantitative communication and edge propagation to multidirectional propagation

3.3. Model Construction

This paper builds an analysis model of college students’ ideological political dynamics and propagation path based on reinforcement learning. The model first collects college students’ ideological political dynamics and then summarizes the ideological political dynamics and propagation paths through call requests. If there is no call request, the call request will continue, until there is a call request. Ideological and political dynamics and propagation paths can only be analyzed after the signal is felt until the end. Similarly, if no signal is sensed, the propagation path analysis will be repeated until a signal is sensed, as shown in Figure 2.

4. Experimental Analysis

4.1. Model Testing

Based on reinforcement learning, this paper constructs an analysis model of university students’ ideological political dynamics and communication paths. The model needs to be tested first. 100 college students were randomly selected as experimental subjects, and 10 groups were divided into 10 groups. Reinforcement learning is compared to deep learning, machine learning, structural equation modeling, and traditional methods. In the model test comparison, this paper uses the most common accuracy rate, precision rate, and recall rate as the comparison indicators. The experimental result data are shown in Tables 3–5.

It can be seen from the data results that reinforcement learning is higher than other models in the comparison of accuracy, precision, and recall, with obvious advantages, indicating that reinforcement learning is more suitable for this study.

The precision of reinforcement learning is 99.7% and 96.2%, which is 37.6% higher than other methods. The highest accuracy was 99.7%, and the lowest was 97.4%. Compared with other methods, it is 37.8% higher than the lowest accuracy. The highest recall rate is 99.6%, and the lowest is 97.6%. Compared with other methods, it is 39.3% higher than the lowest recall. In order to see the advantages of this model more intuitively, it is shown in Figures 3–5.

Through the comprehensive comparison of the accuracy, precision, and recall of the five methods, the average value of each index of the five methods indicates that the reinforcement learning method has more obvious advantages, as shown in Figure 6.

From Figure 6, we know that reinforcement learning has the highest average accuracy, precision, and recall, with an average precision of 98.16%, an average precision of 98.75%, and an average recall of 98.65%. Therefore, this model is most suitable for the research analysis of this article.

4.2. Dynamic Analysis of College Students’ Ideology and Politics

After passing the test, the model will be applied to the research of this paper. First, analyze the ideological and political dynamics of college students. Randomly selected 100 college students were divided into four groups: freshmen, sophomores, juniors, and seniors. It analyzes four aspects: value orientation, learning status, consumption concept, and employment. Through the questionnaire survey, students scored four aspects according to their own situation, with a total score of 10 points. The result is shown in Figure 7.

According to Figure 6, the value orientation score in the ideological political dynamics of university students is 6.975, the learning status is 8.025, the consumption concept is 7.7, and the employment aspect is 7.45. Among them, freshman students are more concerned about the state of study, while senior students are most concerned about employment issues and have the highest score among all the scoring results, reaching 10 points.

4.3. Propagation Path Analysis

This article lists five communication paths, interpersonal communication, organizational communication, mass communication, network communication, and the Internet. In order to more accurately analyze the ideological political dynamic communication paths of university students, this experiment made statistics on the ideological political dynamic propagation paths of 100 university students. The results are shown in Figure 8.

The experimental results showed that 12 people communicated through people, 15 people communicated through organizations, 21 people communicated through mass communication, 28 people communicated through the Internet, and 24 people communicated through the Internet. It shows that the communication path of college students’ ideological dynamics is mainly based on network communication, and the number of first-year students through interpersonal communication and senior students through organizational communication is only 2.

5. Conclusion

The ideological political trend of university students is related to the future and destiny of the country and the nation, and the communication path is also very important. Based on reinforcement learning, this paper constructs an analysis model of university students’ ideological political dynamics and communication paths and improves the accuracy, precision, and recall rate on the basis of traditional methods, which is helpful to analyze the ideological and political dynamics and communication paths of college students.

The findings of this article show that (1)by comparing reinforcement learning with deep learning, machine learning, structural equations, and traditional methods, the accuracy of reinforcement learning is 99.7% and 96.2%, respectively, which is 37.6% higher than other methods. The highest accuracy was 99.7%, and the lowest was 97.4%. Compared with other methods, it is 37.8% higher than the lowest accuracy. The highest recall rate is 99.6%, and the lowest is 97.6%. Compared with other methods, it is 39.3% higher than the lowest recall(2)reinforcement learning has the highest average accuracy, precision, and recall, with an average accuracy of 98.16%, an average precision of 98.75%, and an average recall of 98.65%(3)freshman students pay more attention to the state of study, while senior students are most concerned about employment issues, and they have the highest score among all scoring results, reaching 10 points(4)the communication path of college students’ ideological dynamics is mainly based on network communication. The number of first-year students through interpersonal communication and the number of senior students through organizational communication is the least, only 2 people, respectively

Based on the analysis of the experimental results, it is concluded that in order to guide the positive development of the ideological political dynamics of university students, (1) increase ideological and political education, (2) improve curriculum and mental health monitoring mechanism, (3) improve school employment guidance, (4) strengthen the management of online public opinion, and (5) strengthen home-school cooperation. Although the model constructed in this article has obvious advantages in terms of accuracy, precision, and recall, it still has certain limitations. This model is limited to the research on the ideological political dynamics of university students. Further research on the generality of the model is needed in the future to increase the generality of the model and enable the model to be applied to a wider range of studies.

Data Availability

The experimental data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declared that they have no conflicts of interest regarding this work.

References

S. Mahadevan and J. Connell, “Automatic programming of behavior-based robots using reinforcement learning,” Artificial Intelligence, vol. 55, no. 2-3, pp. 311–365, 1992.
View at: Publisher Site | Google Scholar
M. M. Botvinick, Y. Niv, and A. C. Barto, “Hierarchically organized behavior and its neural foundations: a reinforcement learning perspective,” Cognition, vol. 113, no. 3, pp. 262–280, 2009.
View at: Publisher Site | Google Scholar
L. Baird, “Residual algorithms: reinforcement learning with function approximation,” in Proceedings of the Twelfth International Conference on Machine Learning, pp. 30–37, Tahoe City, California, 1995.
View at: Publisher Site | Google Scholar
M. J. Frank and E. D. Claus, “Anatomy of a decision: striato-orbitofrontal interactions in reinforcement learning, decision making, and reversal,” Psychological Review, vol. 113, no. 2, pp. 300–326, 2006.
View at: Publisher Site | Google Scholar
S. P. Singh and R. S. Sutton, “Reinforcement learning with replacing eligibility traces,” Machine Learning, vol. 22, no. 1-3, pp. 123–158, 1996.
View at: Publisher Site | Google Scholar
J. Peters and S. Schaal, “Reinforcement learning of motor skills with policy gradients,” Neural Networks, vol. 21, no. 4, pp. 682–697, 2008.
View at: Publisher Site | Google Scholar
B. F. Wan and G. Y. Wang, “Analysis of the status quo of the college students' ideological and political education system,” Journal of Nanchang Institute of Aeronautical Technology(Social Science Edition), vol. 18, no. 3, pp. 265–279, 2006.
View at: Google Scholar
L. I. Ming, “Experiential education: useful trails in enhancing effectiveness of the ideological and political education of college students,” Higher Education Forum, vol. 11, no. 4, pp. 739–752, 2008.
View at: Google Scholar
C. G. Sibley, D. Osborne, and J. Duckitt, “Personality and political orientation: meta-analysis and test of a threat- constraint model,” Journal of Research in Personality, vol. 46, no. 6, pp. 664–677, 2012.
View at: Publisher Site | Google Scholar
A. Schwartz, “A reinforcement learning method for maximizing undiscounted rewards,” Machine Learning Proceedings, vol. 1993, pp. 298–305, 1993.
View at: Publisher Site | Google Scholar
C. Yao, “Propagation spread path analysis of public opinion over WEB2.0 peer production community based on similarity link prediction,” Computer Engineering and Applications, vol. 48, no. 30, pp. 83–88, 2012.
View at: Google Scholar
S. K. Noh, D. Y. Choi, and C. K. Park, “Propagation path analysis for location selection of base-station in the microcell mobile communications,” Networking - ICN 2005. ICN 2005, Springer, Berlin, Heidelberg, vol. 3421, pp. 904–911, 2005.
View at: Publisher Site | Google Scholar
K. A. Hujsak, E. W. Roth, W. Kellogg, Y. Li, and V. P. Dravid, “High speed/low dose analytical electron microscopy with dynamic sampling,” Micron, vol. 108, pp. 31–40, 2018.
View at: Publisher Site | Google Scholar
E. A. Theodorou, J. Buchli, and S. Schaa, “A generalized path integral control approach to reinforcement learning,” Journal of Machine Learning Research, vol. 11, no. 11, pp. 3137–3181, 2010.
View at: Google Scholar
S. Mozer and M. C. Hasselmo, “Reinforcement learning: an introduction,” IEEE Transactions on Neural Networks, vol. 16, no. 1, pp. 285-286, 2005.
View at: Google Scholar
A. E. Kelley, S. L. Smith-Roe, and M. R. Holahan, “Response-reinforcement learning is dependent onN-methyl-d-aspartate receptor activation in the nucleus accumbens core,” Proceedings of the National Academy of Sciences of the United States of America, vol. 94, no. 22, pp. 12174–12179, 1997.
View at: Publisher Site | Google Scholar
A. Dezfouli and B. W. Balleine, “Habits, action sequences and reinforcement learning,” European Journal of Neuroscience, vol. 35, no. 7, pp. 1036–1051, 2012.
View at: Publisher Site | Google Scholar
K. Morita, M. Morishima, and K. Sakai, “Reinforcement learning: computing the temporal difference of values via distinct corticostriatal pathways,” Trends in Neurosciences, vol. 35, no. 8, pp. 457–467, 2012.
View at: Publisher Site | Google Scholar
H. R. Beom and H. S. Cho, “A sensor-based navigation for a mobile robot using fuzzy logic and reinforcement learning,” Systems Man & Cybernetics IEEE Transactions on, vol. 25, no. 3, pp. 464–477, 1995.
View at: Publisher Site | Google Scholar
D. A. Pizzagalli, A. E. Evins, E. C. Schetter et al., “Single dose of a dopamine agonist impairs reinforcement learning in humans: behavioral evidence from a laboratory-based measure of reward responsiveness,” Psychopharmacology, vol. 196, no. 2, pp. 221–232, 2008.
View at: Publisher Site | Google Scholar
A. D. Redish, S. Jensen, A. Johnson, and Z. Kurth-Nelson, ““Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling”: correction,” Psychological Review, vol. 116, no. 3, pp. 518–518, 2009.
View at: Publisher Site | Google Scholar
M. X. Cohen and C. Ranganath, “Reinforcement learning signals predict future decisions,” Journal of Neuroscience, vol. 27, no. 46, pp. 12540–12545, 2007.
View at: Publisher Site | Google Scholar
T. Hao, L. Zhou, and J. Yuan, “Unified NDP method based on TD(0) learning for both average and discounted Markov decision processes,” Control Theory & Applications, vol. 23, no. 2, pp. 292–296, 2006.
View at: Publisher Site | Google Scholar
F. L. Lewis, D. Vrabie, and K. G. Vamvoudakis, “Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers,” Control Systems IEEE, vol. 32, no. 6, pp. 76–105, 2012.
View at: Publisher Site | Google Scholar
G. Tesauro, N. K. Jong, and R. Das, “A hybrid reinforcement learning approach to autonomic resource allocation,” in IEEE International Conference on Autonomic Computing, pp. 65–73, Dublin, Ireland, 2006.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2022 Wenbin Wu and Hongwei Liu. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies