Research Article
Early Rumor Detection Based on Deep Recurrent Q-Learning
| | Input: Network , Environment set , Experience pool | | | Output: | | (1) | Initialize current network , and target network , | | (2) | for each epoch do | | (3) | Select an environment from | | (4) | Initialize environment , and get state | | (5) | while true do | | (6) | According to , use -greedy strategy to select action from | | (7) | Perform action in the environment to get the new state and reward | | (8) | if is full do | | (9) | Delete the oldest experience record | | (10) | end if | | (11) | Insert into | | (12) | | | (13) | if is the last state do | | (14) | break | | (15) | end if | | (16) | end while | | (17) | if is full do | | (18) | Select a batch of records from randomly | | (19) | for each record do | | (20) | Use target network to get | | (21) | Use loss function to update current network | | (22) | Update current network with target network every epochs | | (23) | end for | | (24) | end if | | (25) | end for |
|