Research Article
Anti-Attack Scheme for Edge Devices Based on Deep Reinforcement Learning
| 01: Initialize replay memory ; | | 02: Initialize anticipatory parameters ; | | 03: Initialize target function with weight ; | | 04: for, do | | 05: Set policy ; | | 06: Receive initial observation state and reward ; | | 07: for, do | | 08: Select action at from policy ; | | 09: Execute action at and observe reward and observe new state ; | | 10: Store transition () in ; | | 11: Sample random minibatch of transition () from | | 12: if terminates at step then | | 13: ; | | 14: else | | 15: ; | | 16: end if | | 17: Perform a gradient descent step on with respect to network parameters; | | 19: Periodically update the target networks ; | | 20: end for | | 21: end for |
|