Input: jamming action set , working state set , initialization matrix , reward matrix , discount factor and other parameters. |
Output: optimal jamming strategy. |
1: Begin. |
2: While // is the maximum number of iterations. |
3: For // is the number of restart cycles. |
4: Calculate the restart period according to equation (9); |
5: For . |
6: Update the learning rate according to equation (6); |
7: Update the temperature according to equation (4); |
8: Randomly initialize the working state; |
9: While (the current state is not the target working state). |
10: Randomly select action for the current state from the jamming action set; |
11: Choose optimal action according to equation (2); |
12: Calculate the exploration probability according to equation (3); |
13: Generate random number between [0,1]; |
14: If |
15: action=; |
16: Else |
17: action=; |
18: End if. |
19: Execute the current action, update the radar state, and obtain the. |
reward value according to equation (10); |
20: Update the function according to equation (1); |
21: Calculate the difference ; |
22: If // is the convergence threshold. |
23: Jump out of the loop and terminate the learning process; |
24: End if |
25: End while |
26: End for |
27: Update the learning rate range according to equations (7) and (8), and restart the learning rate; |
28: // is the initial restart period. |
29: End for |
30: End while |
31: Output table to get the optimal interference strategy; |
32: End. |