Research Article

Cognitive Electronic Jamming Decision-Making Method Based on Improved -Learning Algorithm

Pseudocode 1

Input: jamming action set , working state set , initialization matrix , reward matrix , discount factor and other parameters.
Output: optimal jamming strategy.
1: Begin.
2: While // is the maximum number of iterations.
3: For // is the number of restart cycles.
4: Calculate the restart period according to equation (9);
5: For .
6: Update the learning rate according to equation (6);
7: Update the temperature according to equation (4);
8: Randomly initialize the working state;
9: While (the current state is not the target working state).
10: Randomly select action for the current state from the jamming action set;
11: Choose optimal action according to equation (2);
12: Calculate the exploration probability according to equation (3);
13: Generate random number between [0,1];
14: If
15: action=;
16: Else
17: action=;
18: End if.
19: Execute the current action, update the radar state, and obtain the.
reward value according to equation (10);
20: Update the function according to equation (1);
21: Calculate the difference ;
22: If // is the convergence threshold.
23: Jump out of the loop and terminate the learning process;
24: End if
25: End while
26: End for
27: Update the learning rate range according to equations (7) and (8), and restart the learning rate;
28: // is the initial restart period.
29: End for
30: End while
31: Output table to get the optimal interference strategy;
32: End.