Abstract
To improve the accuracy and real-time performance of autonomous decision-making by the unmanned combat aerial vehicle (UCAV), a decision-making method combining the dynamic relational weight algorithm and moving time strategy is proposed, and trajectory prediction is added to maneuver decision-making. Considering the lack of continuity and diversity of air combat situation reflected by the constant weight in situation assessment, a dynamic relational weight algorithm is proposed to establish an air combat situation system and adjust the weight according to the current situation. Based on the dominance function, this method calculates the correlation degree of each subsituation and the total situation. According to the priority principle and information entropy theory, the hierarchical fitting function is proposed, the association expectation is calculated by using if-then rules, and the weight is dynamically adjusted. In trajectory prediction, the online sliding input module is introduced, and the long- and short-term memory (LSTM) network is used for real-time prediction. To further improve the prediction accuracy, the adaptive boosting (Ada) method is used to build the outer frame and compare with three traditional prediction networks. The results show that the prediction accuracy of Ada-LSTM is better. In the decision-making method, the moving time optimization strategy is adopted. To solve the problem of timeliness and optimization, each control variable is divided into 9 gradients, and there are 729 control schemes in the control sequence. Through contrast pursuit simulation experiments, it is verified that the maneuver decision method combining the dynamic relational weight algorithm and moving time strategy has a better accuracy and real-time performance. In the case of using prediction and not using prediction, the adaptive countermeasure simulation is carried out with the current more advanced Bayesian inference maneuvering decision-making scheme. The results show that the UCAV maneuvering decision-making ability combined with accurate prediction is better.
1. Introduction
With the continuous development of artificial intelligence technology, the intelligence and autonomy of the unmanned combat aerial vehicle (UCAV) represented by the American “Loyal Wingman” have been significantly improved, but the existing intelligence is far from being able to meet actual needs [1]. Therefore, the autonomous air combat technology of UCAV is currently a hot issue studied by various countries, and it has also been a persistent research topic for decades [2]. Maneuvering decision-making is a key technology in autonomous air combat. It is a mechanism for UCAV to select maneuver in real-time during air combat. It has high requirements for real-time and accuracy. Existing maneuver decision-making techniques are mainly divided into two categories: maneuver decision-making methods based on action libraries and maneuver decision-making methods based on self-learning technology. At present, scholars from various countries have conducted in-depth studies on these two methods.
The maneuver decision-making method based on action library can also be called the rule-based decision-making method. This method selects the optimal maneuver from the action library according to the designed decision rules. References [3, 4] use the influence diagram to study maneuver decision. This method regards the multilevel influence diagram as a nonlinear programming problem, which is not suitable for online planning in high dynamic combat environment. Reference [5] proposes a maneuver escape decision based on the action library and establishes 13 basic maneuver units that can meet real-time performance requirements. However, in complex situations, the accuracy of decision-making needs to be improved. Reference [6] proposes the Beetle Antennae Search-Tactical Immune Maneuver System (BAS-TIMS) and designs 11 kinds of mobile units; the decision accuracy is high, but the heuristic algorithm converges slowly and cannot meet the real-time requirements. References [7, 8] use the differential game to study the decision-making method for maneuver decision and establish a scoring matrix to select the best maneuver. But the huge action library increases the computational complexity and reduces the real-time performance of decision-making. References [9, 10] expand the basic action library, propose a decision-making method based on statistical principles, and establish a more complex situation assessment model to reflect the battlefield environment; however, the coupling relationship between the evaluation models is ignored, and the accuracy of decision-making needs to be improved. Reference [11] proposes an improved symbiotic organisms search (SOS) algorithm and designs 11 common basic maneuvers, which can meet the accuracy requirements of decision-making, but it takes too long to calculate the optimal value, which leads to longer decision-making time. The above research shows that the maneuver decision method based on the action library is difficult to measure the standard of the maneuver library. When the maneuver library units are too few, although the real-time performance can meet the requirements, the decision accuracy is too low. When there are too many mobile library units, the accuracy is improved, but the timeliness cannot meet the requirements.
The maneuver decision-making method based on self-learning technology uses self-learning technology such as machine learning and reinforcement learning to make maneuver decision. Reference [12] proposes a deep reinforcement learning decision-making method, but it is only used in route planning, which has large limitations, low decision-making accuracy, and poor real-time performance. Reference [13] uses the radical basis function network (RBF) to optimize the rate of change of control variables to make maneuver decision-making. In the case of comprehensive data, decision-making accuracy is high, but real-time performance needs to be improved. Reference [14] uses the basic action library as the basic unit of reinforcement learning and realizes continuous action space by weighting the basic maneuver, which also leads to long decision-making time. References [15, 16] use the Bayesian theory to select the optimal discrete maneuver for maneuvering decision-making; it requires high data accuracy. References [17, 18] use a deep reinforcement learning technique to deal with maneuvering decision-making problems, with low real-time performance. The above research shows that such methods have strict requirements for offline data. If the data are missing or the data are wrong, the generalization ability of the model will decrease, and the decision accuracy will be difficult to meet the requirements.
In order to solve the problem of the existing research that it is difficult to maintain high accuracy and low timeliness in maneuver decision-making, we made the following original contributions in this study:(1)A moving time maneuver decision-making method based on the dynamic relational weight algorithm and trajectory prediction is proposed, which combines trajectory prediction and maneuver decision-making to improve the real-time and accuracy of decision-making(2)A more comprehensive situation evaluation function is established on the basis of the UCAV three-degree-of-freedom model, which can objectively reflect the situation of air combat in real-time(3)A dynamic relational weight algorithm is proposed, which breaks the limitations of the constant weight calculation situation and improves the accuracy of decision-making(4)It proposes an adaptive boosting long- and short-term memory network (Ada-LSTM) trajectory prediction method. Compared with the other three traditional prediction methods, the prediction accuracy is significantly improved.(5)Through the simulation experiments of tracking analysis, comparative tracking, and adaptive countermeasure, the high accuracy and low timeliness of the decision-making method proposed in this study are verified
The rest of this study is organized as follows. The second section introduces a moving time maneuver decision system based on trajectory prediction and the dynamic relational weight algorithm. The third section establishes the UCAV three-degree-of-freedom model and the air combat situation system. The fourth section proposes a dynamic relational weight algorithm. The fifth section proposes an adaptive boosting long- and short-term memory network online trajectory prediction method. The sixth section optimizes the moving time decision strategy, and the seventh section performs simulation verification. The last section is the conclusion and future development.
2. Maneuvering Decision-Making System Based on Trajectory Prediction and the Dynamic Relational Weight Algorithm
Considering that the maneuver decision method has a certain hysteresis in the calculation process, the maneuver trajectory prediction is added to give the decision method a time compensation, so as to improve the accuracy and real-time of the decision method. In different air combat situations, the maneuver strategy is different. For example, the UCAV should maintain the angle advantage and increase the distance advantage in the case of tail pursuit; in the head-on situation, it should increase the angle advantage to obtain a more secure advantageous position. Therefore, the dynamic relational weight algorithm is used to adjust the situation in real-time, and the moving time strategy is used to effectively solve the problem of online decision-making [19–22]. Based on the above methods, this section designs a maneuver decision-making system. The process is shown in Figure 1. The specific steps are as follows: Step 1. Input the historical maneuver trajectory of the enemy aircraft and construct a sliding input matrix and a correction matrix Step 2. Use the Ada-LSTM network to predict the maneuvering trajectory to obtain the position information at the next moment Step 3. Bring the predicted enemy aircraft position information and the current UCAV position information into the subsituation function to obtain the future angle situation , future distance situation , and future energy situation Step 4. Input into the relational weight algorithm and calculate the dynamic angle weight , dynamic distance weight , and dynamic energy weight Step 5. Use the moving time strategy to optimize the control quantity Step 6. Update the position information through the optimal control quantity to determine whether the termination condition is met; if it is met, the maneuver is over, if it is not met, return to step 1.

3. UCAV Three-Degree-of-Freedom Model and the Air Combat Situation System
3.1. UCAV Three-Degree-of-Freedom Model
The UCAV three-degree-of-freedom model is used to describe the motion state of a UCAV. The following assumptions are made:(1)Treat the UCAV as a particle, regardless of its shape(2)Ignore the sideslip angle(3)Ignore the effect of the Earth’ rotation and curvature and use the ground coordinate system as the inertial coordinate system(4)Ignore the effects of airflow and gusts(5)Ignore the effect of the altitude, latitude, and longitude on the acceleration of gravity
Based on these assumptions, the following particle model can be established [3]:where, , , and represent the horizontal and height coordinates of the UCAV. is the pitch angle; is the yaw angle; is the velocity; is the roll angle; is the tangential overload; is the normal overload; and is the acceleration of gravity. Among them, is the state variable of the UCAV, and is the control variable.
3.2. Air Combat Situation System
The main factor that affects the situation of one-on-one close air combat is the real-time space occupying information of both sides [16]. Therefore, this study builds the angle function, distance function, and energy function and establishes the close air combat situation model, as shown in Figure 2. The overall function is . Among them, is the overall situation, is the angle situation, is the distance situation, and is the energy situation.

3.2.1. Angle Function
In the three-dimensional air combat coordinate system, the angle function is mainly affected by the target entry angle and target direction angle, both of which can be projected into the two-dimensional coordinate system for simplification. Therefore, the model is projected onto the horizontal plane to establish a two-dimensional two-aircraft confrontation model as shown in Figure 3.

Target direction angle is the angle between the target line of sight and the direction of the local speed. Target entry angle is the target line of sight, which is extended to the angle of the target speed direction, based on Figure 3.where , is the speed of the UCAV, and is the speed of target.
is the traditional angle function. To avoid the singularity problem of the traditional angle function and the range of , is between and . The improved angle function [23] is as follows:
In the above formula, the improved function is analyzed for the following situations:(1), the UCAV is following the target(2), the UCAV is on the target’s side(3), the UCAV is facing the target(4), the UCAV and target face away from each other(5), the UCAV is on the target’s side(6), the target is following the UCAV
The function was simulated in Matlab, and the obtained data are shown in Figure 4, which is consistent with the actual situation.

3.2.2. Distance Function
Since different airborne weapons have different optimal firing ranges, this study proposes the effective attack distance. The effective attack distance is the superposition of the best attack distance of different weapons, which forms a section of the best attack effect, as shown in Figure 5.

Set the maximum attack range as , minimum attack range as , maximum effective attack range as , and minimum effective attack range as . The design distance advantage function is as follows:
3.2.3. Energy Function
In the course of flight, there are two types of energy: kinetic energy and potential energy, and they can be converted into each other. Before constructing the energy function, the energy formula [24] is determined aswhere is the flight altitude, and is the current velocity.
The energy function is constructed as follows:where is the UCAV energy, and represents the energy of target.
4. Dynamic Relational Weights Algorithm
In the process of UCAV close air combat, the situation plays a major role in regulating the operational changes at the next moment, which affects the final result. The traditional situation assessment systems mostly use the constant weights to calculate the overall situation, which often greatly deviates from the actual situation. In severe cases, it may even be the opposite, causing one to miss the best time to defeat the enemy. This study follows the principle of angle advantage, distance advantage, and energy advantage from large to small. By combining the relational analysis [25] and entropy weight theory [26], a new dynamic relational weight algorithm is proposed. This method has a large development space in the autonomous decision-making of unmanned intelligent military equipment, such as unmanned vehicles and unmanned submarines, and is suitable for solving the weight problem with a certain priority arrangement.
4.1. Establishment of Dynamic Correlation Matrix
Gray relational analysis (GRA) [27] essentially provides a method to measure the distance between vectors. For factors with time series, it can be seen as a time curve, while the GRA algorithm is to measure whether the two curves are similar in shape and trend. In the UCAV close air combat system, according to this characteristic, the overall situation and various situations can be established as shown in Figure 6.

The relational degree can directly reflect the effect of each subsituation on the overall situation in the UCAV close air combat system. Therefore, a dynamic relational degree matrix is established to reflect the specific situation of combat at all times, and the establishment process is as follows: Step (1). Discretization of the continuous process. According to the moving time strategy [16], the process of air combat confrontation is decomposed into a maneuvering process with finite discrete time periods. Step (2). Construct the association sequence. The relational matrix is divided into parent sequence and child sequence, which are consistent in the timing sequence. The UCAV air combat process is a Markov process, and the changes in the next moment are only relevant to the present moment. After discretization, take three step length processes as the current moment and construct the parent sequence: represent the overall situation at different moments. Construct each situation subsequence: Among them, , , and represent the angle situation at different moments, , , and represent the distance situation at different moments, and , , and represent the energy situation at different moments. Step (3). Columns and normalization. Eliminate the difference in the sequence quantities at each moment. Step (4). Calculate the relational degree and update the relational matrix. The calculation formula of relational [28] iswhere represents the degree of relevance between the subsituation and the overall situation.
The relational matrix is
, , and represent the correlations between the angular situation and the overall situation at different moments, , , and represent the correlations between the distance situation and the overall situation at different moments, and , , and represent the correlations between the energy situation and the overall situation at different moments.
4.2. Solve Weights
After the relational matrix has been established, the weight can be solved by referring to the entropy weight method [29]. The entropy method is an assignment method to determine the weight of each index in the system through the theory of information entropy, and it is used in many disciplines. For example, in the direction of group decision-making [30], the entropy method is used to calculate the attribute weights; in the direction of prediction [31], it is used to determine the weight of multiple indicators from the perspective of information volume. It is also used in the direction of quantitative evaluation to improve TOPSIS [32]. The entropy weight method implies that a smaller degree of variation of the index corresponds to less information of the reaction and a lower corresponding weight. A smaller probability corresponds to a smaller amount of information, and one can simulate a similar curve, which is fitted by the function, i.e., . The essence of information entropy is the expected value of information: . In the process of UCAV air combat decision-making, the weight cannot be reflected by the degree of variation of the indicators. When the angle advantage is optimal, one must maintain and improve the distance advantage in the future maneuvering process. In this case, the variation of the angle advantage is 0, but its corresponding weight should be maximal. Step (5). Construct fitting function . Because the size of the relational matrix is different, the fitting function constructed is also different. A smaller relational degree corresponds to a weaker effect of the subsituation on the overall situation. According to the principle that the angle advantage is greater than the distance advantage, which is greater than the energy advantage, when , smaller and correspond to a weaker angle, a shorter distance, and more urgent adjustment. The quantities are inversely proportional, so we simulate the similarity curve . When , greater the and correspond to more advantageous angle and distance, which must be maintained, and they are proportional to each other, so we simulate the similarity curve . The specific hierarchical relationship is shown in Figure 7: Step (6). Calculate the expectation matrix . The expectation matrix is calculated as follows: . Step (7). Solve the weights using the if-then principle. The pseudocode to solve for the weights is given in Table 1.

5. Adaptive Boost of Online Trajectory Prediction of the Long- and Short-Term Memory Network
5.1. Long- and Short-Term Memory Network
The long- and short-term memory (LSTM) network is currently a better network structure for dealing with timing problems. Through the control of three gates, the problem of long-term dependence of RNN in processing sequences is solved to a certain extent, and information is transmitted mainly through the state of the unit. The structure is shown in Figure 8.

From the calculation process, is first multiplied with the output of the forget gate and then accumulated with the output of the input gate. The essence is to update the information at the last moment and then merge it with the information at the current moment, so that the information will be stored for a long time in this way. To ensure the simplification of the data, the forget gate is added, and are spliced into a single vector, and the data are normalized between 0 and 1 through the sigmoid activation function, where 1 is “completely reserved” and 0 is “completely discarded.” This method effectively filters data and avoids useless calculations. The calculation formula for the specific forget gate iswhere is expressed as the sigmoid activation function.
The input gate determines the input information of the current unit, and the Tanh function represents the current information. At the same time, the sigmoid function is used to determine which information is useful and which is useless, multiplied by the Tanh function output, and input to the current unit state. The formula iswhere is expressed as the Tanh function.
The output gate determines the output of the current unit. The current unit state is represented by the Tanh function, and are activated by the sigmoid function at the same time, and they are multiplied; the result is the current unit output . The calculation formula is
In [33], it has been proposed that independent prediction of three-dimensional coordinates is more accurate than the overall prediction, so the coordinates on the X, Y, and Z axes are individually used as the input of the LSTM network. When using the three-degree-of-freedom model to simulate the trajectory, the data are sampled at an interval of 0.3 s, and ten times are sampled as a group. The 46 sliding module matrix is constructed from the first nine samples to predict the tenth sample data.
The sliding module input matrix:
The sliding module predicts the output correction matrix:
In the online test of the LSTM network, the first three rows of the matrix are used to make real-time corrections through the output correction matrix to adjust the internal weights and biases. At this time, the network input node is 6 and the output node is 1. Since the increase in the number of hidden layers in the LSTM network will cause a rapid increase in time cost, considering the high requirements for timeliness of maneuver prediction, a double hidden layer network structure is set [34].
5.2. AdaBoost Algorithm
The AdaBoost algorithm [35] is a boosting method that combines multiple weak predictors into a strong predictor and uses this method to build the overall outer frame, as shown in Figure 9. The LSTM network is trained as its weak predictor. The weight of the sample that the previous network misclassified will be strengthened, and the sample with the updated weight will be used to train the next network again. In each round of training, use the overall sample to train a new LSTM and generate a new sample weight and the weight of the weak predictor. It iterates until the predetermined error rate or the specified maximum number of iterations is reached.

From boosting idea to the adaptive boosting algorithm, the key is to introduce the idea of the online distribution algorithm. The strategy in the online allocation algorithm each is a weak predictor in the AdaBoost algorithm, and the overall weight is determined according to the loss value caused by each strategy. This is also the theoretical source of the weight of the weak predictor in the AdaBoost algorithm. In order to obtain a strong predictor with high fitting accuracy, multiple weak predictors need to be added, but this will modify the integration method of the existing weak predictors and increase complexity. AdaBoost uses a greedy strategy to avoid this problem and uses linear addition to add new weak predictors. Linear addition does greatly to improve the accuracy and simplify the algorithm, but the calculation time has also become a new problem. In maneuvering trajectory prediction, due to the drastic changes in air combat, real-time requirements are very high, and the number of weak prediction periods K is the decisive factor for accuracy and time. Use LSTM as a weak predictor to fill the AdaBoost framework, try different numbers of weak predictors and predict multiple times for the strong predictor of the same K, take the average error and average time, and determine the most reasonable number of weak prediction periods. The results are shown in Table 2.
It can be seen from Table 2 that when K = 1, 2, 3, the time consumption is small, but the error is too large, so it is not considered; when K = 4, the error is improved, but when X-axis, Y-axis, and Z axis are considered as a whole, the error formula is that . It cannot meet the high-precision requirements of maneuver prediction; when K = 5, the error drops more, which roughly meets the high-precision requirements, and the prediction time consumption is also low; when K = 6, compared with K = 5, the accuracy does have a certain improvement, but the time consumption has also increased a lot. For single-step prediction, the prediction time of 0.134 s is too long, so it is not considered. In summary, this study chooses to form a strong predictor by five LSTM network weak predictors.
5.3. Comparison of Predicted Trajectories
A group of maneuvering trajectories of the enemy aircraft is generated randomly by using the three-degree-of-freedom model, sample 300 times, and to make a prediction for every 10 groups, a total of 30 cycles are predicted. To improve the prediction accuracy, three-dimensional coordinate independent prediction [33] is used to compare with traditional RNN, CNN, and LSTM prediction methods. The results are shown in Figure 10.

(a)

(b)

(c)

(d)
In the three-dimensional view, it can be clearly found that the prediction result of the AdaBoost-LSTM network is closest to the actual value, and the overall trajectory does not have many mutations, while the trajectories drawn by the traditional CNN, RNN, and LSTM network prediction results have many mutations. After calculation, the average prediction error of the AdaBoost-LSTM network is 34.3 m, that of the CNN is 41.5 m, that of the LSTM is 48 m, and that of the RNN is 88.8 m. It can clearly show that the prediction accuracy of the AdaBoost-LSTM network is higher than that of other traditional deep learning networks, and the accuracy can meet the requirements of maneuvering decision-making.
6. Moving Time Strategy for Optimal Maneuvering Decision
6.1. Maneuvering Decision Process Control of UCAV Based on the Rolling Horizon
The process of air combat is a dynamic confrontation process. It is difficult to know the endpoint of the confrontation, so it is difficult to get the global optimal solution. Only by scientifically designing the local optimal solution can the global optimum be gradually approached. Because the time effectiveness of air combat confrontation is very strong, the process of air combat confrontation is decomposed into a maneuver decision process of the finite discrete time period according to the moving time strategy. The moving control process is shown in Figure 11.

In the graph, is the ith process among n discrete processes. Based on the continuity of the two state changes and considering the maneuverability of UCAV, the variable range of control variables is divided into m control sequences. In the time of making a maneuver decision, it is required to keep the situation of the moment, which implies that there must be a sufficient time to complete the decision. Therefore, the optimal sequence must be selected from the control sequence for the maneuver decision.
6.2. Optimal Design of Maneuver Decision Control Variables
To quickly find the optimal maneuver control variable in the control sequence and satisfy the launch conditions, we attempt to use the heuristic search algorithm to optimize, but the actual demand time is too long to meet the timeliness of air combat decision. Therefore, try to divide the control variables by the gradient to maximize the benefit of the evaluation function of maneuvering decision; then, we obtain the control quantity at the next moment.
To ensure that the UCAV can quickly and accurately find the optimal control variable, a fine control gradient is constructed. Each control variable is divided into 9 levels in the allowable range of maneuverability. Therefore, there are 729 control schemes in total. The gradient of these three control variables is set as follows:
7. Simulation Analysis
To verify the real-time performance and accuracy of the maneuver decision method which combines the dynamic relational weight algorithm and the moving time strategy, first, conduct a contrast pursuit simulation experiment, choose three more advanced decision-making methods, pursue the target machine at the same time, and analyze the situation in the same decision-making step. Then, choose the most competitive decision-making method among the comparison methods, perform adaptive countermeasure simulation experiments with the method proposed in this study, and verify again the accuracy of the method proposed in this study. In the verification simulation, the UCAV and the enemy aircraft use the same platform model and have the same constraints and maneuverability, that is, the hardware conditions of the two parties are the same. The simulation termination condition is that angle situation and distance situation .
7.1. Contrast Pursuit
To verify the real-time and accuracy of the algorithm, the robust maneuver decision theory [9] is selected from the maneuver decision methods based on the action library and choose the DNN decision method [36] and Bayesian decision method [16] in self-learning decision theory, compared with the method proposed in this study. In this section, the target aircraft is designed to make a large-radius serpentine maneuver at an altitude of 5000 meters, and UCAV uses four methods to pursue and analyze the situation of each method within the same decision step. The initial position information is given in Table 3. The initial settings of different decision-making methods are given in Table 4.
It can be seen in Figure 12 that the DNN decision method has the largest deviation, and in the later stage, it does not occupy an absolute advantageous position, and the decision accuracy is poor. The robust decision method does not reveal its shortcomings in the initial stage, but when the target makes a large maneuver, it cannot adjust the position in time, resulting in a decrease in decision accuracy. Bayesian decision and dynamic decision always occupy an advantageous position in the overall pursue process.

The situation of each method is shown in Figure 13. Before 50 decision steps, the angles situation of the four methods are basically the same, but after 60 decision steps, the angle situation of DNN decision decreases and the fluctuation is very large. After 140 steps, the angle situation continues to decrease, indicating that the accuracy of the DNN decision method is poor when encountering a large enemy aircraft maneuver. When the enemy makes a large maneuver between 70 and 120 steps, the angle situation of the robust decision drops rapidly and the adjustment is slow, indicating that the decision-making is not accurate, the essential reason is that the division of the maneuver library is still not comprehensive enough. During the 60 and 140 decision steps, Bayesian decision and the decision method proposed in this study have always maintained a good angle situation; it means that these two methods have high decision accuracy. After 140 steps, the dynamic weight decision has a better angle situation, indicating that this method is more accurate.

First, at the theoretical level, the big-oh notation is used to describe the computational complexity of the method proposed in this study. T represents the step of calculating the dynamic weight in the fourth section, which is the constant order complexity O(T). In the moving time strategy, N represents the decision-making plan, which is linear order complexity O(N). They are serial codes, and the overall complexity is O(T + N). Then, analyze the computational complexity from the aspect of simulation time consumption. A total of 170 decision steps are taken for the contrast pursuit, and among them, the DNN decision method takes 10.45 s in total, and the average single-step decision time is 0.0615 s. The robust decision method takes 25.43 s in total, the average single-step decision time is 0.149 s. The Bayesian decision method takes a total of 9.68 s, and the average single-step decision time is 0.057 s. The method proposed in this study takes 7.68 s in total, and the average single-step decision time is 0.045 s; it can meet the real-time requirements. The average single-step decision time is shown in Figure 14.

7.2. Adaptive Confrontation without Predication
In the contrast pursuit, the Bayesian decision method performs better in accuracy and real-time. Therefore, the enemy aircraft adopts the Bayesian decision-making method, while the UCAV adopts the decision-making method proposed in this study. The two sides conduct adaptive confrontation. The simulation termination condition is angle situation and distance situation . First, in the mutual safe situation, that is, the situation is about 0.5, conduct a set of confrontation simulations and analyze. Then, in the four working conditions of advantage (UCAV situation is high and enemy aircraft situation is low), mutual safe (UCAV and enemy aircraft situation is about 0.5), mutual disadvantage (UCAV and enemy aircraft are in a low situation), and disadvantage (UCAV situation is low and enemy aircraft situation is high), carry out multiple sets of simulation experiments to verify the accuracy of the decision-making method in this study.
The initial state of the adaptive countermeasure simulation in the mutual safe situation is given in Table 5.
The simulation result is shown in Figure 15. In the initial stage, the two aircrafts are close to each other and are in a situation of mutual safe. Bayesian weights control the UCAV to make a nose-up dive maneuver and obtain a more favourable angle advantage. The overall situation rises and presents more advantages. After the two aircrafts meet, the dynamic weight is quickly adjusted to control the aircraft to make a large-radius bucket maneuver, which occupies a height advantage. In the second stage, the two aircrafts are fighting at close range. The dynamic weight is quickly adjusted according to the current situation. The angle advantage rapidly increases. The outer somersault is always around the Bayesian weight and attempting to complete the tail attack. In the last stage, the radius is relatively large. Because the Bayesian adjustment is not sufficiently timely, the large somersault maneuver becomes a tail chase situation, and the aircraft controlled by dynamic weight finally wins the fight. Without predicting the trajectory of the enemy aircraft, maneuvering decision-making takes 135 seconds.

(a)

(b)

(c)
It can be seen from Figure 16(a) that the overall situation of UCAV controlled by dynamic weight is at a mutual safe in the initial stage. In the later period, the situation generally showed an upward trend. The Bayesian weight control UCAV tried to get rid of the disadvantaged position and caused the situation to fluctuate. However, because the dynamic weight is more suitable for the actual situation, the UCAV controlled by the Bayesian weight cannot be eliminated in time. Figure 16(b) describes the most important changes in the angle situation during close air combat. In the initial stage of the two sides flying, the UCAV with dynamic weight control fly at a faster initial speed, causing the tail to chase and the angle situation drops sharply. The potential energy and the kinetic energy are converted into each other, and the angle situation is also showing an upward trend, finally, meeting the angle termination condition. Figure 16(c) shows the change of the distance situation. During the dynamic weight control of the UCAV maneuver, it is always hoped that the enemy aircraft will be within the best effective attack distance, so it always fluctuates and eventually meets the distance termination condition. Figure 16(d) shows the change of energy situation. In close air combat, UCAV completes energy changes by adjusting speed and altitude. The figure shows that energy changes during the confrontation process are quite frequent, which is consistent with the actual confrontation process.

(a)

(b)

(c)

(d)
The adaptive countermeasure experiment was carried out under four working conditions, and each group was run 20 times. The results are given in Table 6. In the case of advantage, the dynamic weight decision method has a winning rate of 100%, and the step length is stable. In the case of mutual safe and mutual disadvantage, the dynamic weight decision method has a winning rate of not less than 60%. In a disadvantageous situation, the winning rate is 40%. It shows that this method has better decision accuracy under different working conditions.
7.3. Adaptive Confrontation with Predication
On the basis of nonpredictive adaptive countermeasures, the dynamic weight-controlled UCAV is added with enemy trajectory prediction, and compared with nonpredictive adaptive countermeasures, it is verified that the UCAV maneuvering decision-making ability under the condition of accurate trajectory prediction is better. The mutual safe simulation result is shown in Figure 17.

(a)

(b)

(c)
In the initial stage, because the UCAV speed controlled by the dynamic weight is faster, the forward rush phenomenon is caused, so the overall situation is low, but after adding the trajectory prediction of the enemy aircraft, the disadvantage time of the initial stage is significantly reduced. When there is no trajectory prediction, the UCAV with dynamic weight control takes about 60 s to change from a unilateral disadvantage to a mutual disadvantage. After adding the predicted trajectory, it only takes about 40 s to turn into a mutual disadvantage. In the second stage, the two-aircraft fighting time is also significantly reduced, from a double-circle maneuver to a single-circle maneuver. In the last stage, after the first and second stages of accumulating advantages, the dynamic weight controls the UCAV to quickly approach the enemy aircraft and use the dive to form a tail chase, which meets the termination conditions. In the case of predicting the trajectory of the enemy aircraft, maneuvering decision-making takes 115 s, which saves about 20 s of confrontation time compared to not predicting the trajectory of the enemy aircraft.
Comparing Figure 18(a) with Figure 16(a), it can be clearly seen that the disadvantage time in the initial stage is shorter and the overall situation is rising more rapidly. In the second stage, the overall situation continued to rise more steadily, and compared to Figure 16(a), the enemy’s situation declined more rapidly. In the last stage, Figure 18(a) completed the termination condition in a shorter time. Figure 18(b) shows that in the initial stage, the angle advantage is obviously and rapidly decreased, but compared to Figure 16(b), it rises earlier and has less volatility. The second and third stages of the angle advantage rise faster in Figure 18(b), so the angle termination condition can be met more quickly. Compared with Figure 16(c), it can be seen in the last stage of Figure 18(c) that the predicted maneuver decision can increase the distance advantage of the UCAV and reduce the distance advantage of the enemy aircraft. The difference between Figures 18(d) and 16(d) is not very significant, and it always presents a state of alternating fluctuations.

(a)

(b)

(c)

(d)
The prediction module was added to conduct adaptive confrontation experiments under four working conditions, and each group was run 20 times. The results are shown in Table 7. It can be seen from Table 7 that under the four different working conditions after the prediction module is added, the decision step length is reduced, and the winning rate is improved compared with no prediction. The results prove that the accuracy of the decision system with the prediction module is significantly improved.
8. Conclusion
In this study, the UCAV three-degree-of-freedom model is established. Simultaneously, the angle function, distance function, and energy function constitute the air combat situation system. Based on the priority principle of angle, distance, and energy from large to small, the correlation between each subsituation and the total situation is analyzed. The fitting function is hierarchically constructed, and the dynamic relational weight algorithm is established. The moving time strategy is adopted to optimize the control amount and form 729 decision-making schemes. Through AdaBoost-LSTM network real-time prediction of enemy aircraft trajectory, combining trajectory prediction with maneuvering decision-making, through tracking, comparative confrontation, and adaptive confrontation simulation experiments, the following conclusions are drawn:(1)The decision method combining the dynamic relational weight algorithm and moving time strategy is more accurate than the current existing decision method(2)The decision-making method proposed in this study can meet real-time requirements(3)Compared with the traditional deep learning network, the AdaBoost-LSTM network has higher accuracy in maneuvering trajectory prediction, and the time can also meet the requirements of mobile decision-making by adjusting the number of internal weak forecast periods.(4)UCAV maneuver decision-making ability which combines with accurate prediction is better
The future research direction is to perfect the trajectory prediction, improve the prediction accuracy, shorten the loss time, and further improve the decision-making theory.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.