Abstract
The reliable operation of the distribution terminal is a key link to realize distribution automation. It is particularly important to efficiently and accurately evaluate the operation state of the distribution terminal. In order to realize accurate state perception of distribution terminals, a state evaluation method based on deep reinforcement learning is proposed to support the reliable operation of the distribution network. First, the fault causes of terminal equipment and the collected datasets are introduced. On this basis, the multilayer network structure is used to analyze the terminal state. Q-reinforcement learning network is used to optimize the convolution neural network, solve the overfitting problem of the deep network model, and continuously extract the data features. At the same time, in order to increase the objectivity and reliability of the evaluation method, the membership function optimization is also introduced into the model to further ensure the accuracy of the state analysis method. Simulation results show that the recognition accuracy of the proposed method is 94.23%, which shows excellent evaluation performance.
1. Introduction
With the accelerating pace of distribution network construction, the coverage of distribution terminals is also increasing year by year [1–3]. The distribution terminal is used for the monitoring and controlling of switching post, section switch on the column, ring network cabinet, distribution transformer, line voltage regulator, and reactive power compensation capacitor in medium voltage distribution network. Its operation stability and reliability directly affect the operation stability and reliability of the whole distribution network [4].
However, the vast majority of distribution terminals are directly installed outdoors or in a simple sheltered area, which is easy to be affected by external force damage, and various unexplained abnormalities often occur [5]. In addition, due to the large number and wide distribution of distribution terminals, the workload of manual normalization inspection is huge. In particular, the distribution terminals and communication equipment on the column are installed at a high place, so it is extremely inconvenient for manual on-site inspection and maintenance. All the above factors pose great challenges to the state detection and fault analysis of distribution terminals [6]. Therefore, it is urgent to propose an efficient and accurate state evaluation method of distribution terminals.
The traditional state identification method of the distribution terminal adopts the mode of postmaintenance or regular maintenance [7]. Postmaintenance refers to the maintenance only when the equipment fails and cannot continue to operate, which will directly reduce the operation reliability of the distribution network. The maintenance method of regular maintenance, which only disassembles the equipment to a certain extent according to the specified time interval, will inevitably produce “excess maintenance” [8].
The emergence of a deep learning network provides a new solution for on-line detection and evaluation of power equipment status. The deep network can be used for continuous feature extraction and model training of the data collected by the distribution terminal, to build a reliable distribution terminal state sample database, and then to realize accurate distribution terminal state evaluation. However, due to the deep network structure, the current multilayer network has the problem of overfitting the large data sample set, and the state analysis model is difficult to maintain the same stability in the whole cycle, which also makes most of the current network models have the problem of low state recognition accuracy.
Aiming to support the reliability of the terminal state, a method based on in-depth learning of terminal state analysis is proposed. The innovation of this method is mainly composed of two parts as follows:(1)Considering the time-series characteristics of power grid state data, this study uses the convolutional neural network (CNN) to construct a terminal state evaluation network and introduces Q-reinforcement learning network to solve the overfitting problem of multilayer network, which can strengthen the network’s perception and decision-making ability, achieve high-precision identification of different states, and improve the accuracy and stability of the evaluation network model.(2)At the same time, considering the complexity of the distribution terminal data, this study also introduces the membership function to optimize the model, which increases the objectivity and reliability of terminal state evaluation methods, and further ensures the accuracy and accuracy of output state evaluation results.
2. Related Research
The state evaluation of distribution terminal can timely and correctly diagnose various abnormal states or alarm information of the terminal, so as to prevent or eliminate faults and ensure the safe [9].
In the era of rapid development of distribution internet of things, the original working mode of distribution operation and maintenance is difficult to cope with the pressure brought by the large-scale access of distribution terminals [10, 11]. The number of operation and maintenance personnel is limited, the technical level is uneven, and the construction scale and the number of distribution terminals are large. There are various factors causing the defects of distribution terminals, resulting in the difficulty and complexity of operation and maintenance work. At present, the operation and maintenance personnel only record and classify a large number of terminal defect data, and conduct postevent human analysis and defect elimination according to the operation and maintenance experience for terminal defects. Passive operation and maintenance have low operation and maintenance efficiency [12].
Aiming to support the stable state of the system after large-scale access to distribution terminals, it is urgent to mine the required laws from a large number of terminal data generated by distribution network operation, effectively analyze the reasons behind the defects, provide ideas for terminal state evaluation, and improve the efficiency and ability of distribution terminal operation and maintenance.
Data mining technology has penetrated into many fields of power system and promoted the development of distribution network in the direction of intelligence [13, 14]. At present, the fuzzy theory modeling method is usually used for equipment state evaluation to evaluate the fuzziness and uncertainty of the characteristic quantity of the thing itself, in order to accurately measure the influence of each state quantity on the equipment state [15, 16]. Although the method of mathematical modeling can support the distribution network to realize the self-detection of the terminal state to a certain extent, the data of the distribution terminal are too large and the cause of fault is too complex [17–19], which makes the establishment of state model particularly difficult and cannot well support the distribution network to realize the timely perception and identification of terminal equipment state.
The fault identification and analysis method based on the multilayer network model gradually break through the limitations of traditional methods. Through continuous training and learning of sample data, a complete and reliable equipment state database is established, and finally, effective distribution terminal operation state identification is realized [20–22]. However, it should be noted that there is less research on realizing the state perception of distribution terminal based on deep learning, but many scholars have effectively perceived the operation state of power equipment or mechanical equipment with the help of the powerful learning ability of multilayer network. Reference [23] will continuously train and learn the sample data obtained from the fan SCADA system through the convolution neural network to realize the health monitoring and state recognition of the fan; reference [24] realizes end-to-end abnormal state detection of steam turbine mechanism equipment based on deep convolution neural network; reference [25] proposed a new defect diagnosis method combining least squares support-vector machine and Bayesian network decision tree to realize the state recognition of secondary equipment such as remote terminal unit and merging unit; reference [26] uses the superposition denoising automatic encoder model in deep learning to directly extract feature information from microgrid power equipment and introduces clustering algorithm to determine the state of electrical equipment. However, it should be noted that for the deep network, the deeper network structure can indeed improve the network efficiency of the state identification model, but there is also the problem that the identification accuracy is reduced due to the overfitting of the model, which makes the stability of the distribution terminal state analysis network model open to discussion.
In this study, the Q-reinforcement learning network model is used to optimize the CNN network while ensuring the recognition accuracy of the state analysis model. Through continuous learning of the data before and after the time, the problem of CNN network overfitting can be solved, which can well support the operation state evaluation and analysis of distribution terminal equipment.
3. Multidimensional Evaluation Method of Distribution Terminal State Based on Deep Reinforcement Learning
3.1. State Perception of Distribution Terminals and Analysis of Influencing Factors
Complete and reliable state data acquisition is the premise to realize the state evaluation of the distribution terminal. In this study, with the help of the key circuits inside the heterogeneous distribution terminal equipment, combined with the hardware detection technology and software verification method, the internal operation state monitoring is realized, and the inspection results are sent through the communication channel, so that the internal operation key parameters and self-inspection results of the distribution terminal are in a “visual” state.
The main acquisition contents of the power distribution terminal are shown in Figure 1.

As shown in Figure 1, the power distribution terminal as a whole mainly includes the control circuit, board, and program operation status. Monitoring the key components can timely find out whether the terminal program, hardware, and power supply are normal, so as to determine whether the collected information is available and quickly locates the terminal fault.
At the same time, it should be noted that the defects and fault causes of distribution terminals are complex and diverse. This study summarizes and analyzes the key factors causing terminal defects through the fault types and causes counted by the actual power supply company, as shown in Table 1.
In this study, distribution terminal faults are mainly divided into internal causes and external causes. Most of the internal causes are equipment source characteristics, and the external causes are environmental impact characteristics.
As shown in Figure 2, this study first divides the collected data sample set into the corresponding training dataset and test sample set. Then, based on the training sample set, the network model of fault diagnosis of deep Q-network is built until the corresponding iteration conditions are met. Finally, based on the deep Q-network, the terminal state of the distribution network is accurately evaluated and analyzed.

3.2. Distribution Terminal State Feature Extraction
Unlike the primary equipment with special state indicators, the distribution terminal is an electronic equipment with a complex failure mechanism. The characteristic quantities deeply related to various faults are often hidden in the hardware. However, the existing terminal’s own state sensing means are lacking, so it is impossible to directly obtain the characteristic quantity reflecting the terminal hardware state like a primary device.
In view of the above situation, this study selects three operation indexes as the state characteristic quantity, namely, abnormal reporting frequency , contradiction reporting frequency , and terminal offline frequency .
An abnormal report refers to the record describing the abnormality of the terminal body in the alarm event uploaded by the terminal, such as encryption verification failure and battery activation abnormality. The abnormal report frequency can directly reflect the overall operation status of the terminal, and its calculation method is shown in equation (1)as follows:where represents the statistical duration; indicates the number of abnormal reports within the statistical time.
Some information uploaded by the terminal needs to conform to certain logic; for example, after the terminal power supply line fault trips, it must follow the AC power loss report, etc. If there are too many reports that do not conform to the logical relationship within the statistical duration , it indirectly indicates that the terminal has the trend of failure. Therefore, the frequency of introducing conflict reports is as follows:
A considerable part of terminal faults gradually develops from short-term disconnection to long-term offline. Therefore, the off-line frequency of the terminal also has certain significance for reflecting the terminal state, and its calculation method is shown in equation (3).where represents the number of terminal drops in the statistical time .
3.3. State Evaluation of Distribution Terminal Based on Deep Reinforcement Learning
Deep reinforcement learning is a trial and error algorithm, which makes the network have both perception ability and decision-making ability. This study realizes the state evaluation of distribution terminals based on a deep reinforcement learning network.
3.3.1. Deep Reinforcement Learning
The research shows that a convolutional neural network has strong feature learning and expression ability, better robustness, and faster calculation speed. It can input the sample dataset into the CNN model for training and use the trained network to distinguish different distribution terminal feature information, so as to achieve high-precision recognition of different states. Therefore, we adopt the basic network structure as the state evaluation network.
The CNN network adopts a cognitive mechanism of imitating biological natural vision.
The rectangular convolution kernel of CNN is used to convolute with the local receptive domain of the input signal, and the convolution check with the same numerical weight is used to scan the input data, so that its parameters can be shared. The mathematical model of convolution operation is as follows:where is the current input eigenvector; represents the eigenvector after convolution calculation; is the symbol of convolution operation; is the weight of convolution kernel; and is offset.
After the convolution operation is completed, the nonlinear transformation is realized through the activation function to improve the expression ability of the model. The expression is as follows:
Pooling operation can effectively preserve feature information. In this study, the maximum pool method is used to deal with it, and its mathematical expression is as follows:where represents the maximum pool(); is the value; is the value; is the value; and indicates the maximum value.
At the same time, we should see that the data collected by the distribution terminal have the characteristics of time series. In order to make the CNN network better extract the characteristic information of the sample dataset, the Markov decision-making method is used to strengthen the network.
The Markov decision process includes four tuples , in which is a state-space set; is a state in the state-space set; is the action-space set; is an action in the action-space set; is the current time state after the action is executed by the agent: the probability of transforming to the next time state ; and is the reward obtained by converting the current time state to the next time state after the action is executed by the agent.
In the fault identification task of this study, the status is the sample data uploaded by the terminal; action is the category of the operation state of the distribution terminal; and whether the model recognition result is consistent with the terminal state type is an important standard for awarding . When the sample type is consistent with the recognition result, is taken as +1; otherwise, is taken as −1; and for the state transition probability , although there is no correlation between each state, the training data samples will be randomly disrupted and the number of training data samples will be evenly distributed during network operation. The state transition probability will become , in which is the number of fault categories.
Due to the problems of local minimum and overfitting in the traditional depth neural network, the accuracy of the model after training is not high and the stability is poor. To solve this problem, we choose to add a Q-learning strategy to the deep neural network to enhance the recognition stability.
The fault diagnosis process of the deep Q-network is shown in Figure 2. The model realizes the intelligent fault diagnosis through the interaction between the environment and the agent. First, the environment inputs the initial data to the agent. After the CNN network fitting, the state-action value function outputs the action , compares it with the fault type of data in the environment, and then gives the agent reward to make the model achieve the expected effect.
3.3.2. State Evaluation of Distribution Terminals
Through the above analysis, we define the sample dataset collected by the distribution terminal as shown in Table 2. At the same time, this study takes it as the input data of the evaluation model of deep reinforcement learning.
Then, the proposed evaluation model is introduced for learning and training. The learning process of the proposed deep reinforcement learning model is as follows:(1)We initialize the weight parameter of the fitting Q-value function;(2)We repeat the experience track from 1 to ; Initialization status (sample data of distribution terminal);(3)We observe whether the output action is consistent with the sample label corresponding to the fault data in the environment. If it is consistent, ; otherwise, ;(4)We input the state at the next time and use to calculate the target Q-value function ;(5)The gradient descent algorithm is used to update the network parameters for .
Figure 3 shows the selected membership function. Considering the complexity of distribution terminal data, the membership function of a triangle combined with the semi-trapezoid is relatively simple and accurate, so this method is used to calculate the membership degree in this study.

The membership function calculation formula of triangle combined with semi-trapezoid is shown in equations (7) to (10).where represents the terminal deterioration evaluation value according to the evaluation index; is the membership function under each state evaluation, which is to determine the state evaluation results of distribution terminals.
4. Experiment Analysis
Aiming to present the experimental simulation analysis with the best effect, the experiment is completed on the machine of Ubuntu 16.04 system, and the analysis network is built by relying on the open-source machine learning platform of PyTorch deep learning framework. The specific experimental platform settings are shown in Table 3.
4.1. Experimental Dataset and Evaluation Index
Using the distribution terminal state analysis method proposed in this study, we take the sample dataset of power distribution terminal operation markers collected by a power grid from Electric Power Research Institute (a total of 15000 data, covering 13 terminal-related state quantities) as an example. The sample dataset is divided into training sample dataset and test sample dataset according to the ratio of 4 : 1.
The RMSE indicates the overall reliability of the prediction.
Aiming to quantify the evaluation efficiency, we use the general evaluation index as the indicator of the quality of the results, that is, the root mean square error (RMSE) , and its calculation formula is shown in equation (11)as follows:
4.2. Training Process Analysis of the Proposed Model
In the proposed network model, the convolution kernel size of the first convolution layer is 128 × 1. The step size is 8, and the larger convolution kernel can effectively extract the characteristics of time-domain signals on a large scale. The second and third convolution layers are used to better extract the deep features of the signal. The size of the convolution kernel is 8 × 1. The output of the network at this position is pooled, and the layer has local invariance. By using maximum pooling for the output characteristics of convolution, the network can reduce the amount of data while maintaining the essence of the signal. The width of pool nuclei is 5 × 1, and the step size is 4.
The full-connection layer can connect these features. The full-connection layer can enhance the ability of the network to learn features, including 256 and 32 neurons, respectively.
Aiming to characterize the excellent ability of the improved prediction model, we first compare the traditional CNN network with the improved network. The prediction results can be obtained from Tables 4 and 5.
As shown in Tables 4 and 5, under the same experimental background and network parameters, the network performance of the deep reinforcement learning network model proposed is significantly improved. When the network parameters are set to learning rate = 0.150 and training parameters = 20 000, the network accuracy of the improved CNN prediction model is the best, that is, is 232.99.
In each round of training, the steps of the deep reinforcement learning model proposed in this study are 512 in each round, and a total of 3000 rounds are trained. At the same time, we average the loss value and draw it, as shown in Figure 4.

From Figure 4, we can see that after each round of training, the low accuracy and poor stability of the model after training can be solved due to the addition of a Q-learning strategy so that the evaluation model has good antinoise performance, which can support the loss value of the terminal state analysis model proposed in this study to remain in a relatively low state, indicating that the model can maintain a high learning efficiency in every round of training. Therefore, the feasibility of this method in distribution network terminal state identification analysis is verified.
4.3. Performance Analysis of the Evaluation Model
In order to verify that the distribution terminal state analysis method proposed in this study has the optimal performance of state identification, reference [23] and reference [26] are used as comparison methods to realize simulation experiment analysis in the same operation scenario.
Figure 5 shows the analysis results of test datasets under different methods.

(a)

(b)
As shown in Figure 5, the state evaluation performance of the method proposed in this study is better than that of the comparison method. The recognition accuracy of the proposed method is 94.23%, which is 1.1% and 0.9% higher than that of reference [23] and reference [26], respectively. The evaluation index of the proposed method is 249.3, which is 7.96 lower than reference [26] and 12.66 lower than reference [23].
The reason is that reference [23] ignores the time-series characteristics of distribution terminal equipment data, and only roughly extracts and learns the state data from the deep network. However, reference [26] ignores the problem of poor stability of the deep network model, and cannot achieve stable and efficient state evaluation and analysis throughout the cycle.
Due to the existence of the Q-reinforcement learning network, the proposed method can enhance the perception and decision-making ability of the network, make the antinoise ability of the state evaluation model better, and can effectively improve the state analysis performance of the model. In addition, the existence of a membership function increases the objectivity and reliability of the terminal state evaluation method so that the proposed distribution terminal state analysis model has better recognition and evaluation ability.
5. Conclusions
Accurately monitoring and evaluating the operation status of distribution terminals are the basic means to realize the safe and reliable operation of the distribution network. Therefore, this study proposes a distribution network state evaluation method using a deep reinforcement learning model. The proposed terminal state evaluation model is composed of the Q-reinforcement learning network and convolutional neural network, which has the strong model antinoise ability and effectively extracts the characteristic information of the sample dataset. At the same time, triangle and semi-trapezoid membership functions are introduced to support the reliability. The simulation experiment is based on the actual sample dataset of the distribution terminal. The experiment proves that the recognition accuracy and root mean square error of the proposed method are 94.23% and 232.99, respectively, which can accurately and stably evaluate the operation status of distribution terminals.
Although the method proposed can accurately identify the distribution station area data l data, its parameters are fixed values, which are difficult to automatically adjust according to the data characteristic information. The next research work is to introduce the parameter adaptive algorithm into the model to enhance the ability of parameter optimization and further improve the evaluation efficiency.
Data Availability
The data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare that they have no conflicts of interest.