Abstract
Wireless sensor network (WSN) can effectively solve the problems of weak coverage, blind coverage, and low survivability of smart substation communication networks by deploying multiple relay nodes and adopting multihop transmission. However, there are still some challenges in the traditional relay selection strategy of WSN in substation, including incomplete information and the selection conflicts among multisource nodes. In this paper, we propose a matching learning-based relay selection mechanism for WSN-based substation power Internet of things (SPIoT). Firstly, considering the electromagnetic interference caused by the operation of high-voltage equipment, a multihop transmission model of SPIoT is built. Furthermore, based on the upper confidence bound (UCB) algorithm and matching theory, a matching learning-based relay selection (MLRS) algorithm is proposed to minimize the energy consumption of SPIoT devices. Simulation results demonstrate that MLRS outperforms existing algorithms in terms of energy consumption and optimal selection probability.
1. Introduction
Substation has great significance to ensure long-distance power transmission and stable power supply [1]. To achieve 24-hour uninterrupted monitoring of the substation, a large number of substation Internet of things (SPIoT) devices are deployed to collect various types of information including temperature, humidity, and smoke [2, 3]. However, traditional fiber-optical communication cannot meet the on-demand coverage and data transmission requirements of SPIoT due to poor scalability and high construction cost [4, 5]. Wireless sensor network (WSN) has the advantages of flexible networking and low deployment cost. WSN adopts multihop transmission to realize coverage enhancement and increase fault tolerance for on-demand monitoring of substations [6, 7]. It is complementary to fiber-optical communication and acts as an effective enabler for SPIoT.
In multihop transmission of WSN-enabled SPIoT, the source node selects relays for data transmission to shorten the transmission distance and enhance coverage under strong electromagnetic interference [8, 9]. To fully utilize the spectrum and energy resources, relay selection needs to be optimized dynamically according to the network state and service requirements. However, relay selection optimization in SPIoT still faces several critical challenges as below [10].
First, the relay selection strategies are coupled across different devices when multiple devices compete for the same relay. Each device faces an adversarial relay selection problem in which its strategy is affected by the strategies of other devices. Second, the global state information (GSI), including channel state information (CSI) and electromagnetic interference [11], is uncertain due to the limitations of network sensing capability and signaling overhead. The devices are forced to optimize relay selection under incomplete state information. Finally, SPIoT devices impose strict requirements on energy consumption due to limited battery capacity. It is necessary to take the optimization of energy consumption into consideration, thereby improving the sustainability of SPIoT networks.
There exist some works that have addressed relay selection problems in IoT. In [12], Muller and Speidel proposed optimal relay selection strategies aiming at either maximizing the mean mutual information or minimizing the outage probability. In [13], Mousavi et al. proposed a relay subset selection method for two-hop WSN. However, these works ignore the optimization of energy consumption, and it is infeasible for SPIoT devices with limited battery. In [14], an analysis model for relay selection under the constraint of energy consumption was developed. In [15], Bakhsh et al. proposed a low-energy distributed relay selection algorithm to achieve reliable communication. However, the above studies have neglected the decision coupling among multiple devices.
Matching theory provides a method to solve combination problem and is widely used in relay selection optimization. In [16, 17], Wang et al. and Baidas et al. proposed a relay selection method based on matching theory, but the establishment of matching preference list requires complete GSI. In substations with dynamic network environment and complex electromagnetic interference, the preference list cannot be constructed without complete GSI, thereby making the traditional matching theory-based relay selection approaches unsuitable.
Reinforcement learning (RL) provides a powerful tool to deal with multistage decision-making problems under incomplete information [18, 19]. In [20], Su et al. proposed a deep RL-based relay selection scheme to achieve lower outage probability and higher utility. In [21], Liang et al. modeled the resource allocation problem in vehicular networks as a semi-Markov decision process and utilized DL algorithms to solve the problem. However, when dealing with problems with high-dimensionality spaces, RL invokes the dimensionality curse and has inferior performances in optimality and convergence [22, 23]. The relay selection strategies obtained by RL are unstable, and the influence of electromagnetic interference is ignored.
Motivated by the aforementioned challenges, we propose a matching learning-based relay selection (MLRS) algorithm to minimize the energy consumption of SPIoT devices. First, considering the electromagnetic interference of substations, a two-hop transmission model for SPIoT with electromagnetic interference is established. Second, we transform the relay selection problem into a one-to-one matching problem between multisource nodes and multirelay nodes. Based on the upper confidence bound (UCB) algorithm, the SPIoT gateway learns to build the preference lists of source nodes based on the number of selections and empirical performances. Third, the matching conflicts between multisource nodes and multirelay nodes are resolved based on matching with rising price. Finally, we compare MLRS with existing relay selection algorithms to validate its performance. The major contributions are presented as follows:(i)Learning-based matching preference construction without precise GSI: MLRS utilizes UCB to learn to construct preference lists based on historical observations of relay node selection times and empirical energy consumption performances. MLRS enables the implementation of iterative matching without precise GSI and achieves an effective compromise between exploration and exploitation.(ii)Stable matching between source and relay nodes: MLRS utilizes matching theory to avoid selection conflicts between multiple sources and relay nodes, which achieve a stable matching based on the learned preference lists. In MLRS, the conflicts among multiple nodes are resolved by iteratively raising matching prices, and each source node is allocated with the most suitable relay node that ranks highest in its updated preference lists.(iii)Low energy consumption: MLRS can dynamically learn the relay selection preference, i.e., the historical energy consumption, thus effectively reducing transmission energy consumption based on the optimal relay selection strategy. Extensive simulations are carried out to demonstrate the low energy consumption performances of MLRS compared with existing relay selection algorithms.
The reminder of the work is organized as follows. Section 2 introduces the SPIoT model. Section 3 presents MLRS. Section 4 presents simulation results. Section 5 concludes this article.
2. System Model
A relay selection model of SPIoT network considering electromagnetic interference in complex substation environment is shown in Figure 1. The SPIoT network consists of two parts, i.e., SPIoT devices and gateway. The gateway provides decision-making and computing services for SPIoT devices. The SPIoT network is a many-to-one two-hop transmission network that includes source nodes, relay nodes, and one destination node. Each source node collects various types of information including temperature, humidity, smoke, and gas and transmits the collected data to a relay node. The relay node receives the collected data from the source node and forwards the data to the destination node. Specifically, decode-and-forward relay node is adopted to avoid the amplification of noise power. The destination node is the substation interval measurement and control cabinet, which serves as the receiving end of monitoring data. Denote the sets of source nodes and relay nodes as and , respectively. Denote the destination node as . The total optimization period is divided to time slots, the set of which is indexed as . At the beginning of the th slot, the source node has to transmit amount of data to the destination node. The th slot ends when all the data have arrived at the destination node. Considering the impact of the dynamic CSI and electromagnetic interference on the transmission delay, the slot duration is unequal. In each slot, the gateway learns the optimal relay selection strategy for each source node based on the historical performance and sends the relay selection strategy to source nodes. The transmission delay and energy consumption models are introduced as follows.

2.1. Transmission Delay Model
In the th slot, the source node selects a relay node, e.g., , to transmit amount of data to the destination node based on the relay selection strategy from the gateway. Let represent that selects in the th slot; otherwise, . The transmission process is divided into two hops. The first hop is from the source node to the relay node, and the second hop is from the relay node to the destination node.
The signal-to-noise ratio of the first-hop transmission from to is given by [24, 25]where represents the transmission power of , represents the channel gain between and , represents the noise power, and represents the electromagnetic interference power during the first-hop transmission. According to Shannon’s formula [26], the first-hop transmission rate is given bywhere represents the bandwidth. Therefore, the first-hop transmission delay from to is given by
In the th slot, the signal-to-noise ratio of the second-hop transmission from to the destination node is given bywhere represents the transmission power of , represents the channel gain between and , and represents the electromagnetic interference power during the second-hop transmission. The transmission rate from to in the second hop is given by
Therefore, in the th slot, the second-hop transmission delay from to is given by
Therefore, the total transmission delay from to through in the th slot, i.e., the length of slot , is the sum of the first-hop transmission delay and the second-hop transmission delay, which is given by
2.2. Energy Consumption Model
The energy consumption of the two-hop data transmission includes transmission energy consumption and reception energy consumption. The transmission energy consumption includes data transmission energy consumption and transmitting circuit energy consumption. The reception energy consumption is receiving circuit energy consumption. The first-hop data transmission energy consumption from to is given by [27]
The second-hop data transmission energy consumption from to is given by
The transmission circuit energy consumption of is given bywhere is energy consumption coefficient of transmission circuit.
The transmission circuit and reception circuit energy consumption of is given bywhere is energy consumption coefficient of transmission circuit and reception circuit.
The reception circuit energy consumption of is given bywhere is energy consumption coefficient of reception circuit.
The total energy consumption is the sum of transmission energy consumption and reception energy consumption, which is given bywhere the first and second terms represent the energy consumption of the first-hop transmission from to , the third term represents the forwarding energy consumption of , and the fourth and fifth terms represent the energy consumption of the second-hop transmission from to .
2.3. Problem Formulation
We address the relay selection problem for SPIoT under dynamic CSI and electromagnetic interference. The relay selection problem is formulated aswhere defines the value of the selected indicator variable and and ensure that there is no selection conflict in each time slot. In other words, a source node selects at most one relay node for data forwarding in each time slot, and a relay node is selected by one source node at most.
3. Matching Learning-Based Relay Selection for SPIoT
In this section, the implementation process of the proposed MLRS algorithm for two-hop data transmission in SPIoT is elaborated.
3.1. Problem Transformation
is intractable because the relay selection strategies of all source nodes are coupled. Based on matching theory [28, 29], the selection problem between multisource node and multirelay node can be transformed into a one-to-one matching problem between source nodes and relay nodes. To solve the problem of incomplete global information in preference lists’ construction of matching theory, MLRS utilizes UCB to enable the implementation of iterative matching without precise GSI and utilizes matching theory to avoid selection conflicts, which achieves a stable matching based on the learned preference lists. Since the relay node is unconditionally selected by the source node for data forwarding, the unilateral matching theory is adopted [30]. The definition of matching is given below.
Definition 1. (matching). A matching is a one-to-one correspondence of the set to itself. means , i.e., and are matched. Otherwise, means does not match with any .
However, the traditional matching theory needs to construct a preference list for each source node based on global information, which is not suitable to real-world SPIoT scenario, due to the dynamic changes of electromagnetic interference and the uncertain global information. Reinforcement learning can solve decision-making problems with incomplete information through continuous interaction with the environment, thereby enabling a high-precision and low-complexity preference list construction. It has the advantages of fast convergence and strong adaptability.
3.2. The Proposed MLRS Algorithm
Based on the multiarmed bandit (MAB) framework in reinforcement learning [31, 32], the source nodes and the relay nodes are modeled as players and arms, respectively. In the th slot, selects to transmit data, and the performance of can only be observed afterwards. The UCB algorithm is an effective method to solve the MAB problem. It is employed by MLRS to construct the preference list. MLRS is mainly divided into two steps: UCB-based preference list construction and iterative matching with rising price.
3.2.1. UCB-Based Preference List Construction
Denote the historical energy consumption of up to slot as , i.e., the empirical estimation of the two-hop transmission energy consumption . Denote the total number of times that selects up to slot as , which is given by
Based on the UCB algorithm and the optimization objective , the matching preference value of towards is constructed to minimize the total energy consumption of the two-hop transmission, which is given bywhere the first item represents the empirical performance of up to slot . The matching preference value of towards decreases as the historical energy consumption increases. The second item is the confidence bound, which represents the estimation uncertainty. It decreases as increases, indicating that the estimated performance gradually approaches the actual expected value. represents the weight for exploration. represents the matching cost of . The initial value of is set as zero.
Denote the preference list of source nodes toward all the relay nodes as . Sort all , in the descending order to construct the preference list of the source nodes.
3.2.2. Iterative Matching with Rising Price
The SPIoT gateway makes matching decisions based on the following steps. The implementation procedure is provided in Algorithm 1, which includes three steps. Step 1: initialization. Set , , , , and , where is denoted as the set of relay nodes that have received more than one matching request. Step 2: iterative matching. Repeat , each makes a request to its most preferred according to . For any , it is matched with the source node which sends the initial matching request. If has been requested by more than one source node, add into . If Each raises its price as where represents the rising step. Each update based on and renew its request. Repeat price rising process until is not requested by more than one source node. Remove from . For that , the data transmission is suspended and waits for the next iteration. Until . Finally, the gateway sends the result to source nodes. Each source node selects and transmits data. Step 3: learning. Observe the transmission performance and get the energy consumption . Update and as
|
3.3. Performance Analysis
Theorem 1 (stability). For any , there is no situation, where prefers than and is stable.
The detailed proof is in [33, 34].
Theorem 2 (convergence). Due to the stability derived in Theorem 1, MLRS is convergent.
Proof. Based on proof by contradiction, we assume MLRS is not convergent. Hence, there exist preferring when and . Due to the higher preference value of towards , should be requested by before is requested by . However, is matched with , i.e., refused , which is in contradiction with the assumption. Therefore, MLRS is convergent.
Complexity of MLRS: the computational complexity of MLRS consists of three parts. In the first step, the computational complexity is . In the second step, we assume that MLRS takes iterations to resolve matching conflicts. When the number of source nodes is greater than or equal to the number of relay nodes, i.e., , the computational complexity is , while the computational complexity of the enumeration method is . In the third step, the computational complexity is . Therefore, the overall computational complexity of MLRS is .
4. Simulation Results
In this section, we validate the performance of MLRS by simulations. We consider a scenario with 10 source nodes, 15 relay nodes, and 1 destination node. The number of time slots is 500. The transmission power and are set as W. The channel gains from the source node to the relay node and from the relay node to the destination node all satisfy the normal distribution [35], where is the distance. The simulation parameters are summarized in Table 1 [36–38]. We consider two state-of-the-art algorithms for comparison. The first one is the traditional UCB algorithm [39], where each source node calculates its preference value for each relay node based on historical performance and sends a data transmission request to the favorite relay node. Considering the case of matching conflicts, the source nodes with conflicts’ preferences towards a relay node will be randomly matched with the remaining unmatched relay nodes. The second one is the energy efficiency performance selection (EEPS) algorithm [40], where each source node sends a transmission request to the relay node with the best historical performance, and the rest process is the same as the traditional UCB algorithm.
The -stable distribution is employed to describe the electromagnetic interference. The characteristic function of the electromagnetic interference power from to is given bywhere is the characteristic exponent, is the skew parameter, indicates a symmetric -stable distribution, is the location parameter, and is the scale parameter [41].
Figure 2 shows the average energy consumption versus time slot. When , compared with UCB and EEPS, the average energy consumption of MLRS is reduced by 17.49% and 24.22%, respectively. The reason is that the random selection mechanism in UCB and EEPS leads to unmatched source nodes and suspended data transmission when multiple source nodes compete for the same relay node. Moreover, UCB and EEPS randomly assign unmatched source nodes to relay nodes. As a result, these source nodes may be matched with some relay nodes with poor historical performance, leading to higher energy consumption. MLRS can effectively resolve matching conflicts between multisource nodes and multirelay nodes based on dynamically learned relay selection preference.

Figure 3 shows the optimal relay node selection probability versus time slot. In the initial stage for all the algorithms, the optimal relay node selection probability is low. The probability increases gradually with the number of selections and finally converges as to 68%, 54%, and 34% for MLRS, UCB, and EEPS, respectively. MLRS always outperforms UCB and EEPS. The reason is that MLRS adopts a tendency exploration scheme instead of random selection, which can achieve an effective compromise between exploration and exploitation.

Figure 4 shows the optimal relay node selection probability versus the number of relay nodes. The optimal relay node selection probability of MLRS converges around 67% and outperforms UCB and EEPS by 18.08% and 18.20%, respectively. When the number of relay nodes increases, i.e., the network topology becomes more complex, MLRS can better adapt to complex network topology with more relay nodes and learn the most appropriate relay selection strategy by combining with learning-based matching preference lists’ construction and matching-based conflict resolution. However, with the increase of the number of relay nodes, the problem of matching conflict becomes prominent in UCB and EEPS, which leads to sharp performance decrease.

Figure 5 shows the average energy consumption versus . As increases, the performance of MLRS improves. When , the algorithm performance is optimal. A too small leads to biased preference towards exploitation and inability to explore potential better options, while a too large leads to inability to exploit existing optimal option.

Figure 6 shows the average energy consumption versus the electromagnetic interference intensity. We divide the electromagnetic interference into five levels by changing the location parameter of the -stable distribution, which are summarized in Table 2. As the electromagnetic interference intensity level increases, compared with UCB and EEPS, MLRS always has the lowest average energy consumption. The reason is that MLRS can learn the optimal relay selection strategy and resolve the matching conflict regardless of the electromagnetic interference level, which verifies that MLRS shows adaptability to various wireless environment.

5. Conclusion
In this paper, a novel relay selection algorithm named MLRS was proposed for SPIoT. MLRS can minimize the energy consumption of SPIoT devices under complex electromagnetic interference through relay selection optimization without GSI. Simulation results indicate that, compared with UCB and EEPS, MLRS reduces the energy consumption by 17.49% and 24.22%, respectively. In the future, we will focus on the joint optimization of multiple quality of service (QoS) performance metrics including delay and throughput, considering the differentiated QoS requirements of SPIoT.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was partially supported by the State Key Laboratory of Alternate Electrical Power System with Renewable Energy Sources, under Grant LAPS202125, and supported by the Open Research Fund of National Mobile Communications Research Laboratory, Southeast University, under Grant no. 2021D12.