Abstract
Reliable data transfer seems a quite challenging task in underwater sensor networks (UWSNs) in comparison with terrestrial wireless sensor networks due to the peculiar attributes of UWSN communication. Therefore, K-means and ant colony optimization-based routing (KACO) is proposed in this paper. In KACO, network area under water is divided into layers with regard to the depth level. And nodes of each layer are divided into clusters by the optimized K-means algorithm. The K-means algorithm is used to cluster nodes. Considering the shortcoming of K-means clustering, an improved K-means clustering is used to select the initial cluster center. In the stage of selecting cluster heads, the remaining energy of nodes and the distance from the sink node are used to calculate the competing factors of nodes, and then, the cluster heads are selected according to the competing factors. In the intercluster routing, the ant colony optimization (ACO) was improved by introducing the Gini coefficient, and the intercluster routing based on improved ACO is proposed. The simulation results show that the proposed KACO routing can effectively reduce the energy consumption of nodes and improve the efficiency of packet transmission.
1. Introduction
With the development of modern science and technology, human beings are more fully aware of the use and development of the ocean, due to its rich resource reserves and research value, which has prompted us to continuously explore the underwater space [1]. A group of interconnected sensor nodes through acoustic channel form a underwater sensor networks (UWSNs) [2]. UWSNs have been capturing attention from the scientific and industrial communities [3]. The use of underwater sensor nodes, capable with wireless communication capabilities, has the power to detect real-time underwater monitoring.
In UWSNs, sensor nodes are deployed in rivers or seas to detect the characteristic present in the water environment, such as temperature, pressure, and water quality [4]. Then, they forward their data to surface sink [5]. Received data of all nodes, the sink node will preprocess these data and forward toward offshore data center.
Figure 1 shows the typical structure of UWSNs, which consists of underwater sensor nodes, sink nodes, and offshore data center. These could be a sink node, or there could be multiple sink nodes, depending on application.

Considering the characteristics of underwater communication environment [6], underwater sensor nodes are provided with acoustic modems to communicate with each other wirelessly and transmit the data toward sink node. While each sink node has both acoustic and radio modems. The acoustic modem is used to receive data from the underwater sensor nodes. And the radio modem is used to transmit data to the offshore data center [7].
Efficient data collection is tedious in UWSNs. However, routing protocols implemented in terrestrial wireless sensor network (WSN) cannot be directly executed in UWSNs since that transmission medium of underwater environment is different with that of terrestrial WSN. In addition, acoustic communication itself has significant limitations, for example, high propagation delay, slow data rates, and absorption losses. Besides, sensor nodes are battery-powered. And it is hard to replace their batteries. Once energy of node is exhausted, it will immediately fail, which will cause network coverage vulnerability and greatly affect performance of network.
Therefore, these challenges can motivate the researches to design reliable and energy-efficient UWSN routing protocols. Clustering protocols have been widely used in terrestrial WSN. The basic idea of clustering protocol is to divide sensor nodes into some clusters. Each node in a cluster has chance to act as a cluster head. Cluster head (CH) takes charge of collecting data from other nodes in its cluster. The introduction of CHs can avoid long distance communication, and much energy is saved.
Therefore, K-means and ant colony optimization-based routing (KACO) is proposed in this paper. In KACO, the K-means algorithm is used to form clusters. Considering that the traditional K-means algorithm generates the initial cluster center in random way, node degree-based initial cluster center selection is introduced. In each cluster, the nodes ran for the cluster head in a distributed way based on residual energy of nodes and distance from sink node.
In addition, the improved ACO is used to construct the intercluster routing, and energy Gini coefficient is applied to construct the transition probability in order to balance energy consumption of clusters.
The rest of this paper is organized as follows: Section 2 presents related works. Section 3 describes the background. The proposed routing is described in Section 4. Section 5 deals with the simulation setup and discussion of the proposed routing. Finally, we conclude the paper in Section 6.
2. Related Works
Reference [8] has discussed the application of clustering routing in UWSN and has confirmed that clustering routing is also suitable for underwater environment. For example, reference [9] has proposed layers and unequal clusters-based energy efficient clustering routing (LUER). In LUER, routing decisions are made based on both link quality and residual energy of nodes. Unfortunately, location of nodes has not taken into account in routing process.
Except for clustering routing, researchers also have proposed other routing protocol. For example, reference [10] has proposed depth-based routing (DBR) protocol. In DBR, routing decisions are made based on depth of nodes. However, it has not taken full advantage of clustering technique, such as data transmission efficiency and reducing energy consumption of nodes.
Reference [11] proposed an energy-efficient multilevel adaptive clustering routing algorithm (ACUN). The algorithm adopts multilevel hierarchical network structure to determine the size of the competition radius. Node with larger residual energy is selected as CH. It can avoid early death of CHs. However, ACUN algorithm has not discussed intercluster routing. In fact, the energy consumed by CH in the routing phase is negligible. In addition, reference [12] proposed sparsity-aware and energy-efficient clustering algorithm. The power control mechanism is used to improve energy efficiency. The algorithm balances the energy consumed in view of power control mechanism rather than routing mechanism.
Reference [13] proposed an efficient metaheuristic-based clustering with routing protocol. The goal of the protocol is to elect an efficient set of CHs and route to destination. The protocol involves the designing of cultural emperor penguin optimize-based clustering techniques to form clusters. However, cultural emperor penguin optimize-based clustering techniques are too complex. In addition, the uniformity of cluster distribution is not guaranteed.
3. Background
3.1. Network Model
underwater sensor nodes (nodes) are deployed randomly in-three dimensional network field . These nodes form a set . They are equipped with acoustic modems in order to communicate with other nodes in underwater environment.
Without loss of generality, we assumed that our network scenario was similar to the networks in [14–17]. The assumptions of the network can be described as follows: (1) each node knows its location upon first deployment [17]; (2) each node acquires its current level of depth with the help of pressure sensor; and (3) All nodes have the same initial limited energy, except the sink node, which has an unlimited power supply.
A sink node at the surface has acoustic and radio modems. The sink node firstly collects data from nodes in underwater environment through acoustic link, and transmits the collected data to offshore data center through radio link, as shown in Figure 2.

In order to improve the efficiency of transmitting data, the depth of deployment has been divided into layers. The number of layer is given by , where is the depth of network deployment and is the transmission range of the node and represents ceil function.
In initialization phase, is loaded into hello message, and sink node broadcasts the hello message in the whole network. Once received hello message, the node extracts from it. Therefore, each node can compute its layer where is the depth of node and represents the number of layer of node .
Sink node locates in the first layer since that it is at the surface. Nodes in each layer are divided into clusters, and each cluster consists of a CH and cluster members, as shown in Figure 2.
3.2. Energy Consumption Model
Energy consumption model implemented in terrestrial WSN cannot be directly executed in UWSN since that characteristic of acoustic wave in underwater environment is different from that of radio wave. The energy consumption model was similar to the models in [18, 19].
Energy consumption of node is denoted as when it transmits bits of data over distance where ““represents multiplication operation, is the energy consumption to transmit a bit data, and is data rate. is the power of transmitting data, which is defined as where is the minimum power required at the receiver and is the distance between transmitter and receiver. is the absorption coefficient, which is defined as where is frequency.
Energy consumption of node is denoted as when it has received bits of data, which is defined as follows: where is energy consumption parameter of received device, which is a constant dependent on the device.
Energy consumption of a CH or sink node is denoted as when it has fused bits of data, which is defined as where is the energy consumed by fusing a single bit of data.
4. KACO Routing
KACO routing is mainly composed of four stages: initialization phase, cluster formation phase, CH selection phase, and intercluster routing establishment phase.
4.1. Initialization Phase
Initially, sink node broadcasts a hello message including serial number of message and location of sink node, as shown in Figure 3. is position vector of sink node.

Received hello message from the sink node, the sensor node checks whether it is the first time to received it. If it is, the sensor node will extract the position vector of sink node from it and store the position vector. Subsequently, the position vector, ID, and layer level of the node are loaded into the hello message. Otherwise, the hello message is discarded. Figure 3 summarizes the details of the hello message structures used.
Afterward, node transmits the hello message toward its neighbor nodes. Node will discard the hello message if it is not received the message for the first time.
Received hello message forwarded by other sensor nodes, the sensor node , extracts the relevant information of the sender from it ,and constructs its on-hop neighbor nodes set, including position vector, ID, and layer level of node. The same layer neighbor node set of node is denoted as . In other words, the layer of any node is the same to that of node.
4.2. Clustering Process
The clustering process is the method used to divide nodes in the same layer into groups such that energy consumption is more balanced.
Number of clusters is vital key in clustering process, which should be assigned prior to the clustering process. Therefore, the number of nodes in each layer is used to compute the accurate optimal number of clusters. Let represents the node set in the layer, where . The optimal number of clusters is given by : where is the number of sensor nodes in .
4.2.1. Node Degree-Based K-Means Clustering Algorithm
The basic principle of K-means algorithm [20] is that the objects are divided into groups. It is implemented in iterative way, namely, iteration will stop until the square error criterion value is minimum or the maximum number of iterations is reached.
Therefore, the nodes in each layer are divided into clusters with the help of K-means algorithm as a result of its simplicity and efficiency. The specific implementation process is shown in Algorithm 1.
|
Input of Algorithm 1 is followed as (1) node set of th layer, denoted as , and (2) optimal number of clusters in th layer, denoted as . Initially, random centroids are selected from , and these centroids are located into initial centroid set, denoted .
Other nodes compute the distance from these centroids, and the centroid with minimum distance is selected to be a cluster center and then form a cluster set .
Afterward, the centroid position of each cluster is calculated, which is defined as where is the position of node in and is the number of nodes in .
Square error criterion function is defined as
Iteration will stop when the square error criterion value is minimum or the maximum number of iterations is reached. If the above conditions are not met, Step 2 is done, as shown in Algorithm 1. is the threshold value.
Although K-means algorithm is simple and efficient, it still has a problem: initial centroid set is selected in a random way. These selected points may be isolated point or remote points, or distance among the selected points is too close, so that the distribution deviation of clusters is too large. The clustering result is wrong, and even the convergence time of the algorithm in the later period of operation is longer.
Therefore, Algorithm 1 is improved, namely, in Step 1 in Algorithm 1, the initial centroid set is selected according to node degree rather than random way. The specific process is as follows.
First, each node in layer computes the number of neighbor nodes, which is called node degree. Then, the set is sorted on node degree in descending order. Let represents the sorted on node degree in descending order. The node with the largest node degree was selected as the first point in initial centroid set. In other words, the first node in is considered to be the first point in initial centroid set.
For simplicity, let represents first node in . The one-hop and same layer neighbor node of node are removed from , namely, , where is the one-hop and same layer neighbor node set of .
Then, the first node in is added into initial centroid set . Repeat the above process until there are nodes in set . Algorithm 2 shows the process of constructing initial centroid set .
|
4.2.2. Cluster Head Selection Process
Next, each node in cluster calculates its competition factor, which is defined as where is the competition factor, is the distance between node and sink node, is residual energy of node , is the initial energy of each node in network, and and denote the coefficient such that .
From the definition, the farther the distance from sink node is, the greater of is. The more residual energy is, the greater of is. Therefore, the nodes with large competition factor are preferentially selected as CH.
In KACO, the CHs are selected using a back-off timer [21]. The timer value is inversely proportional to competition factor of node. For instance, the back-off timer value will be low if the competition factor is high, and vice versa.
Therefore, the timer value of node is set, which is defined as where is the shortest wait time. > due is less than 1.
Obviously, back-off timer value will be expired soon for the nodes with competition factor. Once the back-off timer reaches zero, the node broadcast ADV_CH message within its cluster and declares itself as cluster head. When any of the nodes in cluster have received ADV_CH message before its timer expires, it gives up competing for cluster head in this round.
The entire process is depicted as shown in Figure 4. Firstly, the competition factor is calculated using Equation (10). Then, the back-off timer value is set using Equation (11). Each node listens to decide if the ADV_CH message is transmitted by a neighbor node. The node will give up competing for CH in this round if the ADV_CH message has been transmitted. Otherwise, the node waits and sends an ADV_CH message when it its timer expires.

Once forming cluster, the CH schedules the transmission of all cluster members as a time division multiple access (TDMA).
4.3. Multihop Routing
When a CH has collected data from all nodes in its cluster, it needs to transmit these data toward sink node. The CH directly transmits these data if the distance between the CH and sink node is less than . Otherwise, the CH transmits these data in multihop routing. In this case, the cluster head needs to select a relay node (next-hop forwarding node).
In the intercluster routing, the ant colony optimization (ACO) was improved by introducing the Gini coefficient [22], and the intercluster routing based on improved ACO is proposed.
4.3.1. Gini Coefficient-Based State Transition Optimization
To balance energy consumption among CHs is the key to intercluster routing. Gini coefficient is a statistical index in economics to measure the balance degree of income distribution in a region. It can effectively and abstractly represent the difference of income distribution among individuals. Therefore, Gini coefficient is introduced into intercluster routing. And energy Gini coefficient is used to estimate energy equilibrium degree of clusters.
Firstly, energy Gini coefficient is defined as where is the CH. Assume that a relay node is selected by . The relay node is charge of forwarding data of toward sink node.
is the neighbor CHs set, namely, relay nodes set. and are the number of nodes in and , respectively. is the residual energy of in . is the residual energy of in . is the average energy of node in .
At time , forward ant calculates the transition probability from to , which is defined as where is the amount of pheromone trail on path from to at time and and are parameters that determine the relative influence of the pheromone trails. represents visibility value, which expression is given by where is the residual energy of , is the distance between and , and is the distance between and sink node. As known from Equation (14), both residual energy of candidate CHs and position of candidate CHs are used in the definition. Balance energy consumption of nodes in the selected CHs is ensured by introducing energy Gini coefficient in state transition function.
4.3.2. Updating Pheromone
In order to optimize the efficiency of ACO algorithm, the updating pheromone process is divided into local pheromone updating process and global pheromone udpating process [23].
The rule of updating pheromone is defined as follows: where is the evaporation rate of pheromone and is the quantity of pheromone laid on path from to.
When the forward ant has moved from to , the local pheromone updating rule is applied: where is the local pheromone concentration.
It will automatically disappear, and the corresponding backward ant is generated when the forward ant has reached the sink node. The backward ant returns to source along reverse path, and global pheromone updating rule is applied: where the global pheromone concentration, is the length of the ant’s path, and is the minimum energy of all cluster heads in the path.
The flowchart of discovering intercluster routing is shown in Figure 5. Ant on the source selects next-hop CH according to transition probability shown in Equation (13) and updates local pheromone. When the ant has reached the sink node, is computed. Then, the global pheromone has updated. Accordingly, to discover intercluster routing is completed, a new round of discovering intercluster routing will be started. Repeat the process until the maximum number of iterations is reached. In Figure 5, is the iteration number, and is the maximum number of iterations.

5. Performance Evaluation
5.1. Simulation Environment
Simulations are conducted to evaluate the performance of KACO routing using MATLAB 2016a on local PC with an Intel i7 8th generation 3.20GHz process, 16G of RAM, and the Windows 10 Platform.
underwater sensor nodes are distributed randomly in area. A sink node is located at the center of the surface, as shown in Figure 2. Table 1 summarizes the simulation parameters [17, 24] used in simulation.
Simulation results of proposed work are presented against two existing state of the art schemes: DBR and LUER. The performance is evaluated based on number of dead nodes, residual energy, and received packets at the sink node (RPSN).
5.2. Simulation Results and Discussions
5.2.1. Number of Dead Nodes
Firstly, a number of dead nodes of KACO, DBR, and LUER are analyzed when the number of nodes varies from 50 to 450. Dead node is that have consumed 95% of their energy.
Figures 6 and 7 shows the influence of number of nodes on number of dead nodes, where transmission radius of node is 200 m in Figure 6 and transmission radius of node is 150 m in Figure 7.


As can be seen from Figures 6 and 7, the number of dead nodes is less than DBR routing and LUER routing. The reason is as follows: in KACO routing, the K-means algorithm is used to form clusters so that the distribution of cluster is more even.
In addition, the improved ant colony algorithm is applied in discovering intercluster routing process, and the energy Gini coefficient is used to make the energy consumption among clusters more balanced. These factors make energy consumption of nodes in the network more balanced so that the number of dead nodes is reduced.
In DBR routing, the cluster has not been considered. And the next-hop forwarding node is selected based on depth of nodes. So, the path of transmitting data is longer to increase energy consumption of nodes. In LUER, routing decisions are made based on both link quality and residual energy of nodes. Unfortunately, location of nodes has not taken into account in routing process.
In addition, by comparing Figure 6 with Figure 7, it is not difficult to see that an increase in transmission radius is conducive to reducing the number of dead nodes. The reason is as follows: The larger the transmission radius of a node is, the larger the transmission range of the node is, and the shorter the hop number of the path is, which reduces the energy consumption of transmitting data.
5.2.2. Residual Energy of Nodes
In this section, average residual energy of nodes is analyzed when the number of nodes varies from 50 to 450. Figures 8 and 9 show the influence of average residual energy of nodes on number of dead nodes, where transmission radius of node is 200 m in Figure 8 and transmission radius of node is 150 m in Figure 9.


Average residual energy of is reduced when the number of nodes are increased, as in Figures 8 and 9. The reason is that the more nodes there are, the more packets generated and the more packets the have to transmit, which increases the energy consumption of nodes. Compared with DBR and LUER routing, KACO routing can effectively reduce energy consumption of nodes.
In addition, by comparing Figure 8 with Figure 9, it is not difficult to see that an increase in transmission radius is conducive to reducing energy consumption of nodes. For example, average residual energy of nodes in KACO is increased to 3 J when the transmission radius of node is from 150 m to 200 m and number of nodes is 450.
5.2.3. Received Packets at the Sink Node (RPSN)
Finally, the RPSN of KACO, DBR, and LUER routing is analyzed. The more packets the sink node receives, the better the routing performance is. In Figures 10 and 11, the total received packets at the sink node are evaluated in two cases. In Figure 10, transmission radius of sensor node is 200 m, and in Figure 11, transmission radius of sensor node is 150 m.


As known from Figure 10, the more nodes are, the more packets the sink node receives, which is as expected. The more nodes are, the more packets will be generated, and the more nodes are, the better the connectivity of the network will be, which is beneficial for the sink node to receive packets. However, as the number of nodes increases, the RPSN rises slowly. The reason is that when the number of nodes increase to a certain number, the heavier the network burden is, the more energy the nodes consume, which result in the number of dead nodes (as shown in Figures 7 and 8).
As known from Figure 11, compared with DBR and LUER routing, the proposed KACO routing enables the sink node to receive more packets. This is attributed to the fact that the energy consumption of nodes is reduced, and the number of dead nodes is reduced by establishing intercluster routing based on improved ant colony optimization algorithm, so that the more packets are successfully transmitted to sink nodes. In addition, the communication range of nodes is extended, and the received packets at the sink increase when the transmission range increase.
6. Conclusions
Aiming at the routing problem in UWSNs, K-means and ant colony optimization-based Routing (KACO) has been proposed. In KACO, the clustering and depth of nodes are used to construct energy-efficient cluster routing. So, efficiency of transmitting data is improved to balance energy consumption among nodes.
The research work is limited to isomorphic networks (all nodes are same). In fact, the different types of nodes may have different data priorities. In the future, the issue will be addressed. In addition, data aggregation techniques can be designed for UWSN in the future.
Data Availability
If you need data, please contact me by email.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was partly supported by the Natural Science Foundation of China (No. 61772234).