Abstract

In the distributed infrastructure of fog computing, fog nodes (FNs) can process user requests locally. In order to reduce the delay and response time of a user’s requests, incoming requests must be evenly distributed among FNs. For this purpose, in this paper, we propose a blind load-balancing algorithm (BLBA) to improve the load distribution in the fog environment. In the proposed algorithm, the mobile device sends a task to a FN. Then, the FN decides to process that task using the Double--learning algorithm. One of the critical advantages of BLBA is that decision-making on tasks is done without any knowledge of the state of neighbor nodes. The proposed system consists of four layers: (i) IoT layer, (ii) fog layer, (iii) proxy server layer, and (iv) cloud layer. The experimental results show that the proposed algorithm with proper distribution of tasks between nodes significantly reduces the delay and user response time compared to the existing methods.

1. Introduction

Fog computing is a distributed computing model that extends cloud services to the edge of the network to facilitate the management and scheduling of computing, networking, and storage services between data centers and end devices. Both fog computing and cloud computing provide computing, storage, and networking services to end-users, but fog is closer to the end-user, thus providing minimal delay for Internet of Things (IoT) applications. FNs are located in a layer between IoT and the cloud data center. FNs can process data stream and user requests in real time, reducing network delay and congestion [13].

IoT devices typically assign processing tasks to the nearest neighbor node. In this case, some FNs may receive more tasks than other FNs and be overloaded over time. To avoid this situation, load-balancing methods are suggested to distribute loads over the nodes. Load-balancing in FNs refers to the even distribution of input tasks across a group of processing FNs so that the capacity of FNs is fairly utilized and task processing speed is increased [48]. FNs can allocate their tasks to underloaded neighbor nodes or the cloud through the load-balancing approach and reduce overload and processing delay as much as possible. The load-balancing approaches in the fog environment can be categorized as static and dynamic. Static load-balancing algorithms perform load-balancing and apply fixed rules to distribute task requests. On the other side, in dynamic load-balancing, the tasks are assigned dynamically to the FNs based on a long-term knowledge of load distribution. In other words, dynamic load-balancing approaches update their load-balancing rules frequently based on the new knowledge of traffic loads [9, 10]. Dynamic load-balancing algorithms can be divided into two categories: (1) sender-initiated techniques where congested nodes look for lightly loaded nodes and offload their tasks to them and (2) receiver-initiated strategies where underloaded nodes search for overloaded nodes and steal their tasks [11, 12]. In this paper, we propose a dynamic load-balancing method based on sender-initiated strategies for task distribution over FNs.

The nature-inspired load-balancing algorithms can be classified into three different types: heuristic, metaheuristic, and hybrid. The purpose of designing heuristics is to achieve the optimal response in a specified period [1315]. Met-heuristic algorithms require more execution time to achieve the optimal response, and these algorithms have a more extensive response space than heuristics [1619]. Hybrid algorithms combine heuristic or metaheuristic algorithms that reduce execution time and cost and provide more efficient results than other algorithms [2024]. The use of typical load-balancing methods increases resource utilization and resource savings, as well as reduces delay and response time. However, these algorithms may lose their efficiency because of the time-varying dynamics of traffic load in fog computing. Therefore, we need an algorithm that can adapt to dynamics in environmental conditions. For this purpose, we introduce a decision-making process based on the Double--learning algorithm to evenly distribute processing tasks among FNs. The main contributions of the proposed approach are summarized below: (i)Architecture. The proposed system considers a four-layer architecture to handle load-balancing problems in the fog environment. Because of this architecture, tasks are processed locally at FNs, and there is no need to transfer data to the cloud.(ii)Algorithm. This work proposes a decision-making process based on the Double--learning algorithm to find a low-load FN. The FN using the Double--learning algorithm selects an available neighbor FN or cloud to assign the task. This algorithm can be trained to maximize the long-term reward. In this algorithm, the agent makes decisions about the task processing without knowledge of the fog environment and only based on the observations and rewards. The results show that the load-balancing method based on the Double--learning algorithm has significantly reduced the delay and response time than the compared approaches.(iii)Mechanism. This work proposes a load-balancing method based on the Double--learning algorithm to distribute the tasks evenly among FNs with the goal of reducing processing time. To our knowledge, most of the methods presented in previous research for decision-making require knowledge of the capacity of neighbor FNs and the cloud, which creates a traffic load on the network and also delays the decision-making process. In the BLBA, the FN just decides based on the information obtained during the learning period from delay and rewards based on its own condition and has no knowledge of the status of neighbor nodes. In this method, the FN learns to assign the received task to a low-load node for faster processing.

Our algorithm operates in such a way that the nodes have no initial knowledge of the position of other nodes in the fog environment, and during the learning period, they act only on the basis of experiences related to their conditions and have no knowledge of other neighbor nodes. We refer to such an algorithm as the blind algorithm for the proper distribution of tasks between FNs to stress the fact that there is no prior knowledge of the status of neighbor nodes. This algorithm can be implemented on other networks and run on the fly.

We organized this paper as follows: In Section 2, we offer the related works. In Section 3, we describe the proposed system architecture, the reinforcement learning algorithm, and how to compute the delay in this system. In Section 4, we introduce the proposed load-balancing method. In Section 5, the simulation results and our analysis of these results are presented. Finally, Section 6 offers a conclusion and suggestions for future work.

In this section, we review the previous works on load-balancing using reinforcement learning. To minimize overload and reduce delay, it is critical to use an optimal load-balancing algorithm. In fog computing, IoT devices and mobile users typically assign their tasks to the nearest FN. Since these devices are often mobile, different FNs may have different loads depending on their position in the network. This causes an imbalance in the distribution of tasks between FNs, and some FNs may be overloaded, while other FNs are idle or low-load. In distributed environments, we can use reinforcement learning to design load-balancing algorithms that learn traffic patterns and automatically distribute the load evenly among the nodes. Some authors have used the benefits of reinforcement learning algorithms to solve the load-balancing problem. The existing studies can be classified from many perspectives. Here, we review several previous works that are aware of the capacity and load of the nodes.

2.1. Literature Review

Many of the previous studies are founded based on the assumption of knowledge of node capacity. Baek et al. [10] proposed a decision-making process based on reinforcement learning to find the optimal offloading decision with unknown reward and transition functions. In this method, FNs can send some tasks to an available neighbor FN. The purpose of this is to minimize overload probability and processing time. Xu et al. [25] introduced a dynamic resource allocation method for load-balancing in the fog environment. Technically, they presented a system framework for fog computing and the load-balancing analysis for various types of computing nodes. Then, they designed a corresponding resource allocation method in the fog environment through static resource allocation and dynamic service migration to achieve load-balancing for fog computing systems. Moon et al. [26] defined a computational task migration problem for balancing loads of vehicular edge computing servers (VECSs) and minimizing migration costs. To solve this problem, they adopt a reinforcement learning algorithm in a cooperative VECS group environment that can collaborate with VECSs in the group. The objective of this study is to optimize load-balancing and migration cost while satisfying the delay constraints of the computation task of vehicles. Wu et al. [27] proposed a reinforcement learning-based metadata dynamic load-balancing mechanism. This method can control the load dynamically according to the performance of the metadata servers, and it has good adaptability in the case of a sudden change in data volume.

Other studies are based on the assumption of knowledge of node loads or future load predictions. Razaq et al. [28] proposed a -learning-based algorithm for load-balancing in the fog environment, in which a task is divided into several pieces based on security requirements to help in privacy preservation. In this algorithm, the agent assigns a task piece to a node with an equal or higher security reputation than the security level of a task piece that can provide service to avoid overload on the nodes. Xu et al. [29] proposed a work donation algorithm based on reinforcement learning for distributed-memory systems to optimize load-balancing with minimized communication costs and dynamically adapt to flow behaviors and available network bandwidth. Then, they designed a high-order load estimation model to predict blockwise particle advection loads and used a linear transmission model to estimate interprocess communications’ costs. Mai et al. [30] suggested a reinforcement learning-based method that uses evolution strategies to assign tasks between fog servers to minimize processing latency in the long term. Talaat et al. [31] introduced a load-balancing and optimization strategy using a dynamic resource allocation method based on reinforcement learning and genetic algorithm. This method collects the load information for each server, handles the incoming requests, and distributes them between the servers evenly. Divya and Sri [32] proposed a reinforcement learning-based load-balancing method by combining software-defined networks and fog computing. The proposed method understands the network behavior and balances the loads to provide the maximum possible availability of the resources. Lu et al. [33] used improved deep reinforcement learning based on LSTM and candidate networks to solve tasks offloading in mobile edge computing. Li et al. [34] suggested a load-balancing method using an online reinforcement learning algorithm for load distribution in vehicular networks. This algorithm achieves a suitable association solution through continuous learning from the dynamic vehicular environment. Lin et al. [35] introduced a reinforcement learning-based approach aimed at load-balancing for data center networks. This approach employs reinforcement learning to learn a network and control it based on the learned experience. Li et al. [36] suggested an algorithm based on machine learning which is aimed at generating intelligent adaptive strategies related to load-balancing of collaborative servers and dynamic scheduling of sequential tasks. Based on the proposed algorithm and software-defined networking technology, the tasks can be executed cooperatively by the user device and the servers in the mobile fog computing network. Rikhtegar et al. [37] proposed a load-balancing method based on deep reinforcement learning for software-defined networking-based data center networks. This method uses the deep deterministic policy gradient algorithm to adaptively learn the link-weight values by observing the traffic flow characteristics. Kim and Kim [38] proposed an agent that uses a deep reinforcement learning algorithm to distribute requests between gaming servers. The agent has done this by measuring network loads and analyzing a large amount of user data.

2.2. Research Gap and Motivation

To our knowledge, most of the methods presented in previous research for decision-making require knowledge of the capacity of neighbor FNs and the cloud (e.g., [10, 26, 27]), which creates a traffic load on the network and also delays the decision-making process. Our work in this paper differs from previous works as in our method, the FN just decides based on the information obtained during the learning period from delay and reward based on its own condition and has no knowledge of the status of neighbor nodes. Our work in this paper enables load-balancing in a dynamic fog environment where the nodes have no information about each other. The application of the proposed scheme is not limited to a specific scenario, but its purpose is a subclass of problems.

3. System Model

In this section, we describe the proposed system architecture, the reinforcement learning algorithm, and how to compute the delay in this system.

3.1. Proposed System Architecture

In this paper, as shown in Figure 1, a four-layer architecture is considered for the proposed system. The first layer includes IoT devices that connect directly to FNs and send data to these nodes locally. The second layer is the fog layer. Fog servers can be located in different geographical locations and process data received from IoT devices in real time. The third layer comprises a proxy server that receives data from FNs and then sends this data to the cloud. The last layer in this structure is the cloud data center layer, which includes several servers and data centers. Because of this structure, data and information are processed locally at FNs, and there is no need to transfer data to the cloud.

FNs can allocate their tasks to low-load neighbor nodes or the cloud through the load-balancing method provided for the dynamic fog environment in this paper and reduce overload and processing delay as much as possible. Because of the dynamic of the fog environment, a variable number of mobile devices may be connected to each FN at any moment. The FN to which more mobile devices are connected receives more tasks than other FNs and will be overloaded. Load-balancing methods are used to evenly distribute tasks among FNs to avoid overload. The primary purpose of the load-balancing algorithm in the fog computing environment is to improve the response time so that it operates optimally even in dynamic conditions of the system. For this purpose, in this paper, the Double--learning algorithm is applied in FNs to improve delay, response time, and resource loss in the network. After receiving a task, each FN decides to use the Double--learning algorithm to process it or send it to a neighbor FN or cloud for faster processing.

3.2. Preliminaries on the Reinforcement Learning Algorithm

In this part, we review the background on reinforcement learning and Double--learning algorithm: (i)Reinforcement Learning Algorithm. In this paper, we formulate the load-balancing approach with the reinforcement learning algorithm. Specifically, the reinforcement learning algorithm maximizes the cumulative reward by selecting optimal action in each state of the environment [39, 40]. The proposed method formulates the load-balancing problem as a Markov decision process (MDP) for the dynamic fog environment. The MDP comprises a decision-making agent that continuously observes the current state of the system, selects an action from the allowed actions in that state , and then transitions to a new state and receives a reward for that action, which influences future decisions [41].(ii)Off-Policy Learning. The policy is a mapping from one state to one action, which determines how to deal with each action and how to make a decision in each of the different situations and is defined in the two forms of On-policy and Off-policy. In On-policy, the same policy is used for both optimization and action selection purposes. However, in Off-policy, two separate policies are used for optimization and select action [42]. In this paper, among the reinforcement learning algorithms, the Double--learning algorithm is used. The Double--learning algorithm is an Off-policy algorithm.(iii)Value Function. In the reinforcement learning algorithm, the value function is defined as the received long-term expected cumulative rewards, which has a long-term view, and for each state, a value is determined as follows:where is called the discount factor, which determines the importance of future rewards and shows that the current decision has more value than future decisions. (iv)Model. The model of the Double--learning algorithm is random, and its states are indefinite.

In a reinforcement learning problem, the agent explores the environment and learns to select the optimal action to maximize long-term reward. Hence, reinforcement learning in dynamic environments has many applications for optimization. In addition, it can be an excellent way to evenly distribute tasks between FNs.

3.3. Problem Formulation

We have formulated the proposed load-balancing problem as the MDP to achieve the desired performance. An MDP usually consists of , which are defined for the proposed load-balancing problem as follows: (i). It is the state space, where represents the capacity of FN, represents the forward queue size in FN, and represents the number of mobile devices connected to FN. Decision-making in the Double--learning algorithm is made based on the current state of the system. Most of the previous methods to define the state space require knowledge about the capacity of neighbor FNs. However, in the BLBA, the state of the system is defined only based on the status of the decision-maker FN, and this causes the decision-making to be done without any knowledge of the status of the neighbor nodes.(ii). It is the action space, where represents the selected FN or cloud to assign the task.(iii). The transition probability is a value between . The transition probability distribution to the next state by selecting action if it is in the state .(iv). It is the reward for selecting the action in the current state. The primary goal is to select the optimal action in each system so that the long-term value is maximized and the processing time and overload probability are minimized.

The task processing time is equal to the sum of the transmission delay and processing delay of that task in different devices. These delays are calculated as follows.

3.3.1. Task Transmission Delay

The transmission delay between two nodes is obtained from the sum of the waiting delay in the forward queue of the source node and the send delay on the communication channel between the two nodes, which is calculated as follows: where is the waiting delay in the forward queue of the source node and is calculated as follows: where represents the arrival time of the task in the queue of a node and represents the exit time of the task from that node. The parameters needed to calculate the delay are given in Table 1.

In (2), is the send delay on the communication channel between the two nodes and is calculated as where represents the distance between two nodes.

3.3.2. Reward Function

In the BLBA, the Double--learning algorithm runs on FNs, in which we defined the reward function as the negative of the processing delay. If the task processing delay is longer, as a result, less reward is received. In this paper, is calculated as follows: where represents the processing delay of the task assigned to the FN. is calculated in one of the following two ways: (i)If the node itself (FN-I) that received the task from the mobile device processes it, is calculated aswhere represents the task execution time in FN-I (ii)If the task is assigned to the neighbor node (FN-J) or the cloud for processing, is calculated aswhere represents the task transmission delay from FN-I to FN-J or cloud, represents the task execution time in FN-J or cloud, and represents the transmission delay the result of the task from FN-J or cloud to FN-I

3.3.3. Task Execution Time

After the node receives the task, that node allocates part of its capacity to execute this task. The task execution time in the FN-I, FN-J, or cloud is calculated as follows, where represents the number of task instructions:

3.3.4. Total Delay

Depending on which node will process the task, the task processing time is calculated as follows: where represents the task transmission delay from the mobile device to FN-I and represents the transmission delay of the result of the task from FN-I to the mobile device.

Because the proxy server only sends the task to the cloud and does not perform any processing, it is assumed that the send delay on the communication channel from FN-I to the cloud and vice versa is calculated directly and without considering the proxy server.

Each FN is an agent that is learning in the network. Any new task in the system causes the FN to perform an action in the environment and select one node to assign the new task.

The reward of the selected action will be specified when updating the state of the environment. If the current state of the system is closer to the load-balancing and the tasks are processed faster, the reward will be given to the agent; otherwise, no reward will be awarded to that. Through the rewards received, each node learns to make the best decision for processing a task.

4. Blind Load-Balancing Algorithm (BLBA)

In this section, a Double--learning-based load-balancing algorithm for proper distribution of the load between the FNs is presented to solve the problems of previous methods. The Double--learning algorithm is used to find the optimal state-action with the least computational cost, which obtains enough information through experience. The model of the Double--learning algorithm is random, and its states are indefinite. In a learning problem, the agent explores the environment and learns to select the optimal action to maximize long-term reward. Hence, the Double--learning algorithm in dynamic environments has many applications for optimization. In addition, it can be an excellent way to evenly distribute tasks between FNs.

The Double--learning algorithm uses two estimation functions instead of one estimation function: and . This way, it uses two -tables for estimates that stored the value of all actions. The difference between the two tables is that when we update the value of one of the tables, we use the maximum value present in the other table. Assume that the action is the most valuable action in the state , according to the value function . We use the value to update . In a similar way, the action is the most valuable action in the state , according to the value function . We use and to update . In the Double--learning algorithm, each time an update is performed, it is decided with equal probability that the value of which table is updated and which table is used to consider the maximum value. In this algorithm, an agent performs an action after receiving the state of the environment and then transitions to the next state and receives a reward from the environment in return. The value function for state and action in the Double--learning algorithm is estimated as follows: where is the learning rate, which balances between new observations and what has been learned. The Double--learning algorithm uses the -greedy policy to maximize long-term value, in which indicates that the next action is randomly selected (with a constant probability of ) or selected from among the best in the table (with probability ). First, the algorithm does not have any information about the network, so it is in the form of greedy exploring the network. Once enough information is obtained from the network, load-balancing is performed optimally. In the -greedy algorithm, if we observe each action infinite times, we can be ensured that converges to the optimal value. Therefore, the FN learns through the Double--learning algorithm to select the most suitable node to assign the task.

By applying load-balancing on FNs, the load is evenly distributed between these nodes. The FN is considered an agent and is busy learning in the network. After the FN receives a new task via the mobile device, it observes the current state of the environment. Then, to maximize the long-term reward, based on the experiences and rewards it has received so far and without any knowledge of the capacity of the other nodes and only according to its own capacity, it decides the task processing. If the FN has enough capacity, it processes the task itself; otherwise, it assigns the task processing to the neighbor FN or cloud. If the task processing delay in the FN itself and the neighbor FN is greater than the processing delay of that task in the cloud, then the FN assigns this task to the cloud for faster processing and reduces the load on other nodes. Figure 2 shows the flowchart of the proposed load-balancing method.

The state space is equal to the capacity of the FN, the forward queue size in the FN, and the number of mobile devices connected to the FN; the action of selecting a FN or cloud to assign the task and the reward is a function to minimizing task processing delay.

5. Performance Evaluation

The performance of the proposed load-balancing problem based on the Double--learning algorithm is evaluated using the iFogSim simulation environment [43]. We ran this program on an Asus computer with an Intel Core i7 processor and 8 GB RAM. The proposed system includes FNs and a variable number of mobile devices. Mobile devices randomly connect to their neighbor FNs and assign their task processing to these nodes. First, in the Double--learning algorithm, all -table values are zero, and the FN has no information about the network. In order to learn, the -greedy method is used, in which the value is initially considered equal to 1, and the algorithm in the form of greedy explores the network. After that, the FN’s confidence increases in estimating -values; its value changes to 0.3.

The amount of reward received is equal to the negative of the task processing delay in one of the FNs or the cloud. In the simulation, it is assumed that mobile devices now send tasks to FNs, and the Double--learning algorithm is executed simultaneously with received tasks by FNs. This will prevent overload in the nodes as much as possible. Each node using the Double--learning algorithm selects a node to assign the task after examining the current state of the environment and receives a reward from the environment in return. Over time, the experience of the FNs from the network increases, and each node learns to assign the task to a low-load node that can process the task faster and receive a reward from the environment in return. However, in other methods, unlike the proposed method, the load-balancing algorithm is executed after creating an overload in the FN. This leads to reduced performance and increases the delay in these systems. The parameters used for system evaluation are given in Table 2.

In the following, the performance of BLBA is compared with SSLB [9], random, and proportional [44] load-balancing methods. In the random method, a node offloads tasks to a randomly picked neighbor. That is, when the FN overloads, it randomly selects a neighbor node and sends its load to it for faster processing. In the proportional method, the capacity information of the neighbors is received and selects the optimal one to offload a task. In the SSLB method, after a FN is overloaded, it compares the capacity of the other neighbor nodes and sends the task to the node that has at least 40% of its capacity empty and has the highest capacity.

In this section, we first consider the number of FNs as 4. Then, we increase the number of nodes to 10 and check the performance of the proposed algorithm in both conditions. Figure 3 shows the increase in cumulative reward at each time iteration of the proposed algorithm. In this paper, the reward is equal to the negative of the processing delay of the task assigned to the FN. Given that each task is processed by which node, reducing the processing delay of a task increases the reward received for processing that task. Increasing the number of tasks assigned to nodes leads to reducing cumulative rewards. Because less processing capacity is allocated to each task, as a result, processing each of them has more delay. As shown in this figure, the assigned decision of the task based on the Double--learning algorithm, with the suitable distribution of tasks between nodes, has gradually increased the cumulative reward.

It is expected that by using the Double--learning algorithm, the load-balancing performance in the network will be significantly improved. As shown in Figure 4, the SSLB method has partly reduced the average processing time than the random and proportional methods. However, the assignment of tasks based on the Double--learning algorithm, with the proper distribution of tasks among FNs, has enabled the nodes to process the tasks faster, and as can be seen, the average processing time in the BLBA than compared methods has improved dramatically.

In Figure 5, the run time of all the tasks that enter the system during the simulation time is compared in the BLBA with other methods. As shown in this figure, the proposed BLBA compared to other methods has significantly reduced the run time of these tasks. This makes the proposed system perform better than other compared methods.

Then, in Figure 6, the total delay in all four methods is reviewed. Total delay is obtained through the average processing time of input tasks from the first to the last iteration of the algorithm implementation. As can be seen, the SSLB method has a lower total delay than the random and proportional methods. However, the proposed BLBA achieves less total delay than the other three methods, which means that the proposed method works better.

Finally, the standard deviation of load on nodes is compared in all four methods. According to Figure 7, at first, in the SSLB method, the standard deviation of load on nodes is less than in other methods. However, with increasing agent learning, the standard deviation of load on nodes in the proposed BLBA is significantly reduced. This indicates that in the proposed BLBA, the tasks are evenly distributed in the network, and the overload and underload possibility in the nodes is reduced. In addition, in this paper, the aim is to improve the load-balancing and reduce the delay, which in the proposed method both the delay and the load-balancing are optimized. In other methods, load-balancing may be improved, but that does not mean that delay is minimized.

Then, we consider the number of nodes as 10 and check the performance of these algorithms in these conditions. As shown in Figure 8, as the number of FNs and mobile devices increases, the proposed algorithm spends more time learning. However, even in this situation, the assignment of tasks based on the Double--learning algorithm, with proper distribution of tasks among the nodes, has enabled the nodes to process tasks faster, and as can be seen, the average processing time of tasks in the proposed method is still improved over other methods.

In Figure 9, we can see that as the number of nodes increases, although the BLBA algorithm spends more time learning, finally the run time of all tasks that enter the system during the simulation time, the load-balancing method based on the Double--learning algorithm in these conditions is also significantly reduced compared to other methods. Then, the standard deviation of load on nodes in all four methods is compared for a situation where the number of nodes is 10. According to Figure 10, with increasing agent learning, the standard deviation of load on nodes in the proposed method is significantly reduced. This indicates that in the proposed method, the tasks are evenly distributed among the nodes.

Finally, Figure 11 shows the approximate number of iterations for convergence to the optimal run time per number of different nodes. As you can see, as the number of FNs increases, the number of possible actions increases, and thus, the response space becomes larger. Therefore, the time required to learn and converge to optimal policy also increases. However, although the number of nodes increases, in all implementations, the proposed algorithm eventually converges to the minimum point.

The results show that when a node decides to assign the load using the Double--learning algorithm, it only considers the forward queue state, capacity, and the number of mobile devices connected to itself, and no information from other nodes. From the above evaluation, we conclude that the proposed BLBA is more stable than other load-balancing methods and significantly reduces network delay and response time. In addition, as the number of FNs increases, although the nodes spend more time learning, the results of the proposed method are better than the other methods over time, and we can be sure that the algorithm works well in any situation.

6. Conclusion and Future Work

The purpose of this paper is to provide a method to improve load-balancing in FNs. In this paper, the Double--learning algorithm is used for load-balancing in the fog environment. The Double--learning algorithm achieves an optimal policy using the experience that the agent gains from interacting with the environment. In the proposed BLBA, each FN as the agent explores the fog environment and seeks to find a low-load node for assigning tasks, to minimize processing time and the overload possibility. In this paper, the system state is defined only based on the state of the decision-maker FN, and the decision-making is done without any knowledge of the state of neighbor FNs. The BLBA has been tested for a different number of FNs and mobile devices within the network and has had good efficiency. The simulation results show that our proposed method significantly reduces processing time and response time than existing methods. According to the network structure, the utilization of the Double--learning algorithm in any IoT device to further improve load-balancing and reduce delay is one of the future research directions that this paper opens for researchers. In addition, in the future, we intend to examine the performance of the proposed load-balancing algorithm in mobile edge computing.

Data Availability

The data used to support the findings of this article can be accessed by request.

Disclosure

A preliminary version of this manuscript has been published in the proceeding of 2021 11th International Conference on Computer and Knowledge Engineering (ICCKE), https://ieeexplore.ieee.org/document/9721449.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

(i) Niloofar Tahmasebi Pouya worked on validation, formal analysis, software, data curation, and writing—original draft. (ii) Seyedakbar Mostafavi worked on methodology, proofing—original draft, and project administration. (iii) Mehdi Agha Sarram worked on review and editing, data curation, and supervision.