Abstract
To solve the dynamic and real-time problem of multirobot task allocation in intelligent warehouse system under parts-to-picker mode, this paper presents a combined solution based on adaptive task pool strategy and Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES) algorithm. In the first stage of the solution, a variable task pool is used to store dynamically added tasks, which can dynamically divide continuous and large-scale task allocation problems into small-scale subproblems to solve them to meet dynamic requirements. And an adaptive control strategy is used to automatically adjust the total number of tasks in the task pool to achieve a trade-off among throughput, energy consumption, and waiting time, which has better adaptability than manually adjusting the size of the task pool. In the second stage of the solution, when the task pool is full, tasks in the task pool will be assigned to robots. For the task allocation problem, this paper regards it as an optimization problem and uses the CMA-ES algorithm to find the optimal task assignment solution for all the robots. By comparing with fixed threshold method under 56 different task pool sizes, the experimental results show that the throughput can be close to reaching the optimal level, and the average distance traveled by robots to handle each unit is lower using adaptive threshold method; so, adaptive task pool solution has better adaptability and can find the optimal task pool size by itself. This method can satisfy the dynamic and real-time requirements and can be effectively applied to the intelligent warehouse system.
1. Introduction
In recent years, the orders of various e-commerce platforms have soared, and the scale of distribution centers has become increasingly large, which has brought great challenges to the traditional logistics industry [1]. In the traditional warehouse, 60% to 70% of the workers’ time is spent on picking up goods [2], and the efficiency is extremely low. Therefore, more and more automatic machines and equipment have been applied in the field of warehouse [2]. Many companies have started to adopt a new kind of parts-to-picker intelligent warehouse system, such as Kiva system [3]. In the system as shown in Figure 1, robots transport the shelves from storage areas to workstations, and workers need to wait at the stations. When the shelves reach the workstations, they take the needed goods from the shelves or store bundles into the shelves. It has been proved that this kind of the intelligent warehouse system greatly saves labor cost and improves the efficiency of warehouse operation [4].

Cooperative control of multiple mobile robots is the key to realize intelligent warehousing. In a warehouse as shown in Figure 1, there are often numerous tasks such as replenishment and picking, as well as numerous robots to perform these tasks. In addition, the costs of different robots to perform a task are also different. Therefore, the efficiency of the warehouse is determined by selecting suitable robots to perform specific tasks. This is a typical multirobot task allocation (MRTA) problem [5]. With the operation of the warehouse, tasks and the warehouse environment will constantly change. How to find a better task allocation scheme for pick-task and replenishment-task assignment in such a highly dynamic environment [3, 4] is the focus of this paper.
MRTA is one of the most challenging problems in the multirobot system [6]. Market-based methods are the most studied methods at present, such as the single-task auction algorithm proposed in ref. [7]. In order to solve the problem that the single-task auction algorithm is difficult to get the optimal solution, a combined auction algorithm which considers the correlation between tasks was proposed in ref. [8]. When the number of robots and tasks is small, MRTA can be regarded as a zero-one integer linear programming problem and solved by simplex method, branch and bound method, Hungarian algorithm [9], etc. For example, the Hungarian algorithm was adopted in ref. [10] to solve the role assignment problem in robot soccer game. There are also some thresholding based methods such as ALLIANCE [11] and Broadcast of Local Eligibility (BLE) [12], which have good real-time, fault tolerance, and robustness, but usually only local optimal solution can be obtained. For large-scale problems, the heuristic algorithm can effectively reduce solution space and improve search efficiency. For example, in ref. [13], the heuristic algorithm was adopted to solve the task assignment problem in multi-core processor. Evolutionary algorithms are mature global optimization methods with high robustness and wide applicability, which can effectively deal with complex problems that are difficult to be solved by traditional optimization algorithms. Various evolutionary algorithms such as genetic algorithm and simulated annealing algorithm have been widely used in MRTA problem. In ref. [14], the genetic algorithm was used to solve the time-extended multirobot task allocation problem in the case of disaster. A hybrid genetic and ant colony algorithm was proposed in ref. [15] to improve the solving accuracy of the genetic algorithm. In ref. [16], the genetic algorithm was used to solve MRTA problem in the intelligent warehouse. Ref. [17] designed an improved quantum evolutionary algorithm based on the niche coevolution strategy and enhanced particle swarm optimization (IPOQEA) to solve the airport gate allocation problem. In ref. [18], an improved quantum-inspired cooperative coevolution algorithm with multistrategy is used to solve the knapsack problem and the actual airport gate allocation problem. Refs. [17–20] use the cooperative coevolution framework to divide the complex optimization problem into several subproblems, and these subproblems were solved by independent searching in order to improve the solution efficiency. Similarly, the situation where the number of tasks is variable in an intelligent warehouse can be studied using the idea of divide-and-conquer in Refs. [17–20].
Therefore, we use a task pool to store dynamically added tasks and propose an adaptive control strategy to automatically adjust the task pool size according to the current environment. When the task pool is full, the tasks in the pool will be assigned to the robots. Then, the task allocation problem is regarded as an optimization problem and solved by the CMA-ES algorithm [21].
2. Problem Formulation
The intelligent warehouse system consists of many movable shelves and robots as well as some workstations. The robots transport the needed shelves from the storage area to the workstations, and the workers can complete the replenishment and picking without moving. A typical intelligent warehouse layout (a screenshot from the open source software RAWSim-O [22]) is shown in Figure 2. In the figure, the four squares on the left represent the replenishment station, and the replenished bundles are temporarily stored here waiting for shelves. The four squares on the right represent picking stations. After receiving orders, the system will use a special algorithm to assign orders to different stations. There will be an upper limit on the number of orders in the stations [23]. The squares in the middle area are the shelves, in which the goods in the warehouse are stored. Shelves can be lifted and moved by robots. The circles in the figure are robots. A robot can carry a shelf to move. When a robot does not carry a shelf, it can move freely under the shelf.

In order to facilitate problem analysis, we make the following assumptions: (1)Robots are all isomorphic and travel at exactly the same speed. They can only move forward, backward, left, and right.(2)The time for a robot to lift a shelf and stay at a workstation is very short, which can be ignored.(3)Every robot carries the required shelf and travels from the position of the shelf to the designated station and then carries the shelf back to its original location.
The shelf selection algorithm will select shelves for each workstation according to requirements. The selected shelves need to be transported from the shelf storage area to the appropriate station for picking up or replenishing goods, and then they are transported back to the original position, which is the task of the robots. If a robot is not assigned a task, it will move to a special resting area for rest. How to reasonably assign tasks to robots is the problem to be studied in this paper.
Referring to ref. [16], suppose that there are tasks (refers to all tasks from the beginning to the end of the warehouse operation) and robots in the warehouse, the set of tasks is , and the set of robots is . The set of tasks assigned to robot is , which is a subset of . and . Let and is ordered, and then the sequence of tasks to be completed by the robot is . The cost of robot to complete its task sequence can be expressed as where represents the cost of the robot to complete all tasks. Since all robots travel at the same speed, the cost can be expressed as the distance traveled by the robot. The robot can only move forward, backward, left, and right; so, the distance traveled between the two points can be expressed as Manhattan distance.
represents the cost for the robot to get from the initial position to the position of required shelf for the first task . Let the initial coordinate of the robot be and the coordinate of the required shelf for the first task be , and then
represents the cost for the robot to complete task , which is only related to task itself. It can be represented by the distance that after the robot carries the required shelf, it travels from the position of the required shelf for the task to the designated station and then returns to the shelf’s original position from the station. Let the coordinate of required shelf for task be and the coordinate of target station be , and then
represents the cost for the robot to reach the starting position of the next task after completing task . Since the robot needs to transport the shelf back to the original position after completing task , it can be directly represented by the Manhattan distance from the position of required shelf for task to the position of required shelf for task . Let the coordinate of required shelf for task be and the coordinate of required shelf for task be , and then
In order to make the overall allocation scheme as optimal as possible, we consider the following two optimization objectives: (1)The maximum time taken by all robots to complete all tasks ()(2)The mean distance traveled by all robots ()
where
describes the efficiency of the robots to complete tasks. The smaller is, the less time the robots take to complete all tasks, and the higher the efficiency is. describes the power consumption of the multirobot system. The smaller is, the shorter the total travel distance of all robots is, and the lower the power consumption is. The goal of the method studied in this paper is to reasonably assign all tasks in the system to all robots so that these two values can be as small as possible.
3. Method
3.1. Architecture
With the entry of new orders, new tasks are constantly generated and must be completed as soon as possible; so, the warehouse system is a highly dynamic and real-time system. In such a highly dynamic system, it is difficult to find the global optimal solution; so, the problem is divided into many subproblems. Specifically, we created a task pool . When a new task is generated, it is immediately added to . When the number of tasks in the task pool reaches the threshold value (automatic adjustment of the threshold will be described in Section 3.3), the CMA-ES method in Section 3.2 is used to allocate the tasks in the task pool to robots. The robots insert the new task sequence allocated into the rear of the previous unfinished task sequence, and then the task pool is emptied. The robots execute tasks according to their own task sequence, and the executed tasks are deleted from the sequence. As the new tasks are generated again, the tasks are added to again. Loop until the warehouse stops running. In Figure 3, the specific steps are as follows:
Step 1. Initialize the task pool size and set the task pool P to be empty. For all robots, initialize task sequence of every robot .
Step 2. The threshold of the task pool size is automatically adjusted using adaptive control strategy in Section 3.3.
Step 3. New tasks are constantly added to . Jump to step 4 when the number of tasks in the task pool reaches the threshold.
Step 4. The tasks in the task pool are assigned to the robots using the CMA-ES method in Section 3.2, and for all robots, the new task sequence assigned to robot is inserted at the end of the current task sequence .
Step 5. Clear the task pool and jump to step 2.

The above solution in Figure 3 is executed by the central controller, and the robot only needs to execute the tasks according to the assigned task sequence. The parallel operation of the two parts enables the robots to be busy all the time, which saves time and meets the requirement of real-time storage system.
3.2. CMA-ES Algorithm
As mentioned in Section 3.1, tasks are assigned to robots when the number of tasks in the task pool reaches the threshold. This problem is regarded as an optimization problem in a static environment. This is a NP-hard problem, and the CMA-ES algorithm is used to find the optimal solution. The successful application in many fields [24–26] proves that the CMA-ES algorithm is a good search algorithm.
3.2.1. Representation of Solutions
Referring to ref. [27], for the task allocation problem with tasks and robots, a candidate to represent a task assignment scheme is . contains real numbers, and for each real number , it satisfies , where means task is performed by robot , and means the integer of real number . If , this means that the task and are both assigned to the same robot, and the task represented by the smaller number between and is executed first. If , the execution order of these two tasks is determined randomly.
For example, there are 8 tasks (represented by numbers 1, 2, 3,..., 8) and 3 robots (represented by numbers 1, 2, 3), and an individual [1.7, 3.8, 2.2, 1.3, 2.8, 1.5, 3.3, 3.7] is generated. Then, the task sequence assigned to robot 1 is . The task sequence assigned to robot 2 is . The task sequence assigned to robot 3 is .
3.2.2. Fitness Function
Fitness function is used to evaluate candidates. For the CMA-ES algorithm, individuals with lower fitness value are more excellent. In Section 2, two optimization goals are proposed for the whole system: one is the time for the robots to complete all tasks; the second is the mean driving distance of all robots. Each planning can be regarded as a subproblem of the whole. For each subproblem, in order to achieve the optimal overall performance, these two goals are still considered; so, fitness function is calculated through the following equation [16]: where is a constant that can be adjusted according to the actual demand. If more attention is paid to the completion time of a single order, can be increased. If more attention is paid to the energy consumption of all robots, can be reduced. is the cost of robot to execute the tasks in the current task sequence first and then execute the tasks according to the candidate. is the maximum time taken by the robots. is the mean distance traveled by all robots. In the current moment, there may be unfinished tasks in the task sequence. The robot must first complete these tasks before performing the tasks assigned at the current moment. Therefore, for , we divide it into two parts to calculate: where is the cost for the robot to complete the tasks in the current task sequence, and is the cost for the robot to execute the tasks according to the candidate. and are represented by the distance traveled by the robot and calculated using the method described in Equation (1).
With this fitness function, we try to find the optimal solution at that moment in each optimization and try to approximate the global optimal solution by this method.
3.3. Automatic Adjustment of Task Pool
When the number of tasks in the task pool reaches the threshold, the tasks in the task pool will be assigned to the robots. The threshold plays a decisive role in the efficiency of assignment. The larger the threshold is, the more tasks will be involved in the optimization, and then the more the planned scheme will be close to the global optimal solution. If an optimization contains all the tasks in the system, the optimal solution found by the optimization will be the optimal solution of the whole system. But orders in the warehouse are added dynamically over time, so tasks are also generated dynamically. As the threshold increases, the time required for the task pool to be filled will also increase, and this situation will occur: the robot has finished all the tasks assigned to it, but the number of tasks in the task pool has not reached the threshold; so, the next optimization cannot start, and the robot can only wait. This leads to a waste of time and cannot meet the real-time of the warehouse system. Moreover, because each workstation has an order capacity limit, there is also an upper limit on the total number of tasks in the system, and if the task pool size exceeds this upper limit, the number of tasks in the task pool will never reach the threshold, and the system will be stagnant. Therefore, it is very important to set a threshold of appropriate size.
Obviously, for different warehouses, the threshold should be set differently depending on the actual situation. Even for the same warehouse, the number of robots may be adjusted, and the rate of order generation may vary at different times; so, it is not appropriate to set the threshold to a fixed value. Therefore, we design an adaptive control strategy to dynamically adjust the task pool, as shown in Algorithm 1.
|
First, the setting of the initial threshold is important, which determines the speed of finding the optimal threshold. We believe that the size of the initial threshold should be related to the number of robots and the upper limit number of tasks in the warehouse. The upper limit number of tasks in the warehouse is related to the number of workstations and the capacity of each workstation. So, we propose the following heuristic formula to calculate the initial threshold: where is a constant representing the average number of tasks per workstation in unit time, which is set according to the actual situation. is the number of stations, and is the number of robots. We set a time interval (It is a constant that can be set according to actual requirements), and every seconds, the threshold is adjusted (line 1). is used to record the last adjustment. We counted the total number of tasks completed by the robot from the last adjusted moment to the current moment, and the total number of tasks completed from the penultimate adjusted moment to the last adjusted moment, expressed by and , respectively. If is 0, indicating that the threshold has been set so high that the number of tasks has not reached the threshold, then simply cut the threshold in half and set to (line 2, line 3, and line 4). If is greater than or equal to , it indicates that the last adjustment has had a positive effect on the system, and the same adjustment will be performed (line 5 and line 6). If is less than , it indicates that the last adjustment had a negative effect on the system, and the reverse adjustment will be performed (line 7 and line 8). In addition, will be reversed (line 9).
4. Experiments
We used RAWSim-O [22], an open source framework developed by Merschformann et al., as the experimental platform. RAWSim-O is a simulation framework that simulates the operation of an intelligent warehouse system and allows us to test our own methods.
We used the warehouse layout shown in Figure 2. In the warehouse layout, there are 32 robots and 550 shelves. The storage positions of the shelves are at the middle area of the layout. And there are four replenishment stations on the left and four picking stations on the right. To simplify the problem, we set the duration of a robot staying at a workstation to a very small value of 0.1.
For the assessment of performance we take the sum of SKUs (stock keeping unit) in both item bundles stored at the replenishment stations and orders picked at the picking stations as handled units. This represents the throughput of the warehouse, and the higher the better. We also look at the average distance traveled by robots to handle each unit. This can represent the power consumption of the multirobot system.
In order to test the impact of task pool threshold size on the allocation effect, we did 56 experiments, each experiment corresponding to different pool sizes. Each experiment was simulated for 24 hours with 10 repetitions.
Under different task pool sizes, the number of units handled by robots is shown in the blue solid line in Figure 4, and the average distance traveled by robots to handle each unit is shown in the blue solid line in Figure 5. The comparison results among different fixed threshold on handled units and travel distance per unit are shown in Table 1. The maximum number of handled units is 207583 when the fixed threshold is set to 18. The minimum number of travel distance per unit is 10.73 when the fixed threshold is set to 36, 45, or 47. According to Figures 4 and 5 and Table 1, it is not good to set the threshold too large or too small, which is consistent with our conjecture. If the threshold is set too small, the solution will be too far away from the global optimal solution; therefore, the number of handled units is small, and the travel distance per unit is large. If the threshold is set too large, the solution will be closer to the global optimal solution; so, the travel distance per unit is small, but the robot will have a long waiting time; therefore, the number of handled units will be small.


To sum up, a bad threshold can be very inefficient; so, setting the threshold manually is very risky. Therefore, a method of automatically adjusting threshold is necessary. We used the adaptive control strategy proposed by ourselves to conduct the experiment again, and all conditions were identical except the threshold. According to the workstation capacity, in Equation (8) was set to ; so, the initial threshold was calculated as 32. The results are shown in Table 1. We compared the results with the fixed threshold approach, as shown in Figures 4 and 5. The red dotted line is the adaptive threshold method, and the blue solid line is the fixed threshold method. Compared with fixed threshold 18, the adaptive threshold method gets worse result in handled units but better result in travel distance per unit. Compared with fixed threshold 36, 45, and 47, the adaptive threshold method gets better result in handled units but worse result in travel distance per unit. Taken together, it can be seen from the two figures that the adaptive threshold method can be close to reaching the level when the threshold is set to the optimal in both indexes. The experimental results show that the proposed adaptive control strategy has good application effect.
5. Conclusion
In order to solve the dynamic and real-time problem of multirobot task allocation in the intelligent warehouse system, a combined solution based on adaptive task pool strategy and CMA-ES algorithm is proposed in the paper. In the early stage of the solution, the divide-to-conquer idea is used to design a variable task pool that is used to store dynamically added tasks. The variable task pool is designed to dynamically divide continuous and large-scale task allocation problems into small-scale subproblems to solve them to meet dynamic requirements. And an adaptive control strategy is used to automatically adjust the threshold of the task pool size in real time to achieve a trade-off among throughtput, energy consumption, and waiting time, which has better adaptability than manually adjusting the size of the task pool. In the later stage of the solution, when the task pool is full, tasks in the task pool will be assigned to robots using the CMA-ES algorithm to find the optimal task assignment solution for all the robots according to the fitness function including the maximum time and the mean travel distance required by all robots to complete all the tasks. By comparing with fixed threshold method under 56 different task pool sizes, the experimental results show that the handled units can be close to reaching the optimal level, and the average travel distance per unit is lower using adaptive threshold method; so, adaptive threshold solution indeed has better adaptability. This method can satisfy the dynamic and real-time requirements and can be effectively applied to the intelligent warehouse system.
However, because of the complexity and dynamics of the warehouse environment, it may not be accurate to measure the cost by Manhattan distance. Therefore, how to introduce accurate robot motion model to evaluate the cost will be the next work. Furthermore, the relationships among handled units, travel distance per unit, the maximum time taken by all robots to complete all tasks, and the mean distance traveled by all robots need further study. In addition, the effect of communication quality on allocation is not taken into account and will be deeply studied.
Data Availability
The data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.
Acknowledgments
This work is supported by the Science and Technology Program of Yantai, China (Grant No. 2019XDHZ085), Major Basic Research Project of Natural Science Foundation of Shandong Province, China (Grant No. ZR2018ZC0438), National Natural Science Foundation of China (Grant No. 61673200), and Laboratory of Robotics in Ludong University.