Abstract

With advent of the postepidemic era, the development of digital logistics operations management is imminent. Among the various logistics delivery methods, same-city delivery is chosen by the vast majority of customers for its timeliness and safety. Online ordering and delivery methods for same-city delivery are also gaining increasing attention from enterprises which need to know the inventory balance of all same-city warehouses in time for early deployment and response. However, in practice, the inventory balance of each warehouse can be affected by other warehouses in the same city, and there is often a lack of data in the inventory management system due to equipment and other issues resulting in a poor response from the company to handle emergencies. To address these issues, an improved matrix decomposition model was designed to interpolate the missing data by taking into account the spatiotemporal correlation between warehouses. The L-curve criterion was used to select hyperparameter values, the spatiotemporal regularize was used to capture the time dependence of the time series, and the model performance was evaluated using root mean square error and mean absolute percentage error. Comparisons with classical interpolation techniques were made to validate the improved performance of the proposed method.

1. Introduction

The emergence of the meta-universe concept in recent years has promoted digital development. With the advent of digital logistics, customers are increasingly demanding from platform-based e-commerce companies [1]. Due to many physical warehouses of platform-based e-commerce enterprises, orders are fragmented, statistics are untimely, equipment errors are unable to be transmitted in real time, and other realities, which make it difficult for companies to know the inventory balance of the same-city physical warehouses in time. It can easily lead to problems such as low sensitivity of inventory scheduling and the inability of companies to respond in time after customers adjust their delivery time. Integration of same-city physical warehouses data with enterprise systems is urgent [2]. Completeness of the same-city physical warehouse data has a direct impact on the integration performance. Interpolation missing data in same-city physical warehouses is of importance [3]. However, current research methods for the interpolation of missing data in logistics are less developed than in other fields such as transportation [4], meteorology [5], and so on. The main reason is the redundancy of logistics data and is influenced by synergies across regions. The current approach would result in an additional computational burden on the model. Regularized decomposition as a fast measure to deal with missing data by using different regular term functions to deal with low-rank matrix decomposition problems and have being decreased the time cost dramatically. The flexible use of regularized decomposition clearly achieves a better performance [6]. To improve the interpolation performance, researchers at home and abroad have carried out research on regularized data interpolation models.

In the logistics field, new technology, new markets, new business models, and new customer expectations cause the logistics industry to face immense change, which brings both risk and opportunities. Big data technology makes the logistics transportation process easier to manage, so that goods can be quickly transported to the designated location, to ensure the efficiency of logistics transportation [7].

For example, to explore time-series analysis of the dynamic variables and their interrelated influence in both the short and long run on the relationship between modern logistic industry and economic growth, it implies that logistic industrial development is comparatively quicker in the geographical areas where economic growth is higher than those areas where economic growth is low [8]. Logistics 4.0 details the strategic direction of future logistics in the industrial sector [9]. For logistics inventory, BP neural networks can be used to explore the complex relationship between inventory demand and various influencing factors to obtain effective measures for inventory control [10]. Long Short-Term Memory (LSTM), on the other hand, as a variant of RNN, has been applied in the field of logistics data processing because it is ideally suited for problems that are highly correlated with time series. For example, LSTM is used to build a prediction model for stock price forecasting in the logistics industry [11]. To improve the overall efficiency of the logistics system and the level of customer service proposes the coordinated development of urban intelligent transportation data system and supply chain management. Aims to study the importance and advantages of the coordinated development of urban intelligent transportation data system and supply chain management [12]. Unfortunately, there are few of these analyses for missing data in logistics.

In machine learning, different issues arise during the statistical or data-driven control of interpolation methods, which result in discrepancies between the interpolation and the true values. Most popular imputation methods rely on discriminative models such as decision trees, regression models, and neighborhood-based methods, which are relatively interpretable. Unfortunately, systematically missing data may yield biased sample statistics, which causes these methods to generalize poorly. To address this limitation, different imputation methods turn to generative models to improve imputation accuracy, the structural assumptions encoded in the models can lead to better generalization. A multilayer perceptron-multivariate imputation of chain equation regression imputation method optimized by the limit-memory-BFGS algorithm is proposed, considering the temporal and spatial characteristics of traffic volume [13]. The vehicle dispatching system is one of the most critical problems in online ride-hailing platforms, which require adapting the operation and management strategy to the dynamics of demand and supply. In this paper, we propose a single-agent deep reinforcement learning approach for the vehicle-dispatching problem called deep dispatching, by reallocating vacant vehicles to regions with a large demand gap in advance [14]. Bayesian interpolation model is proposed to characterize the data generation process and to learn the underlying statistical patterns in the traffic data [15]. Considering the spatiotemporal nature of IoT data and the uncertainty of the data collected by sensors, Bayesian maximum entropy (BME) which to impute missing values utilizing as a convenient means to estimate the missing data from IoT applications [16]. On the other hand, state-of-the-art imputation methods are based on graph Laplacian (GL) transform for a semisupervised feature extraction; the scatter matrix fused features update the output weights in real-time online [17]. Training on limit learning machines, with the ability to extract features, realize the multivariate real-time prediction of the time series and complements the model according to the general matrix and low-rank matrix decomposition by the regularization matrix factorization (MF). The decomposed matrix factors are weighted by a common symmetric matrix new matrix completion model and regularization weighted functions are obtained, and then the optimal completion matrix for data completion is obtained [18].

The model software implementation, Spss, uses mean fill which is not suitable for nonrandom missing datasets. Amos uses the original highest likelihood method which is suitable for randomly missing datasets. Solas uses multiple fill regression which is not suitable for nonrandom missing datasets. Matlab is suitable for fast running and needs to display the results step by step. Python is suitable for cases where the data package can be called directly.

These studies have highlighted that most of the models have a strong reliance on mathematics, and there is little research on models for interpolation of missing datasets in logistics. Modeling errors due to different data types can have an impact on the actual results. To addresses problem of large-scale fine-grained traffic state prediction, a deep learning architecture is proposed to handle the challenge of data scale, granularity, and sparsity. Based on the domain knowledge of traffic engineering, the propagation of traffic state is analyzed, and four modules are specially designed and elaborated in the aspects of the temporal propagation of traffic state, spatial propagation of traffic state, etc. [19]. To achieve higher performance for interpolation missing data in logistics, this paper designs an improved LSTM and Graph Laplacian Regularized Matrix Factorization (LSTM-GL-ReMF) system [20]. The L-curve, the temporal regularizer LSTM, and the spatial regularizer GL are combined to build the entity interpolation model for the inventory balance dataset of logistics entities warehouses in each region of the same city. The L-curve is used to select hyperparameters to take values [21]. The temporal regularizer, LSTM, is to capture temporal dependencies in time-series data and the spatial regularizer, GL, is to capture spatial correlations between different warehouses, solving the problem that interpolation results considering only individual warehouses are far from reality by considering spatiotemporal correlation. The model is evaluated using two evaluation metrics, RMSE and MAPE, and compared with other state-of-the-art decomposition models to prove the superior interpolation performance of the improved LSTM-GL-ReMF model in the field of missing logistics data. The main contributions of this paper are summarized as follows:(i)The L-curve criterion is used to select the values of the regularized hyperparameters and to increase the interpretability of the hyperparameter values.(ii)Fusing the L-curve criterion with the LSTM and GL algorithm to construct the model.(iii)Apply the improved model to the domain of logistics operations management.

2. Fundamentals

This section presents the basic theory and modeling principles of this thesis. Includes the definition of operations, matrix decomposition steps, and LSTM-GL-ReMF model theory, where matrix operations are the basis of matrix decomposition and the improved models are optimized on the basis of the basic principles.

2.1. Matrix Decomposition

Definition 1. Given two matrices and , the Kronecker product is as follows:

Definition 2. Given two matrices and which have the same number of columns, the KR product is as follows [22]: The matrix decomposition steps (Algorithm 1) are as follows:
There are three matrices which are trained using either random gradients or batch gradient and combined with a gradient descent method with repeated sampling. The specific formula is as follows: first calculate the minimum differential as shown in equations (3) and (4), where the potential characteristic is K and the minimum value of the difference is e [23]:Get iterative matrix elements and :Finally, set the threshold value and reduce the error by the error function.

Input Scoring Matrix
Step 01: Using a mechanism to put-back to draw some data from the initial sample
Step 02: Based on the above data, optimize the objective function and update the parameters
Step 03: Duplicate data extraction
Output User matrix and Item matrix
2.2. LSTM-GL-ReMF

In the LSTM-GL-ReMF, its temporal regularizer introduces the recurrent neural network LSTM with extremely strong learning performance to discover hidden correlations in the samples. Hochreiter and Schmidhuber proposed the LSTM which could overcome long-term dependencies and determine the best time window automatically [24]. Calculate the hidden layer sequence (6) and output sequence (7) by equations to :where the activation function is , the given sequence is x, the weight matrix is , the bias vector is and the moment is t [25]:where the input gate is i, the forgetting gate is , the cell state is c, and the output gate is O. The sigmoid activation function is and the tangent hyperbolic activation function is tan . Spatial regularizers are based on GL spatial regularization [26] and this can be able to combine complex spatial and temporal dependencies into the decomposition to increase the accuracy of the interpolation performance. Data smoothing is measured using :where is used to denote the gradient of along the stream shape M. Define the edge weight matrix as follows:where denotes the set of nearest neighbors of . Subsequently, the discrete approximation is calculated as follows:

Of these, Graph Laplacian is L and the mapping function is . Then the objective function is as follows:

Of these, matrix trajectory is , regularization parameter is [27].

3. Model Development and Refinement

This section describes the improvements made to the model. The improved model is more applicable to the type of dataset presented in this paper.

3.1. Problem Definition

With the development of e-commerce, city–city distribution has been favored by more users, such as booking delivery in city–city delivery. However, due to various external factors, customers’ predetermined time is easy to change. At this time, enterprises need to have an overall control of the same-city inventory, timely, and reasonable scheduling. At present, when most enterprises use Warehouse Management System (WMS) system for data integration, there is often some missing data, resulting in enterprises being unable to control the overall inventory in the same city. Based on the same-city warehouse inventory in its time and space of the close connection, a modified LSTM-GL-ReMF algorithm was designed to interpolate the missing data.

3.2. L-Curve Guidelines

Lemma 1. Operator equations on Banach spaces X, Y.Of these, and , K is a bounded linear tight operator from X to Y, y is the measurement data item, is the test data after being perturbed by the error, error level .

Lemma 2. In the plane, all points form a monotonically decreasing L-curve with the point of greatest curvature being the corner of the L-curve [28]:Of these, and is a family of consistent-bounded operators. The curvature function of the L-curve with as a parameter [29]:The triple spline interpolation method can be used to approximate the corner angles and thus select the regularization parameters. The main idea is to form a better approximation to the L-curve by gradually increasing the number of nodes and to use the parameter corresponding to the point of maximum curvature on the final strip as the regularization parameter. The steps are as follows [30]: Step 1: given the initial regularization parameter, and its corresponding .Step 2: interpolation of three splines about for and , respectively, note that the interpolation functions are and .Step 3: calculate the point on the curve with the greatest curvature and let it be , note that .Step 4: solve the regularization problem corresponding to and note that the regularization solution is . From the regularization solution, we can obtain the new point pair , of these .Step 5: add the new point pairs and to the previous point pairs and parameters.Step 6: repeat steps 2 to 5 for the point pairs and parameters obtained in step 5 until convergence.

3.3. Model Construction

The LSTM is applied to the time dimension of the training set, the GL is applied to the spatial dimension, and the L-curve criterion is introduced to take values for the regularization hyperparameters. Based on this, an improved model is used to interpolate the data for the test set of time series. The schematic diagram of the improved model is shown in Figure 1.

As shown in Figure 1, the powerful learning capability of the LSTM is used to discover concealed correlations in the data. The GL space regularizer computes discrete approximations from the nearest neighbor graph of the scatter of data points, determination of the corner angle of the L-curve by means of cubic spline interpolation and hence the value of the regularization hyperparameter.

The regularizer is able to combine complex spatial and temporal dependencies into the decomposition process by setting up an error calculator, embedding the current time and interpolating the missing data in the current observation matrix. The complete time-series dataset is used as the training set to train the model to access the model parameters, and then the dataset containing the missing data is used as input to the model. The data interpolation result is the interpolation result considering the temporal spatial correlation.

4. Simulation Analysis

This section presents the experiments conducted in order to evaluate the algorithm proposed in this work.

4.1. Simulation Preparation
4.1.1. Dataset

The training set selected for this thesis is obtained by collation in the WMS intelligent warehouse management system. The software in the WMS intelligent warehouse management system refers to the software part that supports the operation of the whole system, including pick operation, shelf management, receive processing, replenishment management, matrix charging, platform management, warehouse operation, cycle counting, overstock operation, RF operation, and process management, [31]. The WMS process is shown in Figure 2.

The data are in the form of . The number 4 denotes four warehouses and 744 denotes a record of stock balances for each hour of the day for a month. Four warehouses are located in different areas of the chosen city. The four warehouses are in a circular configuration, three of which are closer to each other and on one side of the ring, the remaining warehouse is on the other side of the ring structure. The test set is one month’s stock balance data for certain good in a warehouse extracted directly from the WMS intelligent warehouse management system.

4.1.2. Evaluation Indicators

The commonly used evaluation metric in machine learning regression problems is the RMSE, which is more intuitive as a model evaluation metric in terms of order of magnitude. In this paper, there are no zeros in the original data but to prevent the existence of zero in the interpolated values, MAPE is also chosen as the evaluation metric [32].

The MAPE was chosen, the number of measurements was n, and the number of measurement set scores was . The specific formula was as follows:

The RMSE was also chosen with the following equation [33]:

4.2. Simulation Studies

To validate the performance of the proposed improved model, a dataset of one month’s inventory balance of certain good in a warehouse was selected and set up in an Intel i5-7200u cpu @2.70 GHz processor and Jupyter Notebook environment. Python3 was used for the experiments. The values of some of the parameters are shown in the table for each parameter in Table 1.

To verify the effectiveness and superiority of the improved model, simulations are conducted and the simulation results are quantitatively profiled in this paper. The simulation results are shown in Figure 3 for the improved LSTM-GL-ReMF model interpolation.

The interpolation results of the improved LSTM-GL-ReMF model are shown in the figure, where the blue bars indicate the actual data. Green straight lines indicate the simulated data, which can clearly show that the model interpolates the missing data. The interpolation results show that the interpolated lines overlap with the true values to a high degree. Because of the spatial correlation, the existence of zero values in the interpolated part is in line with the actual situation. The fact that the interpolated stock balance fluctuates greatly due to the different demand for goods by customers in different regions at different times of the year is also in line with the actual situation. The interpolated values for the end of September 2021 in this improved model have a zero value, which may be related to regional user activities such as end of month clearance and delayed start of the school season due to the epidemic.

A comparison is made with other models. The problem studied in this paper is a regression problem and the Bayesian correlation algorithm is highly used in regression problems and the time matrix is also a commonly used model for time series. Bayesian probabilistic matrix factorization (BPMF), temporal regularized matrix factorization (TRMF), Bayesian temporal matrix factorization (BTMF), Bayesian temporal regularized matrix factorization (BTRMF) [34], LSTM-ReMF, and LSTM-GL-ReMF are chosen for comparative analysis. The simulation results are shown in Figure 4.

The blue bar charts in Figure 4 show the actual data. Vacancy parts show the missing data. Green straight lines show the model simulation data, which can clearly show the interpolation of the missing data by the model. RMSE, MAPE was used as an evaluation indicator and the results are shown in Table 2.

To further investigate the generalization ability of the model, simulation experiments were conducted for data with missing rate of 15% and 35%, respectively. Error evaluation was obtained as shown in Figure 5.

From the above results, it can be seen that the interpolation results with the inclusion of spatiotemporal analysis are more realistic. The improved model in this paper performs better compared to the original LSTM-GL-ReMF model, but is still affected by the missing rate of data, with the larger missing rate the larger the model error. The comprehensive simulation results show that the improved LSTM-GL-ReMF model can effectively interpolate the missing inventory balance data in the logistics field with high performance. From a practical point of view, there is a relationship between inventory balances and interregional user behavior in different regions of the same city. The improved model in this paper has very good performance.

5. Conclusion

In this paper, a method for interpolating missing data in logistics based on regularized decomposition is proposed. There is empirical evidence that our research has some contribution to make. Regularization techniques are used to interpolate missing data in logistics to improve the LSTM-GL-ReMF model by considering spatiotemporal data characteristics and apply L-curve method to select regularization parameters. It is provided proof for the introduction of missing data interpolation models into the logistics domain. In addition, a rigorous theoretical derivation of the model’s operating mechanism has been carried out, which provides theoretical support for the model algorithm. The results show that the interpolation model, which takes into account the spatial correlation between the inventory balances of different same-city physical warehouses, is closer to its real situation and has a higher interpolation accuracy. Future experiments will consider different types of missing data to find the range of missing data with the best interpolation capability and thus improve the model interpolation performance.

Data Availability

The data used to support the findings of this study have not been made available because it is confidential within the company.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.