Abstract
In order to improve the short-term demand prediction effect of e-commerce commodities, this paper combines the deep neural network algorithm to predict the short-term demand of e-commerce commodities and proposes a nonparametric supply chain demand prediction model based on multilayer Bayesian network. Moreover, this paper uses the hidden layer factor to describe the internal relationship of customer demand in time series and uses the bottom layer factor to represent the actual customer demand. In addition, this paper directly takes side information into consideration to improve the accuracy of customer demand prediction. Through simulation experiments, it can be seen that the prediction method of short-term demand for e-commerce goods based on deep neural network is close to the actual demand, and it can play a certain role in the demand prediction of e-commerce goods.
1. Introduction
With the rapid development of e-commerce, Internet technology, big data technology, and various mobile devices, the data generated or saved by customers in online shopping is easier to obtain, which makes the collection of full sample data possible. By analyzing these data, e-commerce companies can find out the preferences of customer groups for different products and predict the demand and inventory of each product in different time periods. According to the predict results, enterprises can provide decision support for production, operation, and management and provide a basis for inventory management, thereby saving inventory costs for enterprises and improving customer shopping satisfaction [1]. At present, there are two main methods of commodity demand prediction: qualitative prediction and quantitative prediction. Qualitative prediction is susceptible to subjective factors and is not suitable for prediction a large number of commodities, and the prediction accuracy will also vary greatly with the predicter’s familiarity with commodity sales [2]. Quantitative prediction methods include simple moving average method and weighted moving average method. Both the simple moving average method and the weighted moving average method are only suitable for commodities whose demand has been relatively stable and there is no seasonal trend. Moreover, for the weighted average method, the assignment of weights is still very subjective [3].
It is particularly urgent to mine valuable information from the massive user-commodity interaction data, explore the changing laws of commodity demand hidden behind the data, and find a robust prediction method [4]. Various dynamic and intermittent factors such as changes make this prediction task more difficult and challenging [5].
Under the conditions of market economy, the competition among commodity sales enterprises is increasingly intensifying. In order to gain an advantage in the competition, enterprises must understand the needs of customers, that is, understand when customers need what kind of products, and then provide customers with more comprehensive and thoughtful services [6]. By mining historical sales data with time series association rules, we can get the correlation between different commodities at different times and then understand the needs of customers. However, the current mining methods for temporal correlation of commodities mainly focus on discovering spatial association rules with temporal constraints, spatial correlation between different commodities that appear periodically [7] and different commodities that appear periodically in multiple time granularities, the spatial correlation between them, the correlation of the demand of a single commodity at different times, etc. Suppose there is the following time series association rule: 80% of customers will buy product B at △T time after purchasing product A. This rule reflects the relevance of different commodities at different times, so it is more meaningful for companies to understand customer needs. Using the characteristics of the sales data of commodity sales enterprises related to customers, the original sales data model is converted into a customer model data model grouped by customers, and a time series association rule mining algorithm is proposed for this data model [8].
Commodity demand forecasting is a typical time series forecasting problem. Reference [9] applies a seasonal SARIMAX model to the daily sales forecast of food retail. Reference [10] uses artificial neural network to predict weather-sensitive products in retail stores. Literature [11] uses a variety of traditional machine learning methods to predict holiday shopping. Literature [12] applies ensemble learning to the analysis and prediction of Amazon's online book sales. In literature [13], considering machine learning technology for commodity sales prediction A, more extensive review was conducted. The rapid development of the Internet and the Internet of Things has brought about the rapid growth of various business data, and new requirements have been put forward for various analysis and prediction in the big data environment. Reference [14] uses LSTM to predict oil production.
Since the commodity transaction data itself has the characteristics of time series distribution, it can be predicted by using the recurrent neural network (RNN) in the deep learning model, among which LSTM is more widely used due to its good performance [15]. For real e-commerce data, due to the large differences in the distribution of different types of e-commerce data, unified processing will inevitably lead to a certain decline in performance. Reference [16] takes the historical sales trend graph of commodities as commodity characteristics and forms them through clustering large product categories, so that more reasonable predictions can be made in different categories. At the same time, two-dimensional data frame processing is carried out on the inherent and dynamic characteristics of the product, and the two-dimensional data frame of dynamic characteristics is convolved and fused with the inherent characteristics to optimize it. At the same time, it is further combined with LSTM and MDN network to further improve the predicted performance.
Commodity sales forecasting is to measure the expected sales results in a certain period of time in the future based on the historical sales data of related commodities in combination with changes in market demand and other environments and conditions [17]. How to compare and refer to historical and current data, select appropriate and effective technical methods based on economic theory and other theories, and use them to predict the direction of future changes and development trends and make predictive decisions is the main task of forecasting. Therefore, the ability to predict the likely number of sales in the next period can improve customer satisfaction, reduce unsalable products, increase sales revenue, make more efficient production plans, and improve business planning [18]. Accurate sales forecasts can help improve the efficiency and effectiveness of retail enterprises and supply chain operations. Therefore, how to avoid the increase in inventory costs caused by too much or too little purchases, using more scientific and reasonable methods to forecast, is a current research focus. Regarding sales forecast [19], some scholars believe that sales forecast is an estimate of the sales volume or sales of a certain product. Sales forecasting is the basis for enterprises to purchase materials, arrange production, and make various decisions. Some scholars believe that sales forecast is an estimate of the sales volume and amount of a product in a specified time in the future. Another argument is that sales forecasting is guided by product purchase and sales, estimating and predicting product purchase activities, sales activities, competition, and price changes and the final result [20].
This paper combines the deep neural network algorithm to predict the short-term demand of e-commerce commodities, improves the stability between e-commerce inventory and procurement, and promotes the stable operation of the e-commerce supply chain.
2. E-Commerce Product Prediction Algorithm
2.1. Bayesian Network Theory
In graph theory, a directed graph can be represented by . Among them, is a finite nonempty set, and any element is called a node. E is a set consisting of ordered pairs of distinct elements in V, and the elements in E are called directed edges. A directed graph is called a directed acyclic graph if it cannot start from a certain node and can return to that node after passing through several edges. A typical example is shown in Figure 1.

(a)

(b)

(c)
Therefore, the definition of Bayesian network can be given: we assume that the set of nodes in a Bayesian network is , and the Bayesian network N can be represented as a binary group . Among them, there is the following:(1) represents the directed acyclic graph of the relationship between each node, which is called the structure diagram of the Bayesian network.(2) is used to characterize the probability relationship between each node in the Bayesian network, which is called the Bayesian parameter.
Specifically, Bayesian network can be understood from both qualitative and quantitative levels. At the qualitative level, it uses a directed acyclic graph to describe the dependent and independent relationships between variables. At the quantitative level, it uses conditional probability to describe the dependencies between variables.
The basis of probability theory for Bayesian network models is Bayes' theorem. We assume that is the joint probability density function of a random observation vector y, and is a parameter vector, which can also be regarded as random. The parameter vector can be the coefficients in the model, or the variance, covariance, etc. of the disturbance term. The posterior distribution is the final parametric distribution. Therefore, according to Bayes’ theorem, we can get
It can be seen from the definition of Bayesian method that (y) is known prior information, so it is a constant term. From this, we can abbreviate formula (1) as . This formula is a simple description of Bayes’ theorem. In addition, according to Bayes’ theorem, the posterior distribution is proportional to the product of the likelihood function and the prior distribution.
Table 1 lists the probability density functions of common distributions and their corresponding conjugate distributions.
At present, there are two common methods for parameter estimation learning: maximum likelihood estimation (MLE) and Bayesian estimation (BE). Both methods require the data sample set to satisfy the assumption of independent and identical distribution. If the data set given about n variables contains m samples , each sample in y satisfies the following two conditions:(1)Each sample in y is independent of each other when the parameter is given, that is, .(2)The conditional probability distribution of each sample is the same.
2.1.1. Maximum Likelihood Estimation
The maximum likelihood estimation method is based on the traditional thinking of the obituary analysis. Specifically, it estimates the parameters according to the likelihood of the samples and the parameters, that is, . Among them, is the likelihood function of . Further, a sampling data set consisting of m samples is considered, and the relationship between each is independent and identically distributed. The derivation of the estimation of the parameter using the maximum likelihood method is as follows:
Furthermore, if it is assumed that the sampling data set obeys the Gaussian distribution , the likelihood function can be obtained as
Therefore, in order to estimate , we have
2.1.2. Bayesian Estimation
In the process of Bayesian estimation, the idea of Maxima Posterior (MAP) can be used to estimate parameters. Similarly, if it is assumed that the parameter learning based on the Bayesian estimation method is to be performed on in the sampling data set consisting of m samples, there are
Here, is expressed as a constant term and can be omitted.
Similar to the maximum likelihood estimation, it is assumed that the sampling data set obeys the Gaussian distribution , and the parameters obey the prior distribution of , respectively. In order to obtain the optimal estimated value , there are
2.2. Construction and Solution of Hives Model
Table 2 is the description of the main parameters of the constructed Hives model and the meaning of decision variables.
The main consideration is a single product supply chain consisting of a retailer or manufacturer and multiple consumers. Therefore, when the historical sales data of a certain commodity is given, the customer’s demand for the commodity at the t-th time point is . We use the probabilistic graphical model of Figure 2 to characterize the connections between at different times.

In general, the model shown in Figure 2 can be divided into three layers: the bottom layer is the observed variable layer composed of the actual demand y of customers for commodities. According to the results presented in Figure 2, it can be seen that, in real scenarios, there is no significant direct correlation between customer demands at adjacent time nodes. Therefore, this paper introduces the intermediate hidden layer variable η and indirectly reflects the internal relationship of each demand variable in the time series by establishing a statistical relationship between η. Further, the need for parameter estimation of the hidden layer is considered, and we introduce a hyperparameter layer (μ, ϕ) on top of the hidden layer to smooth the data fitting results.
Specifically, this paper assumes that the variables in Figure 2 have the following relationship:(1)At any time t, the customer’s demand quantity for commodities obeys a Gaussian distribution with mean and variance . Moreover, it can be seen from the above description that the actual commodity data is often very volatile, and the data changes at the front and rear time nodes show strong randomness. Based on this background, this paper assumes that the observations at different times are independent of each other. Therefore, for the observation data at any time, there are In the above formula, the latent factor can be regarded as the customer's demand state for the product at time point t, is regarded as the passenger flow factor at time point t, and is the correction factor. Thus, is the (uncorrected) quantity of goods converted into real demand.(2)The hidden layer variable at any moment is only affected by the hidden state at the previous moment. At the same time, it only acts on its hidden state at the next moment. That is, the hidden layer variable has the first-order Markov property on the time axis, which can be obtained: Further, the time-series characteristics of the customer’s demand for commodities are considered, and we also assume that the value of the hidden layer variables at each time point obeys the following first-order autoregressive model: Among them, is the long-term mean level of the autoregression of the hidden state , is the rate of the autoregression, and is the noise term of the autoregression.(3)The correction factor and the hyperparameter independently obey the normal distribution, which are
Meanwhile, there is
Among them, in formula (11) is defined as the average value of customer demand of the target product at all time points in history. And, since the value of in formula (12) is (-1, 1), therefore, according to the “ “ and “3 “ principles of normal distribution, here, the value of in formula (12) is limited to [0.5, 1].
Next, we consider the design of factor acting on the demand data y. Since in the model assumption of formula (7), we interpret as the passenger flow factor at time point t. In fact, in real life, whether in online shopping malls or in physical stores, the behavior of passenger flow shows certain periodic characteristics in time. For example, there is a certain difference in the activity of product purchases on weekdays and weekends, and due to the certain use cycle and natural life of the product, the purchase behavior of customers for a specific product has a certain interval. Based on this consideration, we assume that is a periodic factor with period H. At the same time, it is assumed that the following relationship should be satisfied between each in Figure 3.

According to this assumption, it can be obtained that, for at any time, it not only affects the demand data at the same time, but also affects at the same time. Therefore, the effect of the factor on y at each moment is shown in Figure 4.

Further, for the factors in the same period, this paper assumes that they independently obey a normal distribution with a mean of 0 and a variance of , as shown in the following formula:
This paper discusses how to estimate Hiyes's model parameters given the historical demand data of users for a particular commodity.
In the sense of maximum a posteriori, the estimation of the above parameters can be reduced to the following problems:
Here, we treat P simultaneously as a function of variable .
Now, known user demand data for a commodity is given, and when is unknown, based on the Bayesian formula, we can get
Further, according to Figure 4, it can be obtained that the distribution on the right side of formula (16) has the following relationship:
Meanwhile, there is
In addition, since in the above transformation this paper uses the periodic assumption of (13), there is . Therefore, formulas (17) and (18) are substituted into formula (16), and the logarithm of both ends of the equal sign is taken, and the result is recorded as l, and we can get
Here, is a constant. The two formulas are compared, and the value does not affect the result of formula (19). We record , , and we can get
Thus, the solution of formula (15) is equivalent to the following objective:
Here, we treat l as a function of variable .
This section discusses the design of the solution strategy for the model. First, for the convenience of discussion, we set . Therefore, it can be obtained that formula (21) is equivalent to
From this, the expression of can be obtained as
Theorem 1. is a convex function of , respectively.
Proof.
Here, we find the first-order partial derivatives of .
First, we calculate . Without taking into account the cyclical factor, it is easy to obtainFurther, periodicity is taken into account, and we note that is affected by the same factor . We get the final formula for asFor , we haveSince is a hyperparameter that affects the hidden state , it can be seen from formula (9) that the distribution of the hidden state is different when there is t = 1, t = T and 1 < t < T. Therefore, the calculation of and needs to be considered according to the specific time t.
For , when there is t = 1, there isWhen there is 1 < t < T, there isWhen there is t = T, there isFor , when there is t = 1 and when there is 1 < t < T, there isWhen there is t = T, there isFor , when there is t = 1, there isWhen there is 1 < t < T, there isWhen there is 1 < t < T, there isWhen there is t = T, there isThen, each parameter is taken as the second derivative.
First, is calculated, and can be obtained. Since we set in the previous model assumptions, we can get .
Second, is calculated. Similarly, since there is in the aforementioned assumption, can be obtained.
Furthermore, for , when there is t = 1 or t = T, there is . When there is 1<t < T, there is . Due to , can be obtained.
For , when there is t = 1 or t = T, there is . When there is 1 < t < T, there is . Therefore, can be obtained.
For , because of , it can be obtained; when there is t = 1, there is . When there is t = T, there is . When there is , there is
Below, the range of values for is further discussed when there is 1 < t < T.
We can get . When there is 1 < t < T, the value range of is jointly determined by the monotonicity of the two functions of . Obviously, it can be seen that increases monotonically in the range of (−1,0) and decreases monotonically in the range of [0,1). However, is the opposite. Therefore, here we test the values of at three extreme points to verify whether is > 0, and we can getFurther, based on the above formula, we can get the following.
When there is , is > 0. At the same time, since there is , there is . Thus, it can be obtained that when there is , there is . In addition, is known from formula (6). It is brought in so that when there is or , isTherefore, it can be obtained that, for any time point t, the value of at the three extreme points is greater than 0. Thus, can be obtained.
The proof is complete.
Since is a convex function of , respectively, this chapter uses the gradient descent algorithm to solve the above objectives.
It can be seen that the core step of Algorithm 1 is the calculation of gradients , and , and the specific calculation results are shown in formula (24) and formula (35).
3. Prediction Method of Short-Term Demand for e-Commerce Goods Based on Deep Neural Network
This article summarizes the factors that can affect the demand for goods on the e-commerce platform according to three major categories: external environmental factors, consumer behavior factors, and e-commerce factors. Environmental factors mainly include economy, society, climate, and local festivals. Consumer behavior factors can be analyzed from consumers’ consumption preferences and intentions and the final consumption results. Figure 5 shows the factors affecting the demand for e-commerce goods.

Figure 6 is a schematic structural diagram of an MLP network with two hidden layers. There is no connection in the layers of the network, and a fully connected connection method is used between the layers. The input layer of the network has a total of n1 nodes, the two hidden layers each have n2 and n3 neurons, and the output layer has n4 neurons. In addition to the output layer, the network also has a bias node bi in each layer.

The main problem of the correlation layer of the AR-MDN model is to solve the correlation variables that affect the demand for e-commerce goods. Therefore, the association layer can functionally be viewed as a feature extraction process for regression-based models. In AR-MDN model association layer modeling, the model is executed independently for each time point, as shown in Figure 7.

In the field of supply chain management for retail and consumer goods, the importance of data cannot be overstated. It can predict potential purchase needs by using weather, demographics, sales and store information data, etc. and provide more intelligent, fast, and accurate guidance for supply chain decisions, which makes the supply chain more transparent. The data work flow chart is shown in Figure 8.

The forecasting objective studied in this paper is to forecast the demand for commodities on a weekly basis. Because the deep neural network training model requires more training data, we use the time sliding window method to process the demand and characteristics of commodities in units of weeks. Taking 1 week (7 days) as a window, the demand of each commodity in different areas in this window is called the label. The working principle of the sliding window method is shown in Figure 9.

This paper outputs the loss in the model training process, and the AR-MDN model predicts the short-term demand for e-commerce commodities in the training loss change process, as shown in Figure 10.

In this paper, by setting the learning rate to the exponential decay learning rate, the loss function of the model will not fluctuate greatly in the later stage of training, and a better training effect can be achieved, as shown in Figure 11.

This paper uses a deep neural network-based short-term demand prediction model for e-commerce commodities to predict the short-term demand for all commodities in 1-week units. The comparison chart of the prediction results of the short-term demand for commodities is shown in Figure 12.

From the above research, we can see that the short-term demand prediction method of e-commerce goods based on deep neural network is close to the actual demand and can play a certain role in the demand prediction of e-commerce goods.
4. Conclusion
In the process of automatic management of e-commerce supply chain, high-quality short-term demand forecast for commodities is the basis and core function of supply chain management and has become the focus of attention of various e-commerce companies. The application of big data analysis in the supply chain can effectively integrate resources, assist decision-making and comprehensive coordination, and can further optimize classification, pricing, and inventory and can generate personalized marketing strategies. As a result, the cost of the supply chain is reduced, the operational efficiency is improved, and the core competitiveness of the enterprise can be significantly enhanced. This paper combines the deep neural network algorithm to forecast the short-term demand of e-commerce commodities and improves the stability between e-commerce inventory and procurement. Through simulation experiments, it can be seen that the short-term demand prediction method of e-commerce commodities based on deep neural network is close to the actual demand and can play a certain role in the demand prediction of e-commerce commodities.
Data Availability
The labeled dataset used to support the findings of this study is available from the corresponding author upon request.
Conflicts of Interest
The author declares that there are no conflicts of interest.
Acknowledgments
This study was sponsored by Huanghe Jiaotong University.