Abstract

Big data analytics (BDA) is a wide and deep application in e-commerce, which impacts positively on the global economy, especially the U.S. and China who have done well. This paper seeks to examine the relative influence of theoretical research and practical activities of BDA in e-commerce to explain the differences between the U.S. and China according to the two main literature databases, Web of Science and CNKI, respectively, and by employing other samples that present retail e-commerce sales and the number of some data companies founded in the U.S. and China each year. We further determine the reasons leading to the difference between the U.S. and China in BDA in e-commerce, which can help managers devise appropriate business strategies in e-commerce for each of them, and provide a proof of the significant relationship of theoretical research and practical activities in BDA in e-commerce. In addition, the variables related to big data companies show a moderation effect rather than mediating effect relative to the practice of theoretical research in e-commerce in the United States, but they show a moderate effect and mediating effects in China. The results of this study help clarify doubts regarding the development of China’s e-commerce. Moreover, three orientations in e-commerce using BDA and the use of quantum computing in e-commerce to solve existing e-commerce problems are explored to provide better evidence for decision-making that could be valuable in future research.

1. Introduction

Big data are a frontier topic for researchers [1] and have always influenced academic research [2]. However, there are six debates regarding the aspects of the big data literature, including different approaches to big data analytics (BDA), artificial intelligence, big data capability, big data-driven business organizing, big data access, and social risks of big data value realization [3]. The severer is the privacy trust crisis in the era of big data, both in the field of enterprise services and some public services; moreover, there are also doubts about data since individuals may have inaccurate information [4]. More precisely, the key point of the debate and crisis regarding big data is how to use big data well and in a correct manner that is frank and honest, agreeing at the outset to focus only on what really matters [5]. For example, considerations include two things: how to adjust algorithms to ever-changing conditions [6] and how to evade the negative effect of “big data hubris” [7].

Hence, general statistical techniques and computational algorithms are issues that require different tools to manage big data sets [8], particularly with respect to different qualities of information disclosure for different purposes. BDA requires understanding the relationships among features and the explored data [9]. It has evolved from the statistical techniques for data mining from the 1970s to business intelligence (BI) 3.0 today [10]. Another reason that prompts us to focus on this area is that China accounted for 23.1 percent of the total online retail sales in 2017, while the United States ranked behind the UK, South Korea, and Denmark, only sharing 9 percent. Data come from the Statista website, https://www.statista.com/statistics/255083/online-sales-as-share-of-total-retail-sales-in-selected-countries/. Obviously, e-commerce marketing not only depends on information technology or the population, but it is likely to reflect on the oneness of value and perception that the U.S. lacks. Therefore, the situation of the United States and China regarding research on and practice of BDA in e-commerce is worthy of discussion.

Akter and Wamba [11] fully demonstrate that BDA in e-commerce, as an emerging field since 2006, exhibits strategy-led analytics and has sustainable value-driven facets for businesses involved in organization management, goods sales, production management, data quality, IT infrastructure and its security, HRM, overarching values, and so on. Additionally, Manyika et al. [12] propose five major contributions that big data can promote to businesses: creativity, performance, consumer behavior, decision-making, and innovative business models. In short, a series of new issues in BDA in e-commerce should be focused on, including how to determine the first-rank relationship among a commodity’s dynamic pricing, dynamic subsidizing, and cost to e-commerce parties, and how to enact good policies with BDA in dynamic pricing to buyers that depend on a buyer’s past good or bad actions, including comments, sales returns, and sharing. Actually, goods prices have not been adjusted to a per-buyer basis in e-commerce firms, while their expenditures differ in completing the transaction. These aspects require a determination of whether there are any respective methods and experiences for the U.S. and China to solve these issues. This paper summarizes the relevant literature and applications as follows.

1.1. U.S.

Similarly, there are also many examples that occur in practice. Data breaches occur daily in the American society; Taylor [13] provides a list of 17 breaches, such as the Yahoo 500 million user accounts attacked in 2013, Adult Friend Finder in 2016, eBay in 2014, Equifax in 2017, Heartland Payment Systems in 2008, Target Stores in 2013, TJX Companies in 2006, Uber in 2016, JP Morgan Chase in 2014, US OPM in 2012, Sony’s PlayStation Network in 2011, Anthem in 2015, RSA Security in 2011, Stuxnet in 2010, VeriSign in 2010, Home Depot in 2014, and Adobe in 2013. Of course, all of these occurrences are just the tip of the iceberg, as Facebook’s data crisis has made Americans worried about and disgusted with big data once again. To some extent, Americans pay more attention to the safety and rationality of data use may be due to their cultural and cognitive characteristics [14], which is somewhat different from China [15, 16].

1.2. China

By first searching “big data” and then add “business” to the Chinese literature database “CNKI” to classify papers by title, only 34 papers were found, and no paper was retrieved when also searching for “crisis” in these 34 papers. Moreover, some critics state that big data in China are overhyped because companies are more interested in using big data to attract media and investors. Data come from the China government website; “How Baidu, Tencent, and Alibaba are leading the way in China’s big data revolution,” http://www.scmp.com/tech/innovation/article/1852141/how-baidu-tencent-and-alibaba-are-leading-way-chinas-big-data. However, increasing cases of stealing and trafficking of personal information have been reported in China, covering hundreds of millions of items in transportation, logistics, health care, social networking, and banking. For example, 300 million user information leaks from Shunfeng Express and 500 million user data leakages from Huazhu Group or Wanhao Group have occurred.

Regardless of how serious the criticisms are, a notable example shows that Alibaba predicted the dynamic change of income from impoverished people through mining and analyzing their transaction data on its e-commerce platform and then helped the Chinese government to target poverty alleviation. Of course, since this process is not accurate enough to identify each impoverished person’s income, Alibaba has cooperated with the government to develop a platform of “Internet plus targeted poverty alleviation,” which connects several public service data and also clarifies the main reason why certain people are still poor, including illness that leads to poverty or unemployment, disasters, laziness, etc.

This paper is composed of four sections and focuses on two topics: BDA in e-commerce for the U.S. and China in academic research and in practice. The second section presents a discussion of empirical analysis on the connection of BDA in e-commerce in theory and practice for the United States and China. The third section (Conclusions) sums up the comparison of BDA in e-commerce between the U.S. and China in theoretical research and in practice. The last section explores avenues for future research and application. This paper focuses on comparing BDA in e-commerce between the United States and China by examining their level of theoretical research and practical application by quantifying the literature and the market and on explaining the status of e-commerce development and the main factors that influence it in the U.S. and China.

2. Discussion

There are several issues, such as the relationship between academic researches and practical activities of BDA in e-commerce, the difference in the relationship between the U.S. and China, and how academic research affects practical activity on BDA in e-commerce for U.S. and China severally, all of which should be discussed in depth after detailed literature review and the analysis above. It widely adopts a statistical analysis method for investigating the effect that theoretical research promotes dramatically practical activities in BDA in e-commerce. Specifically, what follows builds regression models of academic research and its application to find out the action mechanism of BDA applied in e-commerce and then has a comparison of the moderating and mediating effects of BDA between U.S. and China and has an investigation on lag consideration of academic research response to practical application in BDA for e-commerce.

2.1. Data Acquisition

The relevant literature for BDA in e-commerce between U.S. and China, collected by two main literature databases, Web of Science and CNKI, respectively, is aimed at U.S. and China, during the period from 1990 to 2017. As a whole, we present the results of three stages of searching for subject terms classified by title from several literature databases.

Table 1 presents the growth of retail e-commerce sales in the U.S. and China, and it is observed that the speed of development in the U.S. is significantly different from that in China. China’s e-commerce market saw high growth in he past and in the present which will continue to rise in the future. It will lead to a tremendous market in the e-commerce industry applying big data compared with the U.S.

Table 2 lists a certain (limited) number of data companies in the U.S. and China divided by year, which is subject to the difficulty of obtaining the complete information of big data companies. This table collects and organizes data from “OpenData500” for the U.S. and “Data Technology Industry Innovation Institute” for China and can be used to perform a correlation analysis on the issue of theoretical research in BDA in e-commerce to guide practices for the U.S. and China.

2.2. Variable Setting and Data Disposal

Conducting an empirical analysis of the connection of BDA in e-commerce in theory and practice, such as in comparing the United States and China, requires variables to be set up to represent various subjects for literature retrieval and practical activities. The details are shown in Table 3. For instance, searching the subject term of “E-Commerce” in the ProQuest database is denoted by “X01,” and the total retail sales in the U.S. are denoted by “Y01.”

In addition, time series data for “X01” to “X70” and “Y01” to “Y08” are needed for considering further disposals because some variables are rare or data are missing. The method of disposal for these situations is the following: (1) the range of time series data from theoretical research is from 1990 to 2017, and retrieval results without data in the literature database are filled by default with zeroes; (2) a small amount of data are available for analysis, such as for variables X06 to X10 and X23 to X56; and (3) data are missing in aspects of retail sales in the U.S. and China from variables Y01 to Y08, which are estimated by the linear trend method at the missing point, as shown in Table 4; additionally, partial simulated values are generated for variables Y01 to Y08, as shown in Table 5. Of course, when determining model variables, we do not consider all variables in various models and select and analyze the main regression model results through repeated tests.

2.3. Descriptive Statistics

Table 6 shows that X06 and X07’s Cronbach’s alpha value is 0.840, which is greater than 0.5 [17] and indicates consistency for the ProQuest literature database and WoS (Core Collection) regarding the fact that X01 and X02’s Cronbach’s alpha value is under 0.5, similar to CNKI (all), CNKI (periodical), and CNKI (master’s and doctoral dissertation). Consequently, choosing objects from the WoS (Core Collection) and CNKI (periodical) for a theoretical comparison of the U.S. and China is sound and representative. Most variables are not good for data when their std. deviations are greater than their mean. Except for X01–X05, X12, X21, X22, X57, X63, X64, X68, X69, and X70, the others show a Kolmogorov-Smirnov [18, 19] and have an abnormal distribution. Moreover, several methods, including converting to a normal distribution of the data [20], adopting an appropriate regression model and regression standardized residual test [21], and employing nonparametric tests [22, 23], can be used, as discussed in the next section.

As Table 7 demonstrates, data for practical activities, such as relevant retail e-commerce sales and the number of data companies founded, exhibit abnormal distributions. Only variables Y01, Y03, Y08, and Y015’s show a Kolmogorov-Smirnov , and the others are lower than 0.05. In addition, the other statistics from these variables are similar to those of theoretical research variables and will be discussed regarding validation in further regression.

2.4. Linear Regression Model

Multiple regression analysis is used to determine the relationship between the dependent variables and independent variables ; both are time series variables [24]:

where denotes the th year observation of the dependent variable and is a column vector of observations on independent of the th year. Four model specification techniques are used to select the variables in a regression model, all possible regression, forward selection, backward elimination, and stepwise regression, showed by Jomnonkwao et al. [24].

2.5. Regression Models of the U.S. Putting Theoretical Research into Practice in E-Commerce
2.5.1. Linear Regression Model for Retail Sales with Theoretical Research Variables

First, an investigation of the promoting effect of theoretical research on retail sales from these normal distribution variables, which include X02, X21, X57, X63, and X69, is conducted by means of running a stepwise regression [25, 26] in SPSS. And it is found the probability of to enter ≤0.05 [27] of X63, X69, and X21. Four models have an excellent fit, with all achieving for evaluating the dependent variable Y01 and the three independent variables, as shown in Table 8. In Table 9, the dependent variable and independent variables have a linear relation, where all Sig. of the statistics [28] are less than 0.01, and they are available for the predictive analysis adopted by models 1, 2, and 4, where the tolerance values are the same in the multiple linear regression model and their VIF values are less than five, as shown in Table 10, which indicates there is no collinearity [29] between the independent variables, which can also be confirmed in Table 11 by the different eigenvalue and its variance proportions for the independent variables. Moreover, the standardized residuals of the regression are normally distributed [30].

Distinctly, theoretical research in the U.S. promotes dramatically practical activities in e-commerce. For example, we can more clearly understand the role of theoretical research in driving the development of retail e-commerce, from the information provided in Tables 12 and 13, in which the literature on “E-Commerce” and “E-Commerce” and “Big data” has positive effects on retail e-commerce sales in practice.

2.5.2. Investigation of the Moderation and Mediation Effects of Variables of Data Companies Founded in the U.S.

While running several stepwise regressions for variables of the founded number of data companies and some of the random theoretical research variables, it was found that these variables do not fit very well. For example, by selecting X2, X7, X13, X27, X29, X33, X55, X57, X59, X61, X63, X65, X67, and X69 as independent variables in accordance with the theoretical analysis, where the dependent variables are Y09, Y10, Y11, Y12, and Y13 in sequence, multiple linear regression models are constructed. And the results show that only the independent variable X02 was retained in the regression model along with the dependent variable, such as Y09, Y10, Y11, Y12, or Y13. However, the largest square in these models is 0.407, which is less than 0.5. Likewise, the results show that only dependent variable Y12 is significantly related to X33 and X55 (model 2 in Table 14) while selecting X7, X13, X27, X29, X33, X55, X57, X59, X61, X63, X65, X67, and X69 as independent variables, and square is 0.787. In summary, the variables representing the number of data companies founded do not have a good linear relation to the theoretical research variables. Therefore, their moderation and mediation effects [31, 32] will be investigated in the following section.

Here, models of two regression equations are presented, one of which is made up of the independent variables X63 and X69, moderator variable Y13, and dependent variable Y02, and the other model has the interaction term X69Y13 added. It is determined whether the moderator variable has an effect on the relationship between independent variables and dependent variables or not by judging the significance of the square change (Sig. in Table 15), which indicates that the data companies founded (research/consulting company) in the U.S. played a moderating role [31, 32] in the theoretical research work of “e-commerce and information technology” in promoting retail e-commerce sales in practice.

Then, we test the mediation effects [31, 32] of the data companies founded variables in the U.S. as an example, such that X02 is the independent variable (shown in Table 12), Y02 is the dependent variable, and one of the data companies founded variables (Y09–Y13) is the mediating variable. The first step is the regression of Y02 on X02, which has a regression standardized coefficient of 0.797 (, ); the second step is running the linear regression of the independent variable X02 and the dependent variables as one of Y09–Y13, for which all of the regression coefficients are found to be significant, less than 0.05; the last step is building a linear regression of the independent variable X02 and adding one of Y09–Y13, for which the dependent variable is Y02, and all of Y09–Y13 are found to be not significant in this regression model. Hence, a further Sobel test to judge whether variables Y09–Y13 enjoy the mediation effect or not should be performed. The results of the Sobel test shown in Table 16 indicate that the data company variables are not significant regarding their mediation effects on the given regression for the fields of theoretical research and practical activities.

2.6. Regression Models of China Putting Theoretical Research into Practice in E-Commerce
2.6.1. Linear Regression Model for Retail Sales with Theoretical Research Variables

The analysis procedure for China is the same as that performed for the U.S. First, the regression is run of the normally distributed variables Y08 and X04, X12, X22, X64, X68, and X70 by means of stepwise regression. And then, the two models are found to have an excellent fit, with all achieving , as shown in Table 17. The relationship between X68 and Y08 is positive, but their relationship with X04 is negative. Similarly, China’s theoretical research promotes dramatically practical activities in e-commerce. For example, Table 18 shows that the literature on “E-Commerce,” “Business Intelligence Analytics,” “Mobile Technology and E-Commerce,” “Artificial Intelligence and Big Data and E-Commerce,” “Quantum Computing,” etc., all have positive effects on retail e-commerce sales in practice.

2.6.2. Investigation of the Moderating and Mediating Effects of the Variables of Data Companies Founded in China

While the results of running several stepwise regressions for the variables of the founded number of data companies (Y14, Y15, Y16, Y17, and Y18) and some of the random theoretical research variables (X4, X9, X14, X28, X30, X34, X56, X58, X60, X62, X64, X66, X68, and X70) are the same as those for the U.S., the results indicate that China’s theoretical research promotes data companies that are founded in an obvious and direct manner, which can be seen in Table 19. This observation is opposite to that in the U.S. in this regard.

Next, we performed an investigation of the moderation and mediation effects [31, 32] of data companies. First, the moderation of data company variables is judged in the two regression equations, one of which is composed of the dependent variables X04, X12, and X68; moderator variable Y15; and independent variable Y06, and the other equation has the added interaction term, X04Y15, X12Y15, or X68Y15. Then, the decision regarding whether the moderator variable has an effect on the relationship between the independent variables and dependent variables or not is made according to the significance of the square change. The X04Y15 regression model indicates that the square change is valid (Sig. , regression coefficient ), the X12Y15 is and regression coefficient , and the X68Y15 is Sig. and regression coefficient , which indicates that the data companies founded variable (data/technology) in China has a moderating effect on theoretical research work promoting retail e-commerce sales in practice. We also test the mediation effects of the variables of data companies founded in China as an example, using X04 as the independent variable, Y06 as the dependent variable, and one of the data companies founded variables (Y14–Y18) as the mediating variable. The first step is the regression of Y06 on X04, which has a standardized regression coefficient of 0.842 (, ); the second step is running a linear regression of the independent variable X04 and dependent variables from one of Y14–Y18, which shows that all of the regression coefficients are significant, less than 0.05. The last step is building a linear regression of the independent variable X04 and adding one of Y14–Y18, for which the dependent variable is Y06, and it is found that only Y18 () and X04 () are simultaneously significant in this regression model (shown in Table 20). As a consequence, data companies in China play a mediating role in putting these fields of theoretical research into practice in e-commerce.

2.7. Comparison of the Moderating and Mediating Effects of Data Companies Founded between the U.S. and China

In Figures 1 and 2, we can see that both the U.S. and China show significance in moderating the relationship between theoretical research and practice in e-commerce using BDA, as tested by the moderate variable “founded number of data companies.” In the U.S., the variable “founded number of data companies” has a moderating effect on the model of the correlation of “searching subject terms ‘Quantum Computing’—WoS (Core Collection) (X69)” and “retail e-commerce sales (Y02),” such that this relationship is negative (X69: -0.381 (); Y13: 0.423 ()), which means that it does not moderate the correlation of “searching subject terms ‘Mobile Technology and E-Commerce’—WoS (Core Collection) (X63)” and “retail e-commerce sales (Y02)” (Y13: 0.309 ()). This finding is similar to that in China, where the variable “founded number of data companies” has a moderating effect on the model of the correlation of “searching subject terms ‘E-Commerce’—CNKI (periodical) (X04)” and “retail e-commerce sales (Y06),” such that if the relationship is positive (X04: 0.635 (); Y15: 1.216 ()), it has a direct significant negative correlation in relation to “retail e-commerce sales (Y06)” (Y15: -0.782 ()). However, it does not work in the correlation of “searching subject terms ‘Business Intelligence Analytics’—CNKI (periodical) (X12)” (X12: -0.293 ()) and “retail e-commerce sales (Y02)” (Y15: -0.063 ()).

The model presented in Figure 3 assumes a three-variable system, which has a direct and significant relationship between “retail e-commerce sales (Y02)” and “searching subject terms ‘E-Commerce’—WoS (Core Collection) (X02),” and the mediator variable “founded number of data companies in the U.S.” is introduced to the model. However, this path between Y02 and X02 becomes nonsignificant because for “the number of existing data companies in the United States,” it plays an important role in promoting the theoretical research of e-commerce in practice. However, data companies in China shown in Figure 4 are limited to these types of “research/consulting” data companies and have a vivid mediating effect on the relationship between theoretical research works of BDA in e-commerce and practical producing activities in e-commerce.

2.8. Lag Consideration in Evaluating Theoretical Research Response to Practical Application in E-Commerce

In general, this theoretical research puts into practice needs a certain lag to accomplish the task. Here, a selection of the variables X02 and X69 for the U.S. and X04 and X70 for China is made to test the linear relation between their lag and the retail e-commerce sales. First, the goal is to determine the lag order among the independent variables selected from Table 21, which shows the six criteria [33], and the results of the lag order selection for X02 and X69 and X04 and X70 are a lag order of one and two and one and two, respectively. Next, we construct linear regression models involving the lag variables to determine whether the involved lag variables in the regression models fit better or not. The answer is certainly not. No matter whether the U.S. or China is investigated, square degrades, as shown in Table 22. These findings indicate that we should not consider the effects of lag on evaluating the theoretical research response to practical applications in e-commerce, which was also demonstrated in previous sections as observed by the nonlag variables in the empirical studies.

3. Conclusions

The rapid growth of e-commerce has benefited not only the evolution of data science over the past two decades but also the boom of big data from various sources. This is what makes China and the United States the largest e-commerce markets and why China accounts for more than the United States in e-commerce sales. Ultimately, we can determine the reasons leading to the difference between the U.S. and China regarding this point. One of the reasons is the institutional differences and commercial value, which makes Chinese society’s perception of BDA in e-commerce more acceptable than that of the United States. Another reason involves the theoretical research works on BDA in e-commerce in China, which have attracted slightly more extensive attention than that observed in the U.S. and involved a comparison of literature databases, indicating that proof of a significant relationship between theoretical research and practical activities in BDA in e-commerce could be attained. In addition, in the United States, with regard to the relationship of putting theoretical research into practice, the variables of the data company show moderate but no mediating effect. However, in China, the mediating effects of this relationship have been explained. These results help clarify doubts regarding the development of China e-commerce, which even exceeds that of the U.S. today, in view of the theoretical and practical comparison of BDA in e-commerce between them.

3.1. Avenues for Future Research and Practice

Regardless of whether the U.S. or China is considered, the theoretical research work is deeply impressing and has propelled practical application of BDA in e-commerce. However, big data hubris and algorithm dynamics issues may contribute to analysis mistakes [7] because of human subjective prejudice, technological objective limitations, and the need to enhance artificial intelligence by processing data more efficiently for e-commerce transactions. We expect that e-commerce activities concerning the seller, buyer, platform provider, etc., would entail self-learning actively through their own generated data, and then, extraction by others (such as the buyer and platform provider) of the critical information would serve as a combination of commodities for improving the quality of sales and service, particularly for increasing the transparency and credibility of goods to attract purchases. For example, currently, a product from Amazon online is mostly displayed with its price and functions; however, it is anticipated that product information, manufacturer information, seller information, customer information, and even extra payment information or more will be shown in the future, as shown in Figure 5, such that the desire to buy and recommend the right goods for purchasers can be reinforced. Therefore, there are three future research orientations in e-commerce using BDA. First, data originating from e-commerce activities will be considered valuable and tradable resources after being processed by BDA, and either the seller, buyer, or platform provider can enact pricing dynamically with his or her data for sale. In addition, the data trading market and its pricing mechanisms in e-commerce will be researched and widely put into use. Second, a new rule for dynamic pricing for each customer developed by applying BDA in e-commerce can be envisioned, such that a product would sell for a different price on a per-customer basis, enlisting every seller, buyer, and platform provider to accomplish each expectation or revenue maximization. Third, puzzling relationships among purchasing behaviors and consumer habits [34], consumer habits and personalities [35, 36], consumer personalities and the growth environment [37, 38] can be unraveled by using BDA for deep learning in e-commerce trading.

In addition, it is expected that the mixture of large data resources and new technologies will challenge many existing e-commerce problems and find out a better solution. A series of new issues should be focused on, such as quantum computing in e-commerce [39], in which theoretical research works serve to observably promote retail sales, both in the U.S. and China (as seen in Tables 9 and 17). Ronald [40] considers the potential impact that the nascent technology of quantum computing may have on e-commerce, more specifically, designing “encrypt” information in such a way to ensure that an e-commerce trade is safe, offering significant speed-ups for faster search and optimization in the big data age, and implementing the quantum cheque transaction in a quantum-networked banking system [41]. As a BDA concept, quantum machine learning could enable machine learning that is faster than that of classical computers for calculating and analyzing e-commerce activities in the big data age [42]. In short, quantum computing in e-commerce is a crucial theoretical research topic and has practical application both for the U.S. and China at present and in the future.

In future applications, we should encourage data companies to devote efforts to big data business issues required for e-commerce because data companies play a moderation or mediation role in putting theoretical research into practice in e-commerce.

Conflicts of Interest

The author declares that they have no conflicts of interest.

Acknowledgments

This work was supported by key projects of the National Social Science Foundation (No. 19AGL017), Humanities and Social Science Research Project of the Ministry of Education (No. 18YJAZH153), Natural Science Foundation of Fujian (No. 2018J01648), and Development Fund of Scientific Research from Fujian University of Technology (No. GY-S18109), all received from the Chinese government.