Abstract

Due to the demand for safety and convenience in traveling, self-driving technology has developed very fast in the past decades. In this paper, a novel technology forecasting model is developed. The topic-based text mining and expert judgment approaches are combined to forecast the technology trends efficiently and accurately. To improve the reliability of the results, multidimensional information including scientific papers, patents, and industry data is considered. Then, the model is utilized to forecast the development trends of self-driving technology in China. Data ranging from 2002 to 2019 are adopted with proper data cleaning. Topic clustering for papers and patents is performed, and the hierarchical structures are constructed. On this basis, the results of technology’s evolution based on papers and patents are compared and the development trends are obtained. With these results, it is speculated that technology on “Decision” will be the next hotspot in patents. The research results of this paper will provide reference and guidance for Chinese enterprises and government in decision-making on self-driving technology.

1. Introduction

The road is the artery of economy based on which persons, goods, and materials are transported in time. Efficient road traffic is the foundation for social functioning [1, 2]. However, road traffic problems are getting worse nowadays and begin to have a negative impact on the sustainable development of economy and society. More and more traffic jams and traffic collisions are observed, which result in a large number of casualties and property losses [3]. Another issue is the air pollution caused by automobile exhaust, which contributes to greenhouse gases and environmental pollution [4]. Self-driving technology was demonstrated to be an alternative that may alleviate the above problems [5, 6]. As a result, arrangements had been made for self-driving technology by countries including China. The “Made in China 2025” was put forward by the Chinese government on May 19th, 2015. In this project, energy-efficient and new energy vehicles are within the ten key industries for development [7]. The document clearly stated that by 2025, China will master the complete technology of autonomous driving and corresponding key technologies. In addition, an independent Research and Development (R&D) system will be established. In summary, self-driving technology is one of the core fields in the next few years in China.

Different from traditional technologies, self-driving technology is usually considered as disruptive technology [8]. Disruptive technologies mean high cost on R&D, high economic benefits, and high update speed for the technology [9]. High cost means low fault tolerance, and high update speed indicates that it is easy to fall behind for a country or a company. Therefore, accurately identifying the future trends of self-driving technology as soon as possible is crucial for governments’ and vehicle enterprises’ R&D strategic planning in order to obtain the first-mover advantage in the global competition. Hence, forecasting the development trends of self-driving technology is getting more and more attention in recent years.

The feasibility of technology forecasting for self-driving technology should be discussed first. Some restrictions should be fulfilled so that technology forecasting can be applied to a certain technology [10, 11]. In this paper, self-driving technology is assumed that (a) the data in the past contain all the information necessary for the future and (b) the technology trend of self-driving technology is considered to be a continuously growing technology area with a certain pattern, and the pattern of this trend has existed for a certain period of time. Based on the assumptions, it is able to identify technological changes in the future based on the history data of technology with proper technology forecasting methods.

Numerous methods had been proposed to reveal information hidden in the technology data [1219]. Generally, technology forecasting methods can be divided into two categories depending on the degree of dependence on data: qualitative methods and quantitative methods [12]. The qualitative methods are usually based on the experience and knowledge of experts, which are time-consuming and subjective [13]. Furthermore, it is more suitable for the early stage of a certain technology. With the advancement of computer technology and related numerical tools, more and more attention had been paid to quantitative methods [1419]. Statistical tools [1416] and text mining tools [1719] are widely used in technology forecasting, which can deal with massive raw data. As text mining is able to find implicit, previously unknown, and potentially useful patterns from large text documents [20], it becomes the mainstream method of technology forecasting. However, it is hard to eliminate the irrelevant and meaningless information by text mining, which can be solved by export knowledge effectively. To combine the advantages of the methods, a model that integrating text mining and expert knowledge is developed in this paper to forecast technology trends efficiently as well as improving the reliability of the results.

Text documents are necessary for both text mining and expert knowledge to forecast technology trends. The quality of results of forecasting is deeply influenced by the type and quality of the documents. For a long time, patents were recognized as a crucial data source for the study of technology innovation [15] and technology forecasting [21, 22]. However, patents are not enough to get a complete picture of technology developments. Scientific papers are also the achievements of basic research and the seeds of technology innovation [23]. So, a combination of scientific papers and patents will reflect the research level and acceptance of the technology. On the other hand, industry information can reveal the market acceptance and potential of the technology, which will also affect the trend and direction of the technology development. Therefore, the multidimensional information combining patents, papers, and industrial data is used in this paper which can reveal the development status of the technology more comprehensively.

In this paper, the technology forecasting model is developed, which integrates topic-based text mining and expert judgment approaches to forecast self-driving technology trends with scientific papers, patents, and industry data. The paper is structured as follows. The specific procedure of technology forecasting is provided in Section 2. In Section 3, self-driving technology in China for 2002∼2019 is analyzed in detail, and self-driving technology trends in the future are predicted. The conclusions are shown in Section 4.

2. Methodology

The model is an extension based on the framework of Li [24]. In Li’s framework, topic-based text mining and expert judgment approaches were integrated to forecast technology trends with scientific papers and patents. In this model, industry data are considered in addition to scientific papers and patents. Furthermore, an extra preconditioning stage (stage 1 in Figure 1) is constructed to make a preliminary assessment of the problem which can effectively avoid useless work. The framework of the model is illustrated in Figure 1. Some details for the four stages in the framework are given as follows.

2.1. Stage 1: Determine the Feasibility and Necessity of Technology Forecasting

In this paper, Web of Science (WOS) and Derwent Innovations Index (DII) databases are adopted as the data sources for collecting scientific papers and patents, and the industry data are obtained from iiMedia (an authoritative economic data agency in China) [25]. Different search queries related to the target technology (self-driving) are constructed. The results are compared in order to determine the final search queries. Time ranging from 2000 to 2019 is adopted with the step of a year. To determine the feasibility and necessity of technology forecasting, growth curves of patent data, paper data, and industry data are constructed. The curves are compared with the S-curve in the technology life cycle to identify the technology life periods.

2.2. Stage 2: Clustering Topics

The text content of patents and papers from DII and WOS is saved as files by year. In order to facilitate the analysis, the files are converted into XML format with Citespace [26]. Then, the Lingo algorithm in Carrot2 [27] is utilized to generate technology topics for the papers and patents.

2.3. Stage 3: Constructing the Hierarchical Structure and the Evolution Maps of the Technology

With the clustering results as objective evidence for decision-making, a topic analysis process is performed with the results and experts’ knowledge. Finally, the hierarchical structure of self-driving technology is constructed.

2.4. Stage 4: Forecasting the Development Trend of the Technology

With the technical topics from papers and patents, the differential analysis of the technical topics with high growth rates between scientific papers and patents is carried out. The technology evolution path based on papers and patents is compared to forecast the future development trend of the technology.

3. Analysis of Self-Driving Technology in China

3.1. Data Collection and Preprocessing

In this paper, the data sources for paper, patent, and industry data are from Web of Science (WOS), Derwent Innovation Index (DII), and iiMedia, respectively. Different search queries related to self-driving are constructed with Wikipedia and expert advice. The results are compared in order to determine the final search queries. The term “TS = (self-driving OR autonomous car OR driverless car OR robot car OR autonomous vehicle OR driverless vehicle OR robot vehicle) AND CU = (China)” is utilized as the query to search the scientific papers from WOS, and 2874 papers were retrieved from the database for 2000 to 2019. The term “TS = (self-driving OR autonomous car OR driverless car OR robot car OR autonomous vehicle OR driverless vehicle OR robot vehicle) AND PN = (CN)” was used as the query to search the patent data from DII, and 30870 issued patents were retrieved from the database from 2000 to 2019. The search was done on August 19th, 2020.

Data cleaning is necessary for the retrieved papers and patents. The unrelated patent data and papers are removed manually. The cleaned papers and patents were converted into XML format with Citespace so that they are suitable for text mining with Carrot2.

3.2. Analyze from a Holistic Perspective

The annual amount of the scientific papers and patents are shown in the left side of Figure 2. Generally, the number of scientific papers and patents grew rapidly with time. An exponential growth is observed since 2012, showing that the research and development of self-driving technology have been very active in recent years. To show the growth quantitatively, the corresponding growth rate of papers and patents is shown in the right side of Figure 2. In the early years, oscillations are observed in the results due to the low sum of papers and patents. The growth rate tends to be stable from 2012. The mean growth rates for papers and patents within 2012∼2019 are 46.9% and 39.0%, indicating that related fields are very active.

The annual amount of the industry investment is presented in Figure 3. It is observed that a significant increase in industry investment started in 2016, which is due to the national policy. The National Intelligent and Connected Vehicle (ICV) Testing Demonstration Base was approved by the Ministry of Industry and Information Technology of China in June 2015 [28]. In June 2016, the ICT base was officially opened. Therefore, a large amount of investment entered the market in 2016. In 2017, a drop in investment is observed; the reason is that the field had entered a period of steady development. So, a steady increase in investment is observed in 2018 and 2019.

The growth curve method means to fit the growth curve of a variable into the life cycle curve so that an estimate of future performance can be obtained by extrapolation [29]. As shown in Figure 4, the development of technology usually follows the rule of slow-fast-slow. The corresponding life cycle curve is S-shaped and can be divided into four stages, including emerging, growth, maturity, and recession.

Comparing the development tendency of papers and patents (Figure 2) with the life cycle curve, it is not hard to determine the location of paper, patent, and investment. Paper is in the middle and late of the growth stage, and patent is in the middle of the growth stage. The investment in the self-driving industry is just growing, which means the self-driving industry is in the stage of emerging to growth. Therefore, research on self-driving technology is in the explosive growth stage, and commercialization of self-driving technology is still in the early stage. The rapid development of technology research will influence a lot on the industry development, so it is necessary to forecast the tendency of technology.

3.3. Topic Clustering and Hierarchical Structure Construction
3.3.1. Topic Clustering

The annual data of the scientific papers and patents were analyzed with Carrot2 software. The Lingo algorithm in Carrot2 is adopted, and the corresponding control parameters were set to minimum cluster size = 2, cluster merging threshold = 0.7, and size-score sorting ratio = 0. The clustering results for papers and patents are shown in Figures 5 and 6. As the number of documents in the early years did not meet the minimum clustering requirements, the clustering process for the papers and patents is from 2004 to 2019 and from 2002 to 2019, respectively. The subject terms obtained by clustering are imported into the word frequency visualization software to display the clustering results more clear. As shown in Figures 5 and 6, the hot topics in papers and patents are clear and distinct by year. However, there are also irrelevant and meaningless topics that need to be eliminated, such as “Model,” “Current,” and “Arm 2 Arm.” As mentioned before, the unrelated patents and papers are removed manually.

3.3.2. Construction of the Hierarchical Structure of Self-Driving Technology

Based on clustering results in Section 3.3.1, the hierarchical structure of the technology is constructed as a technology tree to clarify the logic between the topics of self-driving technology. The technology tree is a branching diagram that represents relationships among technologies [30]. It provides a picture of the technology [31] to represent the relationships among product components, technologies, or functions of technology in a specific technology area [29]. The technology hierarchy can be utilized in selecting an interesting technology area for in-depth analysis [32]. Therefore, it is important to construct the hierarchical structure of the technology.

Before constructing the technology tree, the subject headings are classified and summarized with the help of experts in self-driving technology. The irrelevant topics and meaningless topics are also removed. With the classified results, the resulted structure is shown in Figure 7. As shown in the figure, the hierarchical structure of self-driving technology consists of two parts, “Hardware” and “Software.” The “Hardware” category contains the tangible entity that supports self-driving technology, which consists of “Sensors,” “Vehicle Controller,” and “Electricity”; the “Software” contains the invisible algorithm, which consists of “Navigation,” “Positioning,” “Perceive,” “Decision,” and “Control.” This information makes it possible to identify the topic clustering results’ category, as well as define the structure of the technology evolution map. Then, on the basis of the two categories, the experts merge the interrelated topics and subdivided them into the corresponding categories.

3.4. Comparison and Forecasting
3.4.1. Comparison of the Technology’s Evolution Based on Papers and Patents

Based on the hierarchical structure of self-driving technology, now we try to analyze the evolution trend of technical topics. The statistical test [3335] is considered before the analysis. Topics from different perspectives are considered as follows.

We consider “Hardware” topics and “Software” topics first. The amount of the topics that appeared in papers and patents are shown in Figure 8(a). The amount of both topics in papers and patents are relatively small before 2012. After 2012, both “Hardware” and “Software” grow rapidly in patents and have the same order of magnitude. Significant growth is also observed for “Software” in papers since 2012. However, the amount of “Hardware” in papers is small from beginning to end. Although the “Hardware” in papers seems to grow since 2017, the amount is still much less than that of “Software” topics in papers. We speculate that the phenomenon is due to that papers about hardware are hard to publish.

The relative proportion between “Hardware” and “Software” in papers and patents is presented in Figure 8(b). The results are nondimensionalized so that the sum of the proportion is 1. For the papers, the relative proportion is not stable before 2012 due to the low amount. After 2012, the “Software” has a very clear larger proportion. For the patents, the amount of “Hardware” and “Software” are always similar. Similar results are obtained for the growth rate (Figure 8(c)). For the papers, the growth rate of “Software” is larger than that of “Hardware” since 2012 while the results of “Hardware” and “Software” are similar for the patents. We may conclude that the applied research studies on self-driving technology have more balanced development in software and hardware.

Based on the above analysis, it is beneficial to analyze the subdivided topics of “Hardware” and “Software” in papers and patents. As the amount of “Hardware” topics in papers are quite small, it is not included in the relevant analysis.

The topics “Navigation,” “Positioning,” “Perceive,” “Decision,” and “Control” in the “Software” topic of papers are shown in Figure 9. The relative proportion is also given to show the variation more clear. Since 2012 from which the sum of papers is not so small, the topic “Decision” gradually becomes dominant in the “Software” topic. The topic “Control” has a relatively stable proportion, and the proportion of other topics is low. We concluded that decision algorithm is a hot topic in related research.

The topics “Sensor,” “Electricity,” and “Vehicle Controller” in “Hardware” topic of patents are shown in Figure 10. In the early years (2006–2012), “Sensor” and “Electricity” dominate the “Hardware” of patents. Then, “Vehicle Controller” started to increase very fast. Till 2016 and from that time on, the proportion of “Vehicle Controller” is larger than 80 percent. “Sensor” still has a small proportion, and it is hard to find a patent on “Electricity.” We conclude that related technology on “Sensor” and “Electricity” is mature.

For the “Software” topics, five subdivided topics including “Navigation,” “Positioning,” “Perceive,” “Decision,” and “Control” are adopted as shown in Figure 11. Since 2013, the “Control” topic gradually dominates. The “Perceive” topic still has a place while it is hard to find a patent on other topics.

3.4.2. Forecasting of Technological Hotspots

Shibata et al. [36] extracted the commercialization gap between papers and patents and proposed that in the active technical research field, topics that exist in papers but not in patents are considered as technological opportunities. To some extent, technological opportunities can determine technological developments in the future [37]. Therefore, it is critical to identify technological opportunities in order to forecast technology development trends.

Similar to the method proposed in [36], we make a comparative analysis of “Software” topics between papers and in patents. The analysis of “Hardware” topics is not performed due to the low amount of papers. The comparison of the results between papers and patents is presented in Figure 12. As discussed before, for the results of papers, the subtopic “Decision” gradually dominates since 2012. “Control” has a relatively stable proportion, and other topics have very small proportions. However, for the results of patents, “Control” dominates since 2015. The proportion of “Perceive” and “Positioning” is small, and other topics including “Decision” are not observed.

With the above findings, the following conclusions are drawn. (1) Technology on “Control” is basically mature, and research on the topic is mainly about production realization. (2) The proportion of “Perceive” and “Positioning” will be small in the future. (3) Scientific research on “Decision” is still undergoing. It will be a hotspot when the relevant algorithms are mature.

4. Conclusions

In this paper, a technology forecasting model is developed and used to forecast the development trends of self-driving technology in China. To improve efficiency and accuracy, topic-based text mining and expert judgment approaches are combined to forecast technology trends. Multidimensional information including scientific papers, patents, and industry data ranging from 2002 to 2019 is considered to improve the reliability of the results. The findings are listed as follows:(1)Research on self-driving technology including papers and patents is in the explosive growth stage, and commercialization of self-driving technology is still in the early stage. The investment amount is significantly influenced by the government’s policy.(2)With the hierarchical structure of self-driving technology, it is observed that “Software” topics dominate in papers while a more balanced development in software and hardware is obtained in patents.(3)The topic “Decision” dominates in “Software” topics of papers. For patents, subtopics about “Control” dominate in both the “Software” and “Hardware” topics. A time lag phenomenon is observed between papers and patents.(4)We speculate that technology on “Decision” will be the next hotspot in patents.

Data Availability

The data used to support this study are included within the article and are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was funded by the National Natural Science Foundation of China (No. 71704036); the Harbin University of Science and Technology's 2019 “Science and Engineering Talents” Program Outstanding Young Talent Project (2019-KYYWF-0216); and the Harbin University of Science and Technology School of Economics and Management “Double First Class” Discipline Construction High-Level Cultivation Project (KY202013C).