Abstract

With the popularization of mobile devices and the development of wireless networks, crowdsensing is devoted to providing universal Internet of Things services. A reasonable task pricing mechanism can not only motivate more users to participate in the sensing task but also help the benign development of crowdsensing platform, so it has gradually become a research hotspot in the field of crowdsensing. Aiming at the common problems of insufficient analysis of task pricing rules and large deviations of pricing prediction models, a task price prediction method based on clustering and DNN is proposed. Using the real historical trade price set as the data source, natural grouping and taxonomic description of task price are realized by exploring sensing task pricing law with complex constraint relation using two-step clustering analysis. On the basis of the above, the price interval prediction model based on DNN is implemented. The experimental results show that the predicting accuracy of the pricing mechanism is higher than 82.7%.

1. Introduction

The increasing demand for practical applications of the Internet of Things, the widespread popularity of mobile smart terminals, and the emergence of the crowd computing model [1] have jointly spawned the emerging concept of crowdsensing. Crowdsensing is a new data acquisition mode that combines crowdsourcing ideas and mobile device perception capabilities and is committed to providing universal Internet of Things services for the public. Crowdsensing uses mobile sensing devices carried by nonprofessional field personnel to realize the distribution of sensing tasks and the collection of sensing data through conscious or unconscious collaboration, which breaks through the barriers that rely solely on professional participation [2]. Crowdsensing integrates GPS, cameras, gyroscopes, microphones, and other sensors. Mobile devices rely on human behavior to perform large-scale, complex sensing tasks and provide rich sensing data. This “people-centered” sensing network [3] overcomes the shortcomings of high networking cost, inflexible deployment, and difficult maintenance in the traditional fixed deployment mode.

Due to the above advantages, crowdsensing is widely used in many aspects of real life. For example, in public security application scenarios, Haddawy et al. developed a smart phone disaster warning system based on Mobile4D using crowdsensing, providing real-time crisis warning and detailed situational awareness information [4]. In the application of environmental monitoring, Oscar et al. monitored air pollution in crowded cities with the help of crowdsensing [5]. In addition, crowdsensing also has a wide range of practical applications in social services [6] and other aspects [7, 8], so it has received extensive attention and a lot of research from domestic and foreign researchers. As an emerging research field of the Internet of Things, the research of crowdsensing mainly includes task pricing [9, 10], task allocation [11, 12], data transmission [13, 14], incentive mechanism [15, 16], etc. With the continuous exploration and resolution of these problems by domestic and foreign researchers, crowdsensing will eventually serve the society in a brand-new way.

Typical crowdsensing is composed of two subjects: a platform and a user carrying a mobile perception device [17], where users include task publishers and task participants, as shown in Figure 1. The platform is mainly responsible for publishing tasks and remuneration for hosting tasks. The task publisher is mainly responsible for perceiving the task release and providing remuneration. The responsibility of the task participant is to complete the task and get paid. It is a reasonable and effective way to pay users who participate in the task in the process of completing the task, so the pricing is particularly important. Essentially, it is to treat the sensing task as a commodity that can be bought and sold in the free market.

The main contributions of this paper are as follows:(i)We use real historical datasets as data sources to construct task pricing standard datasets to improve the accuracy and efficiency of task pricing law analysis. Natural grouping and taxonomic description of task price are realized by exploring sensing task pricing law with complex constrain relation using two-step clustering analysis; the natural classification and group description of the sensing task prices are realized, thereby reflecting the pricing law of sensing tasks.(ii)According to the obtained clustering dataset, DNN is used for batch training and analysis and optimization, and the price range prediction model based on deep neural network is realized, which completes the price range prediction of the perception task and provides a scientific basis for the price decision of the sensing task.

In Section 1, the background, characteristics, architecture, and practical application of crowdsensing are briefly described, and the main contributions of this article are explained at the same time. In Section 2, this article briefly reviews the sensing task pricing analysis scheme proposed by domestic and foreign researchers and conducts research to solve the corresponding problems in view of the current status and existing problems of the previous research. Section 3 introduces a task price prediction method based on clustering and DNN. Using the real price set of perception tasks as the data source, TSCA is used to classify and describe the prices of perception tasks naturally, revealing the intrinsic classification of the prices of perception tasks, and at the same time, we use DNN to perform classification on the prices of the classified sensing tasks. We predict and conduct comparative experiments. Section 5 summarizes the full text and points out the next research direction.

The task pricing of crowdsensing is mainly through the analysis of historical sensing data to explore the way of task pricing rules to determine the price of sensing tasks. The methods used in this pricing model mainly include cluster analysis, multiple regression, bivariate models, and Bayesian models. Literature [18] uses a density-based spatial clustering algorithm to cluster the density areas of the tasks in the task price dataset to optimize the pricing strategy. Literature [19] uses the same clustering method as literature [18] and introduces a proportional sharing mechanism to establish a sensing pricing optimization model that can evaluate task success rates in advance. Literature [20] uses a combination of K-means clustering analysis and multiple nonlinear regression to design the task’s pricing function and analyze the reasons why the task is not completed. Literature [21] proposes a task pricing mechanism based on Bayesian model, which transforms a non-submodel optimization problem into a submodel optimization problem. Shao and others proposed a crowdsensing pricing mechanism based on a bivariate pricing model [22]. By calculating the Pearson correlation coefficient between bivariate data and pricing data, it is proved that bivariate is related to the pricing problem. Literature [23] uses the same dataset as the previous method, processes historical datasets through factor analysis, and establishes a perceptual task pricing model. Literature [24] studied the use of autoregressive methods to consider market sentiment indicators to predict US oil prices and concluded that autoregressive methods are not strong in predicting such problems and machine learning methods need to be considered.

In summary, the existing problems in existing research mainly include the following three aspects: lack of multidimensional large sample standard dataset; when exploring the law of task price, the interval price interval is defined only according to the price range of tasks in the data set. At the same time, the range or value of situational factors affecting task pricing is classified according to this interval. This processing method is difficult to fully reflect the internal law of task price in the dataset. On the price prediction method, the model method has not considered the idea of machine learning to price, which makes the price prediction deviation larger. Based on the above research status and existing problems, we propose a task price prediction based on clustering and DNN.

3. Task Price Prediction Based on Clustering and DNN in Crowdsensing

3.1. Price Classification of Perception Tasks Based on TSCA

The multidimensional large sample historical transaction data are obtained by contacting the platform authorization. The data source is Gaia Open Data Program. Through data preprocessing such as data cleaning, data merging, and data transformation, the task pricing standard dataset is constructed, and the task pricing law of the standard dataset is analyzed through TSCA. Price pricing analysis based on TSCA includes two stages: constructing clustering feature tree and natural grouping based on condensed clustering method.

The process of constructing the clustering feature tree is to insert the sample cases in the task pricing standard dataset into the clustering feature tree according to its clustering characteristics, so as to realize the growth of the clustering feature tree. At the same time, we first form several small clusters of sample cases in dense regions.

First, we define the cluster features to insert them into the cluster feature tree cluster :, where denotes the linear summation of the attribute values of the sample cases in cluster under continuous factors such as perceived task moving distance and task time consumption; denotes the sum of squares of sample case attribute values in cluster under various continuous factors; and is a vector formed by the number of sample cases of each possible value under the classification factors of the first sensing task area, task type, etc.

Secondly, the distance between clusters is calculated by using the logarithmic likelihood formula according to the clustering characteristics. In order to meet the requirements of processing mixed attributes, TSCA uses logarithmic likelihood distance in distance measurement. The logarithmic likelihood distance formula and its parameter definition between cluster and cluster arewhere represents the variance of the continuous factor value estimated from the sample case in cluster ; represents the variance under the sth continuous factor estimated by all sample cases in the perceptual task price data set; and represents the information entropy under the tth type factor in the cluster.

Finally, the clustering feature of cluster is inserted into the clustering feature tree according to the distance between clusters, so as to realize the growth of clustering feature tree. The logarithmic likelihood distance between cluster and cluster is used to determine whether cluster can be absorbed by cluster . The subclusters corresponding to leaf elements in the final clustering feature tree are used for the next stage of clustering.

The clustering stage is to cluster subclusters in the preclustering stage to achieve the final results. Firstly, the clustering method is combined according to the distance between clusters, until a large cluster is synthesized. Then, the approximate range of the optimal cluster number is determined by Bayesian information criterion, and the final cluster number is determined according to the distance ratio between clusters. The whole process is also called automatic clustering.

We determine the approximate range of the best cluster number by BIC. Using BIC to calculate the clustering group to get the minimum BIC value is the optimal model of subclusters. Then, the change quantity and ratio of BIC values of adjacent clusters are calculated, and the formulas are (3) and (4), respectively. If , the optimal number of clusters is 1. Otherwise, the minimum is used as the initial estimate of the optimal number of clusters.

Next, the optimal number of clusters is accurately determined by the ratio of the nearest cluster distance in the two clusters. The distance of the closest cluster in the cluster is , cluster and . The distance measurement ratio of the nearest cluster is

The automatic clustering process based on formulas (1)–(5) is shown in Table 1, indicating the process of selecting the number of clusters in clustering analysis. The final determined number of clusters is determined not only by the minimum BIC value but also by the number of clusters with the largest variation ratio and distance measurement ratio of BIC. It can be seen from the table that when the number of clusters is 5, the corresponding distance measurement ratio is the largest, which is 2.191. Therefore, output is the final automatic clustering result.

By calculating the log-likelihood distance of each cluster in and cluster result , the sample case in the perceptual task dataset is put into the nearest cluster as a single point cluster. With regard to the number of sample cases included in each category, as shown in Table 2, out of 263,598 data, 101,545 sample cases were assigned to the first category (38.5%), with task participants receiving the highest remuneration in this category for completing perceived tasks. A total of 20,441 individuals were assigned to the second category (7.8%), with the lowest remuneration for completing the perception task.

The description of the clustering characteristics of TSCA is shown in Table 3. With the task price as the clustering object, five categories after clustering and the specific attributes description of each category are obtained. In addition, factors are sorted according to the importance of price changes. It can be seen from the table that the most important factor affecting the price change of sensing task is the location area of sensing task, followed by the type of mobile devices that collect sensing data.

Based on the above clustering results, the perceived task price is divided into five intervals (P1, P2, P3, P4, and P5) from low to high. The perception task of task price in the P1 interval accounts for the largest proportion of the overall task, which is 38.5%. The factors have the characteristics of the shortest moving distance and the shortest task time. The factors with the above characteristics are regarded as the perception task with the lowest price. The perception task of P5 interval has the characteristics of the longest moving distance and the longest time-consuming task, which is regarded as the highest price perception task category.

3.2. Task Price Classification Prediction Based on DNN

After TSCA, the task set is divided into different categories and can describe the characteristics of the class clearly enough. Next, DNN is used for classification prediction analysis. The DNN model classifier is composed of an input layer, hidden layer, and output layer, and the nodes in each layer are connected in the form of full connection.

The input layer stores the sample cases in the task pricing standard dataset based on situational factors in the form of each column in the matrix. In addition to the attribute of task price, the class labels formed based on TSCA are stored as one-line vectors in the order of data. In addition, in order to accelerate the convergence of network parameters and make parameter initialization more reasonable, it is necessary to normalize the continuous factors such as task moving distance and task time consumption. It is also necessary to extract Onehot feature from the classification factors such as perception task area and task type.

The hidden layer is composed of full connection layer and activation layer. The activation value of the node is weighted summation of the output of the previous layer and the weight of the current layer and is obtained by the nonlinear activation function Tanh function.

The output layer uses the softmax activation function to normalize the values corresponding to all nodes and express them in the form of probability distribution. For the probability of each sample case output belonging to each category in the task pricing standard dataset, the category with the highest probability value is regarded as the most likely classification attribution. Use the softmax activation function to output the posterior probability value. Therefore, it is necessary to define the objective and output the corresponding optimization function, and the cross entropy criterion is used in the price interval prediction model. The role of CE criterion is to measure the closeness between the target classification value and the actual classification value. The smaller the CE value is, the higher the closeness is and the better the price interval prediction model will be.

The price prediction model is optimized by the stochastic gradient descent algorithm in the training process. Each time, a sample is selected from the training set for learning, so as to achieve the purpose of rapid convergence and avoid the occurrence of local optimum.

4. Experiment and Result Analysis

In order to verify the proposed task price prediction method based on clustering and DNN, this section will complete the relevant comparative experiments. The price prediction effect of machine learning method and the method of exploring the price law of perceptual task are compared. Using accuracy as the price interval prediction model evaluation index, the specific formula is as follows:

Among them, , where is the number of sample cases in the test set. When the prediction interval is consistent with the actual interval, the value is 1; otherwise, it is 0.

The prediction accuracy of the proposed task price prediction method based on clustering and DNN is shown in the figure. The prediction accuracy of the price interval of the training set tends to be between 83.6% and 93.9%, and that of the test set tends to be between 82.7% and 93.3%. Among them, the accuracy rate of P3 price interval is the highest, which is because the change law of factors affecting price change in this interval is relatively single. The accuracy of the P5 price range is the lowest, which may be due to the factors that affect the price change. As shown in Figure 2, the perception task with such characteristics is classified as the task of the P5 price range.

Comparison of price prediction accuracy effects of machine learning methods: the model is compared with the decision tree and KNN (K = 3) classification prediction model. The results are shown in Figure 3. According to the prediction accuracy of five price intervals, the DNN-based perceptual task price prediction model is the best, and the prediction accuracy tends to be between 82.7% and 93.3%, while the prediction accuracy of decision tree and 3-NN is 80.6%–92.9% and 79.4%–91.3%, respectively.

In addition, the methods for exploring the law of perceived task price are compared. The experimental results are shown in Figures 46 by TSCA, equal frequency division, and random division. Among them, the abscissa is the price interval generated by TSCA, equal frequency division, and random division, and the ordinate is the predicted probability. It can be seen from the graph that the use of TCSA to explore the perceived task pricing law and divide the price range according to the law is the best. The probability value of each sample case output belongs to its correct price range is roughly higher than 0.8, and the probability value of other incorrect price ranges is roughly less than 0.2, which can effectively divide the perceived task price range according to the factors affecting the price change.

The probability of task price interval prediction based on equal frequency division is shown in the figure. The predicted probability of outputting the correct price interval is approximately higher than 0.5, and the probability of outputting other incorrect price intervals is approximately less than 0.5, which can basically realize the price interval of distinguishing perception tasks, but the effect is significantly lower than that of TSCA, indicating that the influence of factors on the price of perception tasks needs to be considered.

The probability of task price interval prediction based on random partition is shown in Figure 6. The probability values of the output correct price interval and other incorrect price intervals are roughly between 0.1 and 0.3, which cannot analyze the pricing law of sensing tasks.

Through the experimental comparison of machine learning method and two dimensions before and after clustering, the results show that the method can effectively analyze the pricing law of sensing task according to the factors that affect the price change and divide the price range according to the law, and the model has high prediction accuracy.

5. Conclusions and Future Work

A task price prediction method based on clustering and DNN is proposed. Firstly, the task pricing standard dataset is constructed to improve the accuracy of analyzing the task pricing law and reduce the time consumption of the analysis law. Secondly, the task pricing law is explored by TSCA, and the task price is naturally grouped and classified according to the factors affecting the task price change, so as to achieve the purpose of dividing the price range according to the pricing law. Finally, the price interval prediction model based on DNN is realized. The experimental results show that the prediction accuracy of the pricing mechanism is higher than that of the classification prediction methods such as decision tree and KNN, and the analysis results of the pricing law are significantly better than those of the frequency division method and the random division method. In the future, in the face of a variety of scenarios and tasks, in view of the complex and diverse characteristics of crowdsensing tasks, the pricing prediction analysis of tasks will still be a topic worthy of further research.

Data Availability

The experimental data used in this article were obtained from the public dataset of Didi Travel.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

The authors thank the National Natural Science Foundation of China (grant no. 41761086) and the Natural Science Foundation of Inner Mongolia Autonomous Region (2019MS06030) for the support. The authors also thank the Inner Mongolia Key Laboratory of Wireless Networking and Mobile Computing and the Self-Topic/Open Project of Engineering Research Center of Ecological Big Data, Ministry of Education.