Abstract
Insurance marketing is a discipline that maximizes the benefits between policyholders and insurance companies. A big data-driven approach combined with insurance promotions can leverage a wealth of empirical data to develop new customers, motivate existing customers to engage in more activities, and retain existing customers. Insurance business involves a wide variety of scopes and types, and it is labor-intensive and resource-intensive to rely solely on insurance business personnel to process these tedious data. The data-driven approach can find correlations and perform automatic prediction matching according to the characteristics of insurance business data, which saves a lot of time and labor costs. This research uses data mining method and neural network method to mine insurance business data and predict insurance business. This method can accurately capture factors such as the type of insurance business and the amount of the policyholder. The research results show that data mining technology and neural network method have high accuracy and feasibility in predicting insurance business, the prediction error is within 2.38%, and the linear correlation exceeds 0.96. The method used in this study has high accuracy both in terms of new customers and retention of old customers.
1. Introduction
In recent years, with the rapid economic development and changes in national policies, the insurance business has gradually developed. Insurance business is a business model between insurance companies and policyholders. For an insurance company, it needs to form as many insurance customers as possible and it needs to reduce the investment in insurance as much as possible [1–3]. For the insured, it is necessary to purchase a certain amount and scope of insurance on a regular basis. When an unavoidable accident occurs, the insured can request the insurance company to take out the insurance according to the contract. Overall, this is a win-win business model. With the continuous progress of society and the improvement of people's living standards, the scope of insurance business has become more and more extensive, such as auto insurance, housing insurance, medical insurance, etc., which involve all aspects of life [4]. Each type of insurance also involves many details, such as vehicle insurance involving vehicle loss insurance, third-party compensation insurance, accidental theft, and rescue. For an insurance company, the insurance clerk not only needs to master the coverage and amount of each type of insurance but also requires the insurance clerk to master each customer's purchase of insurance. Through these professional knowledge and situations, insurance salesmen need to do their best to sell insurance [5–7]. It can be seen from the above description that the data of insurance business is numerous, multisource, and complex, which is a difficult task for insurance business, that consumes a lot of human and material resources [8].
The evaluation and selection of the marketing plan of the insurance business need to rely on numerous insurance business cases and data as support and reference [9–11]. For the development of new customers, the salesperson needs to grasp the needs of potential customers, the degree of demand, the cases of related customers, and other information in a timely manner. These data need to be processed by the insurance salesperson. For the retention of old customers, insurance salesmen need to grasp the future needs of customers and other factors such as historical risk situations and then make certain plans [12]. Similarly, the marketing of the insurance business will also involve some marketing activities of the memory customers to follow-up the needs of the customers in time, which further ensures the future renewal trend of the customers. These are three important aspects of insurance marketing business, and it can be seen from the description that these three aspects of business will generate a lot of cumbersome data [13–15]. If only rely on insurance business personnel to process these data, and it is more difficult for them to find potential information from these cumbersome data. Moreover, processing these data through the professional knowledge of insurance salesmen is not only inaccurate but also consumes a lot of financial and human resources. This traditional way of handling insurance data is bad for the bottom line of insurance companies. It is urgently needed for insurance companies to find an insurance business model and evaluation method with higher accuracy and lower cost [16].
Since the 21st century, computer technology and related hardware devices have developed rapidly. It allows people to use computers instead of humans to handle some tedious tasks, and the efficiency of this method is efficient, and its cost is lower compared to manual methods [17]. Data-driven method is a typical way to use computer technology to process tedious data, it has been widely used in people's production and life, and it has achieved great success in many fields. The advantage of data-driven method is that it can process some tedious and nonlinear data quickly and efficiently. In this way, some potential information can be found from some data, and this information cannot be found by human only relying on professional knowledge and experience. The data-driven approach includes data mining techniques and neural network methods, etc. It has been applied in many fields such as medical care, education, and transportation [18]. The prediction accuracy of different data-driven methods is different for different research objects, which requires finding a data-driven method suitable for the research field. At the same time, both discrete data and continuous data data-driven methods have better performance, which provides the possibility for the application of data-driven methods in various fields. Moreover, the data-driven approach can also be normalized according to different types of data, which also provides more feasibility for data-driven applications. The evaluation and selection of insurance marketing business involves a lot of tedious data. The application of data-driven methods in insurance business is a valuable research.
The fusion of data-driven methods and insurance marketing evaluation is a valuable and efficient task. The multisource data of the insurance business will be collected as the input and output of the big data technology, and it will be iterated to learn the relationship between the input and the output. This organic fusion method can not only maximize the economic effect of insurance companies but also more accurately determine the type of insurance and an accurate assessment of the amount, etc. At the same time, it can also help the insured to make a more reasonable choice through the historical insurance experience information. The integration of data-driven methods and insurance business can not only choose data mining methods to fully mine the potential information of insurance business but also use machine learning methods to predict insurance marketing business to find more suitable potential customers and a suitable customer insurance plan.
This research mainly uses data mining methods and machine learning methods to evaluate the insurance marketing business and to mine the potential information of the insurance business. This paper makes relevant explanations from five aspects. Section 1 introduces the feasibility and necessity of the integration of insurance marketing business and big data technology. Section 2 introduces the research status and methods of insurance marketing evaluation and selection. Section 3 introduces the data mining methods and machine learning methods used in the fusion of insurance business and data-driven methods. Section 4 illustrates the accuracy of data-driven methods in insurance marketing operations and some important statistical parameters. This study uses statistical parameters such as linear correlation graph, error distribution, and minimum value distribution to introduce the feasibility of data-driven methods in insurance marketing business. Section 5 further describes the feasibility and urgency of data-driven method in the insurance business as the conclusion of the article.
2. Related Work
Insurance marketing business is an important aspect of research for insurance companies’ profit and loss, and many researchers have done a lot of research on insurance marketing business using different methods. Handel et al. [19] studied a general framework to analyze the policy intervention relationship between selective friction in insurance and market equilibrium. They determined the basic distribution of policyholders and strategies such as risk protection gains and losses. The results show that this general framework has a certain correlation between policyholders and policy. Born et al. [20] has conducted related research on the insurance business in European countries, which uses static and dynamic models to assess the relationship between the Economic Transformation Index and insurance density. The findings demonstrate strong links between currency stability, government property, and trade practices and insurance density. Nourani et al. [21] mainly used the data envelopment analysis (DEA) method to study the insurance efficiency of multidimensional enterprises. It develops four indicators such as network SBM and dynamic network SBM (DNSBM) based on the relaxed metric SBM model and DEA model to analyze the efficiency of insurance. The findings show that the NSBM method outperforms the traditional SBM method when analyzing insurance efficiency, which is a valuable study for corporate insurance. According to the fierce market competition in auto insurance companies, Zhang et al. [22] proposed a logistic regression algorithm to evaluate the risk of car driving and then provided a corresponding reference for insurance business problems. He applied driving scores to insurance pricing models for automobiles, improving the risk identification capabilities of the auto insurance business. This model can assist automotive companies to accurately improve market competitiveness. Ismail et al. [23] said the issue of health insurance fraud has been a concern of the health insurance industry and later of the government. He proposed a point-to-point block-chain system to address this issue, a framework that allows for a more secure, transparent, and rational evaluation of health insurance. Bian et al. [24] believes the usage-based insurance (UBI) business has received more attention in recent years. He mainly evaluates the driver's driving risk level based on on-board sensor data to provide reference for auto insurance companies. He developed the pricing model BVIP as well as the vehicle premium calculation model. The results of the study indicate that the BVIP method has high accuracy in assessing driver driving behavior in auto insurance companies. Dou et al. [25] has noticed that cyber insurance has attracted important attention in industry and academia, which is also a new type of insurance in recent years. The correlation and interdependence of cyber risks increase the risk of cyber insurers. He proposed the optimal grid and insurance effect maximization model based on classical mathematical models and insurance theory. The results show that this method has certain validity and high efficiency in cyber insurance assessment for cyber insurance companies. Yan et al. [26] used the RFM model and incorporated insurance customer risk claims metrics to evaluate the life-time value system of the policyholder's property. At the same time, he takes into account the uncertainty factors in many actual insurance claims and uses hesitant fuzzy theory to conduct cluster analysis. It is based on quantitative methods to assess the risk of the policyholder accordingly. From the above references, many researchers have studied insurance-related data, but data-driven methods such as neural network methods and clustering are rarely applied in insurance marketing business. At present, the research of insurance marketing mainly adopts the traditional way. These researches seldom use big data technology to study the marketing factors of insurance marketing. This paper studies the classification and prediction of insurance marketing based on multisource data. This research mainly uses a big data-driven approach to perform efficient and accurate insurance marketing business evaluation and forecasting tasks. A data-driven approach can map the input and output data of an insurance marketing business to find complex relationships between the two.
3. Introduction to Insurance Marketing Evaluation and Data-Driven Approaches
3.1. The Overview of Data-Driven Method
Data-driven method is a method developed in recent years to deal with tedious data, it can deal with high, nonlinear data, and it can find potential connections between data [27]. This method can help staff to discover the underlying information between the research object and the problem. Data-driven methods include data mining methods, neural network methods, and optimization methods. For the insurance marketing business, the variety and volume of data are enormous, and a data-driven approach is just right for dealing with this data. The question of insurance marketing evaluation and organic integration with data-driven is a valuable research topic for both insurers and policyholders [28, 29]. The data-driven method can not only deal with the nonlinear relationship between the data and make predictions on the future development trend but also can effectively classify the data accurately and perform preprocessing tasks on the data. There will be a large amount of nonlinear data in the insurance marketing business, which can be combined with the advantages of data-driven classification and prediction tasks. During the training process, it needs to constantly adjust the hyperparameters to find a suitable data-driven method.
3.2. An Introduction to the Application of Clustering Method in Insurance Marketing Evaluation
This article mainly uses historical data such as amount, item, type, etc. In the insurance marketing process to classify and predict the future trend of insurance marketing business, these data will be used by insurance marketers as a reference. To complete the prediction of insurance marketing business, it is necessary to effectively classify the collected and complex insurance business data. Therefore, this study mainly consists of two processes, as shown in Figure 1. This study selects new customer data, potential customer data, and old customer insurance data in insurance business as multisource data sources for the data-driven approach. First, we need to classify the collected insurance business data using the clustering method, which is a distance-based classification method. Then, we need to use the classified data to make predictions about insurance coverage. Before these multisource insurance business data are input to the clustering algorithm, these data need to be preprocessed to ensure that these data are processed into data with the same distribution and the same size interval. Insurance marketing business data will be output in the form of a matrix, and these data will be used as input to prediction algorithms.

Clustering is an algorithm in machine learning. It can be used as either task classification or task regression prediction. In ML algorithms, there are different algorithms for decision trees, support vector machine, and clustering. These algorithms are based on different principles of task classification and regression, these algorithms are applicable to different objects. According to the characteristics and demands of insurance marketing evaluation task, the clustering algorithm based on distance is selected to classify insurance marketing business. Figure 2 shows a schematic diagram of the effective classification of insurance data using clustering methods, and the distance-based clustering method is selected in this paper. Clustering methods are mainly divided into distance-based methods and density-based classification methods. The best effect of cluster classification is to ensure that data with strong correlation is divided into the same group of data, and the classification distance between correlated data is as far as possible. From Figure 2, we can see that this paper wants to divide insurance data into data of different customer groups.

The clustering algorithm needs to operate in the form of input and output. The form of input and output in this paper is shown in (1), where x represents the input and y represents the output.
This paper chooses the distance-based clustering method to effectively classify the data of insurance marketing business. The Minkowski method is the distance evaluation index of the clustering method, as shown in (2), where x is the input insurance marketing data and y is the output value. p is the number of categories classified. The difference in the number of p represents the difference in the number of categories that need to be classified. In this study, we need to classify insurance data into 3 categories, therefore, p = 3 was chosen.
In the index evaluation process of clustering, the form of Euclidean distance is commonly used, as shown in Equation (3). This equation is similar to the form of root mean square average error, p = 2. Here, it is necessary to compare the classified value with the real the values that are squared. Through continuous iteration and learning, the optimal classification effect is sought.
Manhattan method is a special kind of clustering evaluation index, where p = 1. This is an uncommon form of evaluation metric, as shown in equation (4).
3.3. Neural Network Methods Used in Insurance Marketing Evaluation
The convolutional neural network can well fit the relationship between the input and the output of the research object, it can also extract the characteristics of the research object well, and it can reduce the amount of calculation. This is more appropriate for the data type of insurance marketing business. The neural network method is often used in the prediction task of the research object, and it is an algorithm that can map nonlinear data and high-dimensional data. With the promotion of computer technology and machine learning algorithms, neural network technology has appeared in different application forms such as fully connected neural network, convolutional neural network, and adversarial neural network [30, 31]. Convolutional neural network (CNN) has been widely used in image recognition, automatic driving, and other fields due to its advantages in feature recognition [32]. This paper adopts CNN to map the nonlinear relationship in insurance marketing business. There are complex correlations between the data generated by the insurance marketing business that marketers have a hard time uncovering. Figure 3 shows the process of CNN extracting the characteristics of insurance marketing business. It needs to input relevant data of insurance business and then perform a series of matrix operations and convolution operations through convolution layer, pooling layer, and activation function to realize insurance. Feature extraction for marketing business. The operation process of CNN includes a series of hyperparameters such as the number of filters, step size, and filling step size, which requires continuous attempts to find the optimal combination of hyperparameters. In this paper, filter number 64, learning rate 0.001, and convolution layer number 4 are applied in the study of insurance marketing business evaluation.

CNN is also a kind of neural network algorithm, it is mainly based on the BP neural network algorithm operation flow. It is also mainly a continuous updating and derivation operation of weights and offsets. The (5) shows the operation flow of weights and offsets between each layer of the neural network method.
CNN contains the activation function, which maps the nonlinear relationship between input and output. The (6) shows the computational criteria for the activation function.
Many derivation operations are involved in the forward and backward operations of neural networks. The (7) and (8) show the derivation rules of bias and weight.
The (9) shows the operation function of weight and bias of neural network, mainly the convolution operation of weight and coefficient and the summation operation of bias.
The (10) shows the convolutional operation of CNN, which is the main operation of CNN.
In order to reduce the computational load of the neural cell, CNN uses the pooled layer to carry out the downsampling. The (11) shows the computational process of the downsampling.
The (12) of CNN, which contains many super parameters, shows the operation rules of CNN super parameters.
3.4. The Preprocessing Method of Insurance Marketing Evaluation Data
This research is mainly to classify and predict the data of new customer business, old customer business, and potential customer business in the process of insurance marketing. The input of the clustering method or the neural network method mainly includes items such as the amount of insurance, the type of insurance, and the time, and the output is the customer's insurance plan. Looking at these input data types, there is a big difference between these data. If these data are directly input into the clustering algorithm or neural network algorithm, it will cause difficulty in convergence during the algorithm training process. Because different types of data neural network algorithms are difficult to perform nonlinear operations, and these data are not in the same order of magnitude, which will lead to problems such as uneven distribution of weights. In a clustering algorithm or a neural network algorithm, a common operation is data preprocessing, which normalizes the collected insurance business data.
4. Result Analysis and Discussion
In this study, the insurance marketing business has been classified and forecasted. Insurance companies are often faced with a large amount of insurance-related data in their daily work, which is rather cumbersome. This study classifies and predicts these data. In the field of deep learning prediction, 5% is often an acceptable margin of error. This is also an acceptable margin of error in the assessment of the insurance marketing business. Figure 4 shows the prediction error of insurance marketing business through a neural network approach. It can be seen from Figure 4 that the prediction errors are relatively small. For different groups of insured customers, the prediction errors are all within 3%. The largest prediction error comes from the prediction of the potential customer group, and this part of the error reaches 2.38%. The reason for the large error of this part of the insured customers may be that there are relatively many uncertain factors in potential customers, and the amount and type of insurance are different. The certainty is relatively large, and this part of the error has a greater relationship with time and changes in insurance policies. The smallest error comes from the insurance group of old customers, because this part of the customer group has relatively little variability in the type of insurance, amount, project, and other factors, and this part has relatively strong stability. The smallest error is only 1.23%, which is small enough for insurance marketing business. From a source of error, the neural network approach is feasible in the insurance marketing evaluation task, and it is also credible enough.

Figure 5 shows the classification proportion of insurance marketing business by clustering method. Through it, it can also be seen intuitively that the clustering method is feasible in the insurance marketing business, and the proportion of classification results is relatively uniform, and there is no extreme proportion, which is beneficial to the training process of the neural network method. The largest proportion reached 38.9%, and this part is the old customer insurance group. The smallest proportion is 30.2%, and the difference between them is relatively small.

In order to more intuitively demonstrate the feasibility of neural network methods in insurance marketing business, this paper shows the differences between the predicted and actual data values of three different business groups. Figure 6 shows a scatter plot of forecast and actual values for the newly insured customer population. This study mainly selects 15 groups of new customers for display. It can be seen that the predicted values of most new insured customers are in good agreement with the actual values, and the difference between them is relatively small, which achieves the evaluation effect of insurance marketing business. There are only 3 groups of insured customers whose predicted data and actual data values are quite different. The red areas in Figure 6 represent areas where the forecast data for the insurance marketing business has a large error, but this is an acceptable error. This may be due to the relatively large mutation of new customer groups, such as the amount of insurance and the choice of insurance items, which have caused large differences. Overall, the marketing forecasts for new customer groups met expectations. This error is acceptable for long-term predictions of new insured customers and will gradually decrease as the new customer data set increases.

For the insurance marketing forecasting task of old insured customers, this paper uses a scatter plot of the distribution of forecast values to analyze the differences. It can be seen from Figure 7 that the neural network method has better accuracy in predicting the old insured customer group than the new customer group. The predicted data values of most of the old customers are in good agreement with the actual insurance data. Only some of the old customers' insurance data and the actual data are quite different, but this part of the difference is smaller than the predicted value of the new customer group. This is because the annual insurance amount and insurance type of old customers are relatively fixed, and its change form is relatively small over time. By comparing the new customer group and the old customer insurance marketing business, the neural network method has better performance in the prediction of old insured customers. For the prediction task of the old customer, it may exist in some insurance policy and activity changes, which may cause a certain large error.

In order to further demonstrate the feasibility and accuracy of neural networks in insurance marketing tasks, this study chose the form of prediction box plots and linear correlation plots to analyze the prediction of potential customer groups. Figure 8 and 9 show a box plot and a linear correlation plot of the distribution of insurance predictions for potential insured customer groups, respectively. In Figure 8, the blue line represents the average value of insurance marketing forecasts and actual values, and the green line represents their minimum. It can be seen from Figure 8 that the neural network method is less accurate in predicting potential insured customers than new insured customers and old insured customers, which is due to the fact that potential insured customers are more uncertain. But in general, the neural network method has also achieved satisfactory results in predicting the insurance marketing of potential customers. The distribution and size of the predicted box plot data are in good agreement with the actual insurance data. Only a few potential customers have large differences in the predicted values. From Figure 9, it can be clearly seen that the prediction of the insurance marketing business of potential customers is better because the predicted value has a good linear correlation with the actual value, the linear correlation coefficient basically exceeds 0.96, and the data values are basically distributed online both sides of the sexual function.


5. Conclusions
With the rapid development of the economy, many new models have emerged in the form of the economy. Insurance business is an economic model derived from this economic environment. The forms and projects of insurance business are varied. For insurance companies, the biggest profit model is to require as many insured customers as possible to join and insured customers to join more insured amounts. In the process of insurance business marketing, the data information of policyholders is more important to insurance marketers. They can use this data to tap more potential insured customers and retain more insured customers. However, with the continuous development of insurance business forms, the data of insurance marketing business presents a trend of diversity and complexity, and it is extremely difficult for insurance marketers to process these data. This study uses a large amount of data from multisource insurance marketing business to classify and predict the customer group situation in the insurance marketing process in a data-driven way, which is beneficial to the evaluation and management of insurance marketing business. From the group classification and prediction of insurance marketing business, the application of data-driven methods in insurance marketing business is feasible and accurate. The clustering method can effectively classify the collected complex insurance data, and the classification results are relatively uniform. The largest category proportion is 38.9%, which comes from old customer groups. The maximum prediction error of the neural network method is only 2.38%. This part of the error comes from the prediction of potential insured customers because this part of the insured customers has great uncertainty about the amount of insurance and the insurance items. For old insured customers, the error is only 1.23%, which is due to the relatively small variability of old insured customers, and the variability of their insured amounts and items over time is relatively small. Although the neural network has produced a large difference in predicting potential insurance customers, the correlation between the insurance data prediction value of potential insurance customers and the actual insurance data reaches 0.96, which shows that the application of neural network method in insurance marketing business prediction is enough trust.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This study was supported by Funding for key research bases of humanities and social sciences in colleges and universities in Hebei province; Research on Insurtech to help inclusive insurance innovation development, project number: JDKF2022011.