Abstract

Drilling is a quite common operation being performed in the manufacturing of components. Instrumental response in drilling is geometrical accuracy and surface integrity of the drilled parts. For the application where geometrical tolerance is very small, an operation is to be carried out very carefully. If not, rejection of drilled samples will be higher and consequently production loss will be higher. The use of prediction model in this scenario is much more appropriate and cost-effective. This research aimed to apply extreme gradient boosting (XGBoost) regressor to develop a drilling prediction model. Drilling experiments were conducted after developing design of experiments with twenty-seven unique sets. Experimental data analysis was then carried out on experimental data sets that have features such as speed, feed, angle, hole length, and surface roughness. After correlation analysis, the k-fold cross validation method was applied for parameterisation. Hyperparameters estimated from the k-fold cross validation were then applied to train and test the XGBoost regressor-based machine learning (ML) model. It is concluded from the model evaluation metric (R2) that the XGBoost regressor model has resulted 0.89 before tuning and 0.94 after tuning of the model, which is higher than the polynomial regressor and support vector regressor.

1. Introduction

Polyether ether ketone (PEEK) is a biomaterial that is preferred in the medical application including tissue engineering, dentistry spine surgery, and maxilla facial surgery. PEEK is semicrystalline material that contains both amorphous and crystalline regions. This two-phase semicrystalline polymer has repeating units of single ketone bond and double ether bonds in the main chain. It has become an important material in the dental application. Many research publications and review articles have been published in the suitability of PEEK in the dental application. Different types of functionalization have been attempted to improve their osteoinductive and antimicrobial capabilities. A review article published by Paratelli et al. [1] reports that the PEEK was applied in implant-supported fixed dental prosthesis (IFDP) frameworks, prosthetic implant abutments, implant abutment screws, and retention clips on implant bars. The PEEK exhibits a glass transition temperature (Tg) of around 289°F and, hence, can be used in high-temperature environment. Due to its electrical properties, it is widely used for electrical components including semiconductor test sockets and electrical connectors. It is resistant to steam and, hence, suitable for reusable medical components subject to repeating autoclave cycles. The PEEK has gained widespread popularity in downhole oil and gas applications due to its resistance to harsh chemicals including H2S. Due to its flame-retardant characteristic and ability to hold UL 94 V-0 flammability ratings from thin cross-sections, it can be applied in electric vehicles too.

Verma [2] reported that designed PEEK costal cartilage prostheses have mechanical properties almost the same as natural costal cartilages. It assists in improved breathing as for the patient going through chest wall reconstruction. It has been applied in replacing the rib cartilage, total knee, and many implants. Davani et al. [3] investigated the structural integrity of PEEK material for a bone application and reported that it is a suitable alternative to stainless steel material. He et al. [4] and Wang [5] reported bio inertness of virgin PEEK hinders its practical application in bone repair application and, hence, modification is required for the PEEK to suit it in the bone application. Kumar et al. [6] also reported in their study on the application of PEEK for spine application that PEEK is a highly suitable material and alternative to titanium alloy. These results are augmented by research conducted by Tan et al. [7]. They investigated different polymeric materials through simulations for bone plating application and reported that modified PEEK is better for this application. The use of the PEEK material in the dental application was discussed in [8].

As the potentiality of PEEK and PEEK composites take them to more versatile material in biomedical and industrial applications, drilling of these materials with high geometrical accuracy is highly appreciated. In general, burrs are produced at the exit of the holes and, hence, the surface finish of the hole is affected. Ranjan et al. [9] used artificial intelligence to predict the hole quality (roundness) in microdrilling (0.4 mm). Kayaroganam et al. [10] used metaheuristic algorithm to optimize the drilling parameters of mica reinforced composite. Parasuraman et al. [11] analyzed the drilling of TiB2 reinforced composite and reported that cutting force was increased according to the filler content. Kaviarasan et al. [12] used artificial intelligence to predict the surface roughness of drilled hole in the Delrin material. Elango et al. [13] recently reported in their study on drilling of PTFE material that use of ANFIS and RSM together for modeling has given more accurate results. Nguyen et al. [14] developed a machine learning (ML) model using Bayesian optimization for polycarbonate material. Support vector regressor (SVR) was applied to develop prediction models for carbon fiber reinforced polymer (CFRP) [15] and Delrin [16]. In [15], Xu et al. attempted SVR for drilling of CFRP and predicting the cutting force. In [16], Elango et al. attempted polynomial and SVR models for turning of Delrin material and concluded that SVR performs better than polynomials. In the case of machining of PEEK material, no research has been conducted so far, despite it is considered as a significant material.

This research is aimed at developing a prediction model for drilling of the PEEK material. The experiments were conducted in CNC machine, and 27 unique datasets were collected. Firstly, exploratory data analysis (EDA) was conducted to understand the correlation between input parameters and response variable. Later, extreme gradient boosting regressor (EGDR) was applied to train the machine learning (ML) model. At this end, the paper is organized as follows: Section 2 describes the material and experimentation. Section 3 presents the model development, while section 4 presents the results and validation.

2. Material and Experimentation

2.1. Material

In this research, a biomaterial known as polyether ether ketone (PEEK) was considered. This biomaterial has superior modulus and strength even at elevated temperature. Its wear properties and durability give it a special consideration in biomedical applications and some industrial applications where high frictional stress is involved. The glass transition temperature (Tg) is about 143°C to 250°C and, hence, it is preferred in valves, bearings, pistons, seals manufacturing, and implants. It is seen as a viable alternative material for stainless steel and titanium alloys for the reason of avoiding stress shielding [17].

Industrial grade PEEK granules were purchased from local supplier in India. Injection molding (Mathman Plastics Molding Machine, China, Model: MPR750R2) was employed in the preparation of PEEK samples. Preheated PEEK granules at 80°C were filled into the injection tube and then heated to 200°C to ensure perfect melting of the raw material. The molten material at 235°C was then passed through the nozzle to mold plate. The plate was allowed to cool at room temperature for 24 h, followed by 30 min in a hot oven at 50°C. Injection molded sheet was in the dimensions of 300 × 300 × 2.5 mm. Dumb bell samples according to ASTM D412/ISO 37 standard were then cut from the injection molded sheet. Tensile tests were conducted using an Instron universal testing machine (Instron 5582, USA) that has 250 mm extensometer and a pneumatic gripper. Three samples were tested at a strain rate of 25 ms−1 until they fracture. Figure 1 shows dumb bell samples which were used in tensile testing and a sample load-displacement graph obtained from the test. The stiffness of PEEK material is 1.52 GPa and tensile strength is 99.48 MPa. These data are commonly used data in design of parts.

2.2. Drilling Experimentation

For drilling experimentation, the PEEK rod of 20 mm in diameter and 1000 mm in length was prepared and used. Drilling experiments were conducted in a CNC machine using Taguchi L27 design of experiments (DoE). Three levels of control parameters, viz., speed, feed, angle, and hole length were used in DoE. Solid TiN coated carbide drill bit (code DIN6537) of 10 mm diameter was used in all experiments. Surface roughness of the drilled holes was measured with surf tester (Model Mitutoya Surftest SJ-210). The surface roughness values were measured in three different locations and the average of measurements was considered. Figure 2 shows the experimental setup and drilled samples, while Table 1 lists input parameters used in the experiments and the respective surface roughness of the drilled samples. In general, large number of datasets are used in machine learning model development, because higher the datasets, higher the possibility to fit the model. But getting the experimental data is a cost consuming process. Hence, it was planned to use minimal number of data (27 datasets) to fit the ML model.

3. Machine Model Development

The reason for developing machine learning model or prediction model is to avoid wastages in the machining and also to know the response beforehand [18]. Decision tree approach is a simple and easier method of modeling that interprets the features and concludes the response of the subject. This method has been used in statistics, data mining, and machine learning. The accuracy of decision tree is dependent upon the size of datasets, greater the amount of data available, higher the accuracy. Gradient boosting algorithm is a machine learning technique that can be used both in regression and classification problems. The term boosting refers to a family of algorithms that convert weak learners in the datasets into strong learners. It could understand that weak learners are slightly better than a random choice, while strong learners are perfect in performance. This approach can produce an ensemble predictive model from weak predictive models. In gradient boosting algorithm, gradient descent in function space is stage-wise used to construct the ensemble. The final model is a function taking input parameters as a vector of attributes to get . In , is a function that models a single tree, and is the weight associated with tree. The two terms, function and weight , are learned during the training phase. Gradient boosting algorithm is more reliable and easier when compared to other machine learning algorithms. Algorithm such as linear regression has its number of degrees of freedom scaling with the number of features . It means that its ability to learn from the data plateau in the regime , where N is the number of samples and M is number of features. The linear regression algorithm results in low variance, but high bias. In the regime, regularization becomes necessary to learn the relevant features and zero-out the noise. A tree, in its unregularized form, has a low bias which can over fit the data to extreme, with the depth of field scaling as , but it has a high variance (i.e., deep trees do not generalize well). But, because a tree can reduce its complexity as much as needed, it can work in the regime by simply selecting the necessary features. A random forest is a low bias algorithm and the ensemble averages away the variance (but deeper trees call for more trees), and it does not over fit on the number of trees, so it is a lower variance algorithm. The homogenous learning is that the trees tend to be similar and tend to limit its ability to learn more on much data. Flores and Keith [19] used gradient booster algorithm to predict the surface roughness in high-speed milling of a metal.

3.1. Exploratory Data Analysis (EDA)

Data in Table 1 are analyzed to identify general patterns in the data, which perhaps include outliers of the data to be capped. Table 2 shows the descriptive statistics of basic features of the data used. It shows the “center” of distribution values (mean) and dispersion (standard deviation) for each variable. Outlier has greatly exaggerated the range and decile distribution for data with 4 intervals of 25%, 50%, 75%, and 100% (Max).

Understanding the correlation among variables is more important during model development. Correlation analysis is a method for measuring the covariance of two random variables and in a matched dataset. The correlation coefficient is a unitless number that varies from −1 to +1. The magnitude of the correlation coefficient is the standardized degree of association between and . The sign is the direction of the association, which can be positive or negative. Figure 3 is drawn to comprehend data statistically from predictive features and their relationship. It is evident that each parameter involved in drilling is independent, and there is no correlation between them.

3.2. Extreme Gradient Boosting (XGBoost) Regressor

Extreme gradient boosting (XGBoost) is one of the tree algorithms to mathematically formalize regularization in a tree. It is adapted to large data scales, as it has a low bias and high variance (due to the boosting mechanism). It is a parallelized and carefully optimized version of the gradient boosting algorithm. It has improved the training time by parallelizing the whole boosting process.

Pseudocode for XGBoost algorithm is as follows:(1)Input: training set , a differentiate loss function , and number of iterations (2)Initiate model with a constant value: F_0 = γ (i.e., fits to actual value)(3)Compute so-called pseudoresiduals and fit a base learner (e.g., tree) to pseudoresiduals. That is, training it using the train set (4)Compute multiplier by solving the one-dimensional problem and followed by that update model(5)Output of model

As explained above, gradient boosting takes the training set and a loss function as inputs, and the final trained model is gotten at the end of the algorithm by output . The workflow of the model development is shown in Figure 4.

The split of training and testing data sets for machine model development was done using the k-fold cross validation approach. It was used to split the experimental data into k subsets, since it is small size data. Below steps were followed in 5-fold cross validation:(1)Training model using k-1 folds(2)Validating model on the remaining fold that is not used for training(3)Repeating above two steps for each combination of folds and averaging the model results, while the optimal results (5-fold) are not obtained

Figure 5 depicts the model parameterisation applied in the current problem. In the process of cross validation, the dataset was split into 80% for training and 20% for testing. Hyperparameterization was processed using training data set of 80% in order to comprehend imbalanced accuracy in training. The optimal hyperparameters and their accuracy are shown in Table 3. It is evident that accuracy of grid search is increased when a number of trees are increased. Among three different cases considered, M1 set (200 trees) has resulted in the accuracy of 98% and 93% for training and testing, respectively. When number of trees is increased from 200 to 300, the accuracy of training data is of course increased to 99%, but accuracy of test data is decreased to 88%. This is because of high variance resulted by higher number of trees.

After cross validation, the XGBoost prediction model was tuned iteratively to achieve the maximum performance. In order to determine the fitness of the XGBoost regression model, a statistical validation technique of residual sum of square (RSS, also known as the sum of squared errors of prediction) was used to identify the dispersion of data as well as how well the data fit the trained ML model. Generally, a lower residual sum of squares indicates that the regression model can better explain the data, while a higher residual sum of squares indicates that the model poorly explains the data.whereis the ith value of the variable to be predicted, is the predicted value of , and n is the upper limit of summation.

4. Results and Validation of XGBoost Regressor Model

The prediction model was evaluated by computing value in each iteration. The coefficient of determination () is a measure of performance of the model. Tuning of model by changing its hyper-parameters has improved value of the model. After several attempts of tuning of the model, it is found that  = 0.89 for training data and  = 0.94 for testing datasets.

To compare the XGBoost model with other regressors that have been used in the literature, polynomial model and SVR model were developed with the same data. In order to have a fair comparison, the optimal parametrization for each one of the compared algorithms was carried out using the same strategy as before. Overall performance of XGBoost model and other selected models (polynomial and SVR) is shown in Table 4.

Evaluating these three regressor models, XGBoost stands highest in the performance, followed by the SVR model. It is found higher than other two models in any iteration before tuning and after tuning as well.

4.1. Experimental Validation of the XGBoost Model

It is obvious that ML model is performed to predict the responses for unseen independent features. For example, the current XGBoost model can predict the surface roughness of PEEK material, if anonymously selected speed, feed, and angle values are given. To validate the developed model, twenty random input values were given into the model, and the respective predicted parameters were observed. Later, the validation experiments were conducted for the same input values, and actual results and predicted results are compared as tabulated in Table 5.

5. Conclusion

This research was focused on to develop a machine learning model for drilling of PEEK material. Development of the model was started with collecting the experimental data from L27 DoE. Considering speed, feed, and drill tool angle, surface roughness of the drilled PEEK rod was measured. Collected experimental data were then applied to XGBoost regressor to develop a machine learning model. Exploratory data analysis revealed that these three independent variables have significance in the surface roughness of the drilled material, particularly combination of speed and feed has more influence in the output. The XGBoost regression model is better than polynomial and SVR regression models. Besides, after testing the ML model, twenty unique data were predicted and further validated. The validation results confirm the effectiveness of the predicted model.

Data Availability

Data associated with this research are available on request from the corresponding author.

Conflicts of Interest

The authors declare that there are no conflicts of interest.