Abstract
One of the important features of routing protocol for low-power and lossy networks (RPLs) is objective function (OF). OF influences an IoT network in terms of routing strategies and network topology. On the contrary, detecting a combination of attacks against OFs is a cutting-edge technology that will become a necessity as next generation low-power wireless networks continue to be exploited as they grow rapidly. However, current literature lacks study on vulnerability analysis of OFs particularly in terms of combined attacks. Furthermore, machine learning is a promising solution for the global networks of IoT devices in terms of analysing their ever-growing generated data and predicting cyberattacks against such devices. Therefore, in this paper, we study the vulnerability analysis of two popular OFs of RPL to detect combined attacks against them using machine learning algorithms through different simulated scenarios. For this, we created a novel IoT dataset based on power and network metrics, which is deployed as part of an RPL IDS/IPS solution to enhance information security. Addressing the captured results, our machine learning approach is successful in detecting combined attacks against two popular OFs of RPL based on the power and network metrics in which MLP and RF algorithms are the most successful classifier deployment for single and ensemble models.
1. Introduction
The Internet of Things (IoT) can be described as the ever-growing global network of smart devices with built-in sensing features and communication interfaces such as local area network (LAN) interfaces, sensors, and global positioning devices (GPS). It is expected that, by 2022, we will have around 50 billion IoT devices scattered across the globe, a 140 percent increase compared to 2018. Since 1999, when the IoT was conceived, the concept of these smart devices has evolved into a conceptual framework including augmented physical objects, heterogeneous devices, and interconnection solutions to share information at scale, across the world [1]. Routing protocol for low-power and lossy networks (RPLs) is used for IPv6 over low-power wireless personal area networks (6LoWPAN) and IoT networks. RPL link layers operate efficiently using nodes that connect through multihop paths to root devices; these devices are responsible for collating and distributing data. A Destination Oriented Directed Acyclic Graph (DODAG) is produced for each root device that accounts for node attributes, link cost, and objective function (OF). However, from a security point of view, RPL is a vulnerable protocol given that it does not integrate the security mechanisms needed to avoid intruders from unauthorized access to the data traveling across an IoT network. Due to this fact, RPL is exposed to several types of attack [2], provided a concise table of RPL attacks for consideration. RPL nodes utilise OF to identify node of next hop based on power consumption and network metrics [3]. Minimum rank with hysteresis objective function (MRHOF) and objective function zero (OF0) have been defined as two main OFs of IoT devices and RPL protocol by the Internet Engineering Task Force (IETF) work group. Detection and quick response to attacks against MRHOF and OF0 is difficult, and the current research lacks study on vulnerability analysis of OFs. Additionally, little investigation has been done on automating the detection and response process particularly for the combined attacks against OFs on IoT networks. However, it is possible that machine learning (ML) and data mining can be used for anomaly-based intrusion detection with a focus on identifying attacks based on power consumption and network metrics [4].
There are four research questions (RQs) that will be addressed throughout this paper:(i)RQ1: is there an available IoT dataset that is suitable to meet the research scope in this paper, or is the development of a novel dataset required?(ii)RQ2: what is the impact of preprocessing (for example, normalisation, feature selection, and sampling) on classifier performance to detect combined attacks against MRHOF and OF0?(iii)RQ3: what is the most successful deployment of ML algorithms to detect a combination of attacks against MRHOF and OF0?(iv)RQ4: are ML algorithms more successful in detecting combined attacks against MRHOF or OF0?
In this paper, we use ML to detect a combination of attacks against MRHOF and OF0 based on power consumption and network metric features. Additionally, to conduct our experiments and due to the lack of suitable IoT datasets, we developed a novel dataset which is focused on IoT features and attack parameters including packet delivery ratio and power consumption of nodes in various combined attack scenarios. Detecting a combination of attacks against OFs is a leading-edge technology that will become a necessity as next generation low-power IoT networks continue to be exploited everyday as they grow quickly. For this, we considered combined attacks such as Rank and Version Attack, Rank and Blackhole Attack, Decreased Path Metric Attack, as well as Rank and Sybil Attack.
The remainder of this paper is organised as follows. In Section 2, we review the related work based on our research questions and a critical understanding of available quality sources followed by our research methodology in Section 3. This is trailed by simulated experiments in Section 4, results and analysis in Section 5, discussions in Section 6, conclusion, limitations, and recommendations in Section 7, and acknowledgements as well as references.
2. Literature Review
In this section, relevant academic papers are reviewed and discussed in five groups. IoT methodologies, MRHOF and OF0 attacks, IDS methodologies and feature selection, datasets and ML classifiers, and preprocessor and balancing techniques were identified as the five core topics. Due to page limitation, we have picked two papers from each category to discuss here. However, more related papers from each group, along with gap analysis for our novel approach, can be found in Table 1. Publications, arguments, and literature were selected based on practical and simulation experiments, expert opinion, evaluation, and analysis and contrasting views. Scholarly literature search engines, libraries, and journals were used to identify strengths, weaknesses, and gaps in research.
With regard to IoT methodologies [5], deliver a survey of IoT-IDS and discuss the use of ML, anomaly-based approaches, intrusion detection based on power consumption, and analysis of objective function behaviour which are relevant to our research in this paper. The authors discuss power consumption as a parameter that can be used to analyse normal behaviour profile to detect malicious activity, based on mesh-under and route-over schemes. Each node is required to monitor power consumption at specified sampling rates and report deviations from expected values. Deviations from expected values are deemed malicious activity, and as such the node is removed from the routing table. The paper expands on the concept of node behaviour focused on power consumption and suggests that packet overhead and memory consumption are adequate metrics that can be used for IoT-IDS. Although the paper provided a broad overview on IoT-IDS, technical depth is limited. Additionally, it is difficult to understand a detailed approach for IoT-IDS using ML from the research alone. Moreover, Rehman et al. [3] discuss Rank Attack as an objective function (RAOF) vulnerability aimed at the RPL protocol. In order for RAOF to be successful, an attacking node corrupts routing metrics so that neighbour nodes’ OF favour the attacker as a preferred parent node. The results of their simulation show the impact of the attack when considering a well-positioned attacking node within the RPL network. The paper is interesting since it identified a relationship between OF, power consumption, hop count, and routing metrics when considering RAOF. However, it does not provide counterarguments to their approach. For example, a discussion on the relationship between power consumption and hop counts when considering an attack on routing metrics could have been explained in further detail. The paper provided a detailed analysis for OF vulnerabilities across RPL networks and the introduction of an attack that had not been identified in other research studies.
With regard to MRHOF and OF, Airehrour et al. [6] present a trust aware RPL model to detect selective forwarding and blackhole attacks in IoT networks. The trust aware model is compared to MRHOF and OF0 to understand if their proposed solution is successful. Addressing their results, selective forwarding attacks against the trust aware model were able to be gradually and significantly reduced. This includes the isolation of malicious nodes. However, MRHOF and OF0 were not able to detect or isolate Selective Forwarding attacks. Additionally, while the trust aware protocol was able to detect and isolate blackhole attacks through analysis of sent packet sequence and received sequence ID, MRHOF and OF0 were not able to do so. They also did not review or discuss in detail the techniques used by MRHOF and OF0 to detect and isolate malicious nodes. The lack of research in this area is significant to our work in this paper. Moreover, Airehrour et al. [7] discuss Secure Trust Protocol (SecTrust-RPL) designed to detect and isolate Rank and Sybil Attacks through node trust relationships. Performance of SecTrust-RPL is compared to standard RPL protocol integrating with MRHOF and OF0. MRHOF is identified as a superior RPL protocol over OF0 based on performance metrics when considering network flow and resource. When considering Rank and Sybil Attack alone, MRHOF demonstrates higher vulnerability than SecTrust-RPL. Although SecTrust-RPL has been identified as being a more secure protocol over MRHOF and OF0, there was no discussion around IDS that can be used with OF and RPL protocol. This is relevant to our research scope in this paper, as RPL and OF may not be required to detect and isolate attacks if ML can be used with IDS. Experiments for MRHOF and OF0 utilising a suitable IDS compared to SecTrust-RPL would have provided a fair evaluation when considering IoT security.
With regard to IDS methodologies and feature selection, Sheikhan and Bostani [9] discuss a security mechanism to detect attacks against IoT networks based on a distributed architecture. Their proposed method is focused on using ML to identify sinkhole and selective forwarding attacks that deviate from normal and abnormal behaviours. Addressing their results, the anomaly detection successfully identifies 80.95% of sinkhole and selective forwarding attacks with a false alarm rate of 5.92%. The misuse-based detection was able to identify 97.88% of sinkhole and selective forwarding attacks with a false alarm rate of 1.96%. Although misuse-based detection was able to identify a higher rate of sinkhole and selective forwarding attacks with a low false alarm rate, this method is only able to detect known attacks. Although the article highlights the significance of selecting noteworthy behaviour features including packet drop rate, packet receive rate, maximum hop count, and average latency, further analysis will be required to select features relevant to OF attacks. Napiah et al. [14] discuss compression header analysis intrusion detection system (CHA-IDS) coalesced with ML to detect 6LoWPAN and RPL combination attacks: Hello Flood, Wormhole, and Sinkhole. CHA-IDS is focused on identifying anomaly- and signature-based features for 6LoWPAN intrusion detection through raw data collection and analysis by means of six ML algorithms (MLP, SVM, J48, Naïve Bayes, Logistic, and Random Forest). They provide quantitative data suggesting CHA-IDS performs better than other 6LoWPAN IDS models for combined attack detection. CHA-IDS applies compression header data for 6LoWPAN as a detection feature in contrast to SVELTE and PONGLE that utilise rank and received signal strength indicators. Destination port, context identifier, destination context identifier, next header, and pattern identified abnormal routing activities were used with ML algorithms to successfully detect attacks. Addressing their results, J48 was the most successful ML algorithm across a combined dataset, while Random Forest ranked second. The strengths of the paper include research on 6LoWPAN and RPL vulnerabilities and flaws in current IDS methods.
With regard to datasets and ML classifiers, Buczak and Guven [17] discuss a range of ML and data mining algorithms and classification techniques based on public datasets for intrusion detection. An overview of publicly available datasets is provided comprising: DARPA 1998/1999/2000, KDD 1999, NetFlow, tcpdump, and DNS and SSH datasets. It is highlighted that, during a research phase, it will be important to ensure ML methods are trained using the same dataset to ensure comparison with other research studies is reliable. However, despite KDD 1999 being the best available labelled dataset, it is limited by attacks that have occurred since the dataset was produced. This may be an issue when considering a reference dataset for use with MRHOF, OF, RPL, and IoT. Moreover, Alam et al. [18] present a paper on eight data mining algorithms: ANNs, deep learning ANNs (DLANNs), C4.5, C5.0, SVM, naïve Bayes (NB), K-nearest neighbours, and linear discriminant analysis (LDA) for use with IoT. Their research aim is to understand if conventional data mining algorithms work for IoT datasets and if not, new algorithms are required. For their research, three sensor datasets from University of California Irvine (UCI) data repository were provided. The results conclude DLANN, ANN, C4.5, and C5.0 performed better than LDA, NB, NN, and SVM when considering accuracy and elapsed time for IoT datasets. C4.5 and C5.0 were identified to provide high accuracy and processing speed whilst remaining memory efficient. DLANNs and ANNs memory efficiency were poor and computationally expensive although identified as having the highest rates of accuracy. The paper discusses an area of research that was difficult to identify during literature review given that most research into ML and IoT utilise DARPA or KDD datasets. Although the paper includes novel research into an area not commonly explored, the paper would have benefited from a detailed discussion around the three datasets provided by UCI.
With regard to preprocessor and balancing techniques, Yin and Gai [19] discuss the challenges of data mining and ML relating to new and enormous data types introduced to solve complex problems. The paper discusses classification methods, preprocessing, feature selection, and data sampling. The publication explains that there are many classification algorithms available that are mostly based on balanced high-quality datasets. Preprocessing is explained as a common method used to improve the accuracy of a dataset by reducing the number of features selected and by sampling well. Their experimental activity is designed to understand how to achieve and improve preprocessing techniques to deliver high-quality datasets. C4.5 classifier was the only algorithm used during experimentation to eliminate conflicting results across a range of 12 datasets. The results conclude that the accuracy of a classifier is more reliable when feature selection is conducted prior to sampling data. In the event that data are largely imbalanced, experimental results conclude that it is better to undersample data rather than oversample when considering minority class. The paper could have benefited with the inclusion of other preprocessor stages into experimental activity to improve the accuracy of a dataset further.
Critical analysis of the current literature identified the following key areas of interest to address in this paper: power consumption and network-related metrics, combination of IoT attacks, MRHOF and OF0 vulnerability analysis, feature selection, and the development of a novel dataset based on IoT attacks. For instance, unlike [5, 14], our work in this paper includes the combination of MRHOF and OF0 attacks considering power consumption and network-related metrics as part of a ML-IDS. Furthermore, unlike [17, 19], a novel IoT dataset has been developed focusing on MRHOF and OF0 attacks including preprocessing techniques, feature reduction, sampling, and normalisation. Additionally, as far as we are aware, no one else has successfully employed time series ML classifiers alongside a novel IoT dataset, whilst detecting a combination of attacks against multiple objective function (e.g., OF0 and MRHOF) based on network and power consumption metrics. Table 1 provides a summary of our gap analysis based on the most relevant reviewed quality papers.
3. Research Methodology
The aim of this paper is to use ML to detect a combination of attacks against OF0 and MRHOF, as two popular OFs for the RPL protocol, based on power consumption and network metrics using a novel dataset. These findings were based on previous research in the field to produce a novel research. Therefore, the considerations include the following:(i)Identification and development of a novel dataset focused on IoT features and attacks(ii)Identifying the most successful deployment of ML algorithms and classifiers(iii)The impact of preprocessing, normalisation, feature selection, and sampling on performance of ML algorithms and classifiers(iv)Employing and assessing the success rate of detecting a combination of attacks against OF0 and MRHOF
In this paper, eight experiments have been developed and presented based on the remarks identified during gap analysis presented in Table 1. Further detail will be provided during the simulated experiments. Our research methodology follows CRISP-DM [20] that provides a structured approach for ML-based projects. We employed the six phases of CRISP-DM as follows:(i)Business Understanding (literature review, development of research questions, project methodology, gap analysis, and research aims)(ii)Data Understanding (exploring datasets)(iii)Data Preparation (data preprocessing)(iv)Modelling (ML algorithms and classifiers)(v)Evaluation (performance evaluation)(vi)Deployment (this phase is out of the scope of this paper and will be discussed during future recommendations)
4. Simulated Experiments
This section is designed to outline simulated experiments conducted following the CRISP-DM process based on our research methodology. It includes data exploration, preprocessing, ML classifiers, classifier ensemble, and feature selection. The experiments will be run a number of times in a consistent manner to ensure the integrity of results. They are designed as follows:(i)Experiment 1—preprocessed dataset: considering all attributes and metrics(ii)Experiment 2—normalisation: considering all attributes and metrics(iii)Experiment 3—normalisation: considering network attributes and metrics(iv)Experiment 4—normalisation: considering power attributes and metrics(v)Experiment 5—considering feature selection(vi)Experiment 6—considering classifier ensemble(vii)Experiment 7—considering detecting attacks against MRHOF and OF0(viii)Experiment 8—considering detecting attacks against MRHOF and OF0 with balanced class
4.1. Exploring Datasets
A range of approaches were considered during dataset exploration including publicly available datasets, privately owned datasets, and implementation of an IoT lab to capture relevant data and IoT simulation.
Publicly available datasets including DARPA 1998/1999/2000 and KDD 1999 were considered since they are used in 71% of ML research experiments [17]. Despite these datasets being available and well labelled, they do not have examples of attacks that have occurred since the datasets were produced. This is presenting an issue when considering attacks against OF0, MRHOF, RPL, and IoT. Leading research professionals, universities and corporate organisations were approached and asked to provide raw datasets for research purposes. For security, privacy, and the protection of intellectual property, each of these organisations were unwilling to share their privately owned datasets. The implementation of an IoT sensor lab was considered but was unfeasible due to the limited budget and geographic location of available resource.
The identification and development of a novel dataset focused on IoT features and attacks was therefore conducted using simulation. Alam et al. [18] acknowledge this approach as difficult and time-consuming requiring significant work to collect, label, and preprocess an IoT dataset to ensure accuracy of results. A simulation dataset was produced to meet project scope using Contiki and Cooja. The raw dataset was provided as a project resource to understand, analyse, and evaluate prior to data preprocessing and modelling ML algorithms to detect attacks against OF0 and MRHOF.
The dataset contained 24 attributes based on network and power metrics. They are all presented and detailed in Table 2. The dataset also includes 418 instances. Attribute and incident metrics were captured in a Contiki and Cooja simulation environment from various sensors during normal activity and in every 30 seconds whilst under attack. The simulation was configured with eleven network nodes and one malicious node. The malicious node can be seen in Figure 1 labelled as number twelve.

Benign activity and four combined malicious attacks were monitored during the simulation scenarios. The malicious attacks include Rank and Version, Rank and Blackhole, Rank and Sybil, and Decreased Path Metric against OF0 and MRHOF. Malicious and benign activities can be identified as attribute 23 in our dataset and is the selected class for our experiments. The class is imbalanced. This will be rectified during the data preprocessing phase.
4.2. Data Preprocessing
Data preprocessing phase is designed to prepare the raw dataset for our eight experiments. Preprocessing and data reduction were essential phases of the project. Therefore, sufficient time was spent ensuring a suitably labelled dataset was created to provide high-quality results for analysis. Data preprocessing is aimed at reducing the complexity of a dataset, so ML models can process data more accurately and faster than a raw dataset. When implementing a data mining process, which is CRISP-DM in this paper, preprocessing often requires more effort and time than the entire data analysis process in excess of 50% total effort [21]. The dataset in Table 3 shows a representative example of some of the complex attributes and instances that we captured over the simulation scenarios in this paper. For the data preprocessing phase of this research, we have picked data cleansing, transformation and feature reduction, normalisation and data analysis, sampling, as well as training, testing, and cross-validation stages. They will be applied on our raw dataset as follows.
4.2.1. Data Cleaning, Transformation, and Feature Reduction
The raw dataset seen in Table 3 was reviewed and issues were identified such as missing, incomplete, and inconsistent values. Additionally, the irrelevant data and errors were identified. To reduce the complexity of the 10,032 entries within the raw dataset, data cleaning and transformation was conducted to represent all data in a standard numeric form. Table 4 describes the steps taken for each feature.
Features that were of no benefit to the ML model, nor did they contain relevant data, were removed from the dataset. Similarly, instances were reviewed and the entries containing no predictive power were removed reducing total instances from 418 to 338. Entries that contained null values or errors were replaced with the mean value for that entry. At this stage, the initial preprocessed dataset was completed and was used later in the project postnormalisation and sampling. It was important to use this dataset for initial assessment and comparison against the final preprocessed dataset to understand the effect that normalisation has on overall performance.
4.2.2. Normalisation and Data Analysis
Normalisation is a scaling technique that is used to provide a new range of data from an existing range. Min-Max normalisation can be used to fit data from one range into a predefined boundary in another. Due to the dataset containing complex numbers, statistical analysis was conducting. Additionally, standard deviation was used to categorise data in a nominal format to achieve the aim of predefined boundaries [22]. The average value of each attribute was taken, and the standard deviation was calculated. A boundary range between 1 and 14 was determined based on the diagram presented in Figure 2. For example, numbers 7 and 8 represent the most normal behaviour, or behaviour closest to mean, and numbers 1 and 14 represent the most abnormal behaviour or behaviour furthest from mean (Figure 2). Despite feature reduction condensing entries to 7,098, an automated system was developed to produce a new range of data from the existing range based on standard deviation and boundary selection to ensure data normalisation was accurate and timely when dealing with large datasets.

The dataset in Table 5 shows a representative example of the attributes and instances that have undergone data cleansing, transformation, feature reduction, and normalisation phases of data preprocessing stage. As can be seen, all of the values are numeric and fall between the boundary range of 1–14. The dataset is now in a format that is suitable to be loaded into WEKA [23] and converted to .arrf file prior to sampling and creating training, testing, and cross-validation datasets. WEKA is a popular and powerful tool for data mining and machine learning.
4.2.3. Sampling
Prior to conducting sampling for the final preprocessed dataset, some housekeeping was performed within WEKA. Package Manager was used to load numeric to nominal format, randomization, and SMOTE filters. The .csv dataset was loaded into WEKA and converted to .arrf file and the numeric to nominal filter applied to ensure all entries were stored in nominal format. A sampling strategy was considered. It was identified that there was a class imbalance for malicious and benign class (64, 140, 65, 33, and 36 malicious/benign events distributed across the dataset) (Figure 3). Guo et al. [24] acknowledge that ML algorithms are typically sensitive to detecting majority class and not minority. Therefore, a balanced dataset was used. Class imbalance can be rectified by oversampling the minority class or by undersampling the majority class depending on which event is to be identified. Oversampling can reduce performance for large datasets and introduces the possibility of overfitting if data are not randomised. Undersampling introduces the possibility of removing important data when a minority class is particularly low, reducing the amount of data available for training, testing, and cross-validation. In this paper, both over- and undersampling techniques will be evaluated to ensure benign activity, and IoT attacks will have the best opportunity of being detected. We used SMOTE filter to oversample events 1, 3, 4, and 5 (Rank and Version, Rank and Blackhole, Decreased Path Metric, and Rank and Sybil Attacks), so each event had approximately 135 instances (Figure 4). Each time SMOTE filter sampled, it placed new instances at the bottom of the dataset introducing a possible overfitting. Therefore, at the end of the filtering process, data were randomised with a separate filter. Oversampling with SMOTE increased total instances from 338 to 674. We also used spread subsample filter to undersample events 1, 2, 3, and 5 (Rank and Version, benign Events, Rank and Blackhole, and Rank and Sybil Attacks) so that each event had 33 instances (Figure 5). To adjust filter settings, random seed was set to 1 to ensure that each sample was randomised. Undersampling decreased total instances from 338 to 165.



4.2.4. Training, Testing, and Cross-Validation
To produce a classifier, data are processed through a ML algorithm. Once a classifier is produced, data are processed through the classifier generating results for evaluation. It is important that data processed through the ML algorithm and classifier are not from the same dataset. This project is resourced with one simulation dataset; therefore, the dataset must be split into training, testing, and cross-validation sets. Training data contain 70% of total instances with the remaining 30% split equally between testing and cross-validation sets. WEKA does not have direct functionality to do this, so the resample filter was exploited to achieve the aim for both SMOTE and spread subsample datasets.
4.3. Machine Learning Algorithms and Classifiers
WEKA was used to train, test, and cross-validate our selected five classifiers: Naïve Bayes (NB), support vector machines (SVMs), multilayer perceptron (MLP), Random Forest (RF), and ZeroR classifiers. ZeroR was used to determine a performance baseline.
NB was the first model to be built in WEKA and was completed using default settings to take the product of probabilities providing a forecast ratio to identify likely outcomes. NB’s default options were not altered since the dataset was in nominal format. Therefore, “useKernelEstimator” and “useSupervisedDiscretization” were not required to be changed.
When developing SVM parameters, it is important to understand that the model is designed to separate classifiers using a boundary. When using WEKA’s LibSVM classifier, setting a suitable boundary allowed the generalisation of a training dataset to be more accurate. This was achieved by optimising parameters setting cost to “C” and kernel type to “gamma,” relating to X and Y axis, respectively.
Default MLP settings were applied during the initial classifier model and subsequently tuned to enhance results. A critical parameter that was evaluated was hidden layers. Hidden layer parameter within WEKA can be used to train data on attributes, classes, or combinations of attributes and classes. This parameter can also be adjusted to determine the number of layers within the MLP model. The MLP model used for experimentation consisted of three layers trained on both attributes and classifiers. Increasing training time from 500 to 2000 epochs improved results allowing backpropagation and a multilayer approach more time to train each MLP layer.
When developing the RF algorithm, consideration was made to the depth of the tree and number of features to be randomly selected. The default setting was applied, and the depth of tree set and number of features were set to 0 (unlimited depth). When processing large datasets, these values can be set to reduce the depth of the tree and number of features to enhance the performance of the classifier. Since the dataset is relatively small, the RF classifier was able to process data with unlimited depth and features providing best results overall.
4.4. Classifier Ensemble
Classifier ensemble techniques were used to gain highly accurate ML classifications through the combination of multiple ML algorithms. The parameters of AdaBoost, Bagging, and Stacking were researched, and experiments were conducted with the ML algorithms but were unable to improve results beyond single classifiers. Voting was used successfully. Additionally, classifier and combination rule parameters were amended. However, optimum results were obtained setting classifier to “MPL and RF” with combination rule set to “Average of Probabilities.” For all classifier models, excluding NB since the option is not available, seed was set to zero ensuring random number generation was not conducted, so comparison of results would be consistent.
4.5. Feature Selection
For the purpose of practical investigation, feature selection was split into the following areas: all dataset metrics, network metrics, and power metrics. Each of these areas were evaluated during Experiments 2–4 to understand which features provided greatest detection. Feature selection was developed further during Experiment 5 removing attacks that were not detectable based on power consumption metrics.
This section outlined the implementation of the CRISP-DM process based on the steps described during research methodology. The application of data exploration, preprocessing, ML classifiers, classifier ensemble, and feature selection has been described in detail, and the results will be presented in the next section.
5. Results and Analysis
There were eight simulated experiments conducted to capture results for discussion. The experiments were designed to understand how ML can be used to detect a combination of attacks against OF0 and MRHOF based on power consumption and network metrics. The experiments were also designed to understand the impact normalisation, sampling, feature selection, and classifier ensemble techniques have on results. The classifier ZeroR was used to determine a performance baseline as a reference to consider when comparing NB, SVM, MLP, and RF algorithms. The baseline confirmed malicious and benign classifier prediction at 20.59% which is reasonable since there is a single benign behaviour and four attacks. Each experiment becomes more complex during investigation to deliver results and understand whether the aims of this study have been achieved.
5.1. Experiment 1: Preprocessed Dataset—All Attributes and Metrics
The aim of Experiment 1 is to create classifiers for each ML algorithm based on a preprocessed dataset considering all attributes and metrics. This means considering both power and network parameters from our novel dataset. The results are captured in Table 6 using 10-fold cross-validation techniques for both SMOTE and spread subsample balancing methods.
The overall aim of Experiment 1 is as follows:(i)Compare each classifier against ZeroR(ii)Compare results from balancing techniques(iii)Carry results forward to Experiment 2 for comparison against a normalised dataset
Correctly classified instances alone can lead to inaccurate results when evaluating ML algorithms and classifiers. However, for Experiment 1, this was considered a suitable evaluation tool for comparison against ZeroR classifier at a measurement of 20.59%.
As can be seen in Table 6, for SMOTE and spread subsample balancing techniques, each classifier improved performance on ZeroR with the exception of subsample—SVM. Additionally, SMOTE outperformed subsampling for this experiment. These results will be considered against normalised preprocessed datasets for all attributes and metrics in Experiment 2.
5.2. Experiment 2: Normalisation—All Attributes and Metrics
The aim of Experiment 2 is to create classifiers for each ML algorithm based on a normalised preprocessed dataset for all attributes and metrics. This means considering both power and network parameters from our novel dataset. The results are captured in Table 7 using 10-fold cross-validation techniques for both SMOTE and spread subsample balancing techniques.
The overall aim of Experiment 2 is as follows:(i)Compare each classifier against Experiment 1 results to understand the impact of normalisation(ii)Compare balancing techniques on the overall performance of classifiers(iii)Identify highly efficient algorithms for detecting specific attacks(iv)Carry results forward to Experiments 3 and 4 to compare against network and power metrics
Correctly classified instances will be used initially to compare the value normalisation plays within ML. We considered root mean square error (RSME), mean absolute percentage error (MAPE), receiver operating characteristics (ROC), correctly classified instances, and confusion matrix as evaluation metrics for Experiments 2–6.
As can be seen in Table 7, for SMOTE and spread subsample, balancing each classifier post-preprocessing and normalisation improved performance on Experiment 1 with the exception of SMOTE–NB. Additionally, in Experiment 2, SMOTE outperformed subsampling and SMOTE–MLP outperforms all other classifiers.
We have also observed the confusion matrix and accuracy by class which resulted in the recognition of highly efficient algorithms for detecting specific attacks for each classifier.
The following highly efficient algorithms were identified: SMOTE–NB for Decreased Path Metric attacks with a ROC of 0.99, SMOTE–MLP for Rank and Blackhole Attacks with a ROC of 1.00 (Rank and Version and Decreased Path Metric attacks scored ROC 0.99), and SMOTE–RF for detecting all attacks with a ROC in excess of 0.99 for each.
These results will be considered against network and power metrics in Experiments 3 and 4 with a focus on ROC average, to determine performance of a classifier. Overall, in Experiment 2, MLP and RF algorithms performed best when balancing and normalisation had been oversampled (Table 7).
5.3. Experiment 3: Normalisation—Network Attributes and Metrics
The aim of Experiment 3 is to create classifiers for each ML algorithm based on a normalised preprocessed dataset for network attributes and metrics. The results are captured in Table 8 using 10-fold cross-validation techniques for both SMOTE and spread subsample balancing techniques.
The overall aim of Experiment 3 is as follows:(i)Compare each classifier against Experiment 2 results to understand the significance of network attributes and metrics on a classifier performance(ii)Compare balancing techniques(iii)Identify highly efficient algorithms for detecting specific attacks(iv)Carry results forward to Experiment 4 to compare against power metrics
As seen in Table 8, reducing the dataset to only include network metrics improved performance for all classifiers during Experiment 3 with the exception of SMOTE–RF. Additionally, SMOTE outperformed the spread subsampling technique. Moreover, highly efficient algorithms for detecting specific attacks using the confusion matrix were SMOTE–NB, SMOTE–MLP, and SMOTE-RF which detected each attack with a ROC in excess of 0.99. SMOTE–MLP successfully identified Rank and Blackhole Attacks 100% of the time with no errors. These results will be considered against power metrics in Experiment 4.
5.4. Experiment 4: Normalisation—Power Attributes and Metrics
The aim of Experiment 4 is to create classifiers for each ML algorithm based on a normalised, preprocessed dataset for power attributes and metrics. The results are captured using 10-fold cross-validation techniques for both SMOTE and spread subsample balancing techniques.
The overall aim of Experiment 4 is as follows:(i)Compare each classifier against Experiments 2 and 3 results to understand the significance of power attributes and metrics on the classifier performance(ii)Compare balancing techniques(iii)Identify highly efficient algorithms for detecting specific attacks for classifier ensemble(iv)Carry results forward to Experiment 5 to compare against classifier ensemble techniques to improve performance for power metrics
As can be seen in Table 9, reducing the dataset to only include power metrics significantly decreased performance for all classifiers during Experiment 4. It is worth noting that despite reducing the performance significantly each classifier performed better than the ZeroR baseline of 20.59%, demonstrating that power metrics obtain predictive power. Additionally, SMOTE outperformed subsampling techniques.
Given the confusion matrix, there were no highly efficient classifiers for detecting specific attacks. However, there were some moderately efficient classifiers that should be considered for evaluation during Experiments 5 and 6.
Decreased Path Metric and Rank and Version Attacks were detected based on power consumption metrics during Experiment 4. SMOTE–NB detected Decreased Path Metric attack with a ROC of 0.90. SMOTE–MLP detected Decreased Path Metric and Rank and Version Attacks with a ROC of 0.91 and 0.94, respectively. SMOTE–RF detected Decreased Path Metric and Rank and Version Attacks with a ROC of 0.93 and 0.96, respectively.
In conclusion, Decreased Path Metric and Rank and Version Attacks were detected based on power consumption metrics during Experiment 4.
5.5. Experiment 5: Feature Selection
The aim of Experiment 5 is to use feature selection based on the results of Experiment 4. NB, MLP, and RF algorithms will be considered with SMOTE sampling. It was clear that Decreased Path Metric and Rank and Version Attacks were detectable based on power consumption metrics. Investigating results of Rank and Blackhole and Rank and Sybil Attacks, it is likely that using ML to detect specific attacks against OF0 and MRHOF based on power consumption may not be possible.
The overall aim of Experiment 5 is as follows:(i)Remove Rank and Blackhole and Rank and Sybil features from the dataset(ii)Identify highly efficient algorithms for detecting Decreased Path Metric and Rank and Version Attacks based on power consumption for use in Experiment 6
As can be seen in Table 10 and after considering evaluation metrics, MLP performed best based on power consumption metrics postfeature selection. It is noted that RF performed similarly to MLP with a larger RSME margin.
Given the confusion matrix, the following highly efficient algorithms for detecting Decreased Path Metric and Rank and Version Attacks were identified: SMOTE–NB detected Decreased Path Metric attack with a ROC of 0.93. SMOTE–MLP detected Decreased Path Metric and Rank and Version Attacks with a ROC of 0.89 and 0.90, respectively. SMOTE–RF detected Decreased Path Metric and Rank and Version Attacks with a ROC of 0.89 and 0.91, respectively.
Overall, Decreased Path Metric and Rank and Version detection improved significantly based on power consumption metrics postfeature selection and MLP performed best in total.
5.6. Experiment 6: Classifier Ensemble
The aim of Experiment 6 is to exploit AdaBoost, Bagging, Stacking, and Voting classifier ensemble methods to increase the likelihood of detecting Decreased Path Metric and Rank and Version Attacks based on power consumption metrics.
The overall aim of this experiment is as follows:(i)Use AdaBoost and Bagging classifier techniques to increase the likelihood of detection(ii)Use Stacking and Voting classifiers for classifier ensemble
As can be seen in Table 11, AdaBoost, Bagging, and Stacking while utilising NB, MLP, and RF classifiers were unable to increase performance beyond what had already been captured during Experiment 5. Voting was established using combinations of NB, MLP, RF, and SVM.
A combination of MLP and RF with minimum probability selected as the combination rule provided the best results. Addressing the confusion matrix, voting with MLP and RF classifiers detected Decreased Path Metric and Rank and Version Attacks with a ROC of 0.86 and 0.96, respectively. Voting with SVM and NB classifiers detected Decreased Path Metric and Rank and Version Attacks with a ROC of 0.81 and 0.91, respectively. Overall, Decreased Path Metric and Rank and Version detection improved significantly based on power consumption metrics postclassifier ensemble utilising Voting with MLP and RF.
5.7. Experiment 7: Detecting Attacks against MRHOF and OF0
The aim of Experiment 7 is to understand the success rates of detecting attacks against MRHOF and OF0. As the most successful classifier identified during experimentation, MLP will be used to assess if there is any difference in detecting attacks against the two objective functions.
The overall aim of this experiment is as follows:(i)Assess success rate of detection for MRHOF and OF0 for network and power metrics(ii)Assess success rate of detection for MRHOF and OF0 for network metrics(iii)Assess success rate of detection for MRHOF and OF0 for power metrics postfeature selection
Experiment 7 was designed to understand the success rate of detecting attacks against MRHOF and OF0. For each experiment, MRHOF and OF0 instances were removed independently of one another allowing the MLP classifier to train, test, and cross-validate results. MLP was used to assess if there was any variance in detecting attacks between the two objective functions. All network and power metrics were considered initially, removing MRHOF and OF0 instances independently of one another, with malicious and benign activity remaining as the selected class.
Removing MRHOF reduced instances to 178. Removing OF0 reduced instances to 293. As can be seen in Table 12, the ML model was better at detecting attacks against MRHOF than it was against OF0.
5.8. Experiment 8: Detecting Attacks against MRHOF and OF0 with a Balanced Class
The aim of Experiment 8 is to understand the success rates of detecting attacks against MRHOF and OF0 with a balanced class. As the most successful classifier identified during Experiment 7, MLP using network metrics will be used to assess if there is any difference in detecting attacks against the two objective functions. The overall aim of this experiment is as follows:(i)Assess success rate of detection for MRHOF and OF0 with a balanced class for network metrics
Before each experiment, MRHOF and OF0 were balanced using SMOTE oversampling technique. Instances were then removed independently of one another allowing the MLP classifier to train, test, and cross-validate results.
In general, Experiment 8 was designed to understand the success rate of detecting attacks against balanced MRHOF and balanced OF0 objective functions. This experiment builds on results captured during Experiment 7 to understand if the MLP classifier is better at detecting combined attacks against MRHOF than OF0. As can be seen in Table 13, the ML model was better at detecting attacks against MRHOF than it was against OF0 despite objective function being balanced. This supports findings identified during Experiment 7.
6. Discussions
In this section, the research questions stated at the beginning of this paper are addressed as follows:(i)RQ1. Is there an available IoT dataset that is suitable to meet the research scope, or is the development of a novel dataset required? Buczak and Alam [17, 18] discuss research into ML, IDS, and IoT, identifying that DARPA and KDD datasets are often used since collecting, labelling, and preprocessing IoT data are difficult and time-consuming. A novel approach was taken to identify and develop a dataset focused on IoT features and attacks. The raw dataset that we provided in this paper included IoT features and attacks. Preprocessing, normalization, and sampling of raw data was time-consuming; however, it was worthwhile. Furthermore, the novel dataset can be shared for further research in this field since correctly labelled IoT datasets are a scarce resource within the research community. The dataset contained a number of limitations that could be improved upon in future to enhance performance. Dataset limitations are discussed in the next section. RQ1 answer summary: in this paper, a novel dataset was developed focused on IoT features and attacks.(ii)RQ2. What is the impact of preprocessing, normalisation, feature selection, and sampling on classifier performance? Yin and Gai [19] discuss challenges to be considered when developing an imbalanced dataset with a focus on preprocessing, normalisation, sampling, and feature selection. Experiments 1, 2, and 5 were designed to understand the impact of preprocessing, normalization, and feature selection have on performance. Experiments 1–4 were designed to understand sampling strategy. Experiments 1 and 2 concluded that each classifier post-preprocessing and normalisation improved performance by 19.29% on average based on balancing techniques for each ML algorithm. Experiments 1–4 concluded that SMOTE performed better than spread subsample by 46.32% on average based on balancing techniques. Experiment 5 concluded that feature selection can be used to remove IoT attacks that were not relevant to detection through power consumption metrics. After feature selection, including the removal of Rank and Blackhole and Rank and Sybil Attacks, the attack detection is increased by 29.67% based on correctly classified instances. RQ2 answer summary: preprocessing, normalisation, feature selection, and sampling techniques are critical processes that provide significant impact on overall ML performance.(iii)RQ3. What is the most successful deployment of ML algorithms and classifiers? Haq et al. [15] reviewed 49 related studies and discussed classifier deployments including single and ensemble methodologies. SVM is identified as the most common algorithm for IDS. When considering classifier ensemble techniques, neutral network and fuzzy logic combinations are most common. The results from Experiments 2, 3, and 5 concluded that MLP, followed closely by RF, was the most successful ML model for time series events with SVM performing worst in contrast to [15]. Table 14 presents average results from Experiments 2, 3, and 5 for RSME, MAPE, ROC, and correctly classified instances displaying overall performance. Overall performance of ML algorithms appeared less accurate than [14] at 99.44% and in some instances [16] at 86.78% since power statistics have been included, lowering average results significantly. Including the power metric results was important since they provide an honest evaluation of the project, and the results can be developed upon in future research resolving limitations. RQ3 answer summary: classifier ensemble voting technique, using the top two performing models MLP and RF, was the most successful deployment of ML algorithms and classifiers with a ROC of 0.97.(iv)RQ4. Are ML algorithms more successful in detecting combined attacks against MRHOF or OF0? The authors in [7, 8] identify a gap in research regarding IDS for combination of IoT attacks using ML. Confusion matrices for MLP and RF were reviewed to understand what attacks were successfully detected based on network and power metrics. Reviewing network metrics, it was identified that Rank and Blackhole Attacks were detected 100% of the time with no errors. Other attacks were detected successfully based on network metrics with a ROC score of 0.99 or above. Overall performance was reduced as benign activity was often incorrectly classified as an attack with a ROC score of 0.96 and precision rate of 78.15%. As indicated during the conclusion of Experiment 4, Rank and Blackhole and Rank and Sybil Attacks were not successfully detected based on power metrics with true positive rates of 33.33% and 50.00%, respectively. Reviewing confusion matrices for MLP and RF in Experiment 6, it was clear that the ensemble techniques significantly enhanced performance beyond results captured during Experiment 4 taking power metrics into consideration. Overall performance was improved from 57.84% correctly classified instances with a ROC of 0.86 to 84.21% and 0.93, respectively. Demonstrating power metrics can be used to detect a combination of IoT attacks. Decreased Path Metric and Rank and Version Attacks were detected with true positive rates of 70.00% and 81.31%, respectively. RQ4 answer summary: the ML algorithms were better at detecting attacks against equally balanced MRHOF than OF0.
7. Conclusion, Limitations, and Recommendations
This paper aims to detect IoT combined attacks of Rank and Version, Rank and Blackhole, Decreased Path Metric, as well as Rank and Sybil against two IoT’s popular objective functions of OF0 and MRHOF using machine learning algorithms. This aim was stablished based on a comprehensive gap analysis across high-quality research papers in the field. In order to successfully achieve this aim and due to lack of suitable IoT datasets, a novel dataset was developed focused on IoT’s network and power features as well as IoT combined attacks. Preprocessing, normalisation, feature selection, and sampling were identified as critical processes significantly impacting performance. Voting as a classifier ensemble technique, using top performing models MLP and RF, was identified as the most successful deployment of ML classifiers. Specific attacks were detected successfully based on network and power metrics; benign activity was also detected successfully and could be employed to prevent zero-day IoT attacks. The ML model was better at detecting attacks against equally balanced MRHOF than OF0. Addressing our captured results, our machine learning approach was successful in detecting all combined attacks against OF0 and MRHOF based on the network and power metrics in which MLP and RF algorithms were the most successful classifier deployment for single and ensemble models.
Although our initial aims were achieved, there were limitations in research and simulated experiments that present opportunities for future researchers to consider. Areas of the project that provide opportunities in future include the continued development of an IoT dataset, ML algorithms and classifiers, sampling, feature selection, and novel MRHOF, OF0, and RPL attacks. For instance, our dataset contained four implemented combined attacks that were successfully identified using network metrics. Only two combined attacks were able to be detected using power metrics. It is recommended that further research is conducted to understand attacks that can be identified by power metrics, for instance, distributed denial of service (DDoS) attacks. We recommend the implementation of an IoT sensor lab in order to produce a large IoT dataset based on project limitations. Additionally, the literature review acknowledged a large range of MRHOF and OF0 attacks that could be used to meet project scope. The selected attacks were useful for detecting network metrics but provided limited success based on power metrics. It is recommended that a wider range of MRHOF and OF0 attacks are included in future datasets with a focus on those attacks that impact power metrics.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This research was funded by the School of Computing at Edinburgh Napier University.