Abstract
Machine learning and data analytics are two of the most popular subdisciplines of modern computer science which have a variety of scopes in most of the industries ranging from hospitals to hotels, manufacturing to pharmaceuticals, mining to banking, etc. Additionally, mining and hospitals are two of the most critical industries where applications when deployed security, accuracy, and cost effectiveness are the major concerns, due to the huge involvement of man and machines. In this paper, the problem of finding out the location of man and machines has been focused on in case of an accident during the mining process. The primary scope of the research is to guarantee that the projected position is near to the real place so that the trained model’s performance can be tested. The solution has been implemented by first proposing the MLAELD (Machine Learning Architecture for Excavators’ Location Detection), in which Bluetooth Low Energy (BLE) beacons have been used for tracking the live locations of excavators preceded by collecting the data of the signal strength mapping from multiple beacons at each specific point in a closed area. Second, machine learning techniques are proposed to develop and train multioutput regression models using linear regression, K-nearest neighbor regression, decision tree regression, and random forest regression. These techniques can predict the live locations of the required persons and machines with a high level of precision from the last beacon strengths received.
1. Introduction
The mining industry constantly plays a vital role in the economic growth of a country due to its correlation with energy resources. Therefore, the engrossment of modern technology and its applications in this field have become very high. Machine learning technique has a comprehensive scope in almost all fields. It can be utilized to explore and experience fresh data for prediction. This allows corporations to develop effective business plans based on the forecasts of the ML algorithms. One of the remarkable achievements of this century is the deployment of geolocation services which made it possible to navigate, and track and locate a person or an object. The primary goal of the research is to guarantee that the projected position is near to the real place so that the trained model’s performance can be tested. On the other hand, these services have some limitations too; e.g., GPS has unequivocal limitations such as an error radius of about 10 meters and loss of signal strength at a height or deep down in Earth.
It is even more substantial when considering an indoor area and hence cannot be used to track people or objects in non-GPS visibility areas such as indoor or underground areas. GPS delivers the most accurate surveying and mapping data available. GPS-based data collection is faster than traditional surveying and mapping procedures, requiring less equipment and personnel.
In addition to geolocation services like maps, navigation, and tracking applications, there is also a demand for indoor tracking and navigation systems. This includes navigation inside a large building, detecting the presence and movement of employees inside office spaces, tracking the movement of people inside stores to place various products strategically, and crowd control in hospitals and healthcare institutions. Therefore, there is an urgent requirement for the development of indoor positioning systems which can be very acute in workplaces like mines, where the risks of significant accidents always loom over. These excavation sites contain numerous hazardous zones. Due to the environmental situations, the safety and security of the excavators pose a problem for the staff overseeing the system. Even mild negligence in security and safety can result in worker fatalities and damage to costly equipment.
In this research work, a combination of Bluetooth technology and machine learning has been used for predicting the accurate locations of excavators (man and machines deployed on the mining site). The limitation of GPS has been overcome in underground areas by using Bluetooth Low Energy (BLE) beacons for exchanging the signals at the working site. BLE can work on low-power consumption and is designed to transmit a limited volume of data for the application of positioning systems. It plays an indispensable role in supporting IoT (Internet of Things) applications for wireless communication. One of the major obstacles of Bluetooth Low Energy is that it cannot be utilized for greater data rates such as those provided by wireless and cellular technology. BLE beacons can be installed at the suspected positions with man or machines at work, and the signal strength between the beacons can be measured using RSSI (Received Signal Strength Indicator). RSSI is a measurement of the relative level of power which is acquired by the RF client system which is an access point (AP) or a router. The signal strength is lower if the distance between the AP and the receiver is more; with the increase in the distance, the rates of wireless data transfer get reduced. The RSSI is used to show how much the remote attached client may hear a specific AP.
In this case, the beacons are placed inside a large indoor hall and act as data points. Additionally, the received signal strength is recorded from all these beacons at specific areas inside the room. Consequently, a database called Beacons Database (BDB) is created from these signal records. In this process, the next step is achieved by executing the popular machine learning algorithms including linear regression, K-nearest neighbor regression, decision tree regression, and random forest regression on BDB for predicting the locations of excavators. However, BDB is divided into train and test datasets; the train dataset is used to train the models of machine learning algorithms, and the test dataset is used to check the authenticity of the predicted outcome. The performance of these models is compared based on the Root Mean Square Error (RMSE) and R-Squared (R2) values. Subsequently, the model with the least error rate is chosen as the most suitable model.
2. Related Works
The mineral investigation is a very important assignment for a country and is very fruitful if it produces the expected outcome. However, it is so sensitive that a single point of failure can stop the process temporarily and sometimes destroy the project, which reflects in a huge loss of man and machine assets. Hence, companies try to provide the best resources for ensuring the successful completion of a mining project. The mining process has been improved as the era of technology begins; e.g., the struggle of deploying wireless communication technologies starts from the early 1970s, and the first VHF radio waves [1–5] were deployed; thereafter, UHF, WLAN, and RFID have been used. The applications based on UHF, WLAN, and RFID provide a potential boost to productivity and mining efficiency by providing better automation capabilities of machines, clear communication between deployed labor, and an easily approachable management information system [6, 7]. The increasing demand of the mining industry results in more involvement of costly machines and a huge amount of labor. Therefore, the requirement of reliable and accurate monitoring devices for underground lines [8], overhead, and WAMS [9] has increased tremendously.
Communication in underground mining can be done via three mechanisms, i.e., TTE (through the earth), TTW (through the wire), and TTA (through the air) transmission [10]. Because of the limitations of the first two methods, the third method, i.e., TTA, is the most popular one, in which ultrahigh frequency and super high frequency are used for wireless communication. In the evolution of wireless technologies, one of the most popular technologies, known for low-power consumption, reliability, security, and ease of operation, is ZigBee [11].
A lot of research has been presented for location detection in indoor environments using beacons, ZigBee, and other technologies which are focused on the progression of wireless data transmission in underground mines [12]. ZigBee technology is a wireless technology that was created as an open worldwide standard to meet the special requirements of low-cost, low-power wireless IoT networks. Therefore, the suggestion has been assessed for the reestablishment of applications and technology; modeling of digital, systematic, and metric-dependent propagation strategies; and wireless system designs by considering the immediate physical environment, antenna positioning, and patterns of radiation. Furthermore, a new study has been presented by introducing a magnetic induction based transmission technique [13] to resolve the different issues raised due to the conditions of the soil environment. This study exposed the possibility of MI-WUSNs (Wireless Underground Sensor Networks) and the implementation of wireless communication systems, including voice and data transmission for underground mines [14], and also addressed the development of wired, semi-cellular, and wireless networking services.
Additionally, the digital communication protocols for MI-WUSNs were proposed [15]; the effects of data communication parameters such as symbol rate and modulation schemes have been evaluated for oil reservoirs. Suitable ranges for propagation linking nodes for specific water, crude oil, and soil formulations were explored.
The Wireless Network Sensor System was a suitable optimization technique. Later, a new model of communication channel was introduced [16]; it recognizes the transmission characteristics of EM waves (of terahertz) in the dynamic underground surroundings used in underground mines [17], which are used for the implementation of the systemic function of staff placement strategies in hazardous locations. It examines an economic and continuous monitoring strategy for the safety of excavators, which would help in an efficient and precise positioning of man and machines. In some other models, a mine quantitative approach [18] was used for calculating data and machinery (nodes) availability utilizing a Self-Encryption Program (SES) program that encrypts data until it is submitted to the cloud. As a continuation [19], a smart helmet which is capable of detecting dangerous situations during the mining process was designed. A miner removing the mining helmet was indicated to be in a hazardous situation. Air temperature, heart rate, and level of toxic gases (e.g., carbon monoxide, hydrogen sulfide, and methane) are the factors that are often used to classify the health situation of workers.
In some other cases [20], integrating on-channel signal booster strategies with the “daisy chain” repeater system was developed by utilizing wide-band linear amplifiers and selective filters to broaden the signals transmitted from base stations into subways across the ground. This technique satisfies the criteria for delivering radio communications that are multichannel, not just for subway stations, but also for paramedical, fire, police, and paging services etc., which are done at a much lower cost. In [21], different radio frequency communication strategies are used in underground mines through medium wave frequency (MF), very high frequency (VHF), and ultrahigh frequency (UHF) for electromagnetic transmission. Here, induction methods were also implemented to satisfy various types of mining conditions in both the laboratory and coal mines located underground. Another hybrid multimode model [22] for wireless communication in underground coal mines was proposed and evaluated for important parameters such as the size of the mine tunnel, operating frequency, and position of the transmitter/receiver. In this era of IoT, the development of Smart SAGES by utilizing the potential of IoT technologies was proposed [23]. As a result, a reliable and robust communication system would be set up for SAGES. This system ensures the confidentiality and durability of the SAGES data transmitted to the cloud, and details can be retrieved efficiently using a mobile application.
3. MLAELD (Machine Learning Architecture for Excavators’ Location Detection)
The problem of finding out the location of excavators by analyzing the database of RSSI values received from BLE beacons can be explained with the help of MLAELD as shown in Figure 1.

The MLAELD can be demonstrated by the following key terms.
3.1. Bluetooth Beacons
Bluetooth beacons can be used to determine precise locations of mobile devices using specific applications. A beacon transmits a Bluetooth Low Energy (BLE) signal within a distance of 50 meters (LOS) that can be detected by compatible devices. The signal is brief and does not change significantly; in fact, beacons are often very small and battery-powered. Bluetooth uses radio technology to carry the beacon broadcast which is relatively inexpensive for mass production. More specifically, Bluetooth Low Energy (BLE) beacons are used to work on low power by transmitting a signal that compatible applications can receive and detect. Effectively, it is a one-way broadcast where beacons transmit the signal with applications to receive them. In other words, an application can be used to detect the beacon and use the signal to regulate the location of a mobile device (excavators). Some beacon technologies used in India are given in Figure 2.

3.2. Bluetooth-Based Localization
BLE beacon generators are compact, affordable, battery-operated wireless transmitters, typically referred to as beacons, and possess several protocol modes. BLE transmits its identifier to local electronic devices like smartphones or single-board computers that can detect BLE signals. Consistently, beacons can send data packets to the receiver in a regular interval of 20 milliseconds to 10 seconds. The Bluetooth Special Interest Group (SIG) [25] adopted BLE as a Bluetooth subsystem to maintain device discovery that enables low-power consumption, and it is engineered for applications that do not require large volumes of data to be exchanged. The main difference between Bluetooth Low Energy (BLE) and classic Bluetooth is BLE’s low-power consumption which means that devices can run for years on a small battery. BLE is used in applications where periodic exchange of small amounts of data is needed with a broad connectivity spectrum; e.g., within reach of 60 meters, BLE 4.0 will achieve data transmission rates of 25 Mbps. These beacons are rather prevalent among IoT devices because of their affordability and low-power demands which make them one of the most promising technologies for localized location tracking while eliminating interference with other Wi-Fi devices.
3.3. Triangulation
The geometrical triangulation approach is the most widely employed positioning technique. Unlike the trilateration [26] approach of calculating distances, the geometrical triangulation process comprises over three sensors to perform the positioning operation, which is achieved by calculating the strength of the transmitted signal or the signal’s propagation period. The triangulation geometrical approach is not only quick but also simple, straightforward, and easy to build for the positioning algorithm. It works well in the absence of interference and barriers. However, in an indoor environment, as the signal gets reflected from the walls, floor, roof, and other obstacles in the room, the triangulation method does not yield good results in such scenarios.
3.4. Android Support for BLE
Android is one of the most widely used operating systems, so it has been selected to test the full solution, and Android apps are available on the Google play store for BLE support. However, other operating systems’ apps are also available, e.g., CocoaPods for iOS. An Android smartphone is used in this research to detect the BLE strength at different locations using the BLE app to take RSSI readings from different beacons.
3.5. Received Signal Strength Indicator (RSSI)
RSSI tests the intensity of a received radio signal, and a higher value indicates a stronger signal. RSSI is utilized in Bluetooth to determine that the signal transmitted is within the Golden Receiver Power Range (GRPR), which is used to describe the optimal spectrum of the strengths of the incoming signals. RSSI is calculated in dB, and the GRPR signal amplitude corresponding to a RSSI can be positive, negative, or zero dB depending upon whether the signal strength is above or below the GRPR.
3.6. Data Collection and Preprocessing
In this research work, the training data is collected in an indoor hall as given in Figure 1. The carpet area of the hall is divided into squares of 1 square foot each. Thereafter, 13 BLE beacons are placed at different locations inside the hall to send data packets to the receiver. The Bluetooth strength from these 13 beacons is measured at a few locations inside the hall to create a dataset. The position of the receiver and the RSSI for all the 13 beacons get recorded.
First, the data is transformed into a format compatible with classification algorithm to predict the location. Second, the data is split into two columns representing the x and y coordinates of the location, which are then used to train and develop regression models. Regression models are built with a predictive performance based on independent variables, and they are frequently used to figure out the relationship between variables and forecasts. It is observed that the beacon signal strength ranges from −40 to −200. A value of −40 indicates the strongest possible signal, and −200 indicates the weakest possible signal.
In Figure 3, a correlation or heatmap is given among the values of beacons B1 to B13. A correlation map shows how closely related are the values in the different features. Illustratively, the correlation coefficient between b3001 (i.e., B1) and b3008 (i.e., B8) is 0.33, which reflects the positive behavior of both beacons concerning the receiver. Therefore, it can be said that both the beacons are present in the same direction from the receiver. On the other hand, the correlation coefficient between b3004 (i.e., B4) and b3002 (i.e., B2) is −0.41, which shows the negative behavior of both beacons for the receiver. Hence, it can be said that the receiver is present between both of the beacons.

3.7. Model Training
The next step for predicting the excavator’s specific location is model training, in which supervised machine learning [27] techniques are used. Supervised learning techniques include the process of learning and developing a function that can map inputs to outputs based on similar input/output pairs. The function is inferred using training data that is labeled or has an assigned target variable. In supervised machine learning, every data point is a pair consisting of an input value and a corresponding output value. Learning algorithms generate an estimated function after following the study of the training data points, which can be used to predict the output vectors of different inputs once the function has been trained. In an optimal situation, the algorithm can determine the dependent variable or the class labels of data points to which the algorithm has never been exposed. For this, the algorithm generalizes its learning from the training data to unnoticeable circumstances. There is a wide range of supervised learning techniques which can be used. An algorithm that works well in a situation might not work the same in other circumstances. In this paper, different supervised learning algorithms are used and their performances, as well as precision, are compared.
3.7.1. Multioutput Regression Techniques
Regression analysis is a type of predictive modeling method which predicts a potential value based on subjective predictors. The interaction between a contingent (target) variable and an independent (predictor) variable is explored using regression analysis. Traditional machine learning predominantly uses just one output/target variable. In multioutput regression, the outputs are dependent not only upon the inputs but also upon one another. This dependency means that the outputs are often not independent of one another and may require a model which can predict both outputs together or each output contingent upon the different outputs. Some regression algorithms can be used to solve multioutput regression problems directly such as linear regression, K-nearest neighbor regression, and decision tree regression.
3.7.2. Linear Regression
Linear regression is a linear approach which is used to predict the interaction between an independent variable and a dependent variable response (or a scalar response). The scenario that operates for one explanatory variable is called simple linear regression. Regression models are designed with a predictive performance centered on independent variables, which is often used to work out the connection between variables and forecasting. Specific regression models differ based on the form of relationship that is assumed between the dependent and independent variables, and the number of independent variables used. In regression models, R-Squared (R2) and Root Mean Squared Error (RMSE) are the two accuracy metrics used to measure how well a regression model performs compared to other models.
R-Squared calculates how much variation the model can identify in a dependent variable. It is the square of the correlation coefficient (R). The value of R-Squared is between 0 and 1, so a higher value implies a closer match between the expected values and the real ones. This indication is a fair measure of how well the model matches the dependent factors. However, this does not take into consideration issues such as overfitting. R-Squared is a relative measure of how well the model conforms to dependent variables.
Mean Square Error (MSE) is an estimator of how well the model fits the exact solution. It is computed by the square sum of the prediction errors. Root Mean Square Error (RMSE) is MSE’s square root value. It is used more often than MSE for two reasons: firstly, MSE values may often become too high for simple comparisons; secondly, the square of error determines MSE, and therefore the square root takes it up to the same degree of estimation error, making it easy to understand.
3.7.3. K-Nearest Neighbors [27]
K-NN assumes a correlation between the current case data and the present cases and incorporates the new case into the category that is more identical to the available ones. The K-NN algorithm stores all the available information and classifies a new data point depending on its resemblance, which ensures that it will quickly be grouped into an appropriate group as new data arises. It is a nonparametric algorithm that requires no assumptions about the underlying data. In K-NN, a given point is selected first using the distance method. There are many ways to calculate the distance between the given point and its closest location, which is called the Euclidean, Manhattan, or Hamming distance. The Euclidean distance metric is used by most machine learning algorithms, including K-Means, to assess the similarity of data. In this paper, the Euclidean distance has been considered.
3.7.4. Decision Tree
Decision tree [27] (Figure 4) splits down a dataset into further small subsets, thus constructing a correlated decision tree simultaneously. The result is a tree with decision nodes and leaf nodes, which includes two or more branches, each representing data for the evaluated attribute. The leaf node represents a decision made on the calculated numerical endpoint. The highest decision node of a graph that correlates to the strongest indicator is considered as the root node.

3.7.5. Random Forest
A random forest [27] (Figure 5) is an ensemble methodology that can implement both regression and classification tasks using several decision trees and by using the strategy referred to as bootstrap aggregation, which is widely recognized as bagging. The underlying principle behind this is to incorporate several decision trees to assess the final version, rather than depending on individual decision trees. Random forest has several decision trees as its base learning models. The bootstrap method includes random row sampling, function dataset sampling, and generating sample datasets for each model. Each decision tree has a large variance, but when we add them all together concurrently, the resulting variance is small. Since each decision tree is appropriately trained on that specific sample data, the performance is not based on one decision tree but on several decision trees. In this method, the result is the mean of all the outputs referred to as aggregation.

4. Result Analysis
In this research, thirteen Qualcomm CSR102x BLE modules are used as Bluetooth beacons. The data collected at different points are used to train multioutput machine learning algorithms to develop models that can make precise predictions about the location of the Bluetooth signal receiver based on signal strength. The test results vary marginally from the training data due to the speculative variance of RSSI in indoor wireless networks, which degrades the position estimator efficiency. BLE support is offered to Android from version 4.3 (API level 18) to version 5.0 (API level 21).
The MLAELD has been used for implementing the machine learning algorithms (linear regression, K-NN, decision tree, and random forest), and the performances of all these algorithms have been compared using R2 and RMSE values. This step is divided into two different categories as follows.
4.1. Comparison of Actual and Predicted Location of Beacons
The main focus of the research work is to ensure that the predicted location is close to the actual location, so that the efficiency of the trained model can be measured.
The Beacons Database (BDB) [28] provides the location of x, y coordinates along with the signal strength. The models have been trained using this BDB by dividing it into train and test datasets using an 8 : 2 ratio. Training data is an extremely large dataset that is used to teach a machine learning model. Test dataset is a tertiary dataset in machine learning that is used to test a machine learning algorithm after it has been trained on an initial training dataset. The results of actual and predicted locations can be categorized based on selected machine learning algorithm as follows.
4.1.1. Actual and Predicted Values by Linear Regression
The actual and predicted values for x and y coordinates are depicted in Figure 6. The blue dots represent the actual values, and the red dots represent the predicted values. The scattering of these dots far from each other shows the difference between actual and predicted values. Overlapping shows that the actual and predicted values are very close to each other. In Figure 6, the values of x coordinates are scattered showing that the errors comparative to other methods are high.

(a)

(b)
4.1.2. Actual and Predicted Values by K-NN
The actual and predicted values of x and y coordinates by the K-nearest neighbor algorithms are shown in Figure 7.

(a)

(b)
In this case, the values are not saturated at one place as in Figure 6. Here, values are scattered, and most of the predicted values overlap the actual values. Therefore, the accuracy increases, in this case, compared to linear regression.
4.1.3. Actual and Predicted Values by Decision Tree
The actual and predicted values of x and y coordinates by decision tree are given in Figure 8. In this case, the results are improved compared to linear regression but are almost similar to those of K-NN.

(a)

(b)
4.1.4. Actual and Predicted Values by Random Forest
The actual and predicted values of x and y coordinates by random forest are given in Figure 9. In this model, the results are improved compared to the other three methods. Extra white space on the figure indicates that most of the values are overlapped, which reflects a higher accuracy of prediction.

(a)

(b)
4.2. Performance Comparison Using R2 and RMSE Values
In this section, the numerical difference between actual and predicted values is discussed using R2 and RMSE values for training and testing data. If the main goal of the model is prediction, then the major criteria are to calculate the RMSE value, which gives the information of accurate prediction of response, and a lower value of RMSE shows a better fit.
However, R2 gives information about how two values are closely related. Therefore, a higher value or value closer to +1 of R2 indicates a good fit of values into the model.
The main focus of the research work is to ensure that the predicted location is close to the actual location.
Table 1 shows the comparison of the various RMSE and R2 values for the four algorithms implemented on the training dataset. The decision tree regression and the random forest regression provide considerably the best performance among the four models.
Table 2 shows the comparison of the various RMSE and R2 values for the four algorithms implemented on the test dataset. The random forest regression provides considerably the best performance among the four models.
5. Conclusion
In this work, a machine learning-based model has been designed to predict the location based on the RSSI values handled by a Bluetooth receiver. The model can be used for precisely locating trapped excavators or machines underground. Apart from that, it can be used at another place where similar requirements are found; e.g., it can be used in supermarkets where it can assist customers in locating shops and track the movement of customers, which can help in product placement. This study is limited to detecting the location of a receiver using a fixed number of BLE beacons inside an indoor hall. In the future, a similar study can be conducted inside a long tunnel with hundreds of beacons to detect precise locations by further leveraging a Wi-Fi setup in the area. This can help in developing a model which can be used to identify the location using Bluetooth, not just in a small indoor area but in a much larger space. Location detection and tracking using Bluetooth can also be used in monitoring the movement and flow of a crowd in a busy street or inside a busy supermarket.
As this technology can be used to track and detect a person’s location in real time, it can be used in crowded fairs and shops to trace lost people or children with ease. It can also be used to detect the last known location of a person stuck in a building during a fire or an earthquake, which can help fire fighters or disaster response teams track and rescue the person swiftly. Using indoor positioning in museums can be the best way to reduce expenditure on hiring staff and guides. It can assist tourists in navigating through the museum and in exploring various artifacts.
Data Availability
The data shall be made available on request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.