Abstract
In order to solve the problem of complex event pattern in big data and strengthen research on key technologies of the Internet of Things and computer time matching algorithms, this paper studies the problem based on Hadoop clustering algorithm. Firstly, based on the subtype attribute of event type, the maximum value is selected as the final attribute value of the event after weighting the event. Secondly, cluster analysis is conducted on the Internet of Things flow dataset through the relationship between complex events. Finally, the simulation test is carried out on the simulation dataset of complex event relationship, and the simulation test of clustering algorithm comparison is carried out. The experimental results show that the clustering accuracy of different datasets is above 85%, and the clustering accuracy of causality reaches 96.07% when the dataset is 5G. Therefore, the algorithm proposed has high feasibility, good stability, and high speed and effectiveness for complex event processing. This study has certain practical significance for solving the problem of complex event pattern in big data.
1. Introduction
With the rapid development of the Internet, the concept of the Internet of Things (IoT) was proposed. The Internet of Things (IoT) refers to the real-time collection of any information that needs to be monitored, connected, and interacted through various devices and technologies such as information sensors, radio frequency identification technology, global positioning systems, infrared sensors, and laser scanners. Objects or processes collect all kinds of required information such as sound, light, heat, electricity, mechanics, chemistry, biology, and location and realize ubiquitous connection between objects and objects and between objects and people through various possible network accesses. Realize intelligent perception, identification, and management of items and processes. Then, with the vigorous development of the IoT, people soon know its existence. Meanwhile, the IoT is also regarded as a major development and change opportunity in the field of information. IoT has been widely used in security, facility management, health care, supply, production, public utilities, and transportation. In fact, the IoT is not a new network or technology; it is just an extension of the Internet. Middleware is a common service between platforms (hardware and operating systems) and applications. For different operating systems and hardware platforms, they can have multiple implementations that conform to interface and protocol specifications. IoT middleware technology is the core key technology of the IoT. However, it is based on a specific IoT application or service type and does not apply to most IoT application scenarios in a unified theoretical framework [1]. The research on key technologies of IoT middleware is mainly restricted by two factors. On the one hand, the research content of the middleware technology of the IoT mainly focuses on the perception and interconnection of the bottom layer, and there is still a big gap from the high requirements of realizing the IoT, including high flexibility, reusability, and high reliability. Middleware is a key software component in IoT applications and a bridge connecting related hardware devices and business applications. Its main functions are to shield heterogeneity and realize interoperability and information preprocessing. The research content of IoT technology mainly focuses on the realization and interoperability of the bottom layer, and the complexity and application range of IoT technology determine the development direction of IoT technology. The IoT is still in its infancy. There are still many problems in supporting large-scale applications of the IoT, such as heterogeneous physical devices, complex event processing, quality of service control, security, access control, and other key technologies, which are also major challenges faced by current IoT applications.
Cruz et al. [2] systematically reviewed the relevant literature and discussed the differences between the current Internet and the IoT-based systems. This study has completed an in-depth discussion on the challenges and future prospects of IoT middleware and finally concluded that middleware plays a vital role in IoT solutions. The architecture method it has previously proposed can be used as a reference model for IoT middleware, and it emphasizes the difficulty of realizing and implementing universal standards. Farahzadi et al. [3] introduced the main of middleware in the field of IoT and compared them from different architectural design possibilities. Several middlewares were introduced and studied from the aspects of middleware architecture, service areas, and applications. In the Internet of Things, the information collected by the same information collection device may be supplied to multiple application systems, and the data between different application systems also need to be shared and communicated with each other. However, because of heterogeneity, the data results generated by different application systems depend on the computing environment, which makes it impossible or very difficult to transplant between different software between different platforms. Moreover, because of the different network protocols and communication mechanisms, these systems cannot be effectively integrated with each other. Through middleware, a common platform can be established to realize the interoperability between application systems and application platforms. From the perspective of the whole research status, the basic system of the IoT is still in its infancy. Experts and scholars in different disciplines have different basic viewpoints on the research of the IoT, so there is no uniform definition of the related concepts of the IoT. Radio Frequency Identification (RFID) technology has brought convenience to researchers, which can penetrate many shelters and realize remote automatic reading of information [4]. RFID technology is no exception, and it also brings a large number of complex data, which are redundant, deleted, or wrong. In order to make full use of these massive data, some researchers proposed data cleaning and filtering [5–7], complex event processing, and middleware technology, which soon became the hotspot and core issue of Internet research.
Nowadays, sensor sampling data management in the IoT is facing many challenges. Compared with dynamic flow characteristics and some correlations between heterogeneity and space-time, there are still no good solutions to these challenges, so this experiment needs to study these problems specifically. However, there is an algorithm which can help this, that is, clustering algorithm. When there are many objects and attributes, this algorithm can, based on some of the attributes of these given objects’ designed similarity function, measure the similarity between multiple objects, which can effectively obtain useful event pattern relations [8, 9]. This study mainly studies how to cluster effective event relationships in the big data generated by the IoT by selecting the corresponding attribute features through Hadoop clustering algorithm based on cloud computing platform, in other words, how to effectively obtain a natural structure hidden in the data.
2. Literature Review
At present, a lot of literature has studied and calculated the key technologies of IoT, cloud, and middleware. For example, Ferreiraand de Sousa Junior [10] believed that it was necessary to move to the IoT. A unified and transparent security measure for the IoT middleware was proposed, and the proposed architecture was deployable, including protection measures based on existing Internet security technologies, support for the special security needs of the IoT, and help to implement a more efficient intelligent environment. The main concern was data protection, privacy, and consumer law. Although the IoT can use system performance through cloud services, the exchange of a large number of control rights or data packets may be harmful to the system and reduce its efficiency. In some cases, data exchange between the IoT and the cloud is not reasonable. Therefore, Yoo and Kim [11] proposed an intelligent gateway to process and analyse requests and decide whether to localize or send them to the cloud.
The research on IoT middleware and key technologies will be relatively less. Kulathunga et al. [12] proposed a middleware framework. The focus of this framework was to reduce the learning curve that developers must face when developing classification-based Internet applications. It would support developers through a developer-friendly Application Programming Interface (API) that would reduce the developer’s learning curve and thus effectively build a classification-based website with core functionality. Kertiou et al. [13] used the advantages of dynamic skyline operators in the field of multicriteria decision-making to reduce search space, improve the efficiency of context awareness, and select the best sensor according to user requirements. Chen et al. [14] proposed a new power supply strategy for RFID tags by integrating wearable nanogenerators based on friction-electric-electromagnetic hybrid mechanism. The nanogenerator can effectively convert biomechanical energy into electrical energy and provide sustainable power for RFID tags. Xiao et al. [15] proposed a new parallelization model of distributed complex event processing system and three parallel processing strategies. The new parallelization model includes the influence of time constraints (such as sliding windows) so that the downstream machine can share the overlapping processing load between batch windows caused by each input event, so as to avoid the events that may lead to wrong decisions. The proposed parallel strategy can ensure that the complex event processing system works stably and continuously over time. In recent years, with the continuous development of RFID technology, a variety of RFID-based systems emerge in endlessly; more and more RFID users, in the study of RFID tag filtering method [16], RFID protocol [17] and RFID reader redundancy algorithm [18], and RFID data compression algorithm [19], have achieved remarkable results. Although RFID technology has gradually matured, there are still many problems, including inconsistent parameters, complex use, high cost, signal vulnerable to interference, complex frequency control, and poor security performance, which need further research.
In the application, the event matching process is a process of logically determining and forwarding the event publishing information describing the characteristics and state of the virtual object and the ordering information describing the interest range of the virtual object. Because the information of object ordering and publishing in the virtual environment is relatively complex (including not only the position information of the object and its own characteristics but also the premonition area and interest area of the object, etc.), it is usually composed of multiple predicates using a logical AND relationship. The matching between information is essentially a matching operation between predicates.
However, in this era of big data, the traditional RFID and Complex Event Processing (CEP) methods and technologies have been unable to meet the needs. Since the data volume of big data is very large, its structure is very diverse. In addition, the effective value of big data is also very low, and the processing speed of big data is very high, so good methods and technologies are required to solve these problems. Therefore, the main problem to be solved is how to quickly and effectively obtain the required event pattern relationship in the environment of big data. For such research directions as pattern recognition and data mining, clustering analysis is the most commonly used method. Many research fields choose to use clustering analysis to study some image processing and pattern recognition, and even it is used in the analysis of data. In addition, clustering analysis can also be seen in other fields such as biology and data mining, which shows that clustering analysis has a very important role. Therefore, the focus of this study is on complex event clustering algorithm.
3. Method
3.1. Architecture of the IoT
IoT architecture generally includes three parts: network layer, perception layer, and application layer. Among them, the perception layer mainly includes RFID, sensors, and two-dimensional codes, and the perception layer mainly identifies objects through sensors to collect data [20]. The architecture of the IoT is shown in Figure 1:

3.2. Key Technologies in the IoT System
3.2.1. RFID Technology
RFID is the abbreviation of Radio Frequency identification, that is, radio frequency identification. RFID refers to a technology that uses the reflected energy of electromagnetic waves to communicate. Since RFID technology is used to identify objects as soon as it is created, the view that it is considered as an automatic identification technology is obviously ambiguous and limited. RFID technology is based on radio frequency signal of noncontact automatic identification target, to achieve remote access to data and storage purposes. Its components include RFID tags, application software systems, and RFID readers, relying on radio frequency signals to achieve communication between tags and RFID readers [21]. The RFID reader can identify and read out the data information contained in the RFID tag and rely on the antenna to realize the transmission of RF signal between the reader and the tag. RFID tags include microelectronic chips and coupling components, which can be divided into active RFID and passive RFID according to the presence or absence of internal power supply. It is identified by placing on the surface of the object. At present, RFID tags contain the most unique electronic identity. The RFID technology commonly used in the IoT is to identify materials, obtain electronic identity by radio frequency signal reader, and transmit it to the application software for processing. With the rapid development of RFID technology, it has the characteristics of strong access to information, poor working environment, remote reading and writing, large data storage, and fast reading. It is widely used in retail industry, asset security management, commodity control, production line, IoT, and other fields. RFID technology and sensor technology are the core technologies of the IoT, and the combination of the two will greatly promote the development of the IoT technology [22, 23].
3.2.2. CEP Technology
CEP (Complex Event Processing) is a data stream-oriented real-time data analysis technology based on memory computing. It is mainly used for high-speed complex data pattern recognition. It is widely used in quantitative transactions and risk control in monthly capital markets in foreign countries. CEP is actually an integrated technology framework, which consists of event filtering, event architecture model, event relationship discovery, event abstraction, and pattern detection. In addition, it includes event-driven processing engines and methods and techniques such as event aggregation and transmission [24, 25]. The main function of complex event processing is to process a large number of events and obtain some useful information. Through continuous monitoring of message flow, CEP will record a series of single events, then identify complex events according to these records, and finally make some corresponding responses and processing. There are four main steps in complex event detection. First, extract the original event from massive data. Second, an event summary is generated based on a series of association rules. Third, time, hierarchical relationship, causal relationship, and semantic relationship are extracted and processed for complex events or original events. Finally, the response is sent to the event subscription server.
3.3. Relationship Structure Model of Complex Events
3.3.1. Types of Complex Events
A complex event is to use the association between event attributes to match a defined event sequence from the event stream according to the pattern. EPCIS (Economic Products Code Information Service) specification determines five types of events related to the IoT, which are a common event and four corresponding subevents including object event, aggregation event, quantitative event, and transaction event. All kinds of events in the supply chain activities of many industries can be represented by these five types of events. The characteristics of the five events are shown in Table 1:
3.3.2. Attribute Values and Complex Relationships of Events
Event types can include many different types. For example, the same event may belong to two or three types of events, depending on the event type attribute. The corresponding subevent types can be weighted, and then, the maximum weight value is selected as the final attribute value of this event [26]. Its complex relationships are shown in Table 2.
3.4. Complex Event Clustering Algorithm
Complex event clustering algorithm, also known as K-means algorithm, is the most representative algorithm in the clustering method based on partition. In the datasets used in cluster analysis, the densities of each class are often different, or even very different. Most existing clustering algorithms focus on how to discover classes of arbitrary shapes and sizes, but struggle to efficiently handle datasets with widely varying densities. The similarity measure of K-means algorithm is Euclidean distance, and the optimal classification corresponding to the initial clustering center vector V can be obtained, which makes the evaluation index J minimum. The clustering criterion function of the algorithm uses the criterion function and error square. The mathematical expression is as follows:where n represents the number of samples, k means that the sample is divided into k categories, and denotes whether the nth sample point belongs to class k; if it belongs to the k category, then = 1; if it does not belong to the k category, then = 0; denotes the k center.
The basic idea of K-means algorithm is to classify the nearest targets by taking K points in the space as the clustering center. Through iterative method, the values of each cluster center are continuously updated to obtain the optimal clustering results [27]. The algorithm flowchart is shown in Figure 2.

K-means algorithm has many advantages. For example, the parameters need to be adjusted are less, the clustering effect is good, the interpretability of the algorithm is strong, the algorithm is easy to implement, and the convergence speed is fast. However, there are also some shortcomings. For example, the selection of K value is not good, it is difficult to converge for datasets that are not convex, and the results obtained by the iterative method are only local optimal and sensitive to noise and outliers.
3.5. Quantization and Normalization of Complex Event Attribute Values
This experiment uses RFID reader to identify and record items in the warehouse, forming a certain event flow, and form the actual dataset. In this experiment, 20 EPC (Electronic Product Code) read points are selected based on a certain data rate to form a vertical data flow.(a)Commodity category: in the 96-bit binary of EPC coding standard SGTIN-96, 24-4 represents the value range of commodities, and the commodity category is 1∼255 after converting from binary to decimal quantization.(b)Event type (binary event types): if the event is a trading event, it is 0.9; if it is a nontrading event, it is 0.1.(c)Location: based on the distance from the warehouse, the attribute values of 19 selected collection data points are quantified as 0–19.
The normalized mapping equation as follows:where denotes the maximum value of j attributes in all elements and denotes the maximum and minimum value of j attributes in all elements.
The experimental environment parameters, as shown in Figure 3, are used to cluster the datasets of different sizes.

4. Results and Discussion
In order to verify the correctness and effectiveness of the experiment, this experiment uses the experimental environment shown in Figure 3 to conduct simulation tests on the simulation dataset shown in Figure 4 and also conducts comparative simulation tests on the clustering algorithm.

Since causality itself is a complex uncertainty problem, only a group of events belong to the same subfield of the problem when considering the determination of causality. When the k value of the dataset is selected, the dataset is collected by co-clustering, and then, each dataset is clustered according to the distance between the event type and the attribute value so that the causal relationship between different clusters can be obtained in the clustering results. When the dataset is 0.5G, 5G, and 10G, the time consumption of different pattern types is shown in Figures 5–7, respectively. The accuracy of different datasets (0.5G, 5G, 10G) is shown in Figure 8.




Figures 5–7 suggest that when the dataset is 0.5G, the time consumption of the collaborative relationship is 19.224, which is the least. When the dataset is 5G, the time consumption of peer relationship is the least, which is 298.227. When the dataset is 10G, the collaborative relationship takes the least time. Figure 8 shows that the clustering accuracy of different datasets is above 85%, and the clustering accuracy of causality reaches 96.07% when the dataset is 5G. Therefore, the proposed algorithm can quickly and effectively cluster the event pattern relationships hidden in different datasets. Meanwhile, this algorithm shows good stability, which can show the feasibility of the proposed algorithm and the effectiveness of complex event processing.
5. Conclusion
The key technology of IoT middleware and computer event matching algorithm are studied, mainly through the Hadoop clustering algorithm based on cloud computing platform to study how to select the corresponding attribute features in the big data generated by the IoT to cluster out effective event relationships, in other words, how to effectively extract a natural structure hidden in the data. Therefore, this study uses Canopy algorithm to predict the clustering center point and k value, which greatly improves the stability and quality of clustering results. Through the analysis of the clustering results, this study can find the association model needed by people for complex events from a large amount of data. The experimental results show that the model has important theoretical significance and practical value. However, there are still shortcomings, which are limited by hardware environment conditions. After testing, if the dataset is too large, the experimental environment will collapse, the time required for clustering will appear uncontrollable phenomenon, and the expected experimental results cannot be obtained. Therefore, this experiment needs further improvement. Cloud computing represents a new tipping point in the value of network computing, offering greater efficiency, better scalability, and an easier application delivery model. Cloud computing realizes not only the virtualization of hardware resources but also virtualization of services, virtualization of data, and virtualization of software delivery modes through service platforms. The combination of IoT middleware and cloud computing can solve not only the problem of filtering, integrating, and storing massive information in the Internet of Things but also the problem of interoperability of different application systems in the Internet of Things.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare no conflicts of interest.