Abstract
Recognizing the users of devices (or clusters of devices) who use IP addresses as unique identities on the Internet can easily enable numerous security applications. Fast and accurate user recognition is critical for supervisors to find influenced organizations connected to their networks in light of new security threats. Many users’ information scatters in the multisource data of IP addresses. Up until now, user recognition of devices has had two main problems. On the one hand, existing methods could not fully use multisource data of the IP addresses and wastes the valuable information of labels. On the other hand, only a tiny portion of devices can be tagged with highly confident known users manually, making it an urgent need to infer unknown users of devices. So, the problem of user recognition on devices is to guess the unknown user with multisource data and existing devices with known users. Therefore, this paper proposes a multiview fusion method to deal with multisource data from devices with a small number of manually labelled samples. The paper uses GraphSAGE to obtain an exemplary representation of IP addresses and designs a label encoder to fully use a small number of devices with known users. Then, the paper builds a specific unified transformer to achieve high performance to determine whether two devices have the same user. At the same time, the paper conducts real-world experiments and finds that the proposed method can achieve 0.9158 accuracy and 0.6131 F1 to find devices with the same users on the constructed dataset in the real world.
1. Introduction
An Internet protocol (IP) address is a unique identifier representing a device or a cluster of devices on the Internet. Thus, the user recognition of devices could be simplified to the user recognition of IP addresses. Recognizing the users of devices can enable numerous security applications. For example, when a new vulnerability is exposed, network supervisors can quickly and effectively identify the influenced users connected or related to their responsible networks to reduce the potential losses by warning them. In terms of using traffic between devices, there have been many studies [1–3] seeking the source of device attacks, namely user recognition of the attacker device. But the studies about user recognition of devices on the Internet are quite few. Querying public databases, such as WHOIS of IANA [4], ASO with querying protocol BGP [5], and rDNS (reversed domain name system) with querying protocol DNS [6], is a common way to identify the users of large-scale devices. However, it has many limitations: (1) many organizations recorded in the databases are the registers or operators of IP addresses, and Internet service providers or cloud service providers, rather than the actual users, own most IP addresses. (2) IP addresses can be sublet or sold, so the registered organizations may not be the actual users, and it is not easy to know as much as possible about who uses public devices on the Internet.
It seems easier to find device users using public databases such as WHOIS and DNS [7]. Nevertheless, these methods can only get the register of devices, which are always not their users. Indeed, these are some related methods for identifying users of devices. AIWEN [8], a commercial company, links devices with users by analyzing companies’ official websites and corresponding IPs. The method is referenced by commercial IoT engines such as Zoomeye (https://www.zoomeye.org/). However, that method only uses websites instead of multisource data from IPs, resulting in low accuracy and coverage. Also, there are some other studies that focus on user recognition, such as person identification [9] in cameras, but they are different from the user recognition of devices on the Internet.
By analyzing the public attributes of devices on the Internet, we observed that many users’ information scatter in the multisource data of IP addresses, as shown in Table 1. For example, the “cptdc” in the rDNS of the first user is the abbreviation of “China Petroleum Technology and Development Corporation,” which is not recorded in ASO. The corresponding ASO is “China Unicom Beijing Province Network,” which is the device’s operator or Internet service provider. Limited to the massive cost of resources, only a tiny portion of device users can be manually labelled by experts. Therefore, an effective way to realize user recognition of devices is to calculate the similarity between the device with an unknown user and the device with a known user and judge whether the same user uses them or not. However, because devices are usually deployed in various environments, leading to the multisource and sparse data of devices, it is challenging to recognize the users of devices well.
The knowledge graph effectively manages large-scale and multisource data, which is suitable for organizing devices’ data on the Internet. The graph representation is used to fuse and utilize the multisource data and express the graph as vectors of nodes and links to facilitate subsequent tasks. In the past decade, graph neural network (GNN) has realized many successful practices in graph representation for homogeneous and heterogeneous graphs. The typical models for homogeneous graphs are to use random walk [10] to generate and vectorize the nodes sequence of the graphs, such as Node2Vec [11] and Struct2Vec [12]. Furthermore, one of the classical paradigms for heterogeneous graphs is to define and use meta paths, such as Metapath2Vec [13]. However, these graph representation methods face several issues. First, the methods on homogeneous graphs only consider the relations between only one kind of node, and others are transformed into node features, making it hard to learn fine-grained representations of nodes when node features are too sparse. Second, most methods on heterogeneous graphs need specific knowledge to design meta paths for each type of graph. Third, most existing methods could not learn the representation thoroughly and efficiently under the web-scale Internet data of devices. Finally, most of the methods above do not consider the known partial labels in the face of specific tasks, making the effect of recognizing users slightly insufficient.
To improve the performance of user recognition of devices, this paper proposes MVEPL: Multiview embedding with partial labels to recognize users of devices on the Internet based on a unified transformer. To integrate the multidata of devices, MVEPL constructs multiview graphs of devices and uses GraphSAGE to get the nodes’ embedding as the devices’ representation. To fully use the devices with known users, MVEPL calculates the label embedding as an extra feature of devices based on the label encoder. To judge precisely whether the same user uses two devices or not, MVEPL trains the unified transformer on the features of devices based on the known labels. The paper also conducts experiments and finds that MVEPL could achieve 0.9158 accuracy and 0.6131 F1 to find out devices with the same users on the constructed dataset in the real world.
Our contribution is summarized as follows:(1)This paper adopts the multiview method to effectively integrate the multisource data, such as AS, DNS, etc., of devices with IP addresses as identities to get their representation by GraphSAGE inductively.(2)To fully use the small number of devices with known users, this paper proposes a label encoder and makes it early to fully use the small portion of devices with known users to get fine-grained embedding of devices.(3)To realize the user recognition of devices on the Internet, this paper designs a unified transformer to enhance the performance of calculating the similarity between the embeddings of two devices to judge whether the same user uses them.
2. Motivation
2.1. Problem Statement
Our work is motivated by several famous and influenced cyber security incidents, such as the Apache Log4j vulnerability [14]. Log4j is a widely used tool for gathering log information and deploying it on websites. The Log4j vulnerability allows attackers to execute code remotely to control the websites. According to media reports, the vulnerability affected many companies using Internet services, including Apple, Amazon, IBM, Microsoft and Twitter. After regulators and security researchers detect the affected websites, the users should be promptly notified to protect against potential attacks.
However, there are still no adequate methods to know the actual users of large-scale devices worldwide in academia and industry. Motivated by the fact that much information about users is embedded in the multisource data of the devices, we identify the user by fusing the scatted information.
2.2. Challenge
It is very challenging to recognize the users of devices on the Internet due to the following reasons.
Facing multisource data of devices on the Internet requires a method that can well embed the devices. The data of devices we use include rDNS [15], AS [16], subnet [17], location [18], and Hardware/software [19]. At the same time, the coverage of each kind of data is also different and sparse, making existing methods could not perform well.
For example, rDNS is pieces of text split by dot, such as .crawl.baidu.com. AS/subnet are a kind of serial numbers such as AS8075 and 104.245.188.0/24. Hardware/software are also texts, such as Firewall and WebContainer. In reality, the coverage of the rDNS is always much lower than AS, making it difficult to integrate and use multisource data of devices.
Currently, most of the existing graph representation methods focus on using label information when constructing specific task models. However, due to the particularity of measuring users of devices, there are very few devices known users, namely a semisupervised problem. In order to use the label information as much as possible to improve the performance of user recognition, this paper uses label information as an added embedding of devices. When the labels of devices are used as an added embedding of devices and for calculating the losses in the testing process, the performance improvement is meaningless, called “evaluation crossing.” At this time, how to use label information in advance and avoid “evaluation crossing” is also a big challenge.
3. Related Work
GNN is a kind of technology in which the model obtains information from homogeneous and heterogeneous graphs. In recent years, GNNs have created many successes for tasks with relational data, such as node classification and link prediction on the open academic graph [20]. The main idea of GNN is to learn the representation of nodes or links by structure and other graph attributes using a neural network. Node2Vec [11] and Struct2Vec [12] use the random walk to generate and vectorize the graphs’ nodes’ sequences to get the nodes’ representation. Nevertheless, Node2Vec and Struct2Vec could only extract the structure features of the graph. GCN [21] and GAT [22] aggregate the features of neighbour nodes by Laplacian matrix and attention scores, respectively. At the same time, the GCN and GAT are too transductive to vectorize the new nodes to the graph. GraphSAGE [23] is more inductive than others because it learns how to aggregate the features to get the node representation instead of the node representation directly. UniMP [24] uses the transformer [25] to extract deeply abstract features of nodes and partial labels of nodes to predict others. All the GNN methods above are designed for graphs with only one kind of node and link, namely homogeneous graphs. So, they cannot model the graphs with more than one kind of node or link, namely heterogeneous graphs. Metapath2Vec [13] defines meta paths and considers the types of nodes and links when generating and vectorizing the nodes’ sequence by a random walk. In contrast, the performance of Metapath2Vec depends on the design of the meta path. RGCN [26] and RGAT [27] also consider the types of nodes and links when aggregating the features of neighbour nodes. Furthermore, HAN [28] gives weights to different link types and to neighbour nodes in the same type of link.
Nevertheless, RGCN, RGAT, and HAN could not consider the implied relations between different types of nodes and links, making it hard to get a fine-grained representation of nodes or links. HGT [29] imitates the structure of a transformer to learn the representation of nodes with different types. Nevertheless, HGT ignores the effect of recognizing nodes with unknown labels with nodes with known labels. Therefore, existing GNN methods cannot be migrated to solve or could not perform well to get the fine-grained embedding of devices for user recognition of devices on the Internet.
3.1. Texts Similarity Model
The text similarity model (TSM), which calculates the similarity of two input texts, is one method that can be easily migrated to judge whether the same user uses two devices. The main idea of TSM is to get the representation of input texts relatively and learn their similarity supervised. The typical and state-the-of-art TSM models in recent years are ABCNN [30], SiaGRU [31], ESIM [32], BIMPM [33] and RE2 [34]. ABCNN and SiaGRU fully learn the context information inside texts by CNN and GRU relatively, but rarely learn the interactive information between texts. ESIM and BIMPM consider the interactive information between texts at the input part of the models but are limited to processing long sequences of input texts. RE2 realizes a fast and accurate text matching by the enhanced residual blocks, but also RE2 mainly depends on the context information of the input texts.
Therefore, existing TSM methods cannot be migrated to solve or could not perform well to judge whether the same user uses two devices or not, to infer the user of the device by the preliminary known user of the other device.
4. Methodology
In this section, MVEPL is introduced, as shown in Figure 1. Firstly, this paper constructs multiview graphs on the base graph of CAIDA topology, combined with the separate embeddings of rDNS, AS/subnet, location, hardware/software as features of their nodes to build multiview graphs; secondly, by using the label encoder and training the GraphSAGE on the graphs, the representation of devices is obtained inductively, including the embeddings of partial devices with known users; finally, unified transformer is designed and used to calculate the similarity between the embeddings of two devices to judge whether the same user uses them or not. At the same time, the paper randomly selects some users of devices and predicts their labels to ensure that no label information will be leaked in advance.

4.1. Separate Embeddings
Separate embeddings layer vectorizes multisource data preliminarily. For data from different sources, the vectorization methods are also different. MVEPL chose four possibly related features for users, including rDNS, AS/subnet, location, and hardware/software of devices. The following are partial analyses of the chosen data:
4.1.1. rDNS
RDNS is a method of decomposing an IP address into a domain name. The first-level (suffix) and second-level (domain) of rDNS are extracted, counted, and sorted to extract the embeddings. Then, we visualized the top 10 suffixes and domains for analysis, as shown in Figure 2.

It is found in Figure 2 that most of the suffixes represent countries and regions, such as de, fr, etc., or the type of industry to which the domain name belongs, such as .net for network providers and .com for commercial enterprises. Moreover, Figure 2 also shows that the domains may contain some information about their users, such as Amazon, Comcast, etc. However, some domains cannot directly expose the relevant users, such as rr.
4.1.2. AS
An autonomous system (AS) on the Internet is a small group with the right to autonomously decide which routing protocol should be used in the same system. This group can be a simple network or several networks controlled by one or more administrators, such as a university, an enterprise, or an individual company. An AS will assign a globally unique digit number, usually called ASN, and administrators of AS are always called ASO.
This paper collects about 100,000 ASNs and their corresponding ASOs. To analyze the usability of AS to recognize the users, the largest 10 AS are listed in Table 2.
The number of IP addresses covered by each AS is different and distributed in the power of , which is relatively large for many users. Also, it can be seen from Table 2 that some ASOs are organizations that may use the IP addresses in AS. For example, the ASO of AS16509 is AMAZON-02, and Amazon probably uses its IP address. While the ASO of AS4134 is CHINANET-BACKBONE No. 31, Jin-Rong Street, CN, it is China’s public backbone Internet. Therefore, the IP addresses in AS4134 are likely to be used by many users. In a word, users can be mined but cannot be determined entirely with AS.
4.1.3. Label Encoder
The early use of label information can make it more convenient to determine whether the users of devices are the same [35, 36]. For example, after knowing the user of one device, we can more precisely determine whether the user of the other device is the same. In this way, the model makes it easier by dividing the problem into two cases, one is to process devices with the same user when some of their users are known, and the other is when all of their users are unknown.
The paper proposes to embed the partially observed labels into the same space as node features, which consist of the label embedding vector for labelled devices and zero vector for unlabelled devices. Unlike the UniMP, the nodes of the heterogeneous graph are various, and devices with IP as identity may be labelled with users. Therefore, the paper uses a randomly initialized linear projection to propagate the label embedding vector from IPs to other nodes, as follows:
For node , represents the node type of ; is the linear projection of trained in subsequent classification; are the neighbour devices of ; is the original features; and is the label embedding vector of node . The paper concatenates and as the final features . Furthermore, only the devices with unknown users could get the label embedding vector by propagation, and linear projection are the attention scores of label embedding based on different users.
4.2. Multiview Embeddings
Multiview embeddings use GraphSAGE to obtain representations of graphs originating from the based graph using the CAIDA topology [37] and separate embeddings as node features from multisources of IPs.
4.2.1. Base Graph
MVEPL constructs the base graph using the CAIDA topology, provided by CAIDA (https://www.caida.org/catalog/datasets/ipv4_routed_24_topology_dataset/). This dataset is designed for studying the topology of the Internet. The dataset is collected by a globally distributed set of monitors (https://www.caida.org/projects/ark/). So, a base graph can be constructed by analyzing CAIDA data to obtain the links between the IP addresses.
4.2.2. GraphSAGE
Most existing graph embedding methods require all the nodes in the graph to be processed simultaneously, also called transductive, and cannot be naturally generalized to unseen nodes. Therefore, MVEPL chose GraphSAGE, an inductive framework that can efficiently and conveniently get the embedding of new nodes to collect multisource data of IP addresses and label embedding of users. For each view graph, the critical equations of GraphSAGE are:where is the embedding of node in layer , ; represents the neighbours of node ; and represent the weights and bias of GraphSAGE.
The graph-based loss function hopes that adjacent nodes have similar node representations while making the representations of separated nodes as distinguishable as possible. The loss function is as follows:where is the neighbour that appears near ; and are the probability distribution and the number of negative samples, respectively.
4.2.3. Unified Transformer
There are some studies [38, 39] using the cluster method on the graphs to divide nodes into different groups, which seems to be useful in user recognition of devices. In contrast, MVEPL aims to judge whether the users of the two devices are the same by the similarity of their embeddings, so as to improve the accuracy of user recognition. These two devices are called query and context in information retrieval. The model based on transformer has achieved state-of-the-art performance in feature extraction and representation in natural language processing. The transformer consists of self-attention and a forward neural network. The self-attention iswhere could be the matrix derived from the input features , and the dimension is . Generally, are the same matrix. When are the same matrix from the query and is the matrix from the context, the attention is called coattention.
Based on the attention defined above, the transformer and cotransformer are built in multihead ways. In order to better extract the interactive information between query and context, MVEPL built a cotransformer to integrate their embeddings. Then, the embeddings after transformers are:where represents the embedding of query device and represents the embedding of the context device.
After getting the final representations of devices from the unified transformer, MVEPL constructs a Siamese model and uses devices as a pair to train and test. In Siamese [31], the loss function used is contrast loss, which can effectively deal with the relationship of paired devices. The expression of contrastive loss is as follows:where represents the Euclidean distance of the two sample features, is the label of whether the two devices have the same user, and margin is the preset threshold and .
MVEPL trained the Siamese by minimizing the contrastive loss based on the final embedding of the query device and context device to determine whether they belong to the same user.
Of course, the earlier the label information is used, the more likely the problem of “evaluation crossing” will occur. That is, the information from the test dataset is leaked to the training dataset, resulting in a meaningless improvement in the models’ performance. Therefore, the paper randomly masks some users of devices and predicts their labels, like UniMP, to ensure that no label information will be leaked in advance. Then, the objective function for classification could bewhere are the labels used in training and are the labels needs to be predicted; are the nodes used in testing; are the embedding vectors of nodes; is the graph structure, and are the parameters of the MVEPL, including those of the label encoder.
5. Experiment
5.1. Dataset
This paper evaluates MVEPL in a real environment at a city in China. The labelled IPs with users are derived from the public data of the devices in the early stage, such as SSL certificates, protocol banners, and existing manual labels. The information about the dataset is shown in Table 3. The target is to judge whether two devices have the same user.
Then, the paper constructs the dataset used in the subsequent experiments on the original dataset provided by Table 3. Each sample in the dataset is a pair of devices, the label of which for the same user is 1 and for different users is 0. Table 3 shows that each user has about 10 IPs, based on the number of users and IP users. Therefore, for convenience, the paper randomly selects 10,000 positive and 100,000 negative samples for the subsequent experiments. All experiments are evaluated in terms of accuracy (ACC) of total classes, precision (), recall (R), and F1-score (F1) of the positive class.
5.2. Settings
The paper conducts all ablations and comparisons using PyTorch 1.10.0 on the environments, as shown in Table 4.
The the main preset parameters of MVEPL are as Table 5 shows:
5.3. Performance
5.3.1. Ablation Studies
The core components in MVEPL that realize user recognition of devices on the Internet are label encoder, multiview embedding, and a unified transformer. To further show their effects, the paper conducts ablation studies(see Table 6). It highlights the best three models outperforming in terms of accuracy and F1. Also, the paper evaluates the cost time of the model, which is the seconds per epoch cost in training, using the parameters in Table 5 and the dataset in Table 3. As the results say, the conclusions are summarized as follows:(1)The performance of models with label encoders is much better than that of models without one because all models perform best in accuracy and F1 when using label encoders. The results prove that the early use of label information described in this paper can improve the performance of the user recognition models under the premise of avoiding “evaluation crossing.”(2)The multiview way to fuse the multisource features of devices performs best on the dataset. The unified transformer based on multiview embedding outperforms those using single-view or concatenated features. For example, the unified transformer based on multiview embedding without a label encoder performs at 0.8543 accuracy and 0.4653 F1. However, the best model based on a single-view is the unified transformer based on AS/subnet, which performs at 0.8387 accuracy and 0.4135 F1. Also, the unified transformer based on concatenated features, namely concat in Table 7, gains 0.6123 accuracy and 0.2007 F1.(3)Models in Table 6 based on cotransformers perform better than transformers to realize user recognition of devices, whether label encoder or multiview embedding is used. It is proven that the interaction information of the input embedding of two devices still plays an essential role in calculating their similarity for user recognition. Unified transformer combines the deep extract features of devices by transformer and their interaction information by cotransformer to gain the best performance, about 0.9158 accuracy and 0.6131 F1.(4)Table 6 also says the models on multiview graphs are much slower than those on single-view graphs. Models on multiview graphs need more calculations to get the node representation. Therefore, the cost time per epoch in the training of total is in the 50 s, much higher than those of rDNS, AS/subnet, location, and hardware/software. At the same time, a unified transformer generally performs best but needs more training time than a transformer or cotransformer. For example, the accuracy of a unified transformer without the label encoder is 0.8543. However, it needs more than the 20 s larger than cotransformer and transformer. At the same time, it must be noted that the label encoder occurs before the calculating node representation of graphs, so the cost time is not included in Table 6.
At the same time, the paper analyses how the ratio of labels masked in training affects the performance of node representations to realize user recognition on the dataset. In this experiment, 10%–90% of the labelled data in the training set is masked. The overall performance is shown in Figure 3.

As seen in Figure 4, the more labels masked in training, the worse the MVEPL performance is, even worse than the models that do not use label embedding. It is possible that in the training process of using label embedding, the fewer partial labels that are used in each epoch, the more difficult it is to obtain the information in the labels. On the contrary, the training of models may fail due to too many missing values in their inputs.

(a)

(b)

(c)
Significantly, the performance when the masked label ratio is 10% is worse than when the masked label ratio is 20%. There may be a small number of errors in the labels from SSL certificates, protocol banners, and existing manual labels in the early stage of user recognition of devices. Therefore, if these labels are masked, the overall performance of the representation of nodes could be easily affected and worsen.
5.3.2. Comparisons with GNNs
In order to verify the ability of MVEPL to vectorize the multidata of IP addresses for user recognition of devices, the paper introduces mainstream and typical GNNs to get node representation on homogeneous and heterogeneous graphs. For a homogeneous graph, the paper takes IP addresses as nodes and concatenates separate embeddings as node features. For a heterogeneous graph, the paper takes IP-IP, IP-rDNS, IP-Hardware, IP-Location, and IP-ASN as links and separate embeddings as node features relatively. Both graphs are based on the base graph. Then the paper inputs node representations from GNNs to the unified transformer to evaluate the ability of different methods for user recognition of devices. The results of comparisons with GNNs are shown in Table 7.
The results show that, in terms of all metrics, the proposed MVEPL significantly and consistently outperforms all baselines on the task of user recognition.
Compared with the models using concatenate features of nodes, the MVEPL achieves about 0.3 gains in accuracy and 0.4 gains in F1. That means the concatenating of node features is a very rough feature extraction that does not consider the graph structure, so it is challenging to achieve satisfactory recognition performance for users.
The best model on the homogeneous graph of devices is UniMP, with about 0.8952 accuracy and 0.5584 F1. The best model on the heterogeneous graph of devices is HGT, with about 0.8527 accuracy and 0.4839 F1. That means the multiview way to fuse multidata is more delicate than the way to vectorize the multidata as node features in a homogeneous graph and does not need to design the meta paths in the heterogeneous graph. Therefore, MVEPL designed in this paper outperforms the existing state-of-the-art models to realize user recognition.
At the same time, due to the GraphSAGE used in the framework, MVEPL is inductive and easier to adapt to web-scale and large-scale data.
5.3.3. Comparisons with TSMs
In order to further verify the MVEPL in user recognition of devices, the paper introduces the mainstream and typical TSMs to calculate the similarity of two devices to judge whether the same user uses them. All methods shared the output of the multiview embedding as the input features. The results of the comparisons are shown in Table 8.
As shown in Table 8, the performance of MVEPL is better than that of TSMs. The reason why MVEPL is competitive may be that those targeted models are built for text matching. While there is little or no correlation between the input features of IP addresses, the previous models used sequence models such as LSTM, which made the performance of MVEPL stronger.
5.3.4. Visualization
In order to verify the ability of MVEPL to represent multisource data of IP addresses for user recognition of devices, the paper takes the output of the unified transformer as the final representation provided by MVEPL. Then, the paper selects the IP addresses of the first 10, middle 10, and last 10 users, ordered by average address number, as the input of K-means. TSNE shows the clustering results in Figure 4.
We found that using the representation provided by MVEPL made it easier to cluster the IP addresses according to their users. It shows that MVEPL obtains a satisfying representation of IP addresses in “user space,” which means clusters of devices with the same user. Thus, the representation of devices from MVEPL could lay the foundation for judging whether two devices have the same user and inferring the unknown users of devices on the Internet.
6. Conclusion
As devices connected to the Internet become broader and more common in industry and life, how to effectively operate and protect the devices will become a top priority for the country and enterprises. Based on the multiview graphs of devices with IP addresses as identities, this paper proposes MVEPL to get the fine-grained embeddings of devices with GraphSAGE and Label Encoder to judge whether two devices have the same user with unified transformer. MVEPL can not only quickly and effectively use the multisource data of devices, but it can also use a small number of devices with manually known users to obtain a good performance in user recognition of devices. In the real world, we can find that MVEPL could obtain a better representation of devices and achieve a high performance of user recognition. The results show that this method can achieve 0.9158 accuracy and 0.6131 F1 to find devices with the same users on the constructed dataset.
The user recognition of devices is exciting, meaningful research, and helpful for network security. In the future, we will apply MVEPL to common IoT scenarios to enable numerous applications for users. On the one hand, user recognition of devices could be used in many network security applications, such as intrusion detection and locating IP addresses. On the other hand, the performance of MVEPL based on more multisource data need to be evaluated in a larger area, not just a city, to verify its performance for user recognition of devices on the Internet.
Data Availability
The data and materials of this study are available from the corresponding author or first author upon reasonable request.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
The paper is supported by National Key R&D Program of China (2020YFB2103803) and National Natural Science Foundation of China (No. 61931019, No. U1766215)