Automatic Extraction of Web Page Text Information Based on Network Topology Coincidence Degree

Shu, Zhinian; Li, Xiaorong

doi:https://doi.org/10.1155/2022/9220661

Wireless Communications and Mobile Computing

On this page

Abstract Introduction Conclusion Data Availability Conflicts of Interest References Copyright Related Articles

Special Issue

Advanced Information Security in Next Generation Wireless Communication-Enabled Internet of Things

View this Special Issue

Research Article | Open Access

Volume 2022 | Article ID 9220661 | https://doi.org/10.1155/2022/9220661

Automatic Extraction of Web Page Text Information Based on Network Topology Coincidence Degree

Zhinian Shu¹and Xiaorong Li¹

Academic Editor: Zhiguo Qu

Received04 Jan 2022

Revised09 Feb 2022

Accepted19 Feb 2022

Published11 Mar 2022

Abstract

In order to effectively solve the above problems, an automatic extraction method of web text information based on network topology coincidence degree is proposed. Search engine, web crawler, and hypertext tag are used to classify web text information, and then, dimensionality reduction is carried out. After processing, the similarity of different features of web page text information is calculated, the similarity is sorted, and the similar text information is extracted according to the correlation based on segment estimation. The experimental results show that the designed method can simplify the complexity of the associated information of the data set and improve the amount of data collection and the success rate of information collection.

1. Introduction

After decades of transformation and development [1, 2], computer technology and information technology have brought earth-shaking changes to human society, transforming human beings from the industrial age to the information age, and making people involved in the wave of information extraction, collection, storage, and analysis. In particular, the Internet as the carrier of information media has become a clear symbol of this era [3]. With the rapid growth of the Internet, the web has developed into a huge information service network containing a variety of information resources and sites all over the world [4, 5]. Network topology refers to the physical layout of devices interconnected by transport media. There is a specific physical, real, or logical, virtual arrangement of the members of a network. If two networks have the same connection structure, they have the same network topology. However, the significant overload of text information on web pages has brought great difficulties to information extraction [6]. Due to the unstructured and disorderly nature of information, people generally can only use full-text extraction to find the required information, so that the webpage containing the required information is filled with a large number of advertisements and irrelevant links, and useful information and useless information are mixed together, increasing the difficulty of correct positioning information [7]. In order to deal with these problems, an automatic extraction technology is urgently needed to help people quickly find the information they really need from the mass of information. The automatic extraction method of web text information is a good way to solve this problem.

Relevant scholars have put forward numerous studies. In reference [8], an adaptive parameter optimization model of infrared small target 3D information extraction based on particle swarm algorithm was proposed. Multiobjective particle swarm optimization algorithm was used to optimize the parameters of the 3D information extraction method, realizing the adaptability of the detection method in different detection scenarios. In the optimization algorithm, an adaptive environment selection strategy is proposed to enhance the ability of evolution and obtain high-quality solution sets. In addition, the inflection point selection strategy is designed to obtain the best parameters of the small target detection method. Experimental results show that compared with the baseline method, this method can detect small targets in different scenes accurately and stably. Reference [9] proposed accelerated training of depth information extraction system for cancer pathology report based on Bootstrap aggregation. Data used in machine learning consisted of free text from electronic cancer pathology report, and partitioned data training was carried out through multitask convolutional neural network and multitask hierarchical convolutional attention network classifier. Up to 40,000 models were generated by dividing a large problem into 20 subproblems, resampling training cases 2,000 times and training deep learning models for each guided sample and each subproblem. Many models were trained simultaneously in a high-performance computing environment in the laboratory. Compared with a single-model approach, model aggregation improves task performance. Although some progress was made in the study, but is not applicable to complex web page text information, based on network topology contact ratio for this web page text information extraction method automatically, as a web page text messages between the number of common neighbors to quantify information nodes of the network topology of coincidence degree, as the change of common neighbor between two points, the importance of text on web pages will also change.

2. Web Page Text Information Preprocessing Based on Network Topology Coincidence Degree

2.1. Automatic Extraction and Classification of Web Page Text Information

We classify the automatic extraction technology of web page text information, and the specific contents are as follows: (1)Search engine: refers to a service-oriented website that can input the desired information conditions in the search interface and extract them [10, 11]. Its principle is to find the content consistent with the information according to the information entered by the user, use some algorithms to process and analyze the web page text information, and store it in the database after integration [12]. The original data is characterized by strong relevance and uniform rules. When users search for information, they can directly feed back the search conditions(2)Web crawler: it is a program hidden in the search engine, which can search relevant web pages or download the page [13, 14]. It is like a spider crawler, it can switch between many web pages arbitrarily, so it is also called a web robot(3)Hypertext tagging: it is a technology for integrating information. According to the user’s requirements, the useful information searched can be spliced together for processing [15]. Therefore, users can use hypertext links to directly find the required text. Hypertext contains a lot of content, such as Chinese characters, pictures, videos and other information

Under the condition of network topology coincidence, the automatic extraction of web page text information needs the help of network control system. The diagram structure of control system is shown in Figure 1.

2.2. Dimensionality Reduction of Web Page Text Information

In the process of automatic extraction of web page text information under the degree of network topology coincidence, in order to ensure the accuracy of extraction, it is necessary to extract the relevant features of web page text information, and the features of web page text information usually consist of multiple dimensions, which will make the process of web page text information extraction too complex, thus reducing the accuracy of web page text information extraction [16, 17]. Because the dimension of web page text information contains a large amount of redundant data, it is necessary to reduce the dimension of web page text information, retain the main feature information, and eliminate the impact of redundant data [18]. The specific method is as follows.

Step 1: collecting web page text information under the network topology coincidence degree can form a web page text information matrix of , where is the number of web page text information and is the number of dimensions of web page text information. Therefore, the web page text information matrix can be described as a collection of dimensions, that is,

In formula (1), represents the mean vector of web page text information, and represents the specific quantity of web page text information.

Step 2: map the data of web page text information in high-dimensional space to low-dimensional space to reduce the characteristic dimension of web page text information. The formula is as follows:

In formula (2), represents the characteristic dimension coefficient of web page text information.

Step 3: set as the feature mean vector of web page text information, then

In formula (3), represents the transposition of the feature mean value of web page text information, and represents the average value of the feature dimension of web page text information.

Step 4: delete the smaller eigenvector or the larger eigenvector matrix in the web page text information [19, 20]. Assuming that the mean value of the feature vector of web page text information is , there is . Therefore, the feature mean vector of web page text information can be approximated as

In formula (4), represents the feature mean vector coefficient of web page text information.

Through the above methods, the feature dimension of web page text information can be reduced, redundant data can be deleted, and the main web page text information features can be retained, which provides an accurate basis for the extraction of web page text information.

3. Automatic Extraction of Web Page Text Information Based on Network Topology Coincidence Degree

3.1. Similarity Search of Web Page Text Information Based on PageRank

At present, the widely used search engine is Google, and its success is mainly due to its high-quality search results. Google interprets itself as “clearly understanding the meaning of users and meeting the needs of users” and has developed a breakthrough PageRank, which is an excellent web page relevance ranking algorithm [21]. PageRank algorithm mainly measures the value of the website through the number and quality of internal and external links. PageRank uses the hyperlink topology directed graph on the web to model the website and search the relevance of the web page [22]. Similarly, the PageRank algorithm is completed through the reference and referenced relationship between web page text information. The link is regarded as a reference to search for similar web page text information through the correlation between web page text information. Among them, the PageRank algorithm for relevance search of web page text information mainly has the following advantages: (1)There will be no citation between two articles in the web page text information. The PageRank algorithm can effectively reduce the process of cyclic calculation, and only simple nested calculation is required [23, 24](2)Most of the research is mainly to solve the practical problems in life, such as scattered topics and speech, which need to be carefully screened by experts in the corresponding field, so as to promote the web text information to have high quality and authority(3)The number of citations of web page text information shows the importance of the corresponding subject in the research field, but the hyperlink of web page can only play the role of navigation and guide users to browse web pages orderly according to the needs of designers [25].

PageRank algorithm is mainly used to analyze the link structure of web pages and mine the information of grid itself in the Internet network. At the same time, it is also called “hyperlink analysis.” The basic idea of the whole algorithm is assuming that there is a link to the web page in the web page ; it shows that the owner of thinks is more important, so some importance scores in are given to , and the importance score can be expressed as

In formula (5), represents the PageRank value of , and the PageRank value of is the accumulation of a series of page importance scores similar to .

A data model applied to information filtering and indexing is established as a vector space model, which plays a very important role in the text similarity calculation of search engine [26, 27]. The main method of similarity comparison in the vector space model is to calculate the cosine through the vector. The model is a very important key assumption, that is, the order in which the formed article entries appear in the article is not important, and they play an independent role in the theme of the article. Therefore, the web page text information can be regarded as a collection of a series of disordered entries [28]. In the vector space model, each index word is regarded as a component and is regarded as a multiple element vector space composed of index words. Generally, the phrase in which index words appear at least once in the file is called keyword. In the search process, the input extracted words can also be converted into multiple element vector space of the file after word segmentation and other operations [29]. Among them, the correlation between documents and search words is mainly obtained by comparing the angle deviation between vectors of different documents and extracted word vectors.

The vector space model takes the feature item as the coordinate represented by the document and represents the web page text information as a point in the multidimensional space in the form of a vector. It is mainly used to represent each single component in different vectors. Calculating the cosine value between components can judge the similarity between web page text information. The vector space model is shown in Figure 2.

In the vector space model, is set to represent a text set containing web page text information, that is,

in the set can be expressed as a vector as shown in

In formula (7), represents the weight of the th feature item in the web page text information . The vector space model represents the web page text information and query mode as a vector constructed with words as elements, and each word is weighted by word frequency and inverse text frequency. Then, the similarity between web page text information and query method is obtained by calculating the cosine angle between vector elements.

In search engine, the vector space model is mainly used to calculate the similarity between different web page text information. In the process of actual extraction, search engines need to classify the search content and mainly take the value of PageRank as the criterion for the preliminary classification of web page text information. Then, we search the inverted file table in the system and select the web page text information containing keywords at the same time. Finally, the similarity of different web page text information and extracted content is calculated by the vector space model [30].

Based on the above analysis, it is necessary to further analyze the architecture and workflow of the search engine: before users submit the extracted content to the query module, all web pages are analyzed and calculated through the search engine. When the vector space model based on the PageRank value is used for preliminary classification, it is necessary to consider the web page text information, calculate the similarity of different search results and search contents, and sort them [31, 32]. The specific operation flow of web page text information similarity search is shown in Figure 3.

In Figure 3, the search content refers to the extracted information input by the user, and the preliminary search refers to the preliminary extraction of web page text information [33]. When querying a large number of web page text information, it is necessary to adjust and expand some institutions in the search engine. The similarity of web page text information is sorted, and the line similarity web page text information search is realized according to the correlation between web page text information.

3.2. Feature Point Extraction of Web Page Text Information Based on Segment Estimation

On the basis of the above analysis, according to the information retrieval results, we extract the feature points of web page text information based on segment estimation and set a series of observation values of web page text information as

Formula (8) is called a web page text information sequence, represents the th observation value at any time point, and represents the observation value at the last stage of the observation time. There are many definitions of segmented estimation, such as local extreme points and edge points, but these are extracted from one-dimensional web page text information. There is still little research on multidimensional web page text information. At this stage, it is still in its infancy and needs further exploration [34]. The web page text information contains local extreme value feature points and nonlocal extreme value feature points, as shown in Figure 4:

(a) Local extremum characteristic point

(b) Nonlocal extremum characteristic point

As shown in Figure 4, each region of local extreme value feature points presents a symmetrical state, and nonlocal extreme value feature points presents an asymmetric state. Local important extreme points, points with large fluctuations in a short time, and the starting point and end point of web page text information are collectively referred to as feature points [35].

Given a web page text information sequence, the following definitions can be obtained from the perspective of extreme points: (1)Extreme characteristic point

Given a segmentation method, the neighborhood of point is , where represents the average value of and , that is, the number of rows of the matrix; represents the average value of and , that is, the number of columns of the matrix. Assuming that represents the minimum point in the domain , it is called the local minimum point. (2)Nonextreme characteristic points

In neighborhood ,

When or , assuming , it indicates that point fluctuates greatly in a short time.

After selecting local extremum feature points and nonlocal extremum feature points, the specific operation steps of web page text information feature extraction based on segment estimation are given below: (1)Based on segmentation estimation, the web page text information sequence is given, and the following segmentation method is given: , and the center point is (2)Add the segmentation method to the feature point set(3)The local extremum important points in different segmentation segments are selected through relevant definitions, and the feature point set is added at the same time(4)Select the points with large fluctuation in a short time in different segmentation segments and add the feature point set at the same time

3.3. Web Page Text Information Extraction Method under Network Topology Coincidence Degree

After the feature point extraction of web page text information based on segment estimation is completed, the web page text information extraction is realized under the network topology coincidence degree. Traditional web page text information extraction methods can only extract web page text information with the same or similar character expression forms and can not respond to the impact of the change of web page text information characteristics under the network topology coincidence degree, which reduces the accuracy of web page text information extraction. Therefore, web page text information is extracted under the network topology coincidence degree.

In the process of web page text information extraction, it is necessary to calculate the distance between feature vectors in web page text information feature space to measure the similarity between web page text information features [36]. The Euclidean distance method can extract web page text information. The Euclidean distance method needs to set the first web page text information as and the second web page text information as , and the feature vectors between the two information are and , respectively. Then, the text extraction formula is

In formula (10), is the Euclidean distance between two web page text information features and is the result of normalization.

3.3.1. Web Text Information Mining

The basic idea of web page text information extraction is to cluster the features of web page text information according to the similarity, each cluster center represents the main features of a web page text information and use the cross line method to match the features of web page text information, so as to realize the extraction of web page text information under the coincidence degree of network topology.

The implementation method of massive web page text information mining is to extract the features according to the text features and use the feature fusion method to extract the text classification and recognition from the massive network data text. The more mature text feature extraction method is to realize different weights for different words to represent the importance of the words in the document. In the weighting implementation method, the easiest weighting method is Boolean weighting, that is, when a word appears in the document, the weighting is 1; otherwise, the weighting is 0. The definition is expressed by the following formula:

In formula (11), represents the frequency of the word in the document ; represents the weighted result of the word. In the recognition process of text data mining, it is necessary to use digital normalization method to process it based on computer. Through normalization, the keywords in the document can be well classified and measured, and the important properties of each keyword can be characterized, so as to realize keyword recognition. The actual mining needs to be realized from the following aspects: (1)The training subject editing function displays the web page text in the form of parameters according to the requirements of the training subject, including web page text feature parameters, web page text quantity parameters, web page text property parameters, web page text category parameters, and web page text capacity parameters; the training plan refers to adopting optimization techniques under a certain mining environment according to the characteristics of web page text Technical measures against massive environment are expressed in specific forms. The key of simulation training system is that the training plan can be designed arbitrarily to artificially increase the intensity and difficulty of training, so as to achieve better training level effect(2)Combined with the performance of web page text mining, taking massive as the simulation object, the deep association mining technology is used to mine web page text data in real time to meet the planning requirements of training subjects(3)According to the training plan, simulate web page text mining and carry out optimization training by optimizing and adjusting technical measures. The optimization and adjustment technical measures should be combined with the expected performance of web page text mining to fully reflect the authenticity of optimization simulation mining(4)Assessment and evaluation: according to the effect information of the mined data, the operation steps are recorded, the rate of change and other characteristic parameters in the mining results are calculated, and the whole operation process is qualitatively or quantitatively evaluated. The key of examination and evaluation is to establish a reasonable evaluation system and evaluate the results of optimization training scientifically, so as to make the training more targeted

The process of extracting web page text information under the network topology coincidence degree is as follows: (1)Set the type of web page text information to be extracted, that is, the number of clustering centers , the weight coefficient of web page text information features, and determine the weight matrix and iterative processing times of web page text information feature attributes(2)The objective function is calculated according to the weight of the eigenvalue of web page text information(3)Set a threshold value of web page text information features for the expansion and change of web page text information features(4)The clustering center of web page text information features is updated, and the membership function is updated(5)Update the weight of web page text information features

According to the mining process of web page text information, the association framework of multi task learning is constructed, as shown in Figure 5:

According to the association framework structure constructed in Figure 5, we sort out the text features of the association structure output by the final framework. The relevance extraction process of web page text information is simulated according to the actual output relevance structure text features.

According to the method described above, the characteristics of web page text information are expressed in vector form, the characteristics of web page text information are selected by evaluation function, the dimensionality reduction of web page text information is realized, and the redundant data in web page text information is deleted, clustering the features of web page text information according to feature similarity, determining the objective function of web page text information extraction, and using constraints. The key of web page text information extraction under network topology coincidence degree is to minimize the value of objective function.

3.3.2. Feature Extraction of Web Page Text Information

With the rapid expansion of human knowledge, mankind has entered the civilized stage of information explosion. How to accurately extract web page text information under the network topology coincidence degree has become an urgent problem to be solved. Under the network topology coincidence degree, in order to realize the accurate extraction of web page text information, it is necessary to accurately extract the characteristics of web page text information. The characteristics of web page text information can be described in the form of vector, and its form is as follows:

In formula (12), represents the value of web page text information, represents the feature weight of web page text information, and represents the number of features.

The content of web page text information can be described by a spatial vector model. If the web page text information is long, the number of features of web page text information will be large, and the process of web page text information extraction will become extremely complex. Therefore, we need to select the main features to represent the web page text information and reduce the feature dimension of web page text information. In the process of extraction, the evaluation function is usually used to select the features of web page text information. The commonly used feature evaluation functions mainly include information gain, mutual information, and statistics. Among them, statistics can represent both positive and negative correlation between web page text information features and feature categories. The expression formula is as follows:

In formula (13), represents the length of web page text information, and , , and represent the probability of occurrence of feature and feature . Through statistics, appropriate web page text information features can be selected and the feature dimension of web page text information can be reduced, which provides a basis for the extraction of web page text information.

In the process of extraction, the clustering center of web page text information features and the weight of features are adaptively adjusted, and finally, the accurate extraction of web page text information is realized.

4. Experimental Analysis

In order to verify the comprehensive effectiveness of the automatic extraction method of web page text information based on network topology coincidence, in the hardware environment of Intel Celeron Tulatin1GHz CPU and 385 MB SD memory and MATLAB 6 0 software environment. The design method, reference [8] method, and reference [9] method are compared to verify the effect of automatic extraction of web page text information.

Automatic extraction methods are divided into new web page text information extraction, complex extraction, and simple extraction. The main purpose is to avoid too single extraction content. In the experimental test, the effect of the extraction method is verified. Increasing the complexity of the extracted content database in the experimental process can better reflect the completeness of the experimental data. The specific extracted content database is shown in Table 1.

The experimental environment consists of two computers connected to the Internet through network equipment. One computer is used as a web server to provide data, and the other computer is used to extract user information. The connection mode is set as routing mode. The computer CPU is Intel F4600, the hard disk is SATA 500 G, the memory is 2 G, the main frequency is 5 GHz, and the download speed of access network is 700 KB/S. The experimental data set selects the real information data of a province as the mobile phone 2G/3G/4G/5G network traffic data, which lasts for 30 days from November 1 to November 30, 2021. Each network traffic data includes the user’s mobile phone number. In the experiment, 9 relevant fields of the data set are selected, and the field format is shown in Table 2:

The Internet businesses concerned in the experiment are Baidu, Sina Weibo, Taobao, and QQ. The extraction parameter settings are shown in Table 3.

The 30-day user information contained in the data set is divided into three consecutive small sets according to the time period, with a time length of 10 days. A real user is identified by the mobile phone number. The three groups of experiments, respectively, dereprocess the data set and screen the repeated digital identities of users. We count the user de duplication number of the three methods and take the average value. The experimental results are shown in Figure 6:

It can be seen from Figure 6 that the number of repeated QQ digital identities in this province is the largest. The methods in this paper, reference [8], and reference [9] are 48%, 45%. and 42%, respectively. Microblogging followed, with 35%, 30%, and 27% of the three methods, respectively; Baidu and Taobao are the least, and about 1/2 of users have at least two digital identities. For different Internet services, the number of deduplication in this method is much larger than that in reference [8] and reference [9], which reduces the number of digital identities of the same user, simplifies the complexity of the associated information in the data set, and reduces the difficulty of information extraction to a great extent.

After the user digital identity is deduplicated, the user information of the data set is extracted, and the information collection effects of the three groups of experiments are compared through the collection success rate. The formula of acquisition success rate is

In formula (14), represents the number of link addresses accessed, represents the number of invalid link addresses, and represents the number of link addresses successfully extracted. When a complete piece of data information is extracted, the information data is stored uniformly. We count the number of access addresses, invalid links, and collection successes of the three methods, calculate the collection success rate of different services in the three periods of the data set, and take the average value. The comparison results are shown in Figure 7:

As can be seen from Figure 7, due to the high digital identity repetition rate of QQ business, the collection success rate of both methods in reference [8] and reference [9] declined to varying degrees, while the collection effect of the method in this paper remained stable. Moreover, the information collection effect of this method is significantly better than that of reference [8] and reference [9], with an average success rate of 94.6%. The average value of reference [8] and reference [9] is 77.9% and 73.8%, respectively. Compared with the two traditional experiments, the success rate of this method is 16.7% and 20.8%, respectively. We make the webpage text information of automatic extraction more abundant.

To sum up, the design method has good performance, which can reduce the difficulty of information extraction and improve the amount of data collection and the success rate of information collection.

5. Conclusion and Prospect

5.1. Conclusion

(1)The automatic extraction method of web page text information based on the coincidence degree of network topology simplifies the complexity of the associated information of the data set(2)This method reduces the number of digital identities of the same user, simplifies the complexity of the associated information of the data set, and reduces the difficulty of information extraction to a great extent(3)The designed method improves the amount of data collection and the success rate of information collection and enriches the automatically extracted web page text information

5.2. Prospect

As there are still many deficiencies in the study of time relationship, it is necessary to make in-depth research and analysis around the following aspects in the future work: (1)Further improve the web page preprocessing method before web page text information extraction, especially design a good denoising algorithm, so as to reduce the interference caused by web page noise and improve the extraction accuracy(2)In the future research, we will use rules to build the set to be predicted, deeply mine Web text information and browsing preferences, extract information more intelligently, and improve the recall rate of digital identity(3)Improve the text extraction algorithm around multimedia resources. The structure of web pages is complex and changeable, and it is difficult to extract the text of surrounding web pages. We take a better algorithm to further judge the relationship between surrounding web pages and multimedia resources and improve the accuracy of web page text information extraction

Data Availability

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Conflicts of Interest

The authors declared that they have no conflicts of interest regarding this work.

References

M. D. Pordeus, J. Junior, H. S. Venante, R. M. B. D. Costa, and V. C. Porto, “Computer-aided technology for fabricating removable partial denture frameworks: a systematic review and meta-analysis,” Journal of Prosthetic Dentistry, vol. 11, no. 2, pp. 1–10, 2021.
View at: Publisher Site | Google Scholar
G. C. Moore and I. Benbasat, “Development of an instrument to measure the perceptions of adopting an information technology innovation,” Information Systems Research, vol. 2, no. 3, pp. 192–222, 1991.
View at: Publisher Site | Google Scholar
A. Celik, N. Saeed, B. Shihada, T. Y. Al-Naffouri, and M. S. Alouini, “A software-defined opto-acoustic network architecture for internet of underwater things,” IEEE Communications Magazine, vol. 58, no. 4, pp. 88–94, 2020.
View at: Publisher Site | Google Scholar
R. Potdar, A. Thomas, M. DiMeglio et al., “Access to internet, smartphone usage, and acceptability of mobile health technology among cancer patients,” Supportive Care in Cancer, vol. 28, no. 11, pp. 5455–5461, 2020.
View at: Publisher Site | Google Scholar
K. Dinesh, S. Jennifer, V. J. Berrocal, R. M. Silver, C. Pedro, and S. L. Newbill, “Randomized controlled trial to evaluate an internet-based self-management program in systemic sclerosis,” Arthritis Care & Research, vol. 71, no. 3, pp. 435–447, 2020.
View at: Publisher Site | Google Scholar
S. M. Amini and A. Karimi, “Two-level distributed clustering routing algorithm based on unequal clusters for large-scale Internet of Things networks,” The Journal of Supercomputing, vol. 76, no. 3, pp. 2158–2190, 2020.
View at: Publisher Site | Google Scholar
R. Soong, A. Jenne, D. H. Lysak, R. G. Biswas, and A. Simpson, “Titrate over the internet: an open-source remote-control titration unit for all students,” Journal of Chemical Education, vol. 98, no. 3, pp. 1037–1042, 2021.
View at: Publisher Site | Google Scholar
X. Ren, C. Yue, T. Ma et al., “Adaptive parameters optimization model with 3D information extraction for infrared small target detection based on particle swarm optimization algorithm,” Infrared Physics & Technology, vol. 117, 2021.
View at: Publisher Site | Google Scholar
H. J. Yoon, H. B. Klasky, J. P. Gounley, M. Alawad, and G. D. Tourassi, “Accelerated training of bootstrap aggregation-based deep information extraction systems from cancer pathology reports,” Journal of Biomedical Informatics, vol. 110, no. 6, 2020.
View at: Publisher Site | Google Scholar
C. Li, J. Bai, L. Zhang, H. Tang, and Y. Luo, “Opinion community detection and opinion leader detection based on text information and network topology in cloud environment,” Information Sciences, vol. 504, no. 12, pp. 61–83, 2019.
View at: Publisher Site | Google Scholar
E. Chaix, L. Deleger, R. Bossy, and C. Nédellec, “Text mining tools for extracting information about microbial biodiversity in food,” Food Microbiology, vol. 81, no. 4, pp. 63–75, 2019.
View at: Publisher Site | Google Scholar
J. Sundell, T. Norberg, E. Haaf, and L. Rosén, “Economic valuation of hydrogeological information when managing groundwater drawdown,” Hydrogeology Journal, vol. 27, no. 4, pp. 1111–1130, 2019.
View at: Publisher Site | Google Scholar
A. Esuli, A. Moreo, and F. Sebastiani, “Funnelling,” ACM Transactions on Information Systems, vol. 37, no. 3, pp. 1–30, 2019.
View at: Publisher Site | Google Scholar
A. Makkar, N. Kumar, A. Y. Zomaya, and S. Dhiman, “SPAMI: a cognitive spam protector for advertisement malicious images,” Information Sciences, vol. 540, no. 11, pp. 17–37, 2020.
View at: Publisher Site | Google Scholar
D. Zhou, K. Sun, M. Hu, and Y. He, “Image generation from text with entity information fusion,” Knowledge-Based Systems, vol. 227, no. 5, 2021.
View at: Publisher Site | Google Scholar
A. Kang, L. Ren, C. Hua, H. Song, and M. Zhu, “Environmental management strategy in response to COVID-19 in China: based on text mining of government open information,” Science of the Total Environment, vol. 769, no. 142471, 2021.
View at: Publisher Site | Google Scholar
Y. Wang, Y. Wei, M. Zhang, Y. Liu, and B. Wang, “Make complex CAPTCHAs simple: a fast text captcha solver based on a small number of samples,” Information Sciences, vol. 578, no. 11, pp. 181–194, 2021.
View at: Publisher Site | Google Scholar
H. M. Wandabwa, M. A. Naeem, F. Mirza, and R. Pears, “Topical affinity in short text microblogs,” Information Systems, vol. 96, no. 2, 2021.
View at: Publisher Site | Google Scholar
F. Fkih and M. N. Omri, “Hidden data states-based complex terminology extraction from textual web data model,” Applied Intelligence, vol. 50, no. 6, pp. 1813–1831, 2020.
View at: Publisher Site | Google Scholar
N. S. Dahdaleh, “In reply: online ratings of neurosurgeons: an examination of web data and its implications,” Neurosurgery, vol. 85, no. 1, pp. E166–E180, 2019.
View at: Publisher Site | Google Scholar
G. C. Vilanilam, “Letter: online ratings of neurosurgeons: an examination of web data and its implications,” Neurosurgery, vol. 85, no. 1, pp. E165–E176, 2019.
View at: Publisher Site | Google Scholar
S. Goodman, A. Benyishay, Z. Lv, and D. Runfola, “GeoQuery: integrating HPC systems and public web-based geospatial data tools,” Computers & Geosciences, vol. 122, no. 1, pp. 103–112, 2019.
View at: Publisher Site | Google Scholar
D. Heinrich, S. Werner, C. Blochwitz, P. Thilo, and G. Svenl, “Hardware-aided update acceleration in a hybrid semantic web database system,” The Journal of Supercomputing, vol. 76, no. 10, pp. 7961–7984, 2020.
View at: Publisher Site | Google Scholar
B. T. Jin, F. Xu, R. T. Ng, and J. C. Hogg, “Mian: interactive web-based microbiome data table visualization and machine learning platform,” Bioinformatics, vol. 38, no. 4, pp. 1176–1178, 2022.
View at: Publisher Site | Google Scholar
C. Jarabak, T. Mutton, and D. D. Ridley, “Property information in substance records in major web-based chemical information and data retrieval tools: understanding content, search opportunities, and application to teaching,” Journal of Chemical Education, vol. 97, no. 5, pp. 1345–1359, 2020.
View at: Publisher Site | Google Scholar
M. Sánchez-Aparicio, J. Martín-Jiménez, S. Del Pozo, E. González-González, and S. Lagüela, “Ener3DMap-SolarWeb roofs: a geospatial web-based platform to compute photovoltaic potential,” Renewable and Sustainable Energy Reviews, vol. 135, 2021.
View at: Publisher Site | Google Scholar
B. Buncher and M. C. Kind, “Probabilistic cosmic web classification using fast-generated training data,” Monthly Notices of the Royal Astronomical Society, vol. 497, no. 4, pp. 5041–5060, 2020.
View at: Publisher Site | Google Scholar
R. Martin-Willett, Z. McCormick, W. Newman, L. D. Larsen, M. O. Torres, and L. C. Bidwell, “The transformation of a gold standard in-person substance use assessment to a web-based, REDCap integrated data capture tool,” Journal of Biomedical Informatics, vol. 94, 2019.
View at: Publisher Site | Google Scholar
M. Postma and J. Goedhart, “PlotsOfData—a web app for visualizing data together with their summaries,” PLoS Biology, vol. 17, no. 3, 2019.
View at: Publisher Site | Google Scholar
F. Gesualdo, F. Marino, J. Mantero et al., “The use of web analytics combined with other data streams for tailoring online vaccine safety information at global level: the Vaccine Safety Net's web analytics project,” Vaccine, vol. 38, no. 41, pp. 6418–6426, 2020.
View at: Publisher Site | Google Scholar
M. Sit, R. J. Langel, D. Thompson, D. M. Cwiertny, and I. Demir, “Web-based data analytics framework for well forecasting and groundwater quality,” Science of the Total Environment, vol. 761, no. 20, 2020.
View at: Publisher Site | Google Scholar
M. Kulawiak, A. Dawidowicz, and M. E. Pacholczyk, “Analysis of server-side and client-side web-GIS data processing methods on the example of JTS and JSTS using open data from OSM and geoportal - ScienceDirect,” Computers & Geosciences, vol. 129, no. 8, pp. 26–37, 2019.
View at: Publisher Site | Google Scholar
B. Wilson, N. Koizumi, A. Patel, C. Fraser, and A. B. Siddique, “306.2: analyzing network of organ sales and trafficking using web scraping data,” Transplantation, vol. 103, no. 11S, pp. S59–S60, 2019.
View at: Publisher Site | Google Scholar
E. Braekman, R. Charafeddine, S. Demarest et al., “Is the European health interview survey online yet? Response and net sample composition of a web-based data collection,” The European Journal of Public Health, vol. 30, no. 3, pp. 567–573, 2020.
View at: Google Scholar
K. Kellou-Menouer and Z. Kedad, “SchemaDecrypt++: parallel on-line versioned schema inference for large semantic web data sources,” Information Systems, vol. 93, no. 11, 2020.
View at: Publisher Site | Google Scholar
T. Mutton and D. D. Ridley, “Understanding similarities and differences between two prominent web-based chemical information and data retrieval tools: comments on searches for research topics, substances, and reactions,” Journal of Chemical Education, vol. 96, no. 10, pp. 2167–2179, 2019.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2022 Zhinian Shu and Xiaorong Li. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies