Abstract

With the rapid development of information technology, people's acquisition of tourism and other information is increasingly dependent on images and other information. Aiming at the low efficiency of traditional image retrieval methods in processing massive image data, an image retrieval method based on wireless communication network is proposed. Based on the salient area and wireless communication network combined with the hash method to extract local CNN features, simulate INRIA Holidays Data set and Oxford Buildings Data set, and calculate the accuracy and recall rate of search results based on the returned results of tourist attractions pictures. This article designs an experiment to verify the accuracy and recall rate of search results. By comparing the feature hash function to generate the hash code and the Hamming distance between each hash code in the image library, the image is queried, and the final search result is obtained: The more searches, the lower the accuracy and recall rate. This also proves to a certain extent that the CNN feature extraction technology can be used for travel image search, improving the search accuracy by 20%.The wireless communication network is still of great significance to the future social development. It is necessary to conduct in-depth research, not only the image retrieval of tourist attractions proposed in this article but also the potential value of wireless communication networks from multiple angles and more comprehensively.

1. Introduction

Since the beginning of the 21st century, electronic information technology has developed rapidly, and almost every household has more or less intelligent mobile terminals such as computers and mobile phones. The capacity of storage devices has also been refreshed again and again, and these have gradually changed the way people live. The content on the Internet is varied and brings a lot of bad information while bringing convenience to people. These include a large number of advertisements, unsightly pornography, and information that contradicts social values. The amount of image content is also growing at an explosive rate, which also poses a problem for us to quickly locate the desired image [1]. At the same time, thanks to the rapid development of hardware devices in recent years, wireless communication network technology, especially in the field of image processing, has received increasing attention from researchers. Especially as tourism has become an increasingly popular trend to meet people's spiritual entertainment needs, more and more people are willing to travel. However, finding suitable tourist locations has become a major obstacle to the development of tourism. With massive amounts of information, how to accurately retrieve images has become a major research hotspot.Therefore, the research on image retrieval system based on wireless communication network still has important theoretical and practical significance [2].

In recent years, domestic scholars and experts have also conducted useful research on content-based image retrieval technology. Compared with foreign research results, its theoretical research and application lags behind, but with the emphasis on Chinese research, the gap is getting smaller and smaller. As far as the current application status is concerned, domestic content-based image retrieval technology is mainly used to retrieve images from professional image libraries, and is being applied in the fields of industry, agriculture, medicine, national defense and entertainment. In general, with CBIR as an example, CBIR technology is far from intelligent and fully automated, and the technology is still immature. There are still many key technologies that need improvement. For example, feature extraction, semantic gap description, and performance optimization still have many issues that require more in-depth research [3].

An image retrieval system usually includes an image retrieval part, an image retrieval library creation part, a model creation part and a system maintenance part. Among them, image retrieval is the core of the function, the image retrieval database and retrieval model are the foundation of the entire system, and system maintenance is the guarantee [4]. Image retrieval means that the computer automatically retrieves the image most similar to the image to be retrieved in the image library. Therefore, we need to first create an image library that we want to retrieve, extract all the images in the library through the neural network model we build, and then save. Then input the image to be retrieved, extract high-dimensional features through the neural network model, and then perform feature matching with the image features in the library, and output similar images from high to low, thereby completing image-based retrieval on the wireless communication network. In order to achieve image retrieval of tourist attractions based on wireless communication networks [5], this paper proposes a local CNN feature extraction algorithm based on salient regions from the perspective of image understanding.

This article is based on the wireless communication network image retrieval system research, the advantages are: 1. Construct a convolutional neural network model to extract image features. 2. Use the hash function to generate a hash code with the help of image features, and compare the Hamming distance between the experimental image and the standard image hash. 3. Calculate the accuracy and recall rate of the image, and measure the image search results.

2. Method

2.1. Wireless Communication Network

In the field of information communication in recent years, wireless communication technology has been the fastest growing and most widely used [6]. Common wireless communication application equipment is shown in Figure 1.

The wireless communication realized on the move is also commonly referred to as mobile communication [7, 8]. People call the two together as wireless mobile communication. The application field of wireless communication technology is shown in Figure 2.

2.1.1. Cooperative Communication Technology

Cooperative communication technology originated from the research work on the information theory characteristics of relay channels in the 1970s [9]. The relay channel model they analyzed is shown in Figure 3, which includes three nodes: source node S, relay node R and destination node D, and pointed out that this model can be decomposed into broadcast channel and multiple access channel [10, 11].

2.1.2. Cognitive Wireless Network

Cognitive wireless network is the networking of CR [12]. The overall study of cognitive characteristics and wireless communication network is carried out. The working mechanism is shown in Figure 4. Cognitive wireless network has cognitive functions [13], can distinguish the current network status, and then adaptively learn, make decisions, and respond according to these network statuses. Its ultimate goal is to achieve end-to-end performance.

Dynamic spectrum management in cognitive wireless networks, also known as dynamic spectrum allocation, mainly uses information obtained from spectrum detection to perform spectrum analysis and spectrum decision-making. Its goal is to combine power control to provide an effective and adaptive use of wireless spectrum Resource methods and strategies [14, 15]. The wireless cognitive network is composed of a group of primary users (PU) and a group of secondary users (SU). PU is an authorized user in the network and has a higher priority than SU, and SU cannot interfere with PU's communication.At present, dynamic spectrum management research mainly involves unlicensed models, secondary utilization models of licensed spectrum, and market models, as shown in Figure 5. The unlicensed model requires SU to use idle frequency bands in accordance with a certain spectrum etiquette; the secondary utilization model of licensed spectrum allows SU and PU to share the licensed spectrum when the SU does not cause harmful interference to the PU [16]; the market model allows the trading of licensed spectrum, The right to part of the authorized spectrum can be transferred according to market rules formulated in advance.

QoS (quality of service) refers to the ability of a network to use various basic technologies to provide better service capabilities for specified network communications. It is a security mechanism for the network and a technology used to solve problems such as network delay and congestion. In the process of spectrum decision-making, first extract the characteristics of each frequency band according to the statistical information of the authorized network and the local observations of the SU [17], and then select the best frequency band according to the user's QoS [18] requirements and frequency band characteristics. Figure 6 shows the rule-based dynamic spectrum decision-making.

The cooperative relationship takes place between SU and PU, as shown in Figure 7. As a cooperative relay of PU, SU actively participates in the communication process of PU. While ensuring the communication quality of PU, it may obtain the opportunity to transmit its own information [19].

Cooperative diversity technology is applied in cognitive wireless network [20], so that two or more SUs cooperate with each other, as shown in Figure 8. In the Oberlay-type [21] spectrum sharing system, SU can only opportunistically access idle frequency bands.

2.1.3. Wireless Communication Technology

Bluetooth [22, 23], ZigBee, and Wi-Fi are three popular short-range wireless communication protocol standards. The network standards adopted by the three wireless communication technologies of Bluetooth, ZigBee, and Wi-Fi are IEEE802.15.1, IEEE802.15.4, and IEEE802, respectively, and their main features are shown in Table 1.

As a layered reference model, the TCP/IP protocol is different from OSI's layering. Figure 9 shows the comparison between the OSI reference model and the TCP/IP protocol model.

2.2. Wireless Communication Network Convolutional Neural Network AlexNet Model

Wireless communication network mainly includes convolutional neural network (CNN), recurrent neural network (RNN) [24, 25] sparse self-coding network, deep belief network, and other models. Convolutional neural network (CNN) is a type of feedforward neural network that includes convolution calculations and has a deep structure. Recurrent neural network (RNN) is a type of recurrent neural network that takes sequence data as input, recursively in the evolution direction of the sequence, and all nodes are connected in a chain. It has certain advantages when learning the non-linear characteristics of the sequence. The sparse autoencoder is an unsupervised machine learning algorithm that continuously adjusts the parameters of the autoencoder by calculating the error between the output of the autoencoder and the original input and finally trains the model. Deep belief network is a network of graphical knowledge representation and reasoning model based on probability composed of multiple layers of neurons. Among the different types of wireless communication network models, the convolutional neural network (CNN) is the most in-depth research and application. A convolutional network refers to a neural network that uses convolution operations instead of general matrix multiplication operations in the network structure [26, 27].

Convolutional neural networks make full use of local features such as the data itself by combining local perception areas, shared weights, and spatial or temporal pool downsampling. Sharing weight means that for an input picture, use a convolution kernel to scan the picture. The number in the convolution kernel is called the weight. Each position in this picture is swept by the same convolution kernel, so the weight is the same, that is, sharing. Sampling is divided into downsampling and upsampling. Downsampling is to reduce the image, it can make the image fit the size of the display area and generate a thumbnail of the corresponding image. Upsampling is to enlarge the image, which enlarges the original image so that it can be displayed on a higher-resolution display device. Due to this structural feature, it is particularly suitable for machine learning of large image data, which can reduce the number of image recognition problems.

The main structure of a convolution neural network includes a convolution part and a fully connected part. The convolution part includes a Convolutional Neural Layer, an activation layer and a down-sampling layer. Features are extracted through the superposition of the convolution part; the fully connected part connects the feature extraction and output calculation loss to complete the recognition and classification functions.

Convolutional neural network is the first robust wireless communication network method that successfully adopts multilayer hierarchical network. Convolutional neural networks are highly adaptable and good at mining local features of data, making convolutional neural networks one of the research hotspots in many scientific fields.

This paper uses the 8-layer convolutional neural network AlexNet model of Alex et al. The AlexNet model is a model designed by Hinton, the winner of the 2012 ImageNet competition, and his student Alex Krizhevsky. Its innovation lies in the successful application of the ReLU activation function, the Dropout mechanism, GPU accelerated training and data enhancement strategies, and the proposed LRN (local response normalization). It also uses overlapping maximum pooling instead of average pooling, successfully avoiding the blurring effect for image retrieval. AlexNet network structure parameters are shown in Table 2. Assuming that the weight and offset are represented by and b, respectively, the linear prediction of the ith category can be expressed as

2.3. Local CNN Feature Extraction Based on Saliency Regions

The sift feature is a very common local descriptor [28]. It condenses image information into 128-dimensional feature vectors by constructing scale space, extracting key points, and generating descriptors. Using image understanding theories and models related to wireless communication networks, the salient areas of the image are extracted and characterized, and local features similar to sift are generated. That is, local CNN features extraction based on salient regions (LCF-SR) [29].The sift feature extraction operation can be divided into five steps: one is the generation of the scale space; the second is the detection of extreme points in the scale space; the third is to accurately locate the extreme points; the fourth is to specify the direction parameters for each key point, and the fifth is the key point description The generation of the child.

2.3.1. Image-Based Understanding Training CNN + RPN + LSTM Model

This model is used to extract areas of interest in the image. The network structure consists of CNN network, RPN positioning layer, simple identification network, and LSTM language model.

(1) CNN Network. CNN uses the VGG-16 structure, which consists of 13 layers of 3 × 3 convolutions and 5 layers of 2 × 2 maximum pooling [30]. In this paper, the final fully connected and classified layers are all deleted. Thus, an input image becomes a collection of series of feature maps after passing through the CNN network, among them:

(2) Positioning Layer. After receiving the input feature map, the positioning layer locates the region of interest according to the feature map and extracts a representation of the appropriate length from each region. The input of the positioning layer is generated by the CNN network, and then outputs three kinds of information: (1) A B × 4 matrix contains candidate regions of the region boundary information; (2) region score, the higher the score, the more likely it is selected as the saliency region and; (3) the region feature: the output of this layer is characteristic stream, each region is represented as a feature of .

(3) Identifying the Network. The identification network is actually a fully connected neural network that is used to process regional features from the localization layer. Each C × X × Y region feature is first expanded into a vector and then passed through two layers of fully connected layers. The fully connected layer contains ReLU (rectified linear units) activation functions and dropout regularization [31]. ReLU, also known as modified linear unit, is an activation function commonly used in artificial neural networks and usually refers to a non-linear function represented by a ramp function and its variants. Dropout reduces the size of the neural network by randomly dropping some neurons during the training process to prevent overfitting. When dropout propagates forward, the activation value of a certain neuron will stop working with a certain probability p, which can make the model more generalized because it will not rely too much on some local features. Finally, each region feature is encoded into a feature vector of D = 4096 dimensions. The feature vectors of the selected salient regions form a matrix of B x D dimensions. This feature matrix is an input to the LSTM language model and is a feature matrix for encoding in image retrieval tasks.

(4) The LSTM Language Model. Training models and the selection of salient regions are based on image understanding [32]. The label of an image is a lot of sentences described in natural language, so end-to-end training based on image understanding can be achieved. The LSTM language model follows the recognition network and enters the feature matrix of the B × D dimension. Specifically, for the marked training sequence , we use T + 2 word vectors as input to the LSTM. Among them, , the area coding of the linear layer is connected to the non-linear RELU layer, is a special start mark, and encodes each training feature , . The LSTM calculates the hidden state sequence and the output vector , , , by a recursive formula. The length of is , V is the dimension of the marker sequence, and the extra length 1 is the end symbol.

2.3.2. Significant Region Selection and Coding

(1) Bounding Box Regression. Each point in the dimension convolution feature map is mapped to the original image of dimension . Assuming that there are k target regions in the original image, k frames of different aspect ratios are selected around each projection point of . The positioning layer gives a confidence score and 4-dimensional position information to the k regions, center point coordinates (x, y), width and height (w, h). This produces an output of the dimension, including confidence scores and position information for each region.

The general target window G's ground truth is represented by a four-dimensional vector (x, y, w, h) that represents the coordinates of the center point and the width and height of the target window. Assuming that the original window obtained by the prediction is P, then we need a relationship during the training so that the original input window P is mapped to obtain a regression window G′ which is closer to the real window G. That is, given , a mapping f is sought such thatand there is

The idea of getting the transformation is(1)First translate the input windowto get:(2)Scale the input window

The corresponding transformation is

The input in the border regression algorithm is the four-dimensional coordinate P =  of the original window, the window coordinate of the Ground Truth, and the prediction window G′ is obtained by the four transformations of .

(2) Screening Area. Next, we need to select B significant areas from the k areas. Nonmaximum suppression, referred to as NMS algorithm for short. It is an effective method to obtain the local maximum. During training, if the same target area appears to be surrounded by multiple candidate frames, a nonmaximum suppression algorithm is needed to remove the lower score candidate frames to reduce the overlap box. The candidate area is divided into a positive area and a negative area. IoU (intersection over union) is defined as the overlap ratio of two candidate boxes, IoU = (A∩B)/(A∪B). If an area and a target area have an IoU > 0.7, it is a positive area; if it and all target areas have IoU values less than 0.3, it is a negative area. Based on the credibility score for each region, a nonmaximum suppression algorithm is used to screen out B significant regions. Nonmaximum suppression is widely used in object detection. The main purpose is to eliminate redundant candidate frames with low confidence to ensure that the best detection position is found.

(3) Bilinear Interpolation. In training, gradients can propagate from output features to input features, but not in saliency regions. Therefore, bilinear interpolation is used instead of the RoI pooling layer. Bilinear means to perform linear interpolation in two directions (first linear interpolation is used in one direction, and then linear interpolation is used in the other direction to perform bilinear interpolation. Although each step is in sample value and position All are linear, but the interpolation is not linear in general, but quadratic at the sampling position. Feature representations of the same size are extracted for each significant region, which can be linked to subsequent recognition networks and LSTM language models.

Setto convolve with core k:

In bilinear interpolation, the kernel function k has the following representation:

Thus, the sampling grid becomes a linear function of the saliency region, and the gradient can predict and train the candidate salient regions by backpropagation. Bilinear interpolation can extract a fixed size feature of for all candidate regions for input to the recognition network.

2.3.3. Area Coding, Extracting Feature Vectors

To get the features of theBCXY dimension, you need to encode it. The identification network is a fully connected network that can perform preliminary coding of features to obtain features of B × 4096 dimensions. Encoding is performed using sum pooling.

The principle of the sampling algorithm is for the feature C of the B × 4096 dimension, B represents the number of significant regions, and 4096 represents the feature dimension of each significant region. First, for each feature of the feature, calculate the sum of its eigenvalues over all significant regions:

The feature is then encoded as the ratio of the sum of each feature dimension, i.e.,

The dimension of the eigenvector F is 4096, and then we reduce it by PCA and adjust it to get the final eigenvector. Finally, according to the above method, feature vectors are extracted from all sample images to construct a search library.

2.4. Signal Reconstruction Algorithm

The signal reconstruction of the compressed sensing theory refers to the reconstruction of the original signal a or its equivalent sparse representation d from the observation vector b according to the known observation matrix ϕ and the transformation basis matrix ψ, which is to find the satisfaction

The reconstruction algorithms that have appeared so far can be divided into three categories: algorithms based on convex optimization, greedy pursuit algorithms, and combined algorithms.

As long as the reconstruction algorithm based on convex optimization obtains the sparsest solution by adding constraints, the norm constraint is the most commonly used. First give the definition of p-norm:

The p-norm of the vector is

In the traditional processing method, the method of solving the minimum norm is generally used to solve this recovery problem, namely:

This optimization can get a fixed solution

However, the solution that the minimum norm optimization method cannot obtain does not have N-sparseness, but a vector containing many nonzero elements.

When p = 0, the norm is obtained, which means the number of nonzero elements is in a. On the premise that the signal a is sparse or compressible, the problem of finding the definite solution of the underdetermined equations b = δd can be transformed into the problem of finding the minimum -norm:

However, the solution process needs to list all possible linear combinations of for all nonzero term positions in the signal a in order to obtain the optimal solution. Therefore, the numerical calculation of (20) and (21) is extremely unstable and it is NP difficult.

Solving a simpler optimization problem can produce equivalent solutions:

2.4.1. Wireless Communication Network Combined with Hash Method

In order to solve the problem of high-dimensional retrieval in large image libraries, an image-based hashing method has emerged by measuring retrieval speed and storage space. The hashing technique can use a hash function to extract a sequence of binary bits representing image features, also called hash encoding, from the image data. Since the hash code is identical to the binary representation used by the computer, the retrieval efficiency and storage space are greatly improved. In the image processing process, various transformations are usually performed on the image, and the traditional hash function is very sensitive to this transformation, which is not conducive to image processing.

In order to apply hash coding in the content-based image retrieval process, we require the hash function to satisfy as much as possible: (1) the hash coding of similar images is as similar as possible; (2) the hash coding of different content images is as different. In the retrieval process, the search results are obtained by calculating the Hamming distance between the retrieved image and the training image in the image library and sorting by size. It can be seen that the hashing technique has great convenience.

The structure of the image retrieval using the hash method is as shown in Figure 10. The image feature extraction technique extracts features from the image in the query image and the image library, generates hash codes by using these feature hash functions, and then compares the query images with The Hamming distance between each hash code of the image in the image library, and the final search result is obtained.

3. Data Sources

In order to evaluate the performance of the proposed method for the image of tourist attractions, we conducted experiments on two image data sets. The data sets are INRIA Holidays Dataset and Oxford Buildings Dataset. INRIA Holidays Dataset is a dataset used by Herve Jegou et al., which is a photograph taken by the Institute during regular holidays (landscape dominated). A total of 1491 images, 500 queries (one map and one group) and corresponding to 991 related images. Oxford Buildings Dataset is a VGG group in Oxford that collected images of 5062 building attractions from Flickr. These images can be applied to the research of image retrieval technology of tourist attractions based on wireless communication network.

4. Results and Discussion

4.1. Result 1: INRIA Holidays Data Set

Select a church main entrance tourist attraction image in the data set as the query image, as shown in Figure 11.

Among the pictures belonging to this attraction, the similar pictures have the following 4 pictures (excluding Figure 11).

The image query is carried out with Figure 12 as input, and the image retrieval result of the tourist attraction is obtained based on the local CNN feature of the saliency region. Due to the large number of search results, the results of the search here are taken as examples, and some of the query results are shown in Figure 13 (not including Figure 11 as the query image).

4.2. Result 2: The Oxford Buildings Data set

The data selected this time comes from The Oxford Buildings. In the same way, select a picture of a tourist attraction as a query picture, as shown in Figure 14.

Feature extraction of image in query image and image library by image feature extraction technology and hashing of these features by hash function. The Hamming distance between the search image and the hash code of the image in the image library is then compared to obtain the final search result. Part of the results of the image retrieval of the tourist attraction is shown in Figure 15.

4.3. Result 3: Precision and Recall Results

(1)After many experiments, with the different number of scenic spot recognition pictures, this article calculates the odd accuracy and return rate, the numbers are 3, 5, 8, 10, and the results are shown in Table 3.(2)In the image retrieval of tourist attractions, it is necessary to search the recall and precision rates obtained from different scenic spots to calculate the average recall rate and precision rate of multiple scenic spots, which are used to indicate the overall performance of the search results. Under the premise of returning the same number of pictures after each attraction search, multiple tourist attractions are retrieved, respectively, and the number of searches is 50, 100, 200, 400, 800. The average recall rate and average precision of tourist attractions are calculated separately. The results obtained are shown in Table 4 and Figure 16.

It can be seen from the results in Table 3 that as the number of retrieved tourist attractions increases, the average accuracy rate and average recall rate are both decreasing. It shows that the tourism image retrieval technology based on wireless communication network is stable and feasible. But this also suggests some shortcomings of this technique: the accuracy of the results is not high. While reducing the number of image retrievals, we should also optimize the image retrieval technology to further improve the accuracy rate and meet people's needs for accurate image retrieval.

5. Conclusions

Based on the above background, this paper studies the application of wireless communication network in image retrieval of tourist attractions. Based on the related theories of image retrieval and wireless communication network, this paper first proposes a local CNN feature extraction algorithm based on saliency regions from the perspective of image understanding. It mainly studies the extraction of local CNN features, how to use the local detail information contained in the image to improve the accuracy of image retrieval, and realizes the localization and description of the key areas in the image. It was verified and analyzed in the same category of image retrieval tasks. Second, wireless communication network combined with hashing. The experiment first calculates the classification effect of the hash code, and then calculates the average precision by calculating the Hamming distance between the hash code of the image feature and the hash code of the image feature in the image library. By comparing the results, the precision and recall of the algorithm are analyzed. Through experiments, the above method has achieved better retrieval results in the image retrieval task, which can further improve the accuracy of image retrieval and has strong practicability. Wireless communication network will still be of great significance to the future social development. But there are also certain problems: the samples selected in the experiment are large and may not be representative. The experiment is established on the basis of previous experiments, and its scientificity needs to be evaluated. The experimental data have not been verified, and there may be a problem of unreliable data. It is necessary to conduct in-depth research, not only in the image retrieval of tourist attractions proposed in this paper but also to develop the potential value of wireless communication network from multiple angles and more comprehensively.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the major project of Philosophy and Social Sciences Research of Colleges and Universities in Hubei Province: “Research on environmental construction strategies and methods of villages and towns in Hubei from the perspective of antiurbanization” (20zd061).