Abstract

With the gradual improvement of the quality of life, taste, and ecological and environmental awareness of urban residents in China, the environmental landscape of residential areas has gradually become a hot spot. At present, the level of the residential environmental landscape has become a necessary means for real estate developers to publicize products and improve economic benefits. Although many residential areas have invested a high cost in constructing environmental landscapes, there are always some deficiencies and defects in the design and implementation of environmental landscapes in residential areas due to various reasons. Therefore, to ameliorate the low efficiency and high cost of manual processing of landscape images, a Fully Convolutional Network (FCN) model based on the traditional Convolutional Neural Network (CNN) is designed for semantic segmentation of landscape images to deal with the excessive amount of landscape elements in landscape image processing. The deconvolution method is utilized to realize pixel-level semantic segmentation. Besides, the image preprocessing method enhances the data to prevent overfitting from commonly occurring in FCN. Moreover, the model two-stage training method ameliorates long training time and complex convergence in deep learning. Finally, three upsampling network structures, i.e., FCN-32s, FCN-16s, and FCN-8s, are selected for a comparative experiment to determine the most suitable network. The experimental results demonstrate that the FCN-8s upsampling network structure is the most prominent; it attains a pixel accuracy of more than 90%, an average accuracy of 88%, and an average Image Understanding of 75%. The three values are the highest among the three upsampling structures, indicating that the FCN-8s can realize accurate landscape image processing. Besides, the recognition accuracy of FCN for landscape elements reaches 90%, 25% higher than that of CNN. This method is effective and accurate in classifying landscape elements, improves the classification accuracy intelligently, and significantly reduces the cost of landscape element classification, which is feasible.

1. Introduction

With the application of computer science to landscape architecture, landscape architecture design method tends to be of high-efficiency, fast, accurate, variegated, and easy to modify, bringing the science of landscape architecture to a new era [1]. Natural landscape photos are the most common and widely used image data in people’s daily life, and there are increasing landscape images on the Internet. In addition to the role of appreciation and adjustment, such data are also widely used in the planning and design and landscape classification of landscape architecture. Therefore, it is essential to correctly classify the landscape elements in natural landscape images to improve landscape architects’ reading and search speed [2]. So far, the primary image classification technology is limited to the underlying visual features of images [3].

Nowadays, the continuous innovation of science and technology, especially computer technology, has brought hope for large-scale image data processing by computers [4]. Digital image recognition technology is an innovative technology that can identify experimental content by computer based on digital image processing technology [5]. Image recognition technology appeared in the middle of the 20th century and was first applied in aerospace exploration. It mainly went through three growth stages: text recognition, digital image processing/recognition, and object recognition [6]. In the late 1980s, this kind of technology appeared in different fields. The landscape design in many residential areas in China does not accord with the people-oriented view in many aspects. For example, some communities’ road network design excessively pursues artistry, leading to complex road bends and loss of convenience, increasing walking distance. Residents often fall on rainy and snowy days because the pavement materials are polished granite or nonslip floor tiles. For another instance, there is no special access or facilities for people with disabilities in the community. Some residential areas do not open unique activity venues for the elderly and young children; sometimes, they have to awkwardly watch young couples do a public display of affection. Some residential areas have no outdoor space for shelter, and residents cannot go out on rainy or hot days. To achieve a people-oriented landscape, the designers should perform a thoughtful and humane design by fully understanding the residents’ age structure, occupation, life, work habits, and physical requirements. In this way, residential landscaping and leisure facilities can respect and consider every detail of human activities and enable the residents to feel the comfort of a humane space.

Performance evaluation indicators, also known as evaluation factors or evaluation items, refer to the aspects from which to measure or evaluate the work performance of the evaluation object, determining what to evaluate. The evaluation indicators involve all aspects of the evaluated person’s performance in the performance evaluation process. Performance evaluation indicators measure whether the actual behavior results of the evaluated person have reached the performance objectives or to what extent. Establishing relevant evaluation indicators for a specific performance goal is essential, including quantity, quality, timeliness, cost, and outcome. Therefore, the performance evaluation indicator is not a single index but an evaluation index system composed of multiple relevant evaluation indexes. Performance evaluation indicator is the critical factor affecting the objectivity and accuracy of evaluation results. It is necessary to establish a set of scientific and comprehensive performance evaluation indicator systems for accurately and scientifically evaluating the performance of the public sector. In recent years, innovative deep learning (DL) technology has been popular in the context of big data. Convolutional Neural Network (CNN) is a representative DL network, which contains unique processing methods and has participated in many fields and scenarios. Here, the Fully Convolutional Network (FCN) model based on the traditional CNN is adopted to classify the landscape elements in the landscape image, and a comparative experiment is performed to prove the model’s validity. The FCN model for landscape element recognition reported here has the advantages of a high degree of automation, fast speed, and wide application range, which is conducive to the digital information of landscape architecture and can facilitate the promotion of digital landscape. Most previous studies have studied and improved the garden design concept from actual situations or psychology. This study intends to combine neural network technology with the garden design method and use the design concept to improve the garden design defects fundamentally. This innovative scheme can fundamentally improve the work in this field. Therefore, the research results can provide a new perspective judgment and basis for the garden design criteria. Firstly, this paper theoretically expounds on the image recognition technology, introduces some key technologies to reduce noise and remove miscellaneous information, and leads to the related concepts of image and processing. Then, the operation mode of FCN illustrates the advantages of CNN. The segmentation technology process is introduced to illustrate the design process of the garden. The FCN model is trained and tested to obtain the training data results, thus drawing the research conclusion.

2. Literature Review

Zhang et al. [7] studied wavelet analysis and applied this method to image decomposition and reconstruction, indicating that the image analysis method has made a significant breakthrough in mathematics. Setti and Wanto [8] proposed a backpropagation algorithm to predict the most significant number of Internet users globally, meaning that the research on neural networks is gradually maturing. At present, image processing technology is developing rapidly. For example, Agrawal and Jayaswal [9] used a support vector machine to learn network diagnosis and classify bearing faults.

The landscape element classification belongs to image recognition. As a classical recognition application of high-dimensional big data sets, image recognition regards the image to be tested as a high-dimensional random vector, maps the obtained data to the low-dimensional feature space through linear or nonlinear methods, and finds the dimensional structure in the hidden low-dimensional high-dimensional data. Ding et al. [10] collected 1,257 gardeners’ social capital perception questionnaires. The authors applied factor analysis and regression analysis in the statistical analysis stage to analyze the role and relative importance of different factors and social capital. They found that the social capital level was significantly influenced by the integration with green infrastructure, accessibility, scale, visual openness, planting form, barren landscape, agricultural infrastructure, and intelligent infrastructure. From the psychology perspective, Zhang and Li [11] identified the influencing factors of residents’ sense of security and comfort of the public garden landscape and put forward some countermeasures to improve public garden landscape design. They suggested that landscape designers should focus on increasing the attraction of public landscapes and comprehensively consider landscape’s ornamental, living, and leisure. The previous studies lay a theoretical foundation for improving public landscape design and residents’ quality of life.

3. Research Method

3.1. Image Recognition Technology

Human beings primarily obtain external information from images. With the development of computer science, digital image processing technology has been widely used in various fields such as industrial production, physical health, and traffic safety [12]. The information in the landscape image is input into the landscape element recognition system through computer programming to automatically recognize landscape elements [13]. The core part of digital image processing is image recognition. The specific image recognition process contains feature extraction of landscape images after preprocessing, some standards (such as Bag-of-Words models and neural networks) used to classify landscape elements on images, and completing the task of identifying landscape elements [14]. Figure 1 illustrates the general composition of the landscape image recognition system.

In Figure 1, image preprocessing aims to show image information more clearly, reduce noise, remove redundant information, and provide high-quality images for subsequent operations [15]. The function of image feature extraction is to study how to extract more efficient features from the original image and realize a precise mapping relationship between images in a particular feature vector or spatial vector. The advantages, disadvantages, and stability of feature extraction directly impact the recognition system’s performance. Therefore, choosing images with features such as concentration and robustness is essential to better adapt to changes in environment, space, and scale [16]. Classifier design and training map an image feature to a feature vector or space and then use some decision rules to classify the low-dimensional feature space to obtain the most accurate classification results [17]. The trained classifier predicts the landscape image and identifies the categories of landscape elements in the landscape image to get the recognition results [18].

3.2. Image Preprocessing

The purpose of image preprocessing is to improve the data intensity of the whole model during training. Improving data intensity refers to using some methods or techniques to increase the number of training samples and enhance the richness of training samples [19]. Especially for DL, improving data intensity is an indispensable technical means. The advantage of improving data intensity is that, in model training, the growth of data amount and data richness will significantly reduce the overfitting of model training [20].

There are many ways to improve data intensity, among which translation, rotation, scaling, and flipping are commonly used [21]. These four methods are used to train the network structure to complete classifying landscape elements in images. The label of the input landscape image is the category of the landscape element. While increasing the data intensity, the label of the landscape element will not change, so the operation order is not irrelevant. The actual operation order in the experiment is rotation, translation, scaling, and flipping. After increasing the data intensity, it is necessary to perform sample randomization processing on the landscape images ready to be input to the model for training. Specifically, an experimental sample is randomly selected from the training set as the input. The following section designs the structure of FCN reported here.

3.3. Structure of FCN
3.3.1. Convolution

Unlike the traditional CNN, including AlexNet [22] and LeNet [23], which require input images to meet their set conditions, FCN has no strict requirements on the size of the input image. Figure 2 reveals the traditional CNN structure. The first layer to the fifth layer is the convolution layer, and the last three layers are one-dimensional vectors. The length of the sixth and seventh layers is 4,096, and the length of the eighth layer is 1,000, representing the category’s probability value.

The difference between FCN and CNN is that FCN aims to classify pixels in landscape images [24]. CNN and FCN are also different in structure, mainly the network’s last layer. After the convolutional layer of CNN is connected to the fully connected layer, a one-dimensional vector with a fixed size is finally obtained and normalized by softmax [25]. However, the last layer of FCN is the convolutional layer, so the final output result is an image with labels. The network structure shown in Figure 2 is modified to replace the last three layers by the convolution layers with convolution kernels of (4096, 1, 1), (4096, 1, 1), and (1000,1, 1, 1), respectively, to constitute the corresponding FCN, as presented in Figure 3.

The process of replacing the last three fully connected layers with convolution layers is the convolution mentioned above. The structure of the fully connected layer is similar to that of the convolution layer. However, the neurons in the convolution layer can only accept the data of a specific region in the previous layer, and the neurons in the same layer can only use one parameter [26]. The similarity of the two layers is to use the dot product calculation method and have the same function expression. In this way, the fully connected and convolution layers can be converted to each other to find the fully connected layer corresponding to each convolution layer. However, in practical applications, the fully connected layer plays a more significant role because the convolution operation of the convolution layer on the image is equivalent to an operation of the fully connected layer, which will make the calculation more convenient.

3.3.2. Upsampling

Upsampling is also called deconvolution. Deconvolution is calculated by addition and multiplication, the same as the convolution network. The difference is that deconvolution derives multiple pixels from one pixel, while the convolution network is the opposite [27]. The forward propagation and backward propagation of deconvolution are opposite to the propagation direction of the convolution network, but they have the same optimization method [28]. Figure 4 displays the comparison between the convolution network and the deconvolution network.

As mentioned above, FCN does not impose restrictions on the size of the input image. However, after several operations of pooling the convolution kernel in the network, the scale of the landscape image will decrease, and the resolution of the horizontal image will gradually decrease. For example, suppose the size of the input image is h × . After several convolution pooling operations, an image with the size of h/32 × /32 may be generated, which is also the smallest layer. This layer is called the thermal image, a high-dimensional feature map [29]. Figure 5 reveals the convolution of FCN.

Then, FCN adopts upsampling, namely, deconvolution, to continuously enlarge the landscape image to the same size as the original image, to realize the classification of pixels in the landscape image. The final input result is an image after upsampling, and then each pixel in the output landscape image can be predicted. This method aims to obtain the maximum value of the corresponding position of the landscape image pixels in all the obtained images. The maximum value of this position is the probability that judges the category.

3.3.3. Jumping Structure

The semantic segmentation of landscape images under FCN is basically completed after the convolution and sampling. However, the upsampling operation will excessively reduce the resolution of the landscape image, which makes the final structure rougher and rougher. Therefore, the obtained segmentation results cannot meet the standard requirements.

As shown in Figure 6, the size of the original input image will become very small after the pooling operation of convolution kernels. For example, after the third convolution pooling, the image will become 1/8 of the original image size. After the fourth convolution pooling, the image size will become 1/16 of the original image. After the fifth convolution pooling, the image size will become 1/32 of the original image. However, after the sixth and seventh convolution pooling, the image will not become smaller, but the number of feature maps will be reduced. At this time, the image is called the thermal image.

The feature obtained after the fifth-layer convolution pooling performs deconvolution operation on a 1/32-size thermal image to restore its original size. The reason is that the thermal image cannot accurately express the characteristics of the original image due to the limitation of accuracy. Therefore, the forward iteration method is employed here to refill the details of the image with the convolution kernel in the fourth layer as the deconvolution, which is equivalent to an interpolation operation. The convolution kernel in the third layer can further supple the details to restore the whole image. The above process is named jumping structure, aiming to optimize semantic segmentation results [30].

3.4. Process Framework of the Segmentation Technology

It provides the test results of the functions of the semantic segmentation model of landscape elements in landscape images based on FCN.

From Figure 7, the core part of the simple model is the FCN of DL; it is also the core part of the algorithm reported here. Firstly, the model directly reads the landscape image and then preprocesses the image. Finally, it classifies the landscape elements in the image through FCN and outputs semantic segmentation results. The image reading steps are implemented using Python’s third-party library function [31].

3.5. Training of the Semantic Segmentation Model Based on FCN
3.5.1. Pretraining

Due to the complexity of FCN of DL, it takes a long time to train the model. Moreover, FCN has deep network layers. On the one hand, when conducting specific experiments, the final results of the model may remain near the optimal solution, which may affect the experimental results. On the other hand, the whole model may take a long time for convergence.

Therefore, in the concrete training of the FCN model, the model parameters with good convergence are usually used to initialize the parameters of the new model [32]. The pretraining method has been widely used in the deep execution and automatic coding networks. Firstly, relatively few training samples train the parameters of each layer in the training network. Then, these parameters initialize the training model. Finally, the formal model is trained. The random initialization method may lead to the local minimum rather than the global minimum [33]. However, the pretraining method can make the model obtain better performance here, which the existing CNN models adopt. Inspired by pretraining, the two-stage training method is also used to train the model.

3.5.2. Two-Stage Training of the Model

For the two-stage training of the model, a pretrained model is used to initialize the model that needs to be trained, to share the parameters of the convolution layer, and to improve the effect of using random initialization alone. The optimization steps for training in the experiment are as follows.Step 1: 500 simple images are manually selected. First, the model is trained on these 500 images. After the model converges, the parameters in the model are saved and downloaded. Because the selected images are simple, the convergence rate of the model is speedy.Step 2: the second training of the model is executed on all training sets. The parameters obtained in Step 1 are used to initialize the model parameters so that all network weights and parameters are updated. This step significantly reduces the model’s training time, improves the model’s performance, and speeds up the convergence speed of the model. Finally, an improved model is obtained.

After training, these models will generate multiple Caffemodel files. These model files represent the parameters of the FCN model saved under different iterations. Ultimately, these models are used to segment the landscape elements in the landscape image to gain the final result map. Figure 8 denotes the two-stage training process of the model.

3.6. Experimental Data Source

In the first experimental stage, 500 training set images are selected as the pretraining data set using the two-stage training model in Figure 8. Then, the second-stage training is performed. The loss and iterative performances of the model are proved by comparing the result data of two-stage training.

In the second experimental stage, different single scene elements are taken as the main scene landscape images to train the model to compare the performance of three different upsampling structures in different landscape element classification in the training process. The landscape element as the central scene in each landscape image must occupy 60% of the landscape image, and each scene element prepares 100 pictures to train the model. The garden landscape images are divided into three categories: waterscape, landscape, and vegetation in the experiment.

4. Experimental Results of Two-Stage Training and Different Upsampling Methods

4.1. Results of Two-Stage Training of the Model

After repeated experiments, the learning rate of the weight parameter is determined as 10-10, and the weight attenuation coefficient is 0.005. Figure 8 displays the model after the second-stage training. The models before and after the two-stage training are compared. In the process of two-stage training, 500 images of the training set are selected as the data set of pretraining, and then the model two-stage training is carried out. Figure 9 illustrates the relationship between loss and iteration of the model with and without two-stage training.

In Figure 9, the longitudinal coordinates represent the value of the loss function, which is responsible for measuring the probability that the test data belongs to a specific category; the lower the loss function, the faster the network convergence. From Figure 9, the loss values of the model before and after the two-stage training decrease rapidly during 300 to 500 iterations. However, after 500 iterations, the loss value of the model after two-stage training reaches 0.43 at 700 iterations, but the loss value of the model before two-stage training only is 0.47. The convergence speed of the model after two-stage training is faster than that of the model before two-stage training, indicating a better effect. With the iteration, the two models are in a relatively stable state ultimately. The loss value of the model after two-stage training is stable at about 0.41, while the loss value of the model before the two-stage training is stable at about 0.45. This result indicates that the convergence speed of the model after the two-stage training is more and more gentle. The experimental results show that the two-stage training method can effectively accelerate the convergence speed in the training process of the landscape image semantic segmentation model, and the data proves that the method is feasible. Therefore, the two-stage training method is adopted to train the model in the subsequent experiments.

4.2. Experimental Results of Three Kinds of Upsampling on Images

From Figure 10, among the three detection factors, the FCN-8s network achieves higher pixel accuracy, average accuracy, and average IU compared to the other two networks. The sampling accuracy of pixels in waterscape images reaches 92%, the average accuracy reaches 89%, and the average IU attains 75%. The pixel sampling accuracy of FCN-16s in landscape images is 90%, the average accuracy is 89%, and the average IU is 76%. The pixel sampling accuracy of FCN-32s in vegetation images is 94%, the average accuracy is 92%, and the average IU is 76%. The accuracy of the other two neural networks is relatively lower. Figures 1012 provide the accuracy comparison among FCN-8s, FCN-16s, and FCN-32s, from the perspective of pixel accuracy, average accuracy, and average IU value. Among these three upsampling structures, the FCN-8s structure achieves the highest pixel accuracy, average accuracy, and average IU value. The pixel accuracy is more than 90%, the average accuracy is more than 88%, and the average IU value is more than 75%. This model has good adaptability and high accuracy for various landscape elements in landscape images. Its average accuracy is lower than pixel accuracy because the average accuracy is calculated by combining the information of all categories in the image. At the same time, too much test set data can also lead to low average accuracy and average IU value.

Figures 1012 show that the FCN-8s structure is more prominent than the other two upsampling structures. The disadvantage of the final output image of FCN-32s is that the edge segmentation effect is not good, and there is a lack of some detailed information, failing to achieve excellent detail processing results. Therefore, the jump layer method is adopted to reduce the step size of the shallow upsampling and fuse the results with the results obtained at a higher level. Then, the upsampling output is carried out. Finally, deconvolution is used to complete the semantic segmentation of the whole landscape image. It can be seen that the FCN-16s structure has a significant improvement over FCN-32s, and the FCN-8s structure is more detailed than FCN-16s in image processing. Therefore, the FCN-8s structure is selected as the optimal upsampling structure for classifying landscape elements in landscape images.

5. Discussion

Given the problems of low efficiency and high cost of artificial processing of landscape images, this paper uses the FCN technology under DL to construct the element classification model for landscape images. Besides, semantic segmentation technology is adopted to edit and train the model, which verifies the effectiveness of the FCN method. A two-stage training is carried out on the FCN model to reduce training time. Three upsampling structures, namely, FCN-32s, FCN-16s, and FCN-8s area, are used for a comparative experiment to verify the effect of the whole model on image processing. Finally, the FCN-8s upsampling structure with the best pixel accuracy, average accuracy, and average IU value is selected, and these three values reach 90%, 88%, and 75%, respectively. At the same time, the pixel accuracy of each object is more than 86%, which shows that the model is very suitable for the classification of landscape elements in garden landscape images. Yang [34] analyzed the application of computer simulation in urban landscape design and value analysis and established computer-aided design of green space. They provided suggestions on the practical application of technical measures to save square: innovation of space design and application of newly constructed wetland system and garden rainwater in the design of regulation and storage system. The relationship between the design of the new square rainwater storage system and the urban landscape environment is consistent with the research results reported here, which can effectively optimize the concept of garden design [35].

6. Conclusion

The completion of pixel classification of landscape images is only a tiny step in landscape architecture evaluation. This method can evaluate the weight of different landscape elements after classifying landscape elements in landscape images to ultimately complete the intelligent landscape evaluation of the whole landscape image, which is worth being applied and promoted. This scientific approach improves objectivity, scientificity, and persuasion of the results of landscape evaluation and promotes future landscape evaluation. This paper achieves satisfactory results in the semantic segmentation of landscape element classification and realizes the semantic segmentation model based on FCN. For landscape images containing multiple landscape element categories, this method has the advantages of high accuracy, less time-consuming, and good robustness. This method can also process multiple complex landscapes simultaneously, dramatically reducing image processing time, saving time, and providing technical support for the automatic classification of landscape images. The research also lays a good foundation for landscape evaluation in the future with intelligent processing of landscape images. However, there are still some problems. The model is still not accurate enough in detail processing. Some element categories in the garden landscape image occupy a small area. In this case, the model may ignore them, resulting in incorrect classification. The follow-up study will try different deconvolution algorithms to extract more accurate details.

Data Availability

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors. Informed consent was obtained from all individual participants included in the study.

Conflicts of Interest

All authors declare that they have no conflicts of interest.

Authors’ Contributions

All authors listed have made a substantial, direct, and intellectual contribution to the work and approved it for publication.

Acknowledgments

The authors acknowledge the help from the university colleagues.