Abstract
Aboveground biomass (AGB) models based on field-measured and remote-sensed data can help to understand and monitor ecosystems and evaluate the impacts of human activity. To create improved forest AGB models for use in an ecological rehabilitation area of Huainan, China, a suite of methods was used to evaluate a combination of the Chinese GaoFen-3 (GF-3) satellite’s synthetic aperture radar (SAR) data and vegetation indexes derived from the WorldView-3 satellite. Using vegetation indices and radar backscatter coefficients, a total of six modelling methods were applied to generate three AGB models, which included multivariable linear regression, linear, exponential, power, logarithmic, and growth functions. The results indicate that the observed root mean square errors (RMSE) of the best models, which included exponential functions based on the variables NDVI and HV, as well as their combination in a multivariable linear regression, were 43.74 Mg/ha, 30.87 Mg/ha, and 26.72 Mg/ha, respectively. The best model used multivariable linear regression with combined SAR and NDVI data (R2 = 0.861). The RMSEs were lowest for mixed forest, moderate for coniferous forest, and highest for broad-leaved forest. The results indicate that a combination of optical and microwave remote-sensing images can be used to effectively improve AGB estimation accuracy.
1. Introduction
Wetlands are important ecosystems that exist between terrestrial and aquatic systems [1, 2]. They provide many ecosystem services such as regulation of the hydrological cycle, maintenance of water quality, conservation of biological diversity, and natural and socially beneficial services. As an important part of urban ecosystems, urban wetlands can improve the urban climate, enhance environmental quality, increase biodiversity, and conserve water [3–5]. However, they are strongly disturbed by human activities, such as excessive urban sprawl, and air and water pollution [6], particularly, in developing countries, where basic human needs have not been met. Aboveground biomass (AGB) is a parameter that can be used to evaluate the patterns, processes, and dynamics of carbon cycling in ecosystems at local, regional, and global scales. Therefore, AGB estimation at the regional and global scales plays an important role in the context of promoting international wetland protection and other initiatives under the framework of the Convention for Biological Diversity [7]. To assess the value of wetlands in providing ecosystem services, it is essential to build a model that can estimate AGB. This can also help to determine the status of an ecosystem and inform scientific forest management.
In the development of new approaches to estimate AGB, efficiency and cost are critical factors. Line transect and distance sampling research methods are commonly used, which build statistical models based on field observations of tree height and diameter at breast height (DBH) [8]. An alternative to gathering the large number of samples required for interpolation at the regional scale, remote-sensing data have been increasingly used to estimate wetland AGB in recent decades [9, 10]. Biomass estimation and long-term biomass monitoring can be achieved using optical remote sensing, which obtains signals reflected from the forest canopy to extract vegetation parameters that have a significant response to biomass [11, 12].
Various vegetation indices derived from optical satellite images have been used to determine vegetation chlorophyll and canopy structure parameters [13–17]. However, limitations remain due to cloud coverage and the structural heterogeneity of the vegetation canopy [18]. Moreover, due to the limitations of optical remote sensing in detecting vertical distributions and saturation phenomena, the sensitivity of vegetation indexes to biomass change is low. Another effective way to estimate AGB is to apply light detection and ranging (LiDAR). It is mounted on an aircraft and can provide detailed vegetation structure measurements as a point cloud, providing accurate AGB estimates without saturation at high biomass levels [19, 20]. However, it cannot satisfy the needs of large-scale applications due to its low efficiency and high cost.
Synthetic aperture radar (SAR) is an active remote-sensing technology that has developed rapidly in recent years. It has a multipolarization ability that can describe the scattering mechanism of vegetation. Many studies have shown that SAR data provide unique and valuable information on wetland biophysical parameters by exploiting the particular sensitivity of radar backscatter signals to AGB [21–23]. The primary SAR-based AGB estimation method commonly involves a regression analysis between polarized microwave backscatter data and ground data obtained from field plots. However, a few problems in estimating AGB using the relationship between observed biomass and SAR backscattering remain because of saturation at high biomass levels and sensitivity to soil conditions [24].
To overcome the drawbacks associated with using a single data type and to assess the application ability for AGB estimation of the new data source of new Chinese GF-3 satellite, biomass estimates based on a combination of SAR and optical data have begun to be implemented. A few studies have also explored the potential of combining SAR, optical, and/or LiDAR data for estimating and mapping wetland ecosystems [25–29]. The accuracy increases when the model is further improved using several datasets from different sensors simultaneously [30–33]. With the development of high-resolution optical remote-sensing research and applications, the combination of SAR and high-resolution optical images have been used for AGB estimation [31, 34].
There are many wetlands in China that covered an area of approximately 66 million hectares in 2016, which is at the forefront of the world [35]. Developing improved models to accurately estimate wetland biomass has become a focus of research. Thus, combining high-spatial-resolution data from the WorldView-3 satellite with SAR data from the new Chinese GF-3 satellite, an enhanced approach to biomass estimation in urban wetlands was developed. It was applied to the Datong Wetland, which is a typical area of postmining ecological restoration. The objectives of the study were as follows:(1)To apply improved forest AGB estimation models to an ecological rehabilitation area using remote-sensing-derived data and ground-measured data(2)To evaluate the model’s mathematical performance by comparison with models based on other tested variables(3)To evaluate biomass distributions based on tree species and to provide a reference for ecological restoration assessment
2. Materials and Methods
2.1. Materials
2.1.1. Study Area
The study area was Datong National Wetland Park, which is located to the east of Huainan City in central Anhui Province, China (32° 36’–32° 37’ N, 117° 01’–117° 03’ E; Figure 1). The park covers an area of approximately 14.85 ha and is a constructed wetland and scenic site. Datong wetland was built on an abandoned coal mine for ecological restoration from 2004 to 2010 [36]. The elevation rises from 30 m to 200 m with average annual precipitation of 471.9–1428.3 mm. Meanwhile, the annual mean temperature is about 15.3°C, with the highest temperature reaching 41.2°C in summer and the lowest reaching −22.2°C in winter. The main forest communities include Metasequoia glyptostroboides, Populus adenopoda Maxim, and Quercus acutissima Caruth.

2.1.2. Field Sampling
To build a model based on the relationship between field-measured and remote-sensing data and to test its performance, a field study was required. Hence, a total of 60 plots with dimensions of 10 m × 10 m were established in September 2016 and May 2017 (Figure 2).

According to forest types, topographic features, and transportation accessibility, the samples were evenly arranged across the region. For each sample, we recorded the geographic coordinates, diameter at breast height (DBH), and height, volume, and species information for trees with DBH > 5 cm in September 2015 and May 2016. Each tree was tagged and assessed as either alive or dead. The central points of all 60 samples, among which 70% were used to build the models and 30% were used for accuracy assessment, were measured by a global positioning system (GPS; Garmin MAP 62CS; accuracy: ±3 m). According to appearance, the forests were divided into coniferous forest (C), broad-leaved forest (B), or coniferous and deciduous broad-leaved mixed forest (M). Then, we calculated the biomass indexes on the basis of observed data, including average DBH, tree height, and stem density in each sample (Table 1).
On the basis of the DBH and tree height of individuals in the sample, AGB was calculated using the continuous biomass expansion factor (BEF) method [37–40]. Combined with the records of DBH and tree height, we calculated the volumes of all individual trees using a volume table system and summed the values of each sample. Then, a series of mathematical models were built in accordance with regression analysis to seek the relationship between biomass (B) and total volume (V). The model can be expressed aswhere B and V are the stand biomass and stand volume of the sample, respectively, and a and b are constants for specific forest types that have been listed for different forest types.
2.1.3. SAR Data
The SAR data were sourced from the Gaofen-3 satellite, which is China’s first solar-synchronous C-band multipolar SAR satellite. It has a maximum spatial resolution of 1 m [41]. It has 12 imaging modes, including focusing beam, fully polarized stripe, and wave mode, which is more than any other SAR satellite (Table 2). The data used were of Gaofen-3 data with two polarizations, which were acquired on 28 December 2017 and consisted of single-look complex (SLC) images of HH and HV with a spatial resolution of 5 m in slant range and an incidence angle between 20° and 50°.
2.1.4. Multispectral Images
Multispectral images were captured by the WorldView-3 satellite on 1 May 2016 during good weather and clear skies. In addition to panchromatic images with 0.31 m resolution and 8-band multispectral images, WorldView-3 also provides 8-band short-wave infrared images with 1.24 m resolution and 12 CAVIS (clouds, aerosols, vapour, ice, and snow) band images [42]. In this study, WorldView-3 image with UTM projection and WGS 84 coordinate system were used to derive vegetation indexes and, furthermore, to modelling AGB with observed forest biomass data. The image consist of 4 bands, including red, green, blue, and near infrared, and the size were 495 lines × 415 pixels at the nadir, with 16-bit data.
2.2. Methodology
Figure 3 provides an overview of the methodology, including data preprocessing, parameter extraction, and model building.

2.2.1. Data Preprocessing
The first step of the research was to preprocess the WorldView-3 and GF-3 images. Preprocessing of GF-3 data was implemented using PIE-SAR software. The GF-3 images were first multilooked to reduce speckling and generate square pixels using factors of 1 and 3. Then, they were speckle-filtered using the Lee refined filter. Geometry-induced distortions, which caused by several terrains and SAR parameter interactions, were removed using ASTER digital elevation model (DEM) data. Sloped terrain can disperse the radiation from radar backscattering [24]. Therefore, it is essential to correct this through radiometric calibration to make the image represent the backscattering characteristics of the ground target.
The first step for preprocessing the WorldView-3 images was the implementation of geometric correction by using ground control points obtained from topographic maps. Then, radiometric correction, which can change the digital number (DN) values to the apparent reflectance, was conducted with the 6S model. In order to make the spatial resolution of WorldView-3 and GF-3 uniform, WorldView-3 was resampled before parameter extraction to a pixel size of 8 m × 8 m.
2.2.2. Parameter Extraction and Principal Component Analysis
To improve the AGB estimation models, vegetation indexes and their principal components obtained by the principal component analysis (PCA) were selected. These are commonly used to characterize vegetation and distinguish between tree canopy and cultivated land/grassland [43–46]. These vegetation indexes include the normalized difference vegetation index (NDVI), the relative vigour index (RVI), the difference vegetation index (DVI), and renormalized difference vegetation index (RDVI) [47]. Their equations arewhere R and NIR are the reflectances of the red and near-infrared bands, respectively.
Since a large number of vegetation indexes were calculated, PCA was used, which can reduce the dimensionality and simplify the data structure. We regard these vegetation indexes as m random variables, which are denoted as X1, X2, …, Xp. The PCA transformed these vegetation indexes into m new indicators named F1, F2, …, Fm (m < p), which are independent of each other and fully reflect the information of the original indicator according to the principle of preserving the main information. Commonly, PCA seeks a linear combination Fi of the original indicators. Thus, the results of these indexes were transformed using linear formulas. The equation iswhere app stands the transform coefficient of each variable.
As for the results of PCA, the PC1 contributed the maximum rate, indicating that it has concentrated most of the characteristics of the four indicators. Then, the index in the PC1 with the maximum coefficient was selected as the modelling parameters.
2.2.3. Model Building
Based on the six common functional models and the single variables, we constructed the regression model using optical vegetation indexes, backscattering coefficients, and observed biomass data. A total of 6 functions were applied to build the model, including (1) linear function, (2) multivariable linear regression, (3) exponential function, (4) power function, (5) logarithmic function, and (6) growth function.
In addition, the absolute and relative root mean square errors (RMSEs) were calculated to assess the precision of the different models. It can be expressed aswhere Bo indicates the observed biomass Be is the estimated biomass and n is the number of samples.
3. Results
3.1. Vegetation Index-Based Model
The variables of the models include each type of derived vegetation index. All the models were summarized and their precision assessed by R2-values (coefficient of determination) and F-tests (Table 3). Then, the best-fitting equation for each variable, including the F-value and R2-value, were applied to test the model.
Apart from NDVI, most of the variables had a poor fit and explained <53% of the variance. To obtain AGB regression models from vegetation indices, the models based on an F threshold of 7.09 and significance level of 0.05 are shown in Table 3. The results indicate that the indexes of RVI, DVI, and RDVI produced relative RMSEs of 34.80%, 31.9%, and 32.73%, respectively. The NDVI-based model was more relevant to AGB than the others. NDVI was moderately correlated with the observed biomass, with R2 of 0.76, whereas the highest correlation was observed in the NDVI-based model, which explained about 43.74% of the variance and resulted in a relative RMSE of 30.12%. To remove data redundancy, PCA of the vegetation index was implemented, which transforms the data to a new coordinate system to maximize the difference and compresses its dimensions. The regression model fitted using other vegetation indices showed a lower correlation with the surveyed AGB than the NDVI-based model (Table 3). However, in contrast to the simple one-factor of the vegetation index-based model, the combined vegetation model using NDVI and DVI provided a poor explanation for the percentage variance (RMSE ≥ 75%), possibly due to the significant correlation between vegetation indices causing interference for linear relationships.
Thus, the best vegetation index model for AGB estimation using NDVI as a variable was the following:where AGB is predicted biomass and NDVI is the pixel value of the parameter maps.
The variance analysis showed that the NDVI-based model offered a significant enhancement in the estimates and led to a lower relative RMSE compared to the application of other single vegetation indices or combinations.
3.2. Backscatter Coefficient-Based Model
We used similar methods to build the models on the basis of the backscatter coefficient. The variables used were HV or HH polarized microwave backscatter data. The regression coefficients of these models were 0.823, 0.836, and 0.81, respectively, which can be used to explain the fitting accuracy. Table 3 indicates that the best-fitting linear correlations were observed with the HV variables extracted from GF-3 data when fitting by using the HV backscatter coefficient-based functions. The model can be expressed as follows:where AGB is predicted biomass and HV is radar backscatter coefficient data derived from Gaofen-3 data.
There was a remarkably correlation between estimates based on the backscatter coefficient and observed biomass, which could explain about 89% of the variance and showed a lower RMSE than 30.87 Mg/ha (relative RMSE = 26.72%).
3.3. Combined Models Using Vegetation Indexes and the Backscatter Coefficient
Although modelling of AGB based on the backscatter coefficient and vegetation indexes provided fitting accuracy of >70% to their respective individual values, the combined models were built to improve the accuracy further. Moreover, the PCA components for these parameters were also performed and the resulted show that NDVI and backscatter coefficient of HV made the biggest contribution. Therefore, they were selected to achieve multivariate regression. We established one model to improve the estimation accuracy. The model can be expressed as follows:where AGB is predicted biomass, HV is the radar backscatter coefficient derived from Gaofen-3 data, and NDVI is the pixel value of the parameter maps.
As mentioned above, the model that combined the vegetation indices and backscatter coefficient provided a standardized regression coefficient of 0.861, which can explain 74% of the variance and led to a relative RMSE of 19.36%. The results show that the model combining the vegetation index with the backscatter coefficient fitted the observed data best when using the combination of NDVI and HV backscatter coefficient.
3.4. Selecting the Best Model and Mapping the AGB
The evaluation of the models (Table 3) shows the following results. For the four vegetation indices, the NDVI-based models explained approximately 68% of the variance and led to a relative RMSE of 30.12%; the RDVI model was correlated with the observed biomass with an R2 of 0.73 and relative RMSE of 32.73% (Figure 4). The other two vegetation-based models (DVI and RVI) had poor fits and had low R2-values of <0.5. The variance analysis shows that a significant enhancement in the predicted values was observed with the NDVI-based model compared to models using other single vegetation indices or combinations. When compared with the VI-based model, the NDVI-based model showed the maximum F-value and minimum RMSE and R2-values of all the VI-based models. The AGB values estimated by these models versus the observed AGB values are plotted as a scatter diagram and with linear correlations in Figure 4.

(a)

(b)

(c)

(d)

(e)

(f)
Figure 4 also suggests that the backscatter coefficient models yielded a significant improvement in biomass estimation compared to those using vegetation index parameters. As for AGB estimation models derived from GF-3 data, the HH- and HV-based models were strongly related to the observed values, with R2-values of 0.823 and 0.836, respectively. Meanwhile, the combined model showed a lower correlation than the single variable with observed AGB, which could explain about 75% of the variance and resulted in a relative RMSE of 25.58% (Figure 4). The probable cause is that the optical sensor mainly collects surface reflections, while SAR backscatter provides vegetation structure information and is more sensitive to biomass estimation.
Accuracy of 0.66–0.81 was achieved using the mathematical model based on single variables derived from the GF-3 and WorldView-3 data (Table 3). Thus, it is possible to combine a few variables to obtain improved accuracy. Based on this consideration, we attempted to verify whether the combined variables could enhance the modelling precision. Regarding the combined models, using NDVI and the HV backscatter coefficients of the GF-3 data as variables provided the best correlation with observed values. The combined model was significantly correlated with the observed values with an R2 of 0.861, which made it the highest correlation with AGB and had a relative RMSE of 19.36% (Figure 4). In order to show the performance of each model, the errors between the estimated and observed values were calculated based on the verification dataset (Figure 5). The results indicate that the largest median was observed in model 3, which meant it had the most significant average error. Meanwhile, model 1 had the narrowest error range of 2.36–8.62 Mg/ha, while model 5 had the widest range (0.55–25.53 Mg/ha), which meant their error distributions were concentrated and scattered, respectively.

Finally, the model with the highest R2- and F-values was chosen as the best model derived from the aforementioned tests. Then, AGB prediction maps were generated for Datong wetland (Figure 6).

4. Discussion
4.1. Revelations and Limitations of the AGB Estimation Method
Despite empirical models having limitations of experience, instability, nonversatility, and poor adaptability, they remain one of the main approaches to biomass modelling because of the difficulties in obtaining physical parameters such as aerosol optical depth and leaf area density. Accordingly, we proposed a collaborative observation based on World View high-resolution optical imaging and new SAR data from the Gaofen-3 satellite. This was possible due to improvements in the ability to acquire high-resolution remote-sensing data and the expanded demand for refined observations. The proposed approach can improve the precision of biomass estimation and provide references for similar studies.
Of the four models based on different vegetation indices, the most significant correlation with measured AGB was provided by the NDVI-based model. However, although NDVI-based models showed higher accuracy than others (lower RMSE and F-values and greater R2), their accuracy is lower than backscatter and combined models. Therefore, the result suggests that NDVI is saturated and its sensitivity is reduced in densely vegetated areas. In fact, the red band is absorbed, particularly, in high-density forests, yet the near-infrared band progressively increases because of multiple scattering effects [48]. Although vegetation index-based MLR is a common approach to building models, it has some limitations. Significant correlations among vegetation indices cause interference that obscures linear relationships, and not all vegetation indices are linearly correlated with biomass. Thus, adding additional variables does not always improve accuracy. Furthermore, with increases in wavelength, correlations with biomass become more significant in dense vegetation than the standard NDVI [49].
The application of backscattering coefficients is a common and effective way to build biomass models [50–52], as confirmed by our results. To reduce the influences of soil and surface moisture, selecting the seasons with little rainfall can help enhance data quality [53]. As for AGB regression models based on SAR data, they all performed better than the models based on vegetation indices in terms of greater R2- and F-values and lower RMSEs (Table 3 and Figure 4). Compared with the HH backscatter coefficient-based model, the regression model with the HV backscatter coefficient showed higher accuracy (higher R2- and F-values and lower RMSE). The multivariate linear regression model with the combination of HH and HV data provided greater accuracy than the model based on single-polarized data (Table 3).
Although the single variables of NDVI and HV polarized were highly correlated with field-measured biomass, the accuracy of the multivariate linear regression model could be increased to 83% by including HV polarized data in the PCA. The combined biomass model had dramatically better performance than those based on single-polarized data or vegetation indices. This improvement is mostly due to the WorldView-3 data containing surface information on tree crowns and the GaoFen-3 data containing information on forests based on backscattering from the forest structure.
The modelling accuracy was highly dependent on the amount and quality of remote-sensing and field data used. The results of the accuracy assessment show that there was uncertainty in the AGB estimation accuracy and AGB distribution, such that the estimated results do not entirely correspond with the data from the 60 field sampling sites. Thus, using more samples may improve estimation accuracy. Furthermore, this study was conducted in an area characterized by its spatial distribution. Since the study was conducted using empirical models, the high accuracy of AGB estimation obtained in this study may require further validation in other areas and in different seasons.
In future, biomass remote-sensing estimation could be further improved by adjusting the data sources and methods used. From the perspective of data sources, because LIDAR technology can accurately describe the three-dimensional structure of forests and is highly correlated with observed biophysical parameters of vegetation, such as biomass, it has profound application prospects for mapping and estimating continuous changes in regional biomass. However, since LIDAR signals are often affected by noise or interference, it is essential to enhance the point cloud when modelling based on LIDAR data [54]. From the perspective of methods, machine learning methods are an important development trend that has the potential to effectively combine data from different sensors with different modelling algorithms to improve the accuracy of biomass estimation.
4.2. Possible Influences on AGB Distribution
In this research, a total of 60 samples from broad-leaved, coniferous, and mixed forests (Table 1) were used to analyse the impacts of forest type and structure on AGB estimates. The accuracy of AGB estimation depends not only on spectral reflectance differences but also on the forest structure.
Most of the broad-leaved stands were secondary forest planted for the restoration of coal mining wasteland in the study area. The RMSEs of these stands were higher than those of other stands due to their disordered structures and lower AGBs. On the contrary, artificial coniferous forest had a regular spatial structure that reduced the random scattering from the canopy and strengthened the single reflections of SAR. Large broad-leaved trees dominate mixed forests, while coniferous trees are distributed in the understory. Because of the complex spatial structure and high canopy density of these trees, scattering was characterized by a high degree of randomness and, thus, the lowest estimated error.
5. Conclusions
This study explored the potential of using WorldView-3 and Chinese Gaofen-3 data to model AGB in an urban wetland. The main conclusions are as follows:(1)The new data obtainable from Gaofen-3 images are valuable for estimating and mapping AGB. The correlation between the backscattering coefficient of HV polarization and biomass was higher than that of HH polarization, indicating that HV polarization is more suitable for biomass estimation.(2)Of the simple one-factor models, the exponential model had the best fit, whether it was based on a vegetation index or the SAR backscattering coefficient.(3)The reliability of the linear combined model based on HV and NDVI data was demonstrated by its high overall accuracy (F = 7.9, R2 = 0.861) at fine scales using small forest plots with considerable biomass densities based on Gaofen-3 and WorldView-3 data. This indicates that a combination of optical and microwave remote-sensing images can effectively improve the accuracy of AGB estimation.
Data Availability
The result data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare no conflict of interest.
Acknowledgments
This study was supported by the innovative project of National Natural Science Foundation of China, under Grant no. 41501294, Universities Natural Science Research Project of Anhui Province, under Grant no. KJ2019A0136, Anhui Provincial Returned Talents Foundation of China 2016, and Youth Fund of Anhui University of Science and Technology under Grant no. QN201939. The authors gratefully acknowledge several students of Anhui University of Science and Technology for their support in the field and the plot survey.