Abstract
This paper presented a new method for automatic plant point cloud acquisition and leaves segmentation. Quasi-dense point cloud of the plant is obtained from multiview stereo reconstruction based on surface expansion. In order to overcome the negative effects from complex natural light changes and to obtain a more accurate plant point cloud, the Adaptive Normalized Cross-Correlation algorithm is used in calculating the matching cost between two images, which is robust to radiometric factors and can reduce the fattening effect around boundaries. In the stage of segmentation for each single leaf, an improved region growing method based on fully connected conditional random field (CRF) is proposed to separate the connected leaves with similar color. The method has three steps: boundary erosion, initial segmentation, and segmentation refinement. First, the edge of each leaf point cloud is eroded to remove the connectivity between leaves. Then leaves will be initially segmented by region growing algorithm based on local surface normal and curvature. Finally an efficient CRF inference method based on mean field approximation is applied to remove small isolated regions. Experimental results show that our multiview stereo reconstruction method is robust to illumination changes and can obtain accurate color point clouds. And the improved region growing method based on CRF can effectively separate the connected leaves in obtained point cloud.
1. Introduction
In greenhouse crop cultivation, the three-dimensional structures of plants convey the real-time continuous information of the whole process of crop growth in a most direct way. Also, it is the most intuitive feature for plant phenotype analysis [1]. Therefore acquiring the 3D structure of plants by an automated, high-throughput, accurate, and rapid method not only helps to monitor crop growth, but also provides visual basis for optimizing the intelligent greenhouse environment control system; and finally, it has a significant meaning for plant phenotypic and genome-environment-phenotype synergistic analysis [2–6]. Owing to noninvasive and high-throughput characteristics, automated 3D-digitization technology has greatly promoted the accurate modeling of 3D structures of plants.
To date, there are several approaches for 3D structure reconstruction. For example, Microsoft Kinect provides depth images by utilizing coded infrared light patterns. However, Kinect has limited range and can only be used indoors. Laser Scanning can obtain highly accurate and complete 3D structure but the devices are quite expensive for agricultural usage. Therefore the passive stereo vision technology, which puts low requirements on equipment and environment [7–10], has received wide attention. Biskup et al. [8] used a binocular stereo vision system to measure the structural parameters of plant canopies.
Binocular stereo vision system needs to manually calibrate the camera and its imaging angle is relatively fixed. Hence the method is sometimes inefficient and it is difficult for it to reveal the complex 3D structure of plants. Compared to binocular stereo, the multiview stereo reconstruction based on an image sequence is used more widely because it can perform automatic calibration of the camera and generate more intact 3D structure information. In the literature, multiview stereo algorithms are roughly categorized into three classes. The first class operates by first computing a cost function on a 3D volume and then extracting a surface from this volume [11–14]. The second class is an image-space methodology that operates by fusing multiple depth maps [15–18]. The third class is surface expansion method which builds a quasi-dense point cloud from a set of matches [19–22]. The accuracy of volume-based method depends on the density of the grid, and therefore the computational and memory costs of volumetric grids are very high. The following two methods are both able to produce good results. In this paper, we propose a multiview stereo reconstruction method based on surface expansion which iteratively expands a sparse set of initial matches into a quasi-dense point cloud that represents surfaces of the scene. In order to overcome the negative effects from complex natural light changes and obtain more accurate plant point clouds, the Adaptive Normalized Cross-Correlation algorithm (ANCC) [23] is used in calculating the matching cost between points of interest in two images, which is robust to radiometric factors such as illumination direction, illuminant color, and imaging device changes. The ANCC method also can reduce the fattening effect that object boundaries are not reconstructed correctly; and this effect clearly deteriorates results of the Zero-mean Normalized Cross-Correlation (ZNCC)—the method used in comparison.
We then segment the plant leaves after getting the plant point cloud from the above multiview stereo reconstruction method. Generally, the segmentation of 3D point cloud can be roughly categorized into five classes [24]: edge-based methods, region-based methods, attributes-based methods, model-based growing methods, and graph-based methods. References [25–27] segmented a plant into two sets, one for the stems and the other for the leaves according to surface feature histograms, followed by a classification or a clustering method. Yin et al. [28] proposed an approach to accurately segment and reconstruct plant. However their approach requires manually cutting the plant leaves and scanning each leaf individually. Hétroy-Wheeler et al. [29] used spectral clustering to realize the segmentation of leaf point cloud, but it needed to manually specify the number of leaves before segmentation.
Based on the characteristics of the subject plants, an improved region growing method based on conditional random field (CRF) is proposed to segment the leaf point cloud. In a canopy point cloud, there may exist some connections between leaves, which easily disturb the segmentation. Therefore we perform some erosion operations around the leaf edge before segmentation. In the initial stage of segmentation, considering that different leaves usually have different surface normals and the points on the same leaf surface usually have similar normals, we use region growing algorithm based on local surface normal and curvature to segment leaves [30]. Usually the segmentation result contains some small isolated regions; therefore a fully connected CRF model defined with the complete set of points is built to further classify these regions to where they ought to belong. A highly efficient approximate inference algorithm based on mean field approximation is applied to the CRF model in which the pairwise edge potentials are defined by Gaussian kernel [31].
2. Multiview Stereo Reconstruction
The proposed multiview stereo approach consists of two steps: structure from motion (SFM: from images to sparse points) and dense matching (from sparse points to quasi-dense points). The SFM step is to obtain 3D sparse feature points location of the scene and camera poses (location and orientation). In the step of dense matching, we use the ANCC algorithm as a robust and accurate correspondence measure to match points of interest, which is robust to radiometric factors and can reduce the fattening effect.
2.1. Structure from Motion
An overview of the SFM process is presented in Figure 1. The first step is to find feature points in each image, where the SIFT key point detector [32] is used because of its invariance to image transformations. Other feature detectors could also be used instead. For instance, several detectors were evaluated in [33]. In addition to the key point locations themselves, SIFT provides a local descriptor for each key point. For each pair of images, key point descriptors are matched by using the approximate nearest neighbors package [34]. Then we robustly estimate a fundamental matrix for the pair using RANSAC. During each iteration of RANSAC, a candidate fundamental matrix is computed using the eight-point algorithm [35], followed by nonlinear refinement based on Levenberg–Marquardt algorithm. Afterwards, we convert the fundamental matrix to the essential matrix and then obtain camera poses [35]. At last, the 3D positions of features points can be acquired through the triangulation calculation. (The scale factor is not considered during triangulation calculation because in the following stage leaf segmentation can be performed without the knowledge of a scale factor.) [35].

After generation of the point cloud of features points between each pair of images, we should merge these point clouds because they are not in the same coordinate system, as shown in Figure 2. Finally, we get 3D positions of all the feature points in a single coordinate system with respect to all point clouds.

2.2. Dense Matching
In this stage, we expand a sparse set of initial matches into a quasi-dense point cloud representing the surfaces of the scene based on the best-first expansion strategy by Lhuillier and Quan [36], which is robust to initial seed match outliers and is efficient in terms of spatial-temporal complexity. In order to overcome the negative effects from complex natural light environments and to obtain a more accurate plant point cloud, the ANCC algorithm is utilized in the process as the matching cost to match points of interest instead of ZNCC. It is noteworthy that the ANCC method is robust to radiometric factors such as illumination direction, illuminant color, and imaging device changes. Moreover, it can reduce the fattening effect from which the Zero-mean Normalized Cross-Correlation (ZNCC) suffers. The best-first expansion strategy is presented in the table of Algorithm 1.
|
In Algorithm 1, is a difference-based confidence measure to avoid propagating too uniform areas. The definition is means the pixel grayscale value at . means the neighbors of the seed. Let , denote all neighboring pixels of pixels and , respectively. The possible matches limited by the discrete 2D disparity gradient are given asFigure 3 illustrates a case for , .

In Algorithm 1, is similarity measure to match points of interest in two images. Generally, the color image formation model at pixel can be represented as follows:where the brightness factor depends on the angle between the direction of light and the direction of surface normal at that point. The global scale factors depend on illumination color and depends on camera. The well-known similarity measure ZNCC that uses raw stereo images does not generate satisfactory result for two reasons. One is that various radiometric changes caused by , and are not taken into consideration. The other is that the object boundaries as the supporting windows from the left and right images do not appear exactly because of the view changes.
To keep the color values away from the impact of radiometric factors, we transform the nonlinear relationship between the corresponding pixels to the affine-transformed relationship that can be solved effectively by the NCC framework. First, we take logarithm to transform this nonlinear relationship into a linear one. Then, the color value can be represented by (use the red channel to describe) To delete the item , we subtract the average of the transformed color values in , , channels. Then each color value becomesWe can see that and are not dependent on , and .
In order to reduce the fattening effect caused by outliers in the matching windows, each pixel in window around the center pixel is allocated with different weights. The weight is computed by using the bilateral filter as follows:The weighted sum using the bilateral filter for the center pixel is defined by the following equation, in which is the normalizing constant:Then we subtract for each channel, thereby resulting in the removal of .
The similarity between and is defined by the following equation that is close to NCC’s framework:Because the intensity information of each channel is normalized in log-chromaticity, the discriminability is lower than the original color. We therefore combine log-chromaticity and original color for keeping the discriminability. The final similarity equation is defined as follows:where . is a balancing factor between the log-chromaticity color and the original RGB color.
To obtain the disparity map of the two images in different lighting conditions, we define stereo matching as a minimization problem of the following energy in the MAP-MRF framework:where is the neighborhood of pixel and is the disparity value we are trying to find for each pixel . is the data cost function which is defined by . is the smoothness cost function based on truncated quadratic model which is defined by . The total energy was optimized using the Graph Cuts (GC) algorithm [37] in order to find the disparity map . Under different lighting conditions, it can be seen from Figure 4 that the algorithm using ANCC as the matching cost function is very robust to illumination changes compared with results of two classic algorithms SGBM and Graph Cuts by OpenCV library.

(a)

(b)

(c)

(d)

(e)

(f)
The point clouds from multiview stereo reconstruction in different matching method are illustrated in Figures 5 and 6. From the results we can see that many points are not reconstructed correctly (marked in red circles) based on the ZNCC matching method because of the fattening effect. And more accurate points are acquired with our reconstruction method based on ANCC in Figure 6.

(a)

(b)

(c)

(a)

(b)

(c)
3. Leaves Segmentation from Point Clouds
To segment each individual leaf from point clouds, an improved region growing method based on fully connected CRF is applied in this section. The method can be divided into three steps: boundary erosion, initial segmentation, and refinement. At the beginning, the edge of each leaf point cloud is eroded to remove the connectivity between leaves. Then leaves will be initially segmented by region growing algorithm based on local surface normal and curvature. Finally an efficient CRF inference method based on mean field approximation is used to remove small isolated regions. The details are further explained as follows.
3.1. Boundary Erosion and Initial Segmentation
Because of the similarity of the colors of leaves, the color information is not reliable for segmentation. Considering the fact that the points on the same leaf surface usually have similar normals, and different leaves usually have different mean normals, the region growing algorithm is used. This method can find smoothly connected areas in point cloud based on point normals. First, the normal for each point is estimated by fitting a plane to some neighboring points. The neighborhood can be specified in two different methods, namely, nearest neighbors (KNN) and fixed distance neighbors (FDN). The next is the region growing algorithm based on normal and curvature, which is presented in Algorithm 2.
|
This algorithm is based on two constraints—local connectivity and surface smoothness. For local connectivity, the points in a segment should be locally connected. This can be realized by using only the neighboring points (through KNN or FDN) during region growing. For surface smoothness, the points in a segment should create a local smooth surface whose normals do not vary much. This constraint can be realized with a threshold on the angles between the current seed point and the points added to the region. Additionally, a threshold on curvature values can ensure that smooth areas are broken on the edges. The result of region growing segmentation without boundary erosion for a canopy point cloud is shown in Figure 7.

As it can be seen from Figure 7 some leaves are segmented into one part, which indicates relatively poor performance because some leaves are connected (marked by red circles) and these leaves have similar normals. To solve the problem, some erosion operations around the leaf edge are performed before segmentation. First, the boundary of the point cloud is extracted by detecting the boundary of the corresponding depth map. The boundary of the point cloud is illustrated in Figure 8. Then we construct k-d tree based on all points including the boundary ones. For every boundary point, we find the nearest points nearby and then delete them. After erosion, we carry out the region growing algorithm for the point cloud and the new result is shown in Figure 9. It can be seen that the result is much better than previous result without boundary erosion and also leaves that were originally connected are now separated.


3.2. Segmentation Refinement
After initial segmentation, there still exist some small isolated regions (small red points region). Therefore we need to classify these regions and determine the actual regions they belong to. A common solution is to transform this problem as maximum a posteriori (MAP) inference in a conditional random field (CRF) defined over pixels or image patches [38–41]. The CRF model contains pairwise potentials, which can simultaneously remove the noise and resolve ambiguous classification results.
In this subsection, a fully connected CRF is built, establishing pairwise potentials on all pairs of points from the canopy point cloud. Considering a random field defined over a set of variables , the domain of each variable is a set of labels . Also a random field is defined over variables . In our setting, ranges over possible input point cloud of size and ranges over possible point-level labelings. is the position vector of point and is the label assigned to point . Hence, the corresponding Gibbs energy of the fully connected pairwise CRF model is formulated aswhere and both range from 1 to . A conditional random field is characterized by a Gibbs distribution . The maximum a posteriori (MAP) labeling of the random field is . The unary potential is computed independently for each point with a classifier that produces a distribution over the label assignment given point cloud features. The following is the function of the pairwise potentials in our model:where is a kernel function and is a label compatibility function. A simple label compatibility function is given by the Potts model . We limit the choice of the kernel function to Gaussian kernel:where is a linear combination weight and is a symmetric, positive-definite precision matrix, defining the shape of the kernel. The vectors and are feature vectors for points and in an arbitrary feature space, respectively. For our points labeling problem, the smoothness kernel is used, defined in terms of point positions and to remove isolated regions.
Usually the high complexity of inference in fully connected models will limit its application. Hence, a highly efficient inference algorithm based on a mean field approximation to the CRF distribution [31] is adopted. This approximation yields an iterative message passing algorithm for approximate inference. As message passing in the presented model can be performed using Gaussian filtering in feature space, highly efficient approximations for high-dimensional filtering are utilized to reduce the complexity of message passing from quadratic to linear.
4. Results
The proposed method was applied to four sets of point clouds acquired from Pachira macrocarpa, Scindapsus, strawberry, and tomato plant.
4.1. Initial Segmentation
Results of the initial segmentation are shown in Figure 10. Some erosion operations were performed around the leaf edges beforehand in case that two leaves are divided into one part.

(a)

(b)

(c)

(d)
When a large of angle threshold is set, the algorithm tends to segment the point cloud into connected subsets. The smaller , the higher probability that a leaf is segmented into several clusters (see Figure 11).

(a)

(b)

(c)

(d)
4.2. Final Segmentation Results and Comparison with the Original Region Growing Algorithm
The results after inference of fully connected CRF are shown in the second row of Figure 12. It can be seen that all the small isolated regions are classified into its nearest leaves and all leaves are individually segmented.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)
For the original region growing algorithm, no matter how we adjust the values of and , there always exists the case that two or more leaves are segmented into one subset, which is shown in the first row (highlighted by red circles) of Figure 12. Meanwhile segmentation results contain many noise sections that are not classified. The results of our method are shown in the second row of Figure 12, and the results are superior than the first row.
5. Discussion and Conclusion
We proposed an automatic method to acquire and segment the point cloud of plant leaves. A multiview stereo reconstruction method based on surface expansion is first applied to obtain the 3D structure of plants. We use the Adaptive Normalized Cross-Correlation to match points of interest in two images, and it is robust to radiometric factors and can reduce the fattening effect. In the stage of leaf segmentation on point clouds, an improved region growing method based on fully connected conditional random field (CRF) is proposed to separate connected leaves with similar color. Our method requires no prior botanical knowledge; therefore it can be applied in a wide variety of cases. Qualitative results on several kinds of plants show the effectiveness of our method. To prevent two leaves from being segmented into only one subset, some erosion operations around the leaf edge are performed before carrying out the region growing segmentation based on normal and curvature. A fully connected CRF model is built in which the pairwise edge potentials are defined by Gaussian kernel to remove small isolated regions after segmentation.
In the future, we will focus on developing advanced multiview stereo reconstruction algorithms to obtain more accurate point clouds for plant phenotype analysis. In addition, we will also study advanced point cloud segmentation algorithms to obtain other organs such as stems and fruits from plants, for realizing the final goal of accurate measurement on plant organs.
Competing Interests
The authors declare that they have no competing interests.
Acknowledgments
The authors would like to express their appreciation for the proofreading work done by Dr. Dawei Li. This research was supported in part by the National High-Tech R&D Program of China under Grant 2013AA102305, the National Natural Science Foundation of China under Grants 61573258, 61374094, and 61603089, Shanghai Sailing Program under Grant 16YF1400100, and the U.S. National Science Foundation’s BEACON Center for the Study of Evolution in Action, under cooperative agreement DBI-0939454.