Abstract

We present a semiglobal stereo matching technique that combines enhanced census transform with unidirectional dynamic programming optimization. This method not only improves matching accuracy but also substantially reduces noise interference by using the 8-domain pixel median instead of the centre pixel in the standard technique and optimizing the initial matching generation value with the 8-path aggregation optimization algorithm. The experimental findings show that this suggested strategy efficiently avoids interference produced by various noises and reduces the false matching rate significantly.

1. Introduction

Binocular stereo vision technology is one of the vital approaches to obtain the internal depth information of objects. It is applied to three-dimensional space measurement and robot SLAM and drive-less vehicles. Binocular stereo matching, as a core subject in the field of stereo vision, aims to obtain the horizontal difference between each pixel in a binocular image, that is, parallax, which directly affects the accuracy of depth information acquisition in the 3D scene [1].

The research of stereo matching began with the proposal of Marrs vision theory. In 1977, Marr et al. proposed a computational vision theory, which directly accelerated the development of computer binocular stereo matching theory and algorithms. With the rapid development of computer vision, stereo matching technology has gradually developed and matured, and the current mainstream stereo matching algorithm was developed by Scharstein et al. in 2002. After systematic collation and generalization, the stereo matching algorithm is divided into two categories [2]: the global stereo matching algorithm is based on dynamic planning and a local similarity metric-based stereo matching algorithm. Global stereo matching algorithms mainly include dynamic programming (DP) [3] and belief propagation (BP) [4] and graph segmentation [5]. Although the global algorithm can obtain a high-quality disparity map in most cases, the complexity of the algorithm is very high, which leads to low computational efficiency, not being easy to implement in hardware, and difficulty in achieving real-time performance. Common local stereo matching algorithms include normalized cross correlation (NCC) [6], sum of square differences (SSD) [7], and sum of absolute differences (SAD) [8]. Since local matching algorithms only consider the influence of pixels in the neighborhood on matching, by comparing the locality of the points to be matched in the left and right images, it is difficult to select the size of the matching window, which makes it sensitive to interference factors such as illumination. It is very easy to produce false matching for the processing of local weak texture, local repeated texture, and local parallax discontinuity [9].

With the widespread use of binocular stereo matching algorithms in various industries, it is critical to improve the accuracy and robustness of stereo matching algorithms in light of the complex diversity of real-world situations, as well as the simplicity and timeliness of algorithm implementation, in order to eliminate the interference of many influencing factors on matching results. Hirschmüller created the SGM algorithm (semiglobal stereo matching) [10], which minimizes computing complexity while maintaining matching accuracy and is now regarded as a quick and accurate matching process as a result of ongoing research. The matching effect on no and weakly textured regions, on the other hand, is proportional to the number of path aggregates, and the computed parallax map is heavily influenced by noise. Keselman [11], Pea [12], and others have utilized the census transform algorithm to calculate matching costs. Mei et al. [13] presented the AD-census algorithm, which combines the absolute value of the brightness or grayscale difference (AD) with the census. Mei et al. proposed the AD-census algorithm. Despite significant advancements by academics, the basic local stereo matching technique, the Census transform, still suffers from low noise immunity and excessive dependence on the center coordinates.

To solve the aforementioned problem, a noise-resistant semi-global stereo matching algorithm is proposed, which combines the benefits of global and local stereo matching algorithms while avoiding their drawbacks. Firstly, the pixels in the census transform window in different scales are reordered, and the median values are taken to calculate the Hamming distance, which solves the problem of over-reliance on the center pixel of the census transform window in traditional algorithms. Then, to improve the matching accuracy, the path aggregation algorithm based on unidirectional dynamic programming is applied to optimize the initial generation value, which can reduce the abnormal matching points and perfect the parallax reconstruction of the weak texture parts. Finally, a winner-take-all strategy is adopted to select the parallax corresponding to the minimum cost aggregation value for pixel selection, and the wrong parallax is eliminated by using left-right consistency detection in the parallax optimization stage. The optimized parallax map provides the same effect as the global algorithm, but the algorithms efficiency has been increased.

The following parts of this paper are organized: Section 2 describes the process of implementing the algorithm flow of this study, which briefly discusses the principle of the classic census transform algorithm and its detection effect. The revised reordering census with path aggregation optimization fusion algorithm and its implementation are discussed. In Section 3, two experiments are constructed to test the correctness of the optimization algorithms initial matching cost and the overall noise immunity of the process. Finally, experimental results are used in Section 4 to show that the modified algorithm has a decreased mismatching rate and better noise immunity.

2. Algorithm Principle

The principle of the algorithm is shown in Figure 1. Firstly, this paper improves the traditional census transformation algorithm in the local algorithm, calculates the initial generation value, and introduces the SGM algorithm based on the unidirectional dynamic programming theory for cost aggregation optimization. Then, the WTA (winner-take-all) algorithm [14] is selected to calculate the parallax value corresponding to the minimum aggregate generation value to obtain the initial parallax map. Finally, the initial disparity map is further optimized to obtain a high-quality disparity map.

2.1. Traditional Census Transform Algorithm

The census transform algorithm in the traditional local algorithm calculates the matching cost by using the local gray difference in the pixel neighborhood to convert the pixel gray into a bit string, by comparing the gray value of the pixel in the neighborhood window (the window size is , and both are odd numbers) with the gray value of the central pixel of the window, and the Boolean value obtained from the comparison is mapped into a bit string. Finally, the value of the bit string is used as the census transformation value of the central pixel, and the formula is as follows:

Among them, and are the maximum integers which are not greater than half of and , is a bit-by-bit connection operation, indicates the grayscale value of the image corresponding to the to-be-matched point , and represents the grayscale values of other points in the local area centered around the to-be-matched point. The operation is defined by

After census transformation, the pixel value in the window is replaced by a bit string containing only 0 and 1. The sorting of the bit string is only related to the center pixel value and neighborhood pixel value in the window. The final cost calculation is done using the bit-strings and in the left and right images by

The calculation process is as follows:(1)Calculate the census change value (bit string) in the left and right images, respectively(2)Calculate the Hamming distance of the pixels of the same bit points in the left and right images(3)Calculate the parallax map

The matching cost calculation method based on the census transform is shown in Figure 2.

Among them, the Hamming distance refers to the number of corresponding bits of two different bit-strings. The calculation method is to perform an XOR operation on the two bit-strings and then count the number of bits that are not 1 in the results. The larger the Hamming distance, the lower the matching accuracy of the same bit pixels, while the smaller the distance, the higher the matching accuracy.

However, the matching accuracy of the parallax maps calculated only by the local census transform algorithm is very low, and the optimal parallax cannot be calculated. Figure 3 is the parallax map calculated by the census algorithm.

As can be seen from Figure 3(b), the disparity map calculated by the traditional algorithm has more noise, a high error rate of peer matching, and incomplete reconstruction of some weak texture regions. The traditional census algorithm is weak in noise robustness, and it is very difficult to obtain the depth image of the scene using this type of disparity map.

2.2. Improved Reordering Transformation and Path Aggregation Optimization Algorithm

To solve the above problems, this paper improves the traditional census transformation algorithm and introduces the path aggregation algorithm to optimize the initial generation value. The traditional census transform algorithm compares the central pixel with other pixels in the window when calculating the census transform value, so it is highly dependent on the central pixel. If the central pixel changes, the calculation result (bit string order) will also be transformed. In complex cases, the algorithm has weak robustness to noise. Because of the above situation, this paper proposes a reordering census transform algorithm, which sorts the neighborhood pixels (excluding the central pixel) in the census window in the ascending order according to the gray value and takes the median in the array to replace the central pixel. The median value calculation process is shown in Figure 4.

Among them, denotes the pixel value of the center position, and represents the 8-domain pixel of . represents the median value of an ascending array; indicates rounding. If the size of the array is even, the median value takes the average value of the intermediate elements. If the average value is of the floating-point type, it is rounded.

As shown in Figure 5, when the center pixel is 20, the census transformation value calculated by the traditional algorithm is 011111101. When the center pixel suddenly changes to 26, the transformation value is 00100100, and the code string order has changed significantly. If the reordering census transformation algorithm is used to calculate the transformation value, although the central pixel changes suddenly, the window calculation does not depend on the central pixel at this time, and the result is still 00100100.

In traditional methods, when the central pixel is affected by noise, the change value of census changes, and the result of calculating Hamming distance is too large. In the improved method, it is no longer dependent on the central pixel of the window. When affected by noise, the census change value remains unchanged, and the Hamming distance does not change.

Based on the initial cost space received from the cost computation, the cost aggregation is a weighted sum of the starting costs of all pixels within a defined window centered on a single pixel. To further improve the accuracy of the matching generation value calculated above, the path aggregation optimization algorithm based on one-way dynamic programming is introduced to optimize the initial matching generation value.

The optimization algorithm in this paper adopts the idea of a global stereo matching algorithm[15] to minimize the global energy function by finding the optimal parallax of the image. The following formula is the definition of the energy function:where is a data item and is a smoothing item.

To efficiently solve the two-dimensional optimization problem in equation (4), a more specific energy function is proposed, as shown in:where the first term is the sum of all pixel point matching costs for a parallax value of and is the matching cost. The second term imposes a fixed penalty on all pixels in the vicinity of pixel point . When there is a substantial shift in the neighborhood pixels, the third part of the energy function adds a larger constant penalty , and is more than . The third term’s function is to allow the method to lower the mismatching rate while preserving the images discontinuous features in the parallax discontinuity zone.

As the specific energy function (5) proposed is still an NP [16] complete problem, a path cost aggregation optimization strategy based on one-way dynamic programming is given, and the problem of finding a two-dimensional optimal solution is approximately divided into multiple one-dimensional path aggregation. The matching cost of each pixel under all parallax is aggregated in one dimension on the path around the pixel, and then the calculated generation value of each path is added to obtain the matching generation value of each pixel after aggregation. The path cost calculation method of pixels along the path is shown as follows:where represents the pixel and represents the path. It is the aggregate generation value and the initial generation value.

represents the aggregation cost value when the disparity of the previous pixel in the path is ; represents the aggregate cost value when the disparity of the previous pixel in the path is ; represents the aggregate cost value when the disparity of the previous pixel in the path is ; represents the minimum value of all cost values of the previous pixel in the path. The path cost aggregation algorithm is shown in Figure 6.

In Figure 6, the aggregation methods can be divided into three types according to the numbers and letters in the square. 16 direction arrows with numbers 1 ∼ 16 represent 16 paths, 8 direction arrows with letters A ∼ H represent 8 paths, and 4 paths are represented by 4 direction arrows with letters B, D, F, and H. Generally speaking, the greater the number of aggregation paths, the better the effect, but the lower the efficiency. The total path generation value of pixels is calculated by

In order to maintain the efficiency of the algorithm, an 8-path aggregation optimization algorithm is used to calculate the parallax of pixels’ aggregate value of the cost under.

The path cost aggregation optimization algorithm is an approximate optimization problem based on dynamic programming, which decomposes the optimal solution of the current pixel into the optimal solutions of subproblems in N directions. The improved reordering census transform algorithm is combined with the path aggregation optimization algorithm to strengthen the relationship between pixels. Compared with the cost of matching a single pixel, the calculated cost value can effectively weaken the impact of noise on the image, further improve the matching accuracy, and retain the edge information of the image at the same time.

2.3. Disparity Refinement

After aggregating and optimizing the initial generation value, the disparity calculation adopts the WTA (winner-take-all) algorithm. In the aggregated and optimized cost space, the pixels to be matched will be p the parallax value corresponding to the calculated minimum matching cost is used as the pixel initial parallax; the calculation formula iswhere is the parallax value and is the matching generation value.

Finally, the initial parallax map is optimized. (1) In the initial parallax map, the left-right consistency strategy is used to find the wrong matching points. (2) Eliminate the connected very small areas in the parallax map. (3) Select the reliable parallax value to fill the parallax of the mismatched points.

2.4. Calculation of Mismatch Rate

The mismatch percentage of a parallax map is usually an effective means to evaluate the quality of a parallax map. The calculation formula is

The total number of parallax pixels is N; is the parallax value of pixel obtained by the algorithm; pixels corresponding to the standard parallax map provided for the Middlebury [17] dataset parallax value of ; is the threshold.

3. Analysis of Experimental Results

In order to verify the reliability of the algorithm in this paper, the algorithm is written in C++ language under the development platform of Visual Studio 2019, and the image is read, displayed, and saved through Open CV, an open-source library. The experimental hardware equipment is Intel(R) core() i5-4200h CPU @ 2.80 GHz and 12 GB of running memory. The experimental images in this paper are used in the standard data set on Middlebury website. Four groups of images, Cones, Reindeer, Wood2, and Cloth3, are used, as shown in Table 1, corresponding to the pixels of the four groups of standard test images and the corresponding parallax search range.

The obtained disparity map is submitted to the Middlebury website for evaluation. The calculation method of the error matching rate has been described in Section 2.4 of this paper.

3.1. Initial Matching Cost Evaluation

In order to verify the advanced nature of the algorithm for reordering census in the initial cost matching stage, we compared the reordering census algorithm with the traditional census algorithm. We use the Middlebury dataset for Cones and Reindeer in two sets of images and compare the disparity maps of two algorithms with the standard disparity map. The average value of the error matching rate in the unobstructed area is analyzed. The experimental results are shown in Figure 7, and the average error rates of the two algorithms are shown in Table 2.

In Figure 7, comparing the standard parallax map in Figures 7c(b–d), both algorithms have certain defects, but the improved reordering census algorithm is better than the traditional algorithm. The contour of the scene reproduced by the traditional census algorithm is not clear, and there is more noise in the generated parallax map. The improved reordering algorithm in Figure 7(d) can reproduce a more complete scene contour and reduce the noise in the parallax map.

As shown in Table 2, the improved reordering algorithm reduces the average mismatching rate by 8.22 percent when compared to the traditional census algorithm, demonstrating that the improved reordering algorithm can effectively reduce the false matching rate and improve the noise resistance in the stage of calculating the initial generation value.

3.2. Antinoise Analysis

To further verify the antinoise performance of this algorithm, it is compared with SAD, NC, C and AD-census [18]. Cones, Reindeer, Wood2, and Cloth3 images in the Middlebury dataset were used.

The experimental process is as follows: salt and pepper noise with a density of 0.04, 0.06, 0.09, and 0.12 and Gaussian noise with a standard deviation of 2, 4, 6, and 8 are added to the four groups of test charts, respectively. Four algorithms are used to calculate the parallax map of four images under different noises, compare the obtained parallax map with the standard parallax map, and calculate and analyze the average mismatch rate of each algorithm in the unobstructed area of the four images under each noise, and the results are shown in Tables 3 and 4.

Tables 3 and 4 compare the average mismatch pixel rates of parallax maps generated by SAD, NCC, and AD-census, and this paper’s approach is to realize parallax maps at various pretzel noise and Gaussian noise intensities. Figures 8 and 9 correspond to the histogram of each group of data, which can more intuitively show the experimental comparison effects of the four algorithms.

According to the results in Table 3 and the histogram in Figure 8, the average false match rate of this technique in the nonobscured region is kept as low as possible when the pepper noise density range is between 0 and 0.12. When the noise density range is between 0 and 0.04, the average false match rates of SAD, NCC, AD-census, and this technique are only 2.33%, 2.07%, and 1.39%, respectively. When the noise density range is gradually increased to 0.060.09, the false match rate of this algorithm changes less, rising by 0.59 percent on average each time, whereas the other algorithms climb by 1.1 percent on average each time. This algorithm improves the matching rate by 2.72 percent when compared to the better-performing AD-census algorithm.

It can be seen from Table 4 and the histogram in Figure 9 that, for Gaussian noise under any standard deviation, the average false match rate of this paper’s approach is the lowest in the nonobscured region, and the advantage is obvious. When the Gaussian standard deviation range is between 0 and 4, the detection result matching rate of the SAD algorithm is 2% lower on average when compared to this paper’s algorithm, and the difference in the ability of NCC, AD-census, and this paper’s algorithm to deal with noise is smaller, with the average difference of the NCC algorithm being 1% and the difference of the AD-census algorithm being 0.83%. When the Gaussian standard deviation is increased to 6, the matching rate of this study improves by 2.08% and 1.89%, respectively, when compared to the NCC and AD-census algorithms. When the Gaussian standard deviation is increased to 8, the figures become 2.56% and 2.34%, respectively. It is evident that the distance between the remaining three methods and this study is growing as Gaussian noise increases; specifically, the SAD algorithm has the poorest performance among the four algorithms, but the robustness of this paper’s algorithm to Gaussian noise is stronger.

Based on the above experimental results, it can be concluded that the optimization algorithm based on reordering and one-way dynamic programming proposed in this paper can effectively avoid the interference caused by different noises in the nonoccluded area, and the matching results are better than the other three traditional algorithms and have good robustness to both noises.

According to the data in Table 4, the effect of the AD-census algorithm is second only to that of this algorithm in many cases. In this paper, the final parallax map generated by the two algorithms is used as the effect comparison diagram. As shown in Figure 10, it is the parallax map obtained by matching the algorithm in this paper with the AD-census algorithm for Cone, Reindeer, Wood2, and Cloth3 images with Gaussian noise.

In Figure 10, Figure 10(c) is the test diagram with Gaussian noise added, Figure 10(d) is the parallax diagram obtained by AD-census algorithm matching, and Figure 10(e) is the final parallax diagram obtained by this algorithm matching. By comparing the parallax map generated by the two algorithms of Figures 10(d) and 10(e), the parallax map obtained by the algorithm in this paper has a better effect, and the matching effect of AD-census algorithm is poor when dealing with noisy images. In the Cone, Reindeer, and Cloth3 parallax map (ellipse marking part) generated in Figure 10(d), there are many white mismatching points, even some white mismatching areas, and there are defects in the reconstruction of image edges. In the Wood2 and Reindeer parallax map generated in Figure 10(d) (ellipse marking part), the reconstruction of weak texture part is incomplete. Compared with the comparison algorithm, the proposed algorithm greatly improves the matching accuracy in the nonoccluded area, can effectively weaken the noise interference points in the image, and has stronger robustness. High precision reconstruction can be obtained in weak texture areas, better matching can be obtained in texture rich and multitexture images, and high precision dense parallax images can be obtained.

4. Conclusion

By computing the reordered census window technique, we offer an improved reordered census transform to retrieve the primitive generation value. Meanwhile, in the parallax selection stage, the path aggregation optimization algorithm is introduced to perform energy aggregation of the original generation values of pixels from 8 directions, and the dense parallax map is obtained by using the winner-take-all measurement and the left-right consistency detection method. The experimental findings demonstrate that the modified algorithm reduces the average mismatch rate of the original parallax map by 8.22%, and the tie mismatch rate under diverse sounds is less than 8%, thus improving noise immunity.

This research has unavoidable limitations. First, while the improved semiglobal stereo matching algorithm improves matching accuracy when compared to the traditional algorithm, the algorithms operation speed is insufficient to achieve the real-time effect, and second, the algorithm’s accuracy fluctuates in the face of complex images, which is caused by the algorithm’s lack of stability in addition to the different characteristics of the images themselves, as well as demerits.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the Fundamental Research Funds for the Henan Provincial Colleges and Universities in Henan University of Technology (no. 2016QNJH03) and the National Natural Science Foundation of China (no. 62173127).