Abstract
The camera calibration in monocular vision represents the relationship between the pixels’ units which is obtained from a camera and the object in the real world. As an essential procedure, camera calibration calculates the three-dimensional geometric information from the captured two-dimensional images. Therefore, a modified camera calibration method based on polynomial regression is proposed to simplify. In this method, a parameter vector is obtained by pixel coordinates of obstacles and corresponding distance values using polynomial regression. The set of parameter’s vectors can measure the distance between the camera and the ground object in the field of vision under the camera’s posture and position. The experimental results show that the lowest accuracy of this focal length calibration method for measurement is 97.09%, and the average accuracy was 99.02%.
1. Introduction
Measuring the distance between self and the obstacle is a crucial part of many fields. The type of method to measuring distance by cameras is called the vision-based ranging method, which may promote the development of automatic measurement and has a great research value [1, 2]. The vision-based ranging method includes the monocular vision-based ranging method and stereo vision-based ranging method. Stereo vision-based ranging methods use the parallax of cameras to measure, which needs to match multiple images taken by multiple cameras. Monocular vision-based ranging method has a higher performance than the stereo vision-based ranging method because it does not need match images in the data preprocessing stage [3].
Monocular vision-based ranging methods can be divided into three categories, including proportion-based methods, machine learning-based methods, and coordinate transformation-based methods. Proportion-based methods are according to the principle that the distance is inversely proportional to the image’s size from the target in the image plane [4, 5]. Taking the ranging model proposed by Bao and Wang [5], this model first assumes the width of all vehicles as a fixed value and then uses this value to learn the model parameters from the images of vehicles at different distances. The ranging accuracy of this model is unsteady because if the measured vehicle is not directly in front of the camera, the value of width will no longer be correct. Machine learning-based methods try to let the computer learn the parameters of the ranging model. For example, Meng et al. [6] proposed the ranging model based on the R-CNN [7] and nonlinear regression. This model first utilizes the R-CNN to detect the position of the preceding vehicle, and then, the distance is obtained by a nonlinear regression model. The accuracy of the objective decision is more than 90%. Coordinate transformation-based methods have strong interpretability. This type of method utilizes the relationship between world coordinates, camera coordinates, pixel coordinates, and camera parameter matrix to calculate the distance between the camera and obstacles [8, 9]. Taking the method proposed by Lu et al. [9], this model increased the ranging accuracy to 95.43%, which derives the ranging model based on the principle of visual imaging. Although the ranging accuracy of this model is higher than previous methods, camera calibration as a key technology in this model needs to be improved.
The high-precision camera calibration aims to determine a set of geometric parameters. It is a prerequisite to extract 3D information from the captured images using the object’s projections in the image plane. Camera calibration methods can be divided into two categories, including traditional calibration and self-calibration. The traditional calibration utilizes multiple reference points on the object to establish the relationship between the 2D pixel coordinates and the 3D world coordinates. Yakimovsky and Cunningham [10] in 1978 proposed a camera calibration method to calculate the transformation matrix of stereo cameras. They used a highly linear lens and ignored the distortion to improve the accuracy of 5 mm at a distance of 2 m. Due to the narrow field of view and not considering lens distortion, this method may cause more errors in a wide field. The unknown parameters computed by linear equations in the traditional camera calibration method may not be linearly independent, which increases the error of calibration. Martins et al. [11] in 1981 proposed the biplane calibration method using the points on the double calibration plane. This method avoids any restrictions on the extrinsic camera parameters. On the premise that there is no deflection in the coordinate system, the average error is about 4 mils with a distance of 25 inches. However, the nonlinear lens distortion is not corrected. Tsai [12] proposed a novel camera calibration method which considered the camera distortion. First, the camera parameters are solved using the direct linear transformation method or perspective projection transformation matrix. Then, the obtained parameters are taken as the initial values. The nonlinear optimization method is used to improve the precision of the calibration. This method requires high-precision calibration targets and a similar size field of vision. Chen et al. [13] proposed a calibration method based on a bundle adjustment system, which reduces the requirement of calibration targets. The world coordinate system is relatively stationary with the steam hammer to decrease the error of calibration. Experiments show that the calibration system measures the ram speed of a steam hammer accurately.
Traditional camera calibration has a strong dependence on calibration targets, which has high requirements for the precision of calibration targets. This method is difficult to transfer the unknown scene because it is usually limited to a specific field of vision and distance. Therefore, the self-calibration was proposed because of its higher operability, which obtains the image sequence by controlling the camera motion and then calculates the parameters by matching the image sequence. Luong and Faugeras [14] proposed a self-calibration method based on point correspondences and fundamental matrices. They used point correspondences between three images to estimate the perspective projection matrices and parameters of the camera. In contrast to traditional camera calibration methods, this method washes calibration targets with a known 3D shape. This method has the disadvantages of the high cost of polynomial calculation and sensitivity to noise since the continuation work in the complex plane. On this basis, Zhang [15] proposed a calibration method between the traditional calibration method and the self-calibration method. This method utilizes the two-dimensional measurement information for the camera’s calibration, which reduces the requirements for the equipment compared with the traditional calibration. In this method, many different angle images of a two-dimensional plane calibration target (checkerboard) are taken to detect the feature points and calculate the parameter matrix of the camera. Nonlinear refinement is based on the maximum likelihood criterion to optimize the calibration results. Although this method has simplified the calibration process compared with previous methods, it still has some requirements for the standardization of operation. For example, the image taken should avoid noise. Otherwise, the inaccurate corner extraction will increase the calibration error. Dong and Isler [16] proposed a method for external parameters of camera calibration that obtained the point-to-to plane constraints by two noncoplanar triangles. This method reduces the dependence on the camera’s initial state estimation. At the same time, it reduces the number of observations without reducing the accuracy of the calibration. Xu [17] proposed a camera calibration method based on mirrored. This method reflects two groups of orthogonal phase-shifting sinusoidal figures by a mirror to calculate the relationship between cameras by constraints between phases.
To sum up, monocular vision-based ranging methods based on coordinate transformation have higher accuracy than other methods. This type of method [18, 19] has not only a model efficiency but also interpretability and stability. They are generally based on the linear imaging model constructed by the pinhole imaging principle to simplify the derivation of the ranging model because the pinhole imaging principle can directly construct the geometric correspondence between the world coordinates and the pixel coordinates. At the same time, considering that most of the vision-based ranging methods are used in the automatic measurement of robots or intelligent vehicles, where the camera is fixed in a certain attitude, a lot of measurement targets are ground objects. Therefore, this study proposes a ranging model different from the previous vision-based ranging methods, which measures the distance between the camera and the ground object when the camera has an inclination angle in three dimensions. Because the world coordinate system is two-dimensional, the computational complexity of the model is lower and there are not many requirements for the three-dimensional structure of the object.
As the only internal parameter of the camera needs to be calibrated in the model, the accuracy of focal length calibration directly affects the range’s accuracy. In the linear imaging model, the object is presented to the image plane through a small hole depending on the principle of straight-line propagation. But in fact, the camera is convex lens imaging, which has distortion and defocusing phenomenon. Distortion will lead to the inconsistency between the theoretical imaging position and the actual imaging position, and defocusing will lead to the image in some positions not presented on the image plane. Previous references [20, 21] proposed the calibration of a nonlinear camera, which calibrated the focal length and distortion parameters and then restored the theoretical imaging position through the distortion parameters and the actual imaging position. Although radial distortion, centrifugal distortion, and thin prism distortion are considered in these methods, defocusing or other possible factors are not well considered. Therefore, the accuracy of the ranging model can be improved.
In this study, all the imaging points are considered to be obtained based on pinhole imaging, so there is no need to restore the pixel coordinates. The effects of distortion and defocusing are reflected in the value of focal length. That is to say, each point at each position has its corresponding focal length. This focal length includes various factors that may affect the imaging position. This study uses the actual distance and pixel coordinates to calculate the focal length corresponding to different pixel positions and then analyses the distribution of focal length value relative to pixel coordinates. It is found that their distribution is not a simple linear distribution. Therefore, to improve the accuracy of range, this study attempts to learn the distribution of focal length corresponding to different pixel positions by nonlinear regression (polynomial regression). The results show that the method can effectively improve the accuracy of the ranging model.
In a word, this study focuses on proposing a simple and high-accuracy camera’s focal length calibration method to improve the accuracy of the monocular vision-based ranging model. The major method is to use a simple linear imaging model to deduce the complexity of the ranging model and then combine the distortion and defocusing phenomenon caused by the nonlinear imaging of the camera into the focal length calibration process. The main innovations of this study are as follows:(1)When the camera has an inclination angle in three dimensions, the ranging model for ground object based on the linear imaging model and geometric coordinate transformation is proposed(2)The distortion caused by convex lens imaging and the influence caused by defocusing is reflected in the focal length of the linear imaging model(3)The calibration process does not require the calibration target, and the camera does not need to move.(4)The focal length values containing the effect by distortion and defocusing are calculated by the ranging model, and the nonlinear distribution is learned by polynomial regression.
2. Proposed Methods
2.1. Monocular Vision-Based Model
In the real scene, most of the obstacles are on the ground, such as pedestrians or other vehicles. The model in this study focuses on the ground feature points, that is, the 3D world coordinate system is reduced to the 2D coordinate system. The posture of the installed camera depends on the installed location. The camera does not have an inclination angle when the camera is facing the front and the optical axis is parallel to the ground. The optical axis is defined as a line perpendicular to the image plane and passing through the optical center. Based on the pinhole camera model and the principle of straight-line propagation of light [22], the ranging model for the ground objects without the inclination angle of the camera is shown in Figure 1.

As shown in Figure 1, is the projection of the camera optical center on the ground plane. The optical axis passes through the camera optical center and is parallel to the ground plane; the optical axis is perpendicular to the image plane. The world coordinate system is established with as the origin, the line passing through and parallel to the optical axis as Y-axis, and the line passing through and perpendicular to Y-axis as X-axis. is the measured point on the ground plane, is the projection of on the Y-axis, is the projection of on the X-axis, is the projection of the optical axis on the image plane, the height from the optical center to the ground is H (the length of ), the image on the image plane of the point is the point , and the physical coordinates of on the image plane are . The length of is recorded as the focal length . The length of (d) is the distance which we will calculate. The camera will have an inclination angle with three different dimensions in the real scene. The details of the three dimensions are shown in Figure 2.

(a)

(b)

(c)
As shown in Figure 2(a), the field of vision of the camera is changed after the camera rotates. The ranging model is consistent with the noninclination angle. As shown in Figure 2(b), the position of the optical center did not change, and the optical axis is still parallel to the ground. The difference with the noninclination angle is that the image plane rotates around its center in the two-dimensional plane. The image plane can be restored to the state without inclination by reverse rotation in the two-dimensional plane. The inclination angle obtained by this way of rotation is called the left or right inclination angle. Since it is not involved in reconstructing the ranging model, the detail of restoring the left-right inclination angle to a noninclination angle is described in the discussion. As shown in Figure 2(c), the optical axis of the camera is not parallel to the ground after rotation. The ranging model needs to reconstruct. The inclination angle obtained by this way of rotation is called the up or down inclination angle. Taking down the inclination angle for an ample, the illustrated of the ranging model is shown in Figure 3.

The meanings of , and in Figure 3 are the same as is shown in Figure 1. The difference between Figures 1 and 3 is that is the optical axis with an inclination angle , that is, the angle between the horizontal line and the optical axis is called . When the camera has a down inclination angle, regulation , and when the camera has an up inclination angle, regulation . M is the intersection point of and the horizontal plane (plane ), is the intersection point of and the horizontal plane (plane ), and the plane is parallel to the ground plane; the optical axis is perpendicular to the image plane. The length of is defined as the focal length , and the length of (d) is the distance which we will calculate.
is the projection of on the plane , is the projection of on the ground plane, and the three points of are colinear. Because the plane is parallel to the ground plane, according to the property of parallel planes, the angle between a line and its projection on two parallel planes is equal. Therefore,
The optical axis is perpendicular to the image plane and is in the image plane. According to the vertical property of line and planes, a straight-line perpendicular to a plane is perpendicular to any straight line in the plane. In , , and is
Because the optical axis is perpendicular to the image plane and is in the image plane, . Following Pythagoras’ theorem, is
is perpendicular to the image plane and in the plane . According to the judgment theorem of plane perpendicularity, is perpendicular to the image plane. The plane passing through is also perpendicular to the image plane. is in the plane and is in the image plane. In , , . According to the properties of a parallelogram, the opposite sides are parallel and equal, , and is
The plane is perpendicular to the image plane, and is in the plane and is in the image plane. In is
is perpendicular to the ground plane, in and is
The simultaneous equations (8) and (11)–(13) can be obtained:
According to the arctangent addition theorem, we get the following results:
Namely,
From equation (9), it concludes that
From equation (10), the coordinate of the measured point on the image plane is the physical coordinate. Its unit should be a millimeter. However, the coordinate of the point directly extracted from the image plane is the pixel coordinate. This pixel coordinate system is established with the upper left corner of the picture as the original center, and the unit is pixels. It is necessary to translate the pixel coordinate into the physical coordinate in equation (10). The transformation formulas are shown in equation (11).
In equation (11), and denote the physical coordinates of point , and denote the pixel coordinates of point , and and denote the size of a pixel unit in and directions. and denote the row resolution and the column resolution of the image plane.
2.2. The Proposed Camera Calibration Model
According to equation (10), the constraint equation of focal length is shown in equation (12), where the distance d in equation (10) is replaced by the accurate distance D in equation (12) measured by the rangefinder.
When the camera’s height from the ground and the camera’s inclination angle remain unchanged, the pixel coordinates of the targets at different positions are different. According to the experimental data, the focal length which is calculated by equation (12) is different. The reason for this phenomenon can be explained by the defocus phenomenon [23] of convex lens imaging. For a camera, the image plane is fixed according to the clearest image in the center. Therefore, the image outside the center of the image plane does not exactly fit the image’s position. The points with different object distances correspond to different focal lengths. To obtain the focal length corresponding to different physical coordinates , this study utilizes polynomial regression to learn the relationship between focal length and the physical coordinates. The physical coordinates are the independent variables, and focal lengths are the dependent variables. The polynomial in the form of equation (13) and the coefficient vector in equation (13) are obtained by polynomial regression.
2.3. Error Compensation Algorithm
To improve the accuracy of the ranging, we propose an algorithm to compensate for the ranging error which is produced from the coordinate extraction of the feature point and the measurement of the camera inclination angle by adjusting the pixel coordinates . According to the experimental data, the generalization ability of the model is the strongest and the most stable when the ranging error threshold is set at 0.5%. The procedure is outlined in Algorithm 1. is the difference between the calculated distance and the accurate distance .
|
3. Experiments
The equipment in our experiment is shown in Figure 4. The camera in this experiment is pixels. 100 targets are placed in the range from 2 to 20 meters. The spatial location of the camera remains the same. The accurate distances between the optical center projected on the ground and the targets are measured by a laser range finder.

As is shown in Figure 5, the pixel coordinates of the contact point between each target and the ground are extracted using MATLAB. From the 100 pieces of data, 14 pieces were randomly selected as the training dataset and 6 pieces as the test dataset. The focal length of the camera in the training dataset is calculated by equation (11). Utilizing the polynomial regression function in the fitting curve toolbox of MATLAB obtained the regression vector in equation (13). The focal length of the test dataset is calculated by equations (11) and (12). Equation (10) is used to calculate the distance d of the test dataset.

4. Results
The values of focal length calculation results of a training set are given in Table 1. For H (the height of the camera), is the inclination angle of the camera. D is the distance between a target and the camera, which is measured by the laser rangefinder. are the physical coordinates calculated according to equation (11), where , , and millimeters. Then, the focal length is calculated according to equation (12). All variables in Table 1 are in millimeters (mm), except the variable a.
Taking and in Table 1 to polynomial regress, the number of parameters in the regression vector is called the length of the vector. Before regression, the convergence value of the length of the regression vector needs to be determined. The length of the vector depends on the regression accuracy and the size of the dataset. When the regression accuracy is constant and the size of the dataset is large enough, the value of vector and the length of the coefficient vector should be convergent. When the regression accuracy is fixed above 99.5%, Figure 6 shows the length of the vector for different sizes of the dataset. As is shown in Figure 6, when the size of the dataset is less than 70, the length of the vector gradually increases with the increase in the size of the dataset. When the size of the dataset reaches 70, the length of the vector is stable at 12. Therefore, the length of the vector is converged to 12. The calibration equation of focal length is shown in equation (14), and there are 12 regression parameters in equation (14). After the length convergence value of the regression vector is determined, Figure 7 shows the parameters in the regression vector obtained by polynomial regression of the data in Table 1. Figure 8 is the regression model.



and E (d) are recalculated based on equation (14) and the values in Table 1, and the results are given in Table 2. is the regression focal length calculated from Figure 7. Before E (d) and after E (d), respectively, represent the ranging error before and after using Algorithm 1. To verify the effectiveness of the proposed method, a number of experiments under different camera heights and inclination angles were made. The result of the maximum and average ranging error is given in Table 3. The maximum error of ranging is 2.91%, and the maximum average error of ranging is 0.98%.
5. Discussion
5.1. Ranging Model for the Camera with a Left or Right Inclination Angle
Compared with no inclination angle, when the camera has a left or right inclination angle as shown in Figure 2(b), it is equivalent to that the pixel coordinate system rotates the same angle in the same direction. It is also equivalent to that in the same pixel coordinate system, and the imaging points rotate in the opposite direction with the same inclination. The diagram of coordinate transformation is shown in Figure 9.

is the image of the fourth quadrant when the camera has a right inclination angle, and is the image point without an inclination angle restored to the left, where . In , according to the arctangent function,
Clearly,
In ,
in equation (17) is the pixel coordinates of the measured point without a left inclination angle and right inclination angle in equation (11). When the pixel of the target is in other quadrants, the formula of coordinate transformation is different from the fourth quadrant, but the principle is the same.
5.2. Robustness of the Vector
To study the robustness of vector , the height and selection of the camera were analyzed in this study. Table 4 provides the value of vector when the height of the camera is 229 millimeters and 239 millimeters.
Table 5 provides the value of the vector under camera and camera at the same height. As given in Tables 4 and 5, the height and selection of the camera will affect the value of the regression vector . Therefore, in the application, the same regression vector is only suitable for the same height and the same camera.
5.3. Compare with Other Methods
To verify that the calibration method proposed in this study is suitable for the monocular vision ranging model and can effectively compensate for the distortion and defocus phenomenon, this study is compared with the Zhang [15] and Li [20] method calibrate the same camera, and the calibration results are given in Table 6.
Table 6 provides the intrinsic matrix calibrated by Zhang and Li methods. This matrix contains the focal length information of the camera. The radial distortion and the tangential distortion in Table 6 represent the distortion information of the camera, and the pixel coordinates can be corrected by these distortion parameters. The focal length (regression model) in Table 6 is the regression model of focal length obtained in this study, where x and y are the pixel coordinates.
In order to verify the performance of the three calibration methods in the monocular vision ranging model, a large number of ground measured points are randomly distributed on the ground. The distribution of the measured points on the image plane is shown in Figure 10. These measured points are randomly distributed in all areas of a picture and randomly selected four groups of test sets, 30 feature points in each group. Zhang and Li’s methods use their distortion parameters to correct the pixel coordinates before ranging. The focal length obtained by the three calibration methods and the pixel coordinates after correction is substituted into the ranging model to calculate the distance. The ranging error results are shown in Figure 11.


(a)

(b)

(c)

(d)
As shown in Figure 11, the proposed calibration method is more stable and accurate than Zhang and Li’s method for the monocular vision-based distance measurement model. Table 7 provides the accuracy comparison of the ranging model in this study compared with other ranging models.
Combined with Figure 11 and Table 7, it can be seen that the proposed calibration method in this study is more suitable for the ranging model in this study than other calibration methods. The monocular vision-based ranging model in this study has higher accuracy than other ranging methods.
6. Conclusions
In this study, a camera’s calibration for a monocular vision-based ranging model based on polynomial regression was proposed to establish the relationship between world coordinates and pixel coordinates. For the possible distortion of the captured images, we utilize an error compensation algorithm to revise the pixel coordinates. Compared with other methods, our method has a high score for matching. The process of regression and error compensation can effectively compensate for the errors caused by distortion without the distortion parameters of the convex lens. The experimental results show that the accuracy of ranging in this study is more than 97%. In the future, we will combine our method with image recognition to improve safety in fully autonomous driving.
Data Availability
The data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare that there are no conflicts of interest.
Acknowledgments
This project was supported by the National Natural Science Foundation of China “Research on accurate ranging method based on vehicle monocular vision” (61741207), the Inner Mongolia Natural Science Foundation “Agricultural Machinery Obstacle Avoidance Algorithms Based on CNN and Vehicle Monocular Vision Dynamic Ranging Model” (2019MS06016), and the Research and Innovation Funding Projects for Postgraduates in Inner Mongolia “Vehicle camera self-calibration technology based on target recognition and geometric ranging model.”