Abstract
Cameras with telephoto lens are usually used to recover details of an object that is either small or located far away from the cameras. However, the calibration of this kind of cameras is not as accurate as the one of cameras with short focal lengths that are commonly used in many vision applications. This paper has two contributions. First, we present a first-order error analysis that shows the relation between focal length and estimation uncertainties of camera parameters. To our knowledge, this error analysis with respect to focal length has not been studied in the area of camera calibration. Second, we propose a robust algorithm to calibrate the camera with a long focal length without using additional devices. By adding a regularization term, our algorithm makes the estimation of the image of the absolute conic well posed. As a consequence, the covariance of camera parameters can be reduced greatly. We further used simulations and real data to verify our proposed algorithm and obtained very stable results.
1. Introduction
In various vision based applications, a camera with a telephoto lens is often useful to acquire detailed information of objects. It could capture high resolution face images for the purpose of recognition and reconstruction even when a user is at a distance [1]. It also could obtain eye images with rich iris textures when a user is several meters away from the camera [2, 3]. In [4], a telephoto lens is used to observe objects under the influence of optical turbulence. By combining with a wide-angle camera, a robotic vision system has been shown in [5], which is suitable for remote surveillance or minimally invasive surgical interventions that could have a higher resolution than typical commercial endoscopes. As the field of view of a telephoto lens could only have a few degrees (e.g., around 8 degrees for a 300 mm telephoto lens), in order to either track objects or reconstruct complete views, an accurate estimation of camera parameters is required.
In the photogrammetry community, camera calibration usually is done by computing the projection matrix using accurate 3D points and corresponding 2D observations [6, 7]. However, in practice, it could be difficult or expensive to build an object with accurate coordinates, especially in a large working space. In the area of computer vision, the calibration technique [8] that requires only a planar pattern (e.g., a checkerboard pattern) is widely used. In this technique, a planar pattern is placed with different orientations and at different distances from the camera. Homographies are estimated between the planar pattern and its observations. These homographies could form a homogeneous system that is used to solve the image of the absolute conic. The intrinsic and extrinsic parameters are then computed by using the estimated homographies and the image of the absolute conic. In the final step, the maximum likelihood estimation (MLE) is applied to estimate the radial distortion and refine the intrinsic and extrinsic parameters by minimizing geometric errors. This technique has further been evaluated with respect to image noise level, number of planes, and orientation of the model plane. Various autocalibration techniques [9, 10] are also proposed to estimate fixed or varying intrinsic parameters without predefined calibration patterns. The basic idea is that the absolute conic is fixed when a camera is moving rigidly.
Since focal lengths of cameras in many vision applications are relatively short, most of existing algorithms consider image noise as a major source of estimation uncertainties and limited research has been conducted on the uncertainties caused by focal length [11, 12]. In [11], Strobl et al. found that the narrow field of view makes the calibration more difficult due to lack of required evidence on perspectivity. In order to improve the calibration accuracy, the camera with a narrow field of view is mounted on a robotic manipulator from which the rigid motions can be read. These rigid motions could provide more constraints for solving the intrinsic parameters and a relative geometric relation between the camera and the robotic manipulator. Similarly, a pan-tilt unit could be used during the calibration as shown in [12].
There are mainly two contributions in this paper. Firstly, we present a first-order error analysis that shows the relation between estimation uncertainties and focal length. Although authors in [11, 12] briefly described the calibration problem caused by long focal length, the error analysis with respect to focal length has not been studied so far. Secondly, we propose a robust algorithm without using additional devices, which is based on the regularization term defined by the prior of the image of the absolute conic.
The remaining of this paper is organized as follows. Section 2 introduces necessary notations and background of existing algorithm using estimated homographies. Section 3 gives the error propagation from image noise to camera parameters. Our calibration algorithm is proposed in Section 4. Section 5 shows the experiments on simulation and real data. The conclusion is given in Section 6.
2. Notation and Background
In this section, we start with the notation and then briefly introduce the calibration technique proposed in [8].
The homography between the planar pattern and image plane is denoted by , the intrinsic matrix is given by where is the coordinates of the principal point, and are scale factors, and and are the number of pixels per unit distance in image along and directions. The image of the absolute conic is .
Given an image of a planar pattern, two constraints can be imposed on the intrinsic parameters, and . Therefore, a constrained optimization can be formed by where is a vector extracted from and is a matrix constructed from entries of . The intrinsic matrix is computed by the Cholesky factorization. Since the close form solution is obtained through minimizing algebraic errors, the maximum likelihood is further applied to refine the results by minimizing geometric errors.
The calibration performance with respect to image noise level, the number of planes, and the orientation of the model plane are also evaluated in [8]. Based on the computer simulations, the errors increase linearly with the image noise level and decrease when more images are used. The best orientation of the model plane is around 45 degrees.
3. Covariance of the Estimated Intrinsic Parameters
In order to find out the relation between focal length and uncertainties of camera parameters, it is not sufficient to only have a point estimate of the parameters. In this section, we present a first-order approximation to compute covariance of estimated parameters.
Let us consider two cameras with different focal lengths and (assume ) and sharing a same image plane. The origin is located at the center of projection of camera 1. Through a single point on the sensor, camera 1 observes a 3D point and camera 2 observes a 3D point , while and are located on a same planar pattern. We also define a transformation that transforms the 3D points and to a coordinate system on the planar pattern such that new depths and are equal to 0. Let us denote and . The configuration is shown in Figure 1.

As this configuration consists of the same image plane, same 2D observations, and same orientations and locations of the planar pattern, the major difference between two cameras is the focal lengths. For simplicity, we assume all the pixels are square, we have Since the center of projection of camera 2 is at , through the same point , we have Putting this together with (3) leads to the formula
Let be the image coordinate of the point on the sensor. Assuming that two cameras have same image resolution and the principal point, is same for both cameras. Let us further assume that the noise is limited to the observed image with covariance , the covariance of the intrinsic parameters is where , , and are the Jacobian matrices evaluated at , , and , respectively. , , and are the vectors made up of the entries of the intrinsic matrix , the image of the absolute conic , and 2D homography . As , , and only depend on and and are same for both cameras, we only need to analyze the relation between of two cameras. The Jacobian matrix for th observed point is given by Based on (5) and (7), it is not difficult to find that where and are vectors made up of the entries of homographies between the planar pattern and camera 1 and 2, and are depths of th 3D points observed by cameras 1 and 2, respectively. is the covariance matrix of the th measured image point. From this equation, we can find that the covariance of 2D homography is also affected by focal length and orientation and depth of the planar pattern.
Notice that is usually far less than depths and ; we could approximate by . Since image resolution and focal length are fixed for two cameras, in order to reduce the uncertainties of the estimated homography, one possible direction is to increase the pan and tilt angles of the planar pattern so that is very close to 0. However, as mentioned in [8], the best orientation is around 45 degrees, which means that this ratio cannot be very small. This direction is also not feasible in practice due to the limited depth of field. When the region of the planar pattern is outside the depth of field, the sharpness of the region decreases and image noise modeled by increases. Moreover, as the planar pattern could be considered as being uniformly distributed within the field of view of a camera, expectations of both and are close to the depth shown in Figure 1. Therefore, for simplicity, it is reasonable to approximate as . As a result, (8) could be simplified to , and the relation between covariance matrices of intrinsic matrices of two cameras is given by Therefore, uncertainties of intrinsic parameters increase when focal length increases. Since extrinsic parameters for each image can be determined by intrinsic parameters and the corresponding homography, it is easy to find out that the uncertainties of extrinsic parameters also depend on focal length.
One might think that it is possible to reduce the uncertainties by choosing the affine camera model. The intrinsic matrix in the affine camera model could contain less parameters (i.e., does not have a principal point), and one way to avoid over-fitting problem is to choose a simpler model. However, the scale factors still exist in the intrinsic matrix of the affine camera model. The similar derivations shown in this section can be easily extended to the affine camera model. Therefore, it can be shown that estimation uncertainties using the affine camera model also increase when focal length increases.
4. Calibration Using Regularized Least Squares
When a long focal length is used, the matrix that is used to estimate (shown in (2)) is ill conditioned. As a result, large perturbations of the intrinsic parameters can have only small changes in the error sum of squares. Since it is often difficult to obtain other data points outside the scope of the sensor that has a limited physical dimension, in this section, we apply a simple and effective prior of the image of absolute conic to reduce the uncertainties.
First, focal length is set as the one provided by the camera. Although this value is different from the focal length in the pin-hole camera model, they are usually in the same order. The number of pixels and can be computed by using sensor size and image resolution. Skew factor is close to 0. The principal point is located around the middle of an image. This location is a close approximation according to [13], which shows that the principal point varies around the image center with some nonlinear patterns when zoom and focus factors vary. Thus, the prior knowledge of the intrinsic parameters can be denoted as .
One possible solution is to apply this prior directly for the estimation of . However, it could require a nonlinear optimization due to the Cholesky decomposition. In order to obtain a close form solution, we transform it to the prior of the image of absolute conic based on . Hence, the prior used in our algorithm is defined by
Notice that we normalize such that the first entry of is 1. This is different from the original constraint in (2). The reason is that some entries are very close to 0 when long focal length is used and it could be numerically unstable for solving the intrinsic parameters . For example, for a 300 mm lens and around image resolution, some entries of are in the order of and some entries during the Cholesky factorization could be in the order of when is applied.
The original homogeneous system in (2) is then converted to an inhomogeneous system by applying the prior from (10): for an appropriate value of , where ( is the first column of ) and and are 2–6 elements of and , respectively. The estimate can be obtained by solving corresponding unconstrained regularized least squares problem for some positive constant . The expectation of can be computed by Thus the estimator from (12) is biased after introducing the prior . The second term of this equation is the bias. As increases the bias increases, and expectation of converges to eventually. In order to evaluate the covariance of , let us define the function: Based on the implicit function theorem in [14, 15], the Jacobian can be approximated by where and can be computed by where is vector of the th row of . The covariance of is given by Since and are independent from and only depends on the , we can see that the covariance decreases as increases. The larger the , the closer the is to . If we consider the mean squares error, it is possible to select an optimal value of at which the mean squared error from testing set is minimized. In practice, we could divide 2D points on a planar pattern into training and testing sets and apply the cross-validation to choose the optimal .
5. Experiments
We tested our proposed algorithm on simulated data and real data over a large range of settings of focal lengths and image noise.
5.1. Simulations
In our simulations, image resolution is set to . Sensor size is mm and focal lengths are 50 mm, 100 mm, 200 mm, 300 mm, 400 mm, and 500 mm. Skew factor is set to 0.009. The principal points are set to the image center. Table 1 gives focal lengths and corresponding scale factors used in the experiments. Gaussian noise with and are added to the 2D observations. Since depth of field is limited when a long focal length is used, the observed points could be easily blurred when pan and tilt angles are large. Thus, we use a large standard deviation () of image noise to further test the robustness of our algorithm. The planar pattern is generated randomly with different pan/tilt angles and at different depths from the camera. Angles are uniformly distributed between and degrees. Foreshortening effects are not considered in the simulations. Depths are also uniformly distributed within 6 meters. As the calibration technique in [8] is widely used in the area of computer vision, we implemented this algorithm as a baseline in order to compare calibration performance between existing algorithms and our algorithm. In our simulations, we add 5% offsets to the priors of both the focal length and the principal point. We conducted 20 trials for each configuration.
Figure 2 shows a comparison of uncertainties between the close-form solutions and the solutions from MLE when mm is used. Figure 2(c) shows that RMS reprojection errors are reduced by minimizing the geometric errors. However, as the cost function is not a convex function and initial guess from the close-form solution could be far away from the global minimum, the uncertainties of intrinsic parameters cannot be reduced by the nonlinear refinement as shown in Figures 2(a) and 2(b). The results are similar to other settings in Table 1. This experiment shows that the MLE can reduce RMS errors for training data points. However, it cannot reduce the uncertainties of camera parameters.

(a)

(b)

(c)
Figures 3 and 4 show the relation between focal lengths and the uncertainties of intrinsic parameters. It shows that uncertainties increase with the increase of focal length. The absolute errors of the principal point could be very large. It indicates that the estimated principal point could be very far away from the image plane for long focal lengths. Figures 3(c), 3(d), 4(c), and 4(d) show the results by using our algorithm. the uncertainties of both the focal length and the principal point are reduced to few percents. The estimated values by using our algorithm converge to the bias (i.e., 5%), which is consistent with (13). This further means that our algorithm should be mainly used for the camera with a long focal length (e.g., mm as shown in Figures 3 and 4). When focal length is short, we need to choose either algorithm [8] or a very small .

(a)

(b)

(c)

(d)

(a)

(b)

(c)

(d)
5.2. Real Data
We also test our algorithm for the real data. The camera to be calibrated is a Canon EOS 450D. The sensor size is mm. Image resolution used in the experiments is . We use a 300 mm telephoto lens in the experiments. The prior of the skew factor is set to 0. The prior of the principal point is set to the image center. The and are computed based on the sensor size, image resolution, and focal length. Table 2 shows the priors for the intrinsic parameters.
18 images of a planar pattern with different orientations and at different depths are captured within 6 meters. 9 images of them are randomly selected every time for calibration and the same calibration procedure is repeated 20 times. Figure 5 show the calibration results using Zhang’s algorithm. We can see that the uncertainties of the intrinsic parameters are very large when a 300 mm telephoto lens is used. The estimated principal point could be far away from the image plane, which in practice is not reasonable.

(a)

(b)
Figure 6 shows our calibration results with different . Similar to the results of simulations, the uncertainties are reduced greatly and estimated intrinsic parameters converge to the priors when increases.

(a)

(b)

(c)

(d)
6. Conclusion
As a camera with a telephoto lens could be used in various vision based systems, it is necessary to calibrate the camera accurately. Many existing algorithms that are designed for cameras with relatively short focal lengths could cause large uncertainties of estimated parameters even that the RMS reprojection errors of training data are small after a nonlinear optimization. In this paper, we first give a detailed error analysis that shows the relation between uncertainties and focal length. Then we propose a robust calibration algorithm based on the regularized least squares to reduce the uncertainties. Looking into future, we will apply our approach to the camera network that contains the camera with a telephoto lens in the area of remote surveillance and scene reconstruction.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgment
This work is supported in part by US National Science Foundation Award HRD 0833184.