An RGB-D-Based Cross-Field of View Pose Estimation System for a Free Flight Target in a Wind Tunnel

Liu, Sheng; Feng, Yuan; Shen, Kang; Wang, Yangqing; Chen, Shengyong

doi:https://doi.org/10.1155/2018/7358491

Complexity

On this page

Abstract Introduction Conclusions Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Control Design for Systems Operating in Complex Environments

View this Special Issue

Research Article | Open Access

Volume 2018 | Article ID 7358491 | https://doi.org/10.1155/2018/7358491

An RGB-D-Based Cross-Field of View Pose Estimation System for a Free Flight Target in a Wind Tunnel

Sheng Liu,¹Yuan Feng,¹Kang Shen,¹Yangqing Wang,¹and Shengyong Chen¹

Guest Editor: Andy Annamalai

Received17 Aug 2018

Accepted28 Oct 2018

Published02 Dec 2018

Abstract

Estimating the real-time pose of a free flight aircraft in a complex wind tunnel environment is extremely difficult. Due to the high dynamic testing environment, complicated illumination condition, and the unpredictable motion of target, most general pose estimating methods will fail. In this paper, we introduce a cross-field of view (FOV) real-time pose estimation system, which provides high precision pose estimation of the free flight aircraft in the wind tunnel environment. Multiview live RGB-D streams are used in the system as input to ensure the measurement area can be fully covered. First, a multimodal initialization method is developed to measure the spatial relationship between the RGB-D camera and the aircraft. Based on all the input multimodal information, a so-called cross-FOV model is proposed to recognize the dominating sensor and accurately extract the foreground region in an automatic manner. Second, we develop an RGB-D-based pose estimation method for a single target, by which the 3D sparse points and the pose of the target can be simultaneously obtained in real time. Many experiments have been conducted, and an RGB-D image simulation based on 3D modeling is implemented to verify the effectiveness of our algorithm. Both the real scene’s and simulation scene’s experimental results demonstrate the effectiveness of our method.

1. Introduction

Aircraft attitude estimation plays a crucial role in aircraft control systems of the wind tunnel. During the flight of the aircraft, it is essential to adjust the flight parameter according to the real-time attitude of the aircraft [1]. And while verifying the flight performance of aircraft, it is also necessary to check the performance of the aircraft in different attitudes. Attitude estimation is an important part of this [2]. In computer vision task, aircraft attitude estimation can be regarded as an object pose estimation task. Vision system is the most widely used technique for measurements in the wind tunnel, which can provide crucial data that can be compared with computational fluid dynamics (CFD) predictions to assist validating design geometries.

However, it is hard to get satisfying precision measurement results in a low-speed wind tunnel when the target is flying freely. The main reason is that the wind tunnel is a high dynamic testing environment and there often exist a complex illumination condition and unpredictable motion of the target. These will heavily decrease the measurement accuracy of the estimation system.

The size of a low-speed wind tunnel is eight meters long, six meters wide, and six meters high. Thus, multiview live RGB-D streams are needed in the system as input to ensure the measurement area can be fully covered. Furthermore, a multimodal initialization method is developed to measure the spatial relationship between the RGB-D camera and the aircraft. Based on all the input multimodal information, our cross-FOV model is proposed to recognize the dominating sensor and accurately extract the foreground region in an automatic manner.

The object pose estimation task has been extensively studied [3]. Traditional methods of object attitude measurement are mainly divided into template matching and feature matching [4]. Template matching methods [5] are usually applied to weakly textured scenes. Such methods need to reconstruct 3D objects and then match real scenes with 3D models to find the best pose. The classic ICP algorithm [3] and RANSAC algorithm solve the current pose by minimizing the distance between corresponding points of the actual scene and the model [6]. Many people believe that, in computer vision applications [7–11], the contour of an object is the most reliable information, because feature-based recognition methods [12–15] are likely to fail when recognizing poses of weakly textured objects.

In this paper, to overcome those problems, we propose a cross-field pose estimation framework based on local features to estimate the pose of an aircraft in real time and deal with cross-field problems. By acquiring the relative positional relationship between the camera and the aircraft, we transform the relative pose of the camera into the relative pose of the aircraft. According to the experiment results, the pose estimation system with the cross-FOV model can get accurate measurement results in a wind tunnel. And we run our pose estimation system to measure the pose of an aircraft model. We will apply our system on the low-speed wind tunnel of China Aerodynamics Research and Development Centre (CARDC).

2. Overview of Our Method

We proposed a cross-FOV RGB-D pose estimation system that processes each new frame in real time. Also maintaining high precision pose estimation our system reconstructs the sparse point cloud for the object in the scene and can track the target motion continuously when the target moves across the different field of view. Figure 1 illustrates the frame-to-frame operation of our system and Figure 2 illustrates the structure diagram of our system in a real scene.

2.1. Pose Initialization

In this section, a relative attitude measurement module is utilized to obtain the relative attitude between the aircraft and initial camera. We set two tags at the X-axis of the aircraft shown in Figure 3. The center of the tags is the center of the aircraft. Once the tag’s location is detected, the center of the aircraft is localized. Then the system can transform it into relative attitude between the aircraft and initial camera.

2.1.1. Tag Recognition

This module is used to detect the position of tags in Figure 3. We use a tag detector to detect tags following the proposed method AprilTag [5]. The first step is adopting an adaptive thresholding approach to threshold the input grayscale image into a black-and-white image. The next step is segmenting the edges based on the characteristic of the black-and-white components from which they arise to find edges which might form the boundary of a tag. Finally, the method computes a proximate partition by searching for a small number of corner points and then iterates through all possible combinations of corner points to find all fitting quad. After this whole operation, a tag is localized in the image coordinate system and the center of two tags represents the center of the aircraft.

2.1.2. Aircraft Center Localization

We transform the coordinate (u, v) of aircraft center in image coordinate with its corresponding depth value d into a 3D coordinate so that relative attitude between the aircraft and initial camera is obtained:where is the focal length and is the principal point and all can be known from calibration. is the 3D coordinate of aircraft center.

2.2. Cross-FOV Model

In the wind tunnel, as shown in Figure 4, the cross-field of view measurement is needed. Thus, we designed a cross-FOV model. For camera in camera set at the time , we have the color frame and depth frame . To choose the best input camera , we need to backtrack frames to get the maximum frame score camera set . The frame score can be expressed bywhere is the depth score at that fit the depth constraint . and are, respectively, the frame width and height.

The best input camera is chosen from the camera set to find the maximum scorewhere is a proportion calculate function, which will calculate the amount of camera number in camera set .

2.3. Feature Extraction

The system utilizes a fast binary descriptor called ORB for the feature extracting task. This descriptor is rotation invariant and resistant to noise and illumination changes. Simultaneously, it is fast to extract and match which makes ORB suitable for real-time pose estimation work in a complex environment.

Our system handles RGB-D input. We extract ORB features on the RGB image for tracking and, for each feature with coordinates and its corresponding depth value d, we transform them into a world coordinate system according to (1).

2.4. Bundle Adjustment

After the initialization operation, our system performs bundle adjustment, to minimize the reprojection error between the 3D point and its corresponding 2D point to estimate the camera’s instantaneous pose relative to previous framewhere is the robust Huber cost function and the covariance matrix associated with the scale of the keypoint. The projection function is defined as follows:where is the focal length and is the principal point, both known from the calibration.

3. Experiments

In the evaluation stage, we carry out a quantitative evaluation on both synthetic and real sequences with ground truth data. Our synthetic experimental sequences which imitate a real experimental environment are specifically designed for this work. In the synthetic scene, we set up ambient light and multiple light sources to simulate the real complex lighting conditions. As for camera settings of synthetic data, we set it up with settings of an Asus Xtion camera with resolution of 640◊480 pixels and field of view in 58°H, 45°V, 70°D.

3.1. Synthetic Experiments

Appropriate synthetic sequences were specifically created for this work. In Figure 5, the left image is a synthetic color image, middle image is the corresponding depth image, and right image is the output of our system. Point cloud is sparsely reconstructed from the model, and the coordinate axis at the center of the model is the current pose of the target. We set three kinds of translate sequences.

For each synthetic scene, we compare the estimated and ground truth trajectories of the aircraft by computing the root-mean-square errors (RMSE). Results on synthetic sequences are shown in Table 1 and Figure 6. We also make a comparison with Co-fusion [3] in translate sequences. Co-fusion [3] only supports translation output; as Table 1 shows, rotation result of Co-fusion [3] is not given. Co-fusion [3] failed in sequences rotate1, rotate2, rotate3, and translate&rotate.

(a) translate1

(b) translate2

(c) translate3

As shown in Figures 6–8, the estimated trajectories of the proposed method fit well with ground truth in all scenes. In rotate sequences, the proposed method can also track the object stably. As we can see from Table 1, the proposed method performs better than Co-fusion [3]. In some scenes, Co-fusion [3] may fail to track the object which leads to a huge estimating error. But our method can achieve long-term and effective tracking so long as the first frame is provided to initialize. We carried out a quantitative evaluation on both synthetic and real sequences with ground truth data.

(a) rotate1

(b) rotate2

(c) rotate3

(a) Trajectory

(b) Pose

3.2. Experimental Verification with Hexapod and Real Scene

For real sequences, we set a series of experiments on a high precision Hexapod as Figure 9 shows; the accuracy of Hexapod can reach micron level (Table 2). For each axis, a corresponding experiment is set up. After aligning cameras with cloud terrace, we separately set the platform to move uniformly along each axis to test the accuracy of translation and rotation. Results on real sequences are shown in Figure 10.

Experiments on the high precision cloud terrace show that our proposed method also performs well in real sequences. In rotation experiments, the detection of yaw angle is the most accurate which illustrates that the proposed method can reach the highest accuracy without change of depth.

We have performed a series of qualitative experiments to demonstrate the capabilities of our method. The comparison to Co-fusion [3] indicates that our method achieves extremely high accuracy. As mentioned earlier in our paper, our method can adapt to a complex lighting environment and achieve high precision tracking and pose estimation. And particularly, it can satisfy the need of cross FOV, which means it can achieve reliable pose estimation in a wide range of environments as demonstrated in the experiments. The real scene experiment is shown in Figure 12, and the estimated trajectory of the sequence is shown in Figure 11; the trajectory is accurate and concise.

4. Conclusions

We introduced a cross-field of view (FOV) real-time pose estimation system which provides high precision pose estimation of the free flight aircraft in a wind tunnel environment. Multiview live RGB-D streams are used in the system as input to ensure the measurement area can be fully covered. First, a multimodal initialization method is developed to measure the spatial relationship between the RGB-D camera and the aircraft. Based on all the input multimodal information, a so-called cross-FOV model is proposed to recognize the dominating sensor and accurately extract the foreground region in an automatic manner. Second, we develop an RGB-D-based pose estimating method for a single target by which the 3D sparse points and the pose of the target can be simultaneously reconstructed in real time. Many experiments have been conducted and an RGB-D image simulation based on 3D modeling is implemented to verify the effectiveness of our algorithm. Both the real scene’s and simulation scene’s experimental results demonstrate the effectiveness of our method.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the Zhejiang Provincial Natural Science Foundation of China under Grant numbers LY15F020031 and LQ16F030007 and the National Natural Science Foundation of China (NSFC) under Grant numbers 11302195 and 61401397.

References

D. Dusha, W. Boles, and R. Walker, “Attitude estimation for a fixed-wing aircraft using horizon detection and optical flow,” in Proceedings of the Australian Pattern Recognition Society (APRS), pp. 485–492, Australia, December 2007.
View at: Google Scholar
P. Bauer and J. Bokor, “Multi-Mode Extended Kalman Filter for Aircraft Attitude Estimation,” IFAC Proceedings Volumes, vol. 44, no. 1, pp. 7244–7249, 2011.
View at: Publisher Site | Google Scholar
M. Runz and L. Agapito, “Co-fusion: Real-time segmentation, tracking and fusion of multiple objects,” in Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 4471–4478, Singapore, Singapore, May 2017.
View at: Publisher Site | Google Scholar
R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, “ORB-SLAM: a versatile and accurate monocular SLAM system,” IEEE Transactions on Robotics, vol. 31, no. 5, pp. 1147–1163, 2015.
View at: Publisher Site | Google Scholar
J. Wang and E. Olson, “AprilTag 2: Efficient and robust fiducial detection,” in Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2016, pp. 4193–4198, Republic of Korea, October 2016.
View at: Google Scholar
P. Rudol, M. Wzorek, and P. Doherty, “Vision-based pose estimation for autonomous indoor navigation of micro-scale unmanned aircraft systems,” in Proceedings of the 2010 IEEE International Conference on Robotics and Automation, ICRA 2010, pp. 1913–1920, USA, May 2010.
View at: Google Scholar
M. G. Breuers, F. A. Sadjadi, and N. de Reus, “Image-based aircraft pose estimation: a comparison of simulations and real-world data,” in Proceedings of the Aerospace/Defense Sensing, Simulation, and Controls, p. 472, Orlando, FL.
View at: Publisher Site | Google Scholar
S. Chen, Y. Li, and N. M. Kwok, “Active vision in robotic systems: a survey of recent developments,” International Journal of Robotics Research, vol. 30, no. 11, pp. 1343–1377, 2011.
View at: Publisher Site | Google Scholar
S. Chen, J. Zhang, Y. Li, and J. Zhang, “A hierarchical model incorporating segmented regions and pixel descriptors for video background subtraction,” IEEE Transactions on Industrial Informatics, vol. 8, no. 1, pp. 118–127, 2012.
View at: Publisher Site | Google Scholar
C. Shengyong, Active Sensor Planning for Multiview Vision Tasks, Springer Science & Business Media, 2008.
H. Liu, S. Chen, and N. Kubota, “Intelligent video systems and analytics: a survey,” IEEE Transactions on Industrial Informatics, vol. 9, no. 3, pp. 1222–1233, 2013.
View at: Publisher Site | Google Scholar
L. Liu, Y. Wang, F. Xie, and J. Gao, “Legendre Cooperative PSO Strategies for Trajectory Optimization,” Complexity, vol. 2018, Article ID 5036791, 13 pages, 2018.
View at: Publisher Site | Google Scholar
M. Wang, H. Ye, and Z. Chen, “Neural Learning Control of Flexible Joint Manipulator with Predefined Tracking Performance and Application to Baxter Robot,” Complexity, vol. 2017, Article ID 7683785, 14 pages, 2017.
View at: Publisher Site | Google Scholar | MathSciNet
M. Zollhöfer, M. Nießner, S. Izadi et al., “Real-time non-rigid reconstruction using an RGB-D camera,” ACM Transactions on Graphics, vol. 33, no. 4, 2014.
View at: Google Scholar
D. Tzionas and J. Gall, “Reconstructing Articulated Rigged Models from RGB-D Videos,” in Computer Vision – ECCV 2016 Workshops. ECCV 2016, G. Hua and H. Jégou, Eds., vol. 9915, Springer, Cham, 2016, Lecture Notes in Computer Science.
View at: Google Scholar

Copyright

Copyright © 2018 Sheng Liu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies