Geometric Calibration Of A Stereo Camera
Geometric Calibration Of A Stereo Camera
Introduction
Modern technology has made cameras smarter and smarter over the past several decades, as they offer better and better image quality as well as photo capturing experiences. Stereo cameras have gained much attention because they provide the visual experience closest to that from human eyes. In this project, we take a closer look at the geometric calibration steps of stereo cameras, evaluate the results obtained from both simulations and real camera experiment, and discuss the features and tradeoffs in the calibration process.
Background
JEDEYE
Stereo cameras use two sets of lenses and imaging sensors to capture a pair of two images each time, emulating the binocular visual system of a human being. These images contain 3D depth information as well as the color contend found in regular camera pictures. Such cameras have only been used in the filming industry and highly advanced research fields so far, very few products are available to replace a regular phone camera or more advanced DSLR’s. JEDEYE stereo camera from Fengyun Vision is a new product to solve this problem integrating advanced electronics and stereo cameras. However, a stereo camera needs to be geometrically calibrated first in order to use in an everyday scenario.
Geometric calibration
Geometric camera calibration is the process of estimating the extrinsic and intrinsic parameters of the lens and imaging sensor of a image recording device. These parameters are crucial in correcting lens distortion, depth estimation, 3D scene reconstruction, as well as object measurements. The photo capturing process can be modeled as a transform from 3D world coordinate system to the 2D image coordinates [1]:
W[x y 1]=[X Y Z 1]P
Where the real world coordinates [X Y Z 1] is projected onto the image coordinates [x y 1]. W is a scaling factor for the image, and P is the camera parameter matrix:
P=[R | t] K
[R | t] is the extrinsic matrix, it describes the 3D spatial relationship between the filming object and the camera. It is a 3x3 rotation matrix (R) concatenating with a 3x1 translation vector (t), making a 3x4 matrix. This matrix represents the location of the camera in the 3D scene, as well as the direction it is looking at. It provides rigid transformation from the 3D real world coordinates to the 3D camera’s coordinates. The Intrinsic parameters K characterizes the geometric parameters of the camera including focal length, optical center, and skew coefficient. This matrix then projects the 3D camera coordinates into the image 2D coordinates. The complete image capturing process has two transformation steps, as illustrated in fig. 1.
Part I Calibrating the JEDEYE stereo camera
Calibration steps
A standard 7x9 checkerboard pattern is printed on a letter size paper as the calibration pattern. This pattern is widely used because it has great contrast and many rigid corners that’s easy for the computer to recognize. This checkerboard is taped at the back wall of a light box, to avoid the movement of the calibration pattern. Cameras are then placed at locations with different distance and angle to the checkerboard to take multiple pairs of images. Alternatively, the same images can be acquired by fixing the camera and moving the calibration patterns. However, moving this checkerboard pattern printed on a paper would cause unwanted variations between different different set ups, such as accidental warping of the paper. All images are taken with D65 lighting conditions. Calibration techniques for both single camera and stereo cameras are readily available and incorporated into the matlab toolboxes [2]. Both tools follow the same algorithms to calibrate a camera. In the stereo camera calibrator, it also calculates the geometric parameters between the two cameras of this stereo pair, namely the rotation and translation of the second camera with respect to the first one. Both tools are used in this project and the results are compared here. To get parameters for both cameras, the single camera calibrator is used twice. Interestingly, the reprojection error obtained from the stereo camera calibrator is 0.69 pixels per corner, while the reprojection error from the single camera calibrator is 0.4 pixels per corner, even though the same images are used for the two cases. This is not too surprise after examining the reprojection error distribution. In the stereo camera calibrator, a image pair has to be loaded at the same time, so it is possible that for one pair of image, the left one has very low reprojection error but the right one has higher reprojection error. Therefore it is harder to choose a set of images that work the best for both cameras. Whereas in the single camera calibrator, it is possible to individually choose the images that work best for each camera, resulting in better calibration results. In order to get the best calibration results, the extrinsics and intrincs of each camera is obtained from the single camera calibrator, while the stereo parameters are obtained from the the stereo camera calibrator. Fig. 3 shows the 3D reconstruction of the calibration set up, it matches perfectly with the actual setup.
Results and Evaluation
Rectified images are used to evaluate the calibration results in addition to the reprojection error. To see the undistorting effect, another scene is created in the light box. The original and rectified images are shown next to each other in fig. 4. It is observed that the undistortion works very well by comparing the two image. The distortion is most severe at the line where the light box back panel intersects with the bottom panel. In the rectified image this distortion is removed. The biggest feature of the stereo camera is that it records the depth information. A red-cyan anaglyph (fig. 5) is generated by rectifying and superimposing the stereo image pairs. When viewed with a pair of color coded anaglyph glasses, this image will give a perception of a 3D scene. The depth information is encoded in the disparity between two corresponding object in the left and right image of a stereo pair. Closer object have higher disparity and object further away yields lower disparity. This can be verified in fig. 6, where objects are labeled with the difference between the left and right eye image. A coarse disparity map (fig. 7) is generated based on this stereo image pair. This disparity map verifies with reality that the mini cat statue, being closest to the camera, has the highest disparity; and the apple has the lowest.It is also observed that there are many holes and defective spots in the disparity map. This is because the surface texture of the objects are generally too smooth and therefore lacking uniqueness for the computer to correspond them in two images. The simplicity of the disparity algorithm puts another limit on the result. While a better algorithm for generating disparity map can be investigated and optimized, it is beyond the scope of this project. This disparity map is simply showing that it is very promising to use the calibrated stereo images to reconstruct the 3D scenes.
Discussion
In this part we have described the process and results of calibrating the JEDEYE stereo camera. The calibration results are evaluated using both the reprojection errors and visual inspections. Reprojection errors less than one pixels are generally acceptable calibration results [2]. During the process, it is observed that target quality is the deciding factor of the calibration. A flat, high resolution, high contrast checkerboard pattern generates accurate camera parameters. When capturing images, varying the orientation of the camera and target is more useful than varying their distance. This is understandable because images at different angle rely on different projection plane, which then helps calculate the homography and the parameters.
Part II Simulation
Goal
1. Simulate stereo camera images in any given 3D scene.
2. Calculate the ground truth extrinsic and intrinsic parameters of a given stereo camera.
3. Compare the MATLAB generated extrinsic and intrinsic parameters and the ground truth.
4. Analyze and evaluate the calibration accuracy.
Methods
1. Simulation environment setup:
In our implementation, we used two toolboxes (Pbrt and ISET)to simulate the stereo images. Pbrt stands for Physically based Rendering Toolbox. It is an open source renderer using ray tracing where people can generate a 3D scene, specify lighting condition, camera location and lens type and film resolution. There are 3 versions of pbrt available. In our case, we used pbrt-v2 (version2) [3] which does not have very complicated features as pbrt-v3 but can satisfy all our needs to save computational cost. To reduce the rendering time, we created a very simple scene. A checkerboard of size 50cm x 50cm composed with 8x8 black and white squares. Fig1. shows the rendered scene by a pinhole camera with resolution 128x128 at origin. The square size is 62.5mm x 62.5mm. The checkerboard is located at (0, -1000, 0) (unit: mm) with respect to the default origin in pbrt. The resolution of the film is 256 x 256 and the number of rays per pixel is 128. In this setting, the average time for rendering a stereo image pair is around 1-2min.
Appendix
You can write math equations as follows:
You can include images as follows (you will need to upload the image first using the toolbox on the left bar.):
