Geometric Calibration Of A Stereo Camera

Introduction

Modern technology has made cameras smarter and smarter over the past several decades, as they offer better and better image quality as well as photo capturing experiences. Stereo cameras have gained much attention because they provide the visual experience closest to that from human eyes. In this project, we take a closer look at the geometric calibration steps of stereo cameras, evaluate the results obtained from both simulations and real camera experiment, and discuss the features and tradeoffs in the calibration process.

Background

JEDEYE

Stereo cameras use two sets of lenses and imaging sensors to capture a pair of two images each time, emulating the binocular visual system of a human being. These images contain 3D depth information as well as the color contend found in regular camera pictures. Such cameras have only been used in the filming industry and highly advanced research fields so far, very few products are available to replace a regular phone camera or more advanced DSLR’s. JEDEYE stereo camera from Fengyun Vision is a new product to solve this problem integrating advanced electronics and stereo cameras. However, a stereo camera needs to be geometrically calibrated first in order to use in an everyday scenario.

Geometric calibration

Geometric camera calibration is the process of estimating the extrinsic and intrinsic parameters of the lens and imaging sensor of a image recording device. These parameters are crucial in correcting lens distortion, depth estimation, 3D scene reconstruction, as well as object measurements. The photo capturing process can be modeled as a transform from 3D world coordinate system to the 2D image coordinates [1]:

$W{\begin{bmatrix}x&y&1\end{bmatrix}}={\begin{bmatrix}X&Y&Z&1\end{bmatrix}}P$

Where the real world coordinates ${\begin{bmatrix}X&Y&Z&1\end{bmatrix}}$ is projected onto the image coordinates ${\begin{bmatrix}x&y&1\end{bmatrix}}$ . W is a scaling factor for the image, and P is the camera parameter matrix:

$P={\begin{bmatrix}R&|&t\end{bmatrix}}K$

[R | t] is the extrinsic matrix, it describes the 3D spatial relationship between the filming object and the camera. It is a 3x3 rotation matrix (R) concatenating with a 3x1 translation vector (t), making a 3x4 matrix. This operation is a translation from Euclidean coordinate to homogeneous coordinate, which is a standard procedure in computer graphics. This matrix represents the location of the camera in the 3D scene, as well as the direction it is looking at. It provides rigid transformation from the 3D real world coordinates to the 3D camera’s coordinates. The Intrinsic parameters K characterizes the geometric parameters of the camera including focal length, optical center, and skew coefficient. This matrix then projects the 3D camera coordinates into the image 2D coordinates. The complete image capturing process has two transformation steps, as illustrated in Fig. 1.

Fig.1 Illustration of extrinsic and intrinsic transformations of a camera

Part I Calibrating the JEDEYE stereo camera

Calibration steps

A standard 7x9 checkerboard pattern is printed on a letter size paper as the calibration pattern. This pattern is widely used because it has great contrast and many rigid corners that’s easy for the computer to recognize. This checkerboard is taped at the back wall of a light box, to avoid the movement of the calibration pattern. Cameras are then placed at locations with different distance and angle to the checkerboard to take multiple pairs of images. Alternatively, the same images can be acquired by fixing the camera and moving the calibration patterns. However, moving this checkerboard pattern printed on a paper would cause unwanted variations between different different set ups, such as accidental warping of the paper. All images are taken with D65 lighting conditions. Calibration techniques for both single camera and stereo cameras are readily available and incorporated into the matlab toolboxes [2]. Both tools follow the same algorithms to calibrate a camera. In the stereo camera calibrator, it also calculates the geometric parameters between the two cameras of this stereo pair, namely the rotation and translation of the second camera with respect to the first one. Both tools are used in this project and the results are compared here. To get parameters for both cameras, the single camera calibrator is used twice. Interestingly, the reprojection error obtained from the stereo camera calibrator is 0.69 pixels per corner, while the reprojection error from the single camera calibrator is 0.4 pixels per corner, even though the same images are used for the two cases. This is not too surprise after examining the reprojection error distribution. In the stereo camera calibrator, a image pair has to be loaded at the same time, so it is possible that for one pair of image, the left one has very low reprojection error but the right one has higher reprojection error. Therefore it is harder to choose a set of images that work the best for both cameras. Whereas in the single camera calibrator, it is possible to individually choose the images that work best for each camera, resulting in better calibration results. In order to get the best calibration results, the extrinsics and intrincs of each camera is obtained from the single camera calibrator, while the stereo parameters are obtained from the the stereo camera calibrator. Fig. 2 shows the 3D reconstruction of the calibration set up, it matches perfectly with the actual setup.

Fig.2 3D visualization of the extrinsics obtained from JEDEYE calibration

Results and Evaluation

Rectified images are used to evaluate the calibration results in addition to the reprojection error. To see the undistorting effect, another scene is created in the light box and images are taken using JEDEYE with the same lighting conditions. The original and rectified images are shown next to each other in Fig. 3. It is observed that the undistortion works very well by comparing the two image. The distortion is most severe at the line where the light box back panel intersects with the bottom panel. In the rectified image this distortion is removed.

Fig.3 Comparison of the original (left) and rectified (right) image.

The biggest feature of the stereo camera is that it records the depth information. A red-cyan anaglyph (Fig. 4) is generated by rectifying and superimposing the stereo image pairs. When viewed with a pair of color coded anaglyph glasses, this image will give a perception of a 3D scene.

Fig.4 Red-cyan anaglyph of the stereo images

The depth information is encoded in the disparity between two corresponding object in the left and right image of a stereo pair. Closer object have higher disparity and object further away yields lower disparity. This can be verified in Fig. 5, where objects are labeled with the difference between the left and right eye image.

Fig.5 Anaglyph with disparity labeled on each object

A coarse disparity map (Fig. 6) is generated based on this stereo image pair. This disparity map verifies with reality that the mini cat statue, being closest to the camera, has the highest disparity; and the apple has the lowest.It is also observed that there are many holes and defective spots in the disparity map. This is because the surface texture of the objects are generally too smooth and therefore lacking uniqueness for the computer to correspond them in two images. The simplicity of the disparity algorithm puts another limit on the result. While a better algorithm for generating disparity map can be investigated and optimized, it is beyond the scope of this project. This disparity map is simply showing that it is very promising to use the calibrated stereo images to reconstruct the 3D scenes.

Discussion

In this part we have described the process and results of calibrating the JEDEYE stereo camera. The calibration results are evaluated using both the reprojection errors and visual inspections. Reprojection errors less than one pixels are generally acceptable calibration results [2]. During the process, it is observed that target quality is the deciding factor of the calibration. A flat, high resolution, high contrast checkerboard pattern generates accurate camera parameters. When capturing images, varying the orientation of the camera and target is more useful than varying their distance. This is understandable because images at different angle rely on different projection plane, which then helps calculate the homography and the parameters.

Part II Simulation

Goal

1. Simulate stereo camera images in any given 3D scene.

2. Calculate the ground truth extrinsic and intrinsic parameters of a given stereo camera.

3. Compare the MATLAB generated extrinsic and intrinsic parameters and the ground truth.

4. Analyze and evaluate the calibration accuracy.

Methods

1. Simulation environment setup:

In our implementation, we used two toolboxes (Pbrt and ISET)to simulate the stereo images. Pbrt stands for Physically based Rendering Toolbox. It is an open source renderer using ray tracing where people can generate a 3D scene, specify lighting condition, camera location and lens type and film resolution. There are 3 versions of pbrt available. In our case, we used pbrt-v2 (version2) [3] which does not have very complicated features as pbrt-v3 but can satisfy all our needs to save computational cost. To reduce the rendering time, we created a very simple scene. A checkerboard of size 50cm x 50cm composed with 8x8 black and white squares. Fig. 7. shows the rendered scene by a pinhole camera with resolution 128x128 at origin. The square size is 62.5mm x 62.5mm. The checkerboard is located at (0, 1000, 0) (unit: mm) with respect to the default origin in pbrt. The resolution of the film is 256 x 256 and the number of rays per pixel is 128. In this setting, the average time for rendering a stereo image pair is around 1-2min.

ISET stands for Image Systems Evaluation Toolbox. It is a toolbox to conduct image processing pipeline and evaluate image qualities given specific camera settings. In ISET, we are able to set the sensor parameters, such as field of view, sensor size, exposure time and etc. After setting the sensor parameters, we processed the raw images using Bayer’s pattern bilinear demosaicing.

To connect Pbrt and ISET, we used and modified the Pbrt2ISET created in Prof. Brian Wandell’s lab (Courtesy to Prof. Wandell and Trisha Lian). This Pbrt2ISET helps us to input a pbrt scene, moves around the camera and render the scenes from a that particular location with a realistic camera. This Pbrt2ISET facilitated the process of generating many pairs of stereo images from various locations. After passing a pbrt file to Pbrt2ISET, an optical images are generated and we finished the sensor setting and image processing within ISET. We used a double Gaussian lens of focal length 12.5 mm and an FOV of 22 deg. We set the film size to be 0.72 mm x 0.72 mm. The output our script generates a stereo image pair in png format. The main script we created for our implementation is named “checker_board_stereo.m” and supplementary functions are “recipe2oi.m” and “recipe2oi.m”. Fig. 8. shows a stereo image pair we generated running the script “checker_board_stereo.m”.

Fig.8 stereo image pair of the rendered checkerboard

2. Calculation of ground truth extrinsic and intrinsic parameters

Since in our simulation, we specify our stereo camera locations and lens type, we are completely aware of the geometry of the scene. Hence, we are able to calculate the ground truth values of extrinsic and intrinsic parameters. Following further elaborates on how the parameters are calculated.

Fig.9 Illustration of extrinsic and intrinsic parameters

Extrinsic parameter calculation:

To recall, extrinsic parameters describe the mapping from the 3D world coordinates to the 3D camera coordinates. It is composed of the rotation matrix and the translation vector. In our setting, we know the location of the object and the camera in world coordinates and the orientation of the camera in terms of the ‘up’ vector. Using these, the Extrinsic matrix is computed as follows;

Let $C\in \mathbb {R} ^{3\times 1}$ be the camera center, $p\in \mathbb {R} ^{3\times 1}$ – object location, $u\in \mathbb {R} ^{3\times 1}$ – camera ‘up’ vector.
Compute the following;
- $L={\frac {p-C}{||p-C||}},s={\frac {L\times u}{||L\times u||}},u^{'}=s\times L$
- The rotation matrix is $R={\begin{bmatrix}-&s^{T}&-\\-&u^{'^{T}}&-\\-&-L^{T}&-\end{bmatrix}}\in \mathbb {R} ^{3\times 3}$
- The translation vector is obtained as $t=-RC\in \mathbb {R} ^{3x1}$
- The extrinsic matrix is the block matrix given by ${\begin{bmatrix}R&t\\0&1\end{bmatrix}}\in \mathbb {R} ^{4\times 4}$

Intrinsic parameter calculation: As mentioned earlier, the intrinsic matrix maps the 3D camera coordinates to the 2D homogenous image coordinates. To compute this, we require information about the focal length of the lens, physical size of the sensor and resolution. Additionally, it also includes the skew parameter, which we can assume to be zero in our case as we consider that the sensor is perfectly parallel to the lens. General representation is ;

$K={\begin{bmatrix}f_{x}&0&0\\s&f_{y}&0\\x_{0}&y_{0}&1\end{bmatrix}}\in \mathbb {R} ^{3\times 3}$

Here, $f_{x}$ and $f_{y}$ are the focal length in pixel units along x and y directions. i.e., $f_{x}=F_{x}{\frac {w}{W}}$ and $f_{y}=F_{y}{\frac {h}{H}}$ , where $F_{x}$ and $F_{y}$ are respective focal lengths in physical units, w and h are the width and height of the sensor and W, H indicate the resolution.

$x_{0},y_{0}$ is the principal point offset which indicates the coordinates of the center of the lens with respect to the sensor plane when the origin is taken to be at the left bottom.

By moving the stereo camera in the scene, we generated several pairs of images. For each setting, we calculated the extrinsic parameters and the intrinsic parameters (invariant to change in position). We used the MATLAB stereo camera calibration toolbox to obtain these parameters from the generated pairs of images. This app accepted 8 pairs of images and calibrated the camera with a small average reprojection error of 0.64 pixels. We compared these parameters with the ground truth and analyzed the same. The following figures shows an illustration of the reprojection error and the reconstructed view of the scene.

Results

A sample pair of images after calibration using MATLAB is shown below.

Fig.10 Left and right images of checkerboard with detected corners and reprojected corners

Following are the extrinsic parameters and the corresponding ground truth for a particular camera setting, namely when the camera center is located at (0, -650, 0).

From MATLAB:

Failed to parse (unknown function "\begin{bmatrix}"): {\displaystyle R = \begin{bmatrix} -0.0007 & 0.9998 & 0.0223 \\ −0.9984 & 0.0006 & −0.0572 \\ −0.0572 & −0.0223 & −0.9981 \end{bmatrix} t = \begin{bmatrix} 98.0535 \\ −84.2576 \\ 3.3𝑒+03 \end{bmatrix}}

Ground truth:

Failed to parse (unknown function "\begin{bmatrix}"): {\displaystyle R = \begin{bmatrix} 0.9988 & 0.0499 & 0 \\ 0 & 0 & 1 \\ 0.0499 & −0.9988 & 0 \end{bmatrix} t = \begin{bmatrix} 0 \\ 0 \\ −650.812 \end{bmatrix}}

As we can see, the rotation matrices don't match with each other. The columns indicate the orientation of the camera with respect to the world coordinate axes. We can see that when the order of the columns of the obtained R matrix is changed, it would be similar to the ground truth. This indicates that MATLAB considers a different set of axes, as the checkerboard object is invariant to rotation. However, the translation vector obtained from calibration also doesn't match with that of the ground truth. This could also be accounted by the different set of axes.

Similarly the intrinsic parameters from calibration and ground truth are as follows;

From MATLAB:

Failed to parse (unknown function "\begin{bmatrix}"): {\displaystyle K = \begin{bmatrix} 3.68𝑒+03 & 0 & 0 \\ 0 & 3.69𝑒+03 & 0 \\ 125.0785 & 139.2502 & 1 \end{bmatrix} }

Ground truth:

Failed to parse (unknown function "\begin{bmatrix}"): {\displaystyle K = \begin{bmatrix} 4.44𝑒+03 & 0 & 0 \\ 0 & 4.44𝑒+03 & 0 \\ 128 & 128 & 1 \end{bmatrix} }

The intrinsic matrices are similar. The principal point offset obtained from MATLAB is (125.0785, 139.2502) which is very close to the actual value if (128, 128). The obtained focal length when converted to physical units is 10.5 mm as opposed to the actual 12.5 mm.

Conclusions

We successfully created an environment to simulate stereo camera images. We used this environment to generate checkerboard images which can be used for geometric calibration.

On top this environment, we created two scripts to calculate the ground truth extrinsic and intrinsic parameters by a given geometry setting and a camera lens profile.

We compared the ground truth parameter and matlab generated calibration parameters. We found that for our particular scene, the error on intrinsic is relatively small while the error on extrinsic parameter is really high. We concluded with several key reasons.

Our scene setting is too simple. The background is just a simple uniform white. As our checkerboard is also in square shape. This makes camera’s view in x axis and z axis exactly the same. This causes calibration fails in detecting the exact rotation matrix. To solve this problem. We need to create asymmetrical scenes. For example, different color background at each direction or with different textures would help solving this problem.

As we found out, the translation matrix which specifies the location of camera in the world coordinate is not accurate either. We consider the reason for this as our checkerboard size is too big (50cmx50cm) and we have few cells in checkerboards. To get a full view of the checkerboards, we need to move our camera very far on the y axis (78cm). However, the distance between our left and right camera is fixed at 5cm resulting the disparity between two cameras too small. We rendered 14 pairs of stereo images but Matlab only accepted 8 of them because many of them are too similar to each other. To improve this , we suggest making a much smaller checkerboard (20x20cm) with more checks (10x10) and also rendering more images (30 pairs).

Another potential reason is that the lens we chose is double gaussian lens which should have very small distortion. Since distortion can give information about scene-camera geometry, we had only a very weak indicator here.

Appendix I

The Matlab script for JEDEYE calibration and rectification: https://drive.google.com/open?id=1T0dQ4HWxzLDTTUTDbHyhRNvG8_MAtTFF (The uploaded file is not complete, and updating with new versions have failed.)

Our code is available at github (https://github.com/VitaminAJ/Simulation-of-stereo-camera-calibration-parameters.git). To run the code, you need to install docker container, iset (add path of iset) and pbrt-v2.

Appendix II

Work breakdown The JEDEYE calibration was lead by Bryce, and Anqi and Varsha also contributed in the initial calibration steps and image capturing. For the simulation of stereo camera parameters, Anqi created the simulation environment. Varsha created the scripts for calculation of intrinsic and extrinsic parameters. Anqi and Varsha tuned the parameters of the settings and calibrated the images together.

Geometric Calibration Of A Stereo Camera

Contents

Introduction

Background

JEDEYE

Geometric calibration

Part I Calibrating the JEDEYE stereo camera

Calibration steps

Results and Evaluation

Discussion

Part II Simulation

Goal

Methods

Conclusions

Appendix I

Appendix II

Navigation menu

Geometric Calibration Of A Stereo Camera

Introduction

Background

JEDEYE

Geometric calibration

Part I Calibrating the JEDEYE stereo camera

Calibration steps

Results and Evaluation

Discussion

Part II Simulation

Goal

Methods

Conclusions

Appendix I

Appendix II

Navigation menu

Search