Spectral and Geometric Calibration of a Dual Camera Mobile Phone RoseRustowicz

Introduction

Fig. 6. An outline of processing steps for disparity estimation in stereo-images

Fig. 7. A visualization of disparity calculation and it’s relation to depth, Z. Disparity is defined as the distance between corresponding points in two images, shown also to be inversely proportional to depth.

Fig. 9. A qualitative comparison of measured versus predicted RGB values for a MacBeth Color Checker chart. Measured values were taken from a raw color camera image as the mean of the R, G, and B values over each patch. Predicted RGB values were calculated by a matrix multiplication of the spectrophotometer measurements of each color patch times the transpose of the spectral sensitivity curves for the R, G, and B color channels.

The human visual system uses information from each eye to reveal depth and color information in its surroundings. Technology can be used to recreate this ability in imaging systems, often seen in dual camera systems. Stereo-imaging uses data taken from slightly different perspectives to extract depth information. The addition of information from a second sensor enables other capabilities such as image fusion and noise reduction for improved image quality. Dual camera systems are of interest in consumer electronics, surveillance and security, computer vision, robotics and autonomous driving, and in applications in the military and medical community. Dual cameras have gained recent popularity in mobile devices as well, as phones such as Huawei’s Mate 9 Pro and Apple’s iPhone X have released their technology to the consumer market.

In order to take advantage of the added perspective in dual cameras to extract information about scene depth, the system must be calibrated. Calibration as a whole is important in many regards. For example, calibration is crucial for accurate modeling and simulation. Given an accurate camera model in ISET, a user can test any number of modeling scenarios. The accuracy of the simulation depends both on the simulation itself as well as on the accuracy of the camera model used. Geometric calibration specifically is useful for correcting image imperfections such as lens distortion. In this case, a camera's intrinsic parameters are found to correct for image warping caused by lens abnormalities (or in some case by lens design, such as in a fisheye lens). Geometric calibration for dual cameras provides added information about the scene such as depth and extraction of real world coordinates. Camera calibration is also important for those that wish to process a cameras raw image data. Known calibration parameters can be useful in manipulating raw images to obtain a higher image quality output. For example, if spectral sensitivities and intrinsic parameters are known, the raw image can be undistorted and sampled in a way that produces accurate geometry and color representation of a scene.

In this report, we explain and demonstrate geometric calibration and spectral sensitivity measurement for a Huawei Mate 9 Pro Dual Camera mobile phone. Geometric calibration yields information on the intrinsic and extrinsic parameters of the system, while spectral sensitivity measurement yields the relative response of each color channel in the system over a range of wavelengths.

Background

Geometric Calibration - Finding the Intrinsic and Extrinsic Camera Parameters The intrinsic and extrinsic parameters of a system are used to transform world coordinates of a scene into image coordinates on a sensor. The three-dimensional world coordinates, $[X_{W},Y_{W},Z_{W}]$ , are related to the three-dimensional camera coordinates, $[X_{c},Y_{c},Z_{c}]$ , through a rigid transformation with the extrinsic parameter matrix. The extrinsic parameter matrix is defined as $[R|t]$ , where $R$ is a 3x3 rotational matrix, and $t$ is a 3x1 translation vector. Thus, the extrinsic matrix is a 3x4 matrix of rotation and translation coefficients that relate three-dimensional world coordinates to three-dimensional image coordinates. These parameters relate the location and orientation of a camera with respect to objects in the world. When two cameras are used in stereo-calibration, extrinsic parameters also provide information on the translation and rotation of the two cameras in relation to one another. This gives information relating the two cameras to one another, as well as to their position in a world coordinate system. The three-dimensional camera coordinates, $[X_{c},Y_{c},Z_{c}]$ are related to the two-dimensional image coordinates through a projective mapping, defined by the instrinsic parameters of the system. The intrinsic parameter matrix is made up of the focal length in x and y, the principal point in x and y, and the skew.

A description of the math described is shown in Fig. 1, where in the first line, $x$ are the image coordinates, $P$ is the camera parameter matrix, and $X_{w}$ are the world coordinates. The next equality shows that $P$ , the camera parameter matrix, is equivalent to $A[R|t]$ , where $A$ is the intrinsic parameter matrix, and $[R|t]$ is the extrinsic parameter matrix, defined by a rotation matrix $R$ and a translation vector $t$ . In the line below, $\alpha _{x}$ and $\alpha _{y}$ represent the x and y components of the focal length, $x_{0}$ and $y_{0}$ represent the x and y components of the principal point, and $s$ is the skew, which represents the offset from parallel between the sensor and the lens.

Fig. 2 shows a pictorial description of the transformation of world coordinates into image coordinates using the intrinsic and extrinsic camera parameters.

Depth Estimation The first instrumental stereoscopic imaging device, the stereo-comparator, was introduced in a 1901 lecture by Dr. Carl Pulfrich, described by [4]. The device captured a scene on two plates of film for measuring coordinates between two images. These ideas were extended through the 18th and 19th centuries with analytic approaches in using parallax to measure image capture orientation and to estimate depth or topography of a scene. Scharstein and Szeliski [12] provide a summary and evaluation of stereo algorithms published between 1974-2002. Depth estimation with dual sensors often uses stereo-photogrammetry principles, in which the geometry of the two sensors and parallax in the scene are used to find common points and estimate depth. More recent approaches have adopted machine learning techniques to estimate depth, [8, 11, 13, 15], although we will not be exploring these approaches here.

Spectral Sensitivity Spectral Sensitivity curves give a relative measure of the response of each of the systems color channels over a range of wavelengths that they are sensitive to. The spectral sensitivity of a color channel will impact the way that the system interprets and reconstructs the color in a scene. As shown in Fig. 3, from Professor Wandell's lecture notes, manufacturers have different preferences in how they engineer the spectral sensitivities of their sensors.

Instrumentation

Huawei Mate 9 Pro Mobile Phone

The Huawei Mate 9 Pro mobile phone was used for all experiments. The phone has two cameras - one with an RGB Bayer array with pattern 'BGGR', and a spatial resolution of 3968 x 2976. The second sensor is a monochromatic grayscale sensor that measures intensities over a wider bandwidth in the visible spectrum. This panchromatic sensor has a higher spatial resolution of 5120 x 3840 pixels. The sensors have the same aspect ratio and image roughly the same size scene. Huawei has given us tools to get the raw sensor data from each of the cameras, which has allowed us to perform the calibration procedures.

Spectrophotometer

A SpectraScan PR-715 spectrophotometer was used throughout the spectral experiments. Measurements were taken during initial calibration to measure the spectral power distributions (SPDs) of the monochromatic signals from the monochromator, used to calculate the spectral sensitivity curves. The spectrophotometer was also used during spectral validation to measure the SPDs of each of the MacBeth Color Checker patches under constant illumination.

Monochromator

A Cornerstone 130 monochromator from Oriel Instrument was used during the spectral measurements to emit narrow wavelength bands onto a diffuse scattering plate, which were then measured by the spectrophotometer and mobile phone sensors.

Light Booth

A X-Rite SpectraLight III light booth was used during spectral validation to take an image under constant light. A T10 illuminant was used for this measurement.

Methods

Spectral Sensitivity Measurement

Spectral sensitivity gives a measure of how sensitive each color channel is across a range of wavelengths. We describe the method used to calculate the spectral sensitivity curves, and show validation of the result.

Measurement Setup and Calculation

We directly measure the spectral sensitivities of the mobile device using a monochromator and a spectrophotometer. A monochromator is used to emit a light beam of narrow wavelength bandwidth onto a diffuse scattering plate, while the spectrophotometer and mobile phone simultaneously measure this monochromatic signal. We measure this signal from 390 nm to 730 nm by 10 nm intervals to obtain a matrix, $S\in R^{\lambda \times M}$ of spectral power distributions from the spectrophotometer, where M is the number of individual measurements from 390 nm to 730 nm, and $\lambda$ are the number of wavelengths bins in the spectrophotometer measurement. We also obtain $A\in R^{3\times M}$ , a matrix of RGB intensities measured with the phone’s Bayer array sensor. In $S$ , each column corresponds to one spectrophotometer measurement at a monochromator setting, defined over the wavelength measurement range of the spectrophotometer. Each column of $A$ corresponds to the cameras response at a monochromator setting, with rows defined by the measurements in the R, G, and B channels. To calculate the spectral sensitivity curves, we first find a weight matrix, defined by $W={\frac {A}{S^{T}S}}$ . The spectral sensitivity curves, $C\in R^{3\times \lambda }$ , are defined as $C=WS^{T}$ . The setup for the measurement procedure is shown in Fig. 4, where the radiometer is circled in red, the monochromator in green, the phone in orange, and the diffuse scattering plate in blue. A visual diagram of the measurement process is shown in Fig. 5.

Geometric Calibration and Disparity Estimation

A system outline for disparity estimation is shown in Fig. 6. After capturing stereo-image pairs, we perform geometric calibration to obtain the intrinsic and extrinsic parameters of the system, described in more above. These parameters allow us to undistort and rectify any stereo-pair taken with the camera parameters set during calibration (e.g. the focal length must remain constant). From the rectified, co-planar images, we then estimate disparity.

Geometric Camera Calibration

Through geometric calibration, we find the intrinsic and extrinsic parameters of a stereo-pair of cameras. Intrinsic parameters describe internal properties of the camera lens, such as focal length, principal point, and lens distortion coefficients. Extrinsic parameters describe the cameras position and rotation with respect to the checkerboard that is captured in the calibration images in terms of a real world coordinate system. When two cameras are jointly calibrated, the extrinsic information from each of the cameras is used to relate the translation and rotation of the cameras to one another.

We utilize techniques by [16] and [9], which were implemented into the Matlab Computer Vision System Toolbox by [3]. [9] extend the camera calibration model of a pinhole camera to account for lens distortions of tangential and radial coefficients. This non-linear camera model is optimized using the Levenberg-Marquardt iterative algorithm. [16] further develop a radial lens distortion model that can be solved with only a few images of a planar checkerboard throughout a scene.

Undistort and Rectify Images

Using the intrinsic parameters from camera calibration, lens distortion is corrected in each of the individual image scenes. From extrinsic calibration, the procedure yields both the fundamental and essential matrices, which define the mapping from points to lines between the stereo-pair. The matrices define the transformation to epipolar constrained geometry for the two images, which allows for a faster search to determine point correspondences. using the Matlab Computer Vision Toolbox, undistortion can be applied using the undistortImage() function to each image individually, while rectifyStereoImages() uses both intrinsic and extrinsic parameters to both undistort and rectify the stereo-pair so that they are co-planar.

Calculate Disparity

Disparity is defined as $disparity=x-x^{'}={\frac {Bf}{Z}}$ , where $x$ and Failed to parse (syntax error): {\displaystyle x′ } are corresponding points in a stereo-pair, $B$ is the baseline distance between the two camera centers, $f$ is the focal length, and $Z$ is the depth of the corresponding point. From this equation, we see that the disparity is calculated as the distance between corresponding points of the two images. In order to get the disparity map, we much locate these correspondences.

In the Matlab toolbox, we use the disparity() function, which takes in two rectified images and produces a disparity map. By default, the Matlab toolbox utilizes the Semi-Global Block Matching Method, outlined in [10], which performs pixel-wise matching based on Mutual Information and adds a constraint on global smoothness. Disparity is computed by comparing the sum of absolute differences (SAD) of each block of pixels in the image. The Semi-Global method incorporates the additional constraint on global smoothness, which forces disparity to be similar in neighboring blocks. This constraint results in more complete, coherent disparity maps. A comparison of stereo-matching methods to the Semi-Global Matching method with the smoothness constraint can be seen in [10].

The disparity function also outlines how depth, $Z$ , can be calculated from disparity values. This can be visualized in Fig. 7, a diagram that was adapted from an OpenCV tutorial on depth estimation from stereo images. In Matlab, triangulation can also be performed with the reconstructScene() function, which uses an input disparity map and parameters from the stereo pair to compute real world coordinates of each point in the corresponding images.

Results

We show results for the measured spectral sensitivity curves as well as for geometric calibration and estimation of disparity.

Spectral Sensitivity Results and Validation

The resulting spectral sensitivity curves are shown in Fig. 8, where the red, green, and blue lines in the plot correspond to the red, green, and blue channel spectral sensitivities, respectively. To validate the color channel RGB curves, we follow [7] and show a comparison between measured and predicted color values under a constant illumination source. We capture an image of a MacBeth color checker in a light booth with the RGB camera, and also measure each color checker patch with a spectrophotometer. The measured RGB values are taken directly from the Bayer array image values, measured as the mean R, G, and B values of each color patch in the raw camera image. The predicted RGB values are calculated by multiplying a matrix of spectral power distributions of each color patch from the spectrophotometer, $S_{MB}\in R^{24\times \lambda }$ , by the transpose of the spectral sensitivity curves for the RGB channels, $C\in R^{3\times \lambda }$ , where $\lambda$ is the number of wavelength bins measured by the spectrophotometer: $RGB_{pred}=S_{MB}C^{T}$ . Fig. 9 shows the measured RGB values versus predicted RGB values for each color patch in the MacBeth Color Checker. In Fig. 10, we normalize and plot the measured versus predicted RGB values. Figures 9 - 10 show the correspondences in the measured values and predicted values obtained with the spectral sensitivity curves.

By observation, the spectral curves seem a bit noisy. In future work, we will investigate the addition of a weight regularizer that enforces smoothness on the curves. The RGB values in Fig. 9 appear relatively similar, although predicted values seem to be more vibrant and saturated compared to the measured RGB values. Additionally, the axis for the predicted values in Fig. 10 starts around 0.2 rather than at zero. This discrepancy will also need to be investigated. Overall, the measurement process for the spectral sensitivities seems to have gone well. The measured vs. predicted values in all R, G, and B values are strongly correlated, as seen in Fig. 10.

Geometric Calibration and Disparity Estimation Results

Fig. 11 shows a plot of the extrinsic positions of each checkerboard image that was used in the calibration procedure. The checkerboard was held in many locations and orientations to cover a variety of depth and orientation combinations within the scene of both cameras. The mean re-projection errors of the checkerboard corner locations after calibration are shown in Fig. 12, which yields an overall mean error of 0.77 pixels. Disparity maps were computed for several different scenes, and qualitative results are shown in Fig. 13. For this implementation, we use the default block size of fifteen pixels, and a disparity range from [0, 64]. Looking at the qualitative results in Fig. 13, the calibration appears to be successful, especially with such a small baseline between the stereo-pairs (~11 mm). There is noise in the results, most likely due to both calibration error and illumination and spectral factors in the scene. The central disparity map in Fig. 13 shows an interesting case where the algorithm works especially well, in areas of high texture. The image contains a highly textured surface, a cat scratching post, that is clearly visible in the disparity map.

Some specific instrinsic and extrinsic parameters found from geometric calibration are reported in here, in terms of pixel size. The quantitites can be related to world coordinates given a physical measure of the pixel size.

Camera 1 (RGB, Bayer array):
- focal length, x: 3129.89
- focal length, y: 3136.20
- principal point, x: 1481.86
- principal point, y: 1965.84

Camera 2 (MONO, panchromamatic)
- focal length, x:3,904.95
- focal length, y: 3,912.60
- principal point, x: 1906.70
- principal point, y: 2542.23

Essential Matrix:

2.91652373639221e-10 -2.46483695433993e-08 0.00351574737902143 1.94389597138534e-08 -1.56039333504368e-10 -2.30398345692344e-05 -0.00339197334512210 5.69222715312095e-05 -0.216011231276415

Translation of Camera2:

-0.0166337253329474 10.4954769171060 0.184487679165534

Rotation of Camera2:

0.999970885314548 0.00762041883862111 0.000397164906899218 -0.00761846315496286 0.999959873973824 -0.00471269152518661 -0.000433061653528867 0.00470952853044557 0.999988816336775

Because the two sensors have a different size, it's important to note that they must be resampled in order to be rectified with one another on the same plane. Through this limit, the focal length and principal point of camera 2 must also be scaled along with the image size. A change from [5120 x 3840] --> [3968 x 2976] results in an image that is scaled by 0.775 in both x and y dimensions. Thus, the focal length and principal point of the image must also be scaled by 0.775. The results just above have already been scaled. Therefore, the original focal length and principal point for the second camera were:

focal length, x: 3,026.34
focal length, y: 3,032.27
principal point, x: 1477.70
principal point, y: 1970.23

Conclusions

In this report, we explored depth estimation and spectral sensitivity in a dual camera. We calibrated the intrinsic and extrinsic parameters of a Huawei Mate 9 Pro, and showed disparity estimation results. We also measured the spectral sensitivities of each of the sensor channels and show a comparison of measured versus predicted RGB values. In the future, we aim to calibrate more aspects of the system, such as the spatially-varying point spread functions (PSFs) across each lens, as well as further verification that current calibration parameters have been measured correctly. This will also include putting a regularization term on the weights for spectral smoothing, and investigating some smaller details such as the axis offset in Figure 10.

Acknowledgements

I would like to thank Professor Gordon Wetzstein for his advisement throughout this project, Joyce Farrell and Zhenyi Liu for assistance with the spectral calibration setup and procedure, and Don Dansereau for guidance regarding stereo-calibration. Thank you to Professor Brian Wandell for a wonderful quarter of lectures in imaging! Finally, thank you to Huawei for funding this work and providing the mobile device for use in these experiments.

References

[1] https://www.slideshare.net/charmie11/camera-calibration-15702015.

[2] https://fradelg.gitbooks.io/real-time-3d-reconstruction-from-monocular-video/content/index.html.

[3] Y. Bouguet, J. Camera calibration toolbox for matlab, 2015.

[4] F. Doyle. The historical development of analytical photogrammetry. Photogrammetric Engineering, (2):259-265, 1964.

[5] J. Farrell, M. Okincha and M. Parmar, “Sensor calibration and simulation,” Proc. SPIE 6817, 68170R (2008).

[6] CMOS solid state SONY IMX123LQT-C image sensor datasheet.

[7] J. Farrell, P.B. Catrysee, B. Wandell, “Digital camera simulation,” Applied Optics,Vol. 51, Issue 4, pp. A80-A90 (2012).

[8] P. Guerrero, H. Winnemoeller, W. Li, and N. J. Mita. Depthcut: Improved depth edge estimation using multiple unreliable channels (2017).

[9] J. Heikkila and O. Silven, A four-step camera calibration procedure with implicit image correction, Computer Vision and Pattern Recognition (CVPR), (1997).

[10] H. Hirschmuller, Accurate and efficient stereo processing by semi-global matching and mutual information, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2005).

[11] R. Memisevic and C. Conrad, Stereopsis via deep learning, (2013).

[12] D. Scharstein and R. Szeliski, A taxonomy and evaluation of dense two-frame stereo correspondence algorithms, International Journal of Computer Vision, 47 (1-3):7-42, 4 (2002).

[13] F. Sinz, J. Candela, G. Bakur, C. Rasmussen, and M. Franz, Learning depth from stereo, Joint Pattern Recognition Symposium, pages 245-252, (2004).

[14] B. Wandell, “Lecture #5: Image Processing”. PSYCH 221:Image Systems Engineering. Stanford University (2017).

[15] J. Zbontar and Y. LeCun, Computing the stereo matching cost with a convolutional neural network, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6 (2015).

[16] Z. Zhang, A flexible new technique for camera calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11), (2000).

Appendix

Presentation slides can be found here: slides

Code for the project can be found here: File:Rustowicz fall2017 code.zip

Spectral and Geometric Calibration of a Dual Camera Mobile Phone RoseRustowicz

Contents

Introduction

Background