Depth Mapping Algorithm Performance Analysis: Difference between revisions

Revision as of 19:55, 15 December 2017

Introduction

We will implement various disparity estimation algorithms and compare their performance.

Background

Disparity and Depth

Depth information about a scene can be captured using a stereo camera (2 cameras that are separated horizontally but aligned vertically). The stereo image pair taken by the stereo camera contains this depth information in the horizontal differences (when comparing the stereo image pair, objects closer to the camera will be more horizontally displaced). These differences (also called disparities) can be used to determine the relative distance from the camera for different objects in the scene. In Figure 1, you can see such differences on the left where the red and blue don't match up.

Disparity and depth can be related by the following equation (where x-x' is disparity, z is depth, f is the focal length, and B is the interocular distance). !!!!!!!!!!!!OSCAR/SHALINI WRITE STUFF HERE!!!!!!!!!!!!!!!!!!!!!!

$x-x'={\frac {Bf}{z}}$

Image Rectification

In order to extract depth information, the stereo image pair must first be rectified (i.e. the images must be transformed in some way such that the only differences that remain are horizontal differences corresponding to the distance of the object from the camera). Rectification can be accomplished both with and without camera calibration. If the corresponding camera intrinsics and extrinsics are given for the stereo image pair, then calibration is not necessary. If they are not given, but photos of a checkerboard or some other calibration object are provided, then the calibration parameters can be calculated.

If not enough camera parameters are given and there are no checkerboard images to be used for calibration, then the following 4 steps can be used for rectification of a given stereo image pair.

First we detect SURF keypoints (a scale and rotation invariant feature detector) in each stereo image and extract the feature vectors of each keypoint.
We then find matching keypoints between the images using a similarity metric. The uncalibrated rectification used by MATLAB uses the sum of absolute differences metric.
We then remove outliers (incorrect matches) using an epipolar constraint. Looking at Figure 3, this means that for a keypoint x on the left image, the matching keypoint on the right image must lie on the corresponding epipolar line defined by the intersection of the epipolar plane and the image plane.
Now using the vector distance between the remaining correct matches, we must compute a 2D projective geometric transformation to apply to one or both of the stereo images. After transformation, the stereo images should no longer have any vertical displacement.

Datasets

SYNS Dataset

!!!!SHALINI FIX THIS!!!!

The Southampton-York Natural Scenes (SYNS) dataset contains image and 3D range data measured from different rural and urban locations. Each sample contains the following data:

LiDAR depth information (360 x 135 degree field of view)
Panoramic HDR image captured with a SpheroCam (Nikkor fish-eye lens) (360 x 180 degree image)
Stereo image pairs captured with 2 Nikon DSLR cameras (each image pair was captured at a different rotation of the camera such that all of them covered a 360 degree view of the surroundings)

No intrinsics/extrinsics or calibration data → use uncalibrated rectification on stereo images → compute depth map

Want to compare depth map to LiDAR info (ground truth) Use SYNS script to scale panoramic LiDAR info to panoramic HDR image Least mean squares algorithm to map rectified stereo images to the panoramic HDR and find region of interest Extract corresponding LiDAR info and compare to computed depth map

Middlebury Dataset

!!!!!OSCAR DO THIS PLS!!!!!

Methods: Similarity Metrics

!!!OSCAR DO THIS PLS!!!!!

SSD

SAD

NCC

CT

The census transform encodes the relative brightness of each pixel (with respect to its neighbors) and compares the digital encoding of pixels from windows from each image (left and right). The algorithm is as follows:

Compute bit-string for pixel p, based on intensities of neighboring pixels

Compare left image bit-string with bit-strings from a range of windows in the right image

Choose disparity d with lowest Hamming distance

Methods: Algorithm Evaluation

Talk about

Tried these 3 algorithms, x pictures for each
How error rate is calculated and/or averaged
Other stuff

Results

Sum of Squared Differences

Sum of Absolute Difference

Performance with default parameters

Disparity Map without semi-global matching

Effect of Block Size and Smoothing

For block matching with and without the semi-global smoothing, we tested the effect of altering the block size of the Block Matching + SAD algorithm. The chart below shows the resulting average error rates over 15 images, along with the corresponding graph.

Overall, the semi-global smoothing had a significantly lower error rate than regular SAD. Furthermore, the block size affects the performance of each differently -- semi-global SAD produces lower error rates with smaller block sizes, and SAD has a local optimum around a block size of 11.

Comparing the computational time of each algorithm over all block sizes tested, SAD with semi-global smoothing took an average of 0.9598 seconds per image, while SAD took an average of 0.3016 seconds. Thus, regular SAD is more than three times as fast.

Census Transformation

Conclusions

Available Datasets

!!!!!SHALINI DO THIS!!!!!

Smoothing in Disparity Algorithms

Which Algorithm Performs 'Best'?

References

Appendix I

Appendix II

We divided the work as follows:

Oscar Guerrero:
Deepti Mahajan:
Shalini Ranmuthu:

You can write math equations as follows: $y=x+5$

You can include images as follows (you will need to upload the image first using the toolbox on the left bar.):

@@ Line 66: / Line 66: @@
 ===NCC===
 ===CT===
+The census transform encodes the relative brightness of each pixel (with respect to its neighbors) and compares the digital encoding of pixels from windows from each image (left and right). The algorithm is as follows:
+*Compute bit-string for pixel ''p'', based on intensities of neighboring pixels
+*Compare left image bit-string with bit-strings from a range of windows in the right image
+*Choose disparity ''d'' with lowest Hamming distance
 == Methods: Algorithm Evaluation ==

Depth Mapping Algorithm Performance Analysis: Difference between revisions

Revision as of 19:55, 15 December 2017

Contents

Introduction

Background

Disparity and Depth

Image Rectification

Datasets

SYNS Dataset

Middlebury Dataset

Methods: Similarity Metrics

SSD

SAD

NCC

CT

Methods: Algorithm Evaluation

Results

Sum of Squared Differences

Sum of Absolute Difference

Performance with default parameters

Effect of Block Size and Smoothing

Census Transformation

Conclusions

Available Datasets

Smoothing in Disparity Algorithms

Which Algorithm Performs 'Best'?

References

Appendix I

Appendix II

Navigation menu

Depth Mapping Algorithm Performance Analysis: Difference between revisions

Revision as of 19:55, 15 December 2017

Introduction

Background

Disparity and Depth

Image Rectification

Datasets

SYNS Dataset

Middlebury Dataset

Methods: Similarity Metrics

SSD

SAD

NCC

CT

Methods: Algorithm Evaluation

Results

Sum of Squared Differences

Sum of Absolute Difference

Performance with default parameters

Effect of Block Size and Smoothing

Census Transformation

Conclusions

Available Datasets

Smoothing in Disparity Algorithms

Which Algorithm Performs 'Best'?

References

Appendix I

Appendix II

Navigation menu

Search