Stereo Depth Estimation Algorithms: Difference between revisions

Revision as of 03:39, 15 December 2017

Introduction

The brain is a fantastic supercomputer with unmatched 3D object recognition and depth estimation. The ocular system has over 120 million rods and 6 million cones which send signals that are funneled into of our visual cortex resulting in our capability to see. The massive amounts of data and processing power of our brain, make it a fascinating phenomenon. Stereo depth is a major part of this machinery and in this paper, we delve into the topic by investigating existing computer algorithms. Depth extraction from stereo estimation boils down to Triangulation. The paper seeks to explain three primary approaches and details surrounding stereo depth extraction.

Background

Computer vision has seen explosive growth during the turn of the century; stereo depth in particular has seen its fair share of research due to its application in panorama mosaic, improved face tracking, facial recognition, autonomous driving, high resolution geometric sampling and many others. The result of this high interest is a plethora of algorithms and papers describing various ways of estimating depth from two images.

A typical stereo algorithm consists of four steps: matching cost computation, cost aggregation, optimization and disparity refinement [1]. Stereo correspondence, the process of matching pixels between stereo images, is essentially an optimization problem. The local or global categories for stereo depth estimation are determined by the way an algorithm solves the stereo correspondence problem.

Methods

"The wide range of modern dense matching algorithms can be assigned to two big categories: local methods, which evaluate the correspondences one point at a time, not considering neighboring points/measures, and global methods where some constraint on the regularity of results are enforced in the estimation"[2]. However, after the advent of the MC-CNN paper[3] a whole host of learning network algorithms have dominated the KITTI data set. Below we will go over the types of depth estimation methods assuming a rectified image has been provided.

Local or Block Method

One Image is picked as a reference and then a block size is searched across the epipolar plane to identify a corresponding pixel on the paired image. Different cost functions can be used to determine the similarity between pixels. A winner takes all strategy is used for finding the location of a corresponding pixel in the search space.

Examples of costing functions

Sum of absolutes distance

Sum of squared distances

Maximize Cross-correlation

Normalized cross-correlation

This can be very efficient and accurate making it particularly suitable for real-time applications [2] Consider having a time comparison from the data sets These algorithms are susceptible to images with : • Sudden depth variations • Higher degree of occlusions • very noisy results in low contrasted textured regions. The local solutions also do not take advantage of smoothing functions as described in global optimization solutions. Most algorithm methods also use a "left right consistency" disparity refinement step to improve on the captured images. This step helps mitigate some of the occlusion issues.

Locally find a cost minima for a pixel across the scan line after selecting a template.

Global Method

Semi Global Matching (SGM)

Energy Minimization

Simulated Annealing

Markov Random Fields

Markov Random Fields have enable a statistical method of building out the disparity method

Belief Propagation

Graph cuts

Other Methods

Plane sweeping

CNN

MC_CNN is an algorithm that started the movement The latest algorithms have implemented various approaches to improving on the idea of a learned stereo matching. The cost optimization designs, whether global or local, seek to gain an answer of whether a reference pixel is similar or not to the pixel under comparison. The machine learning networks seek to improve on this comparison step by adding apriori knowledge.

Difficult geometry and occlusion of disparate images are considered costs that are to be minimized in previously mentioned implementations. The learned neural nets attempt to be trained with various geometric image snippets with both positive and negative matches to be able to answer the question of similarity with a priori knowledge.

Features are first extracted from input images and then these features are combined together to determine a similarity score

Dynamic Programming

Scan line based traversal with a 2D dp matrix of pixel and disparity

Plane Sweeping

Take an image and warp to different disparity arrangements until a plan match is found and then iterate and solve for remaining items.

Results

Local based Algorithms perform well in smooth images with low occlusion <insert mask disparity> <insert bust and light bulb disparity>

Poor performance with increased occlusion <pipes performace> <Jade disparity map>

The costing function makes small changes within the same image <side by side and difference image of cones >

Global optimization does a much better job of handling occlusion and depth discontinuity <Block SSD vs ICM vs Graph cut comparison image here>

The implementation function selected can make a big difference < iterated conditional modes (ICM) vs graph cuts >

However, the latest KITT benchmarks are dominated by CNNs < insert KITT table >

Conclusions

In this paper, we have given a wide range of topics related to the available solutions for computer depth. The paper has attempted to describe the mathematical and computational components of a small piece of the computer vision, namely stereo depth estimation. Surely, this is not an exhaustive review but the paper seeks to introduce the various approaches and their fundamental ideas.

The state of the art with this problem set is wide and varied but with more stereo images with ground truths, the learned approaches are quickly becoming the norm. Although we take our innate capability of understanding the 3-dimensional geometry of the objects we see, the capability was enabled by a huge trove of a priori knowledge on top of the available planar intersection, shadow and ray tracing, relative size, convergence, defocus and accommodation cues. The world of computer vision is barely scratching the surface when compared to what nature has already perfected. The pace of innovation holds much promise that one day we will be able to outperform the human brain not only in depth perception but also in 3D reconstruction and recognition.

Appendix

[1] Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47.
[2] Dall'Asta, E. and Roncella, R., 2014. A comparison of semiglobal and local dense matching algorithms for surface reconstruction. International Archives of
Photogrammetry, Remote Sensing and Spatial Information Sciences, 40(5): 187–194.
[3] Zbontar, Jure, and Yann LeCun. "Stereo matching by training a convolutional neural network to compare image patches." Journal of Machine Learning Research 17.1-32 (2016): 2.
[4] Van Meerbergen, Geert, et al. "A hierarchical symmetric stereo algorithm using dynamic programming." International Journal of Computer Vision 47.1 (2002): 275-285.
[5] Boykov, Yuri, Olga Veksler, and Ramin Zabih. "Fast approximate energy minimization via graph cuts." IEEE Transactions on pattern analysis and machine intelligence23.11 (2001): 1222-1239.
[6] Sun, Jian, Nan-Ning Zheng, and Heung-Yeung Shum. "Stereo matching using belief propagation." IEEE Transactions on pattern analysis and machine intelligence 25.7 (2003): 787-800
[7]Szeliski, Richard, et al. "A comparative study of energy minimization methods for markov random fields with smoothness-based priors." IEEE transactions on pattern analysis and machine intelligence 30.6 (2008): 1068-1080.

@@ Line 8: / Line 8: @@
 == Methods ==
-"The wide range of modern dense matching algorithms can be assigned to two big categories: local methods, which evaluate the correspondences one point at a time, not considering neighboring points/measures, and global methods where some constraint on the regularity of results are enforced in the estimation"[2]. However, after the advent of the MC-CNN paper[3] a whole host of learning network algorithms have dominated the KITT data set.
+"The wide range of modern dense matching algorithms can be assigned to two big categories: local methods, which evaluate the correspondences one point at a time, not considering neighboring points/measures, and global methods where some constraint on the regularity of results are enforced in the estimation"[2]. However, after the advent of the MC-CNN paper[3] a whole host of learning network algorithms have dominated the KITTI data set. Below we will go over the types of depth estimation methods assuming a rectified image has been provided.
 === Local or Block Method ===
+One Image is picked as a reference and then a block size is searched across the epipolar plane to identify a corresponding pixel on the paired image. Different cost functions can be used to determine the similarity between pixels. A winner takes all strategy is used for finding the location of a corresponding pixel in the search space.
+Examples of costing functions
+*Sum of absolutes distance
+[[File:sad.png]]
+* Sum of squared distances
+[[File:ssd.png]]
+* Maximize Cross-correlation
+[[File:correlation.png]]
+* Normalized cross-correlation
+[[File:ncorrelation.png]]
+This can be very efficient and accurate making it particularly suitable for real-time applications [2]
+	Consider having a time comparison from the data sets
+These algorithms are susceptible to images with :
+	• Sudden depth variations
+	• Higher degree of occlusions
+	• very noisy results in low contrasted textured regions.
+The local solutions also do not take advantage of smoothing functions as described in global optimization solutions. Most algorithm methods also use a "left right consistency" disparity refinement step to improve on the captured images. This step helps mitigate some of the occlusion issues.
 Locally find a cost minima for a pixel across the scan line after selecting a template.

Stereo Depth Estimation Algorithms: Difference between revisions

Revision as of 03:39, 15 December 2017

Contents

Introduction

Background

Methods

Local or Block Method

Global Method

Semi Global Matching (SGM)

Energy Minimization

Simulated Annealing

Markov Random Fields

Belief Propagation

Graph cuts

Other Methods

Plane sweeping

CNN

Dynamic Programming

Plane Sweeping

Results

Conclusions

Appendix

Navigation menu

Stereo Depth Estimation Algorithms: Difference between revisions

Revision as of 03:39, 15 December 2017

Introduction

Background

Methods

Local or Block Method

Global Method

Semi Global Matching (SGM)

Energy Minimization

Simulated Annealing

Markov Random Fields

Belief Propagation

Graph cuts

Other Methods

Plane sweeping

CNN

Dynamic Programming

Plane Sweeping

Results

Conclusions

Appendix

Navigation menu

Search