Comparison between Large Displacement Optical Flow Algorithms
Background
Large Displacement
Optical flow has been considered an almost solved problem with the algorithm suggested by Lucas and Kanade[1]. However, due to the intrinsic property of the algorithm based on constant brightness assumption in neighbor pixels often the conventional algorithm[1,2] fails when there is a large displacement due to fast movement of the object, camera motion or occlusions.
Large Displacement Optical Flow Algorithm
Brox and Malik proposed an optical flow algorithms that adapts variational approach combining it with descriptor matching[3]. The idea of their work is to find the corresponding region using sparse SIFT[4] descriptor. Their method can be described like following as a formula.
Energy
The algorithm suggested by Brox and Malik[3] mainly consists of three parts; Data term(color and gradient constancy), smoothness term and descriptor matching term.
![](/psych221wiki/images/d/d4/Energy2.png)
Methods - Deep Flow and Deep Matching
Deep Flow
In [5], Deep Flow has been suggested that uses Deep Matching in descriptor matching. In spirit, Deep Flow is similar to Brox and Malik's work[3] in several aspects that Deep Flow also uses variational flow in their energy minimization and incorporated descriptor matching to find the corresponding regions, but it is different in that Brox and Malik[3] does sparse descriptor matching and Deep Flow is doing dense correspondences matching[5], what they call Deep Matching. In following sections we explain more about Deep Flow properties, structure and formula.
Deep Matching (1). independently movable subpatches
As we can see from the figure below, in SIFT descriptor matching, for example in the figure below using the conventional HoG template matching, the second configuration is the best matching result. However, rather than using rigid configuration of each subpatches, Deep Matching allows each subpatch to to find the best fit to the target image from the reference image[5].
![](/psych221wiki/images/2/2b/Subpatches.png)
Deep Matching (2). convolution
The reference(first) image is divided into non-overlapping patches of 4 by 4 pixels and convoluted with the target(second) image[5]. And the result of convolution is the response map[5].
![](/psych221wiki/images/1/1f/Convolution.png)
Deep Matching (3). aggregating into larger patches
After response maps are generated from convolution, do max-pooling and sub-sample the response map to reduce the size into half and aggregate the patches so create 8 by 8 patches from 4 by 4 response maps[5]. Repeat this process until 16 by 16 patches and 32 by 32 patches are obtained[5]. This multi-layer structure is similar to deep convolutional nets[6] and that is the reason the name Deep Matching and Deep Flow comes from[5].
![](/psych221wiki/images/0/0a/Aggregation.png)
![](/psych221wiki/images/0/07/Aggregationdetail.png)
Deep Matching (4). quasi-dense correspondences
In the pyramid of multi-scale response maps, local maximum is obtained from all the matched patch, even in the areas with less textures or patterns[5]. This is the aspect that makes Deep Matching and Deep Flow stronger than Brox and Malik's method[3] which uses sparse descriptor matching and tends to miss the points in the textureless areas.
![](/psych221wiki/images/4/46/Multiscalepyramid.png)
Results
Figure 6 below shows one example from MIP-Sintel dataset[7]. The result of optical flow computation is shown on the last column and the movement in the arm and head is visualized.
![](/psych221wiki/images/f/fd/Sintelresult.png)
We also ran the code from [8] and compared the optical flow computation result. And the optical flow is visualized like following. Xu et al. proposed their motion detail preserving algorithm[8] to keep both small displacement optical flow and large displacement optical flow especially when there exists camera motion because of the large displacement of the background the fine structure or small displacement of the foreground is ignored. In this example, Deep Flow also gave a pretty good result to find out the motion of the arms and head but it seems like the result by [8] gives more clear holistic shape of the person on the right and give more detailed information about the motion in the arm of the person on the right.
![](/psych221wiki/images/f/fe/Mdpofresultsmall.png)
Conclusions and Future work
We ran DeepFlow algorithm, one of the current state of the art optical flow algorithms that ranks within top 5 from MPI-Sintel Dataset Evaluation Competetion[9]. Due to the fast motion of the object, camera motion, blurring, occlusions, textures measuring optical flow is a pretty challenging task. In our experiment DeepFlow gave a pretty reasonable flow of the motion but [8] seems to give more detailed information. Due to the variations in the difficulty of the image settings and constraints it will be more interesting to run the experiment in many different conditions; comparing the result of different optical flow algorithms on different controlled settings such as 1) textureless vs. textured objects 2) small displacement in foreground vs. large displacement in foreground object 3) blur exsits vs. no blur. In addition, more quantitative analysis will give better result to compare different algorithms using metrics such as Average Angular Error(AAE) and End Point Error(EPE).
References
1. B. D. Lucas and T. Kanade. An iterative image registration technique with an application to stereo vision. In proceedings of the 7th International Joint Conference on Artificial Intelligence, 1981.
2. B. Horn and B. Schunck. Determining optical flow. Artificail Intelligence, 17:185-203, 1981.
3. T. Brox, C. Bregler, and J. Malik. Large displacement optical flow. In IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2009.
4. D. Lowe. Distintive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91-110, 2004.
5. DeepFlow: Large displacement optical flow with deep matching
Philippe Weinzaepfel, Jerome Revaud, Zaid Harchaoui and Cordelia Schmid,
Proc. ICCV‘13, December, 2013.
6. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, november 1998.
7. D. J. Butler, J. Wulff, G. B. Stanley, and M. J. Black. A naturalistic open source movie for optical flow evaluation. In ECCV, 2012.
8. Li Xu, Jiaya Jia, Yasuyuki Matsushita, "Motion Detail Preserving Optical Flow Estimation", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010.
9. http://sintel.is.tue.mpg.de/results