Comparison between Large Displacement Optical Flow Algorithms

From Psych 221 Image Systems Engineering
Jump to navigation Jump to search

Background

Large Displacement


Optical flow has been considered an almost solved problem with the algorithm suggested by Lucas and Kanade[1]. However, due to the intrinsic property of the algorithm based on constant brightness assumption in neighbor pixels often the conventional algorithm[1,2] fails when there is a large displacement due to fast movement of the object, camera motion or occlusions.

Large Displacement Optical Flow Algorithm

Brox and Malik proposed an optical flow algorithms that adapts variational approach combining it with descriptor matching[3]. The idea of their work is to find the corresponding region using sparse SIFT[4] descriptor. Their method can be described like following as a formula.

Energy

The algorithm suggested by Brox and Malik[3] mainly consists of three parts; Data term(color and gradient constancy), smoothness term and descriptor matching term.

Equation 1. Energy Optimization Objective Function (for more details for each term, readers please refer to [3])

Methods - Deep Flow and Deep Matching

Deep Flow


In [5], Deep Flow has been suggested that uses Deep Matching in descriptor matching. In spirit, Deep Flow is similar to Brox and Malik's work[3] in several aspects that Deep Flow also uses variational flow in their energy minimization and incorporated descriptor matching to find the corresponding regions, but it is different in that Brox and Malik[3] does sparse descriptor matching and Deep Flow is doing dense correspondences matching[5], what they call Deep Matching. In following sections we explain more about Deep Flow properties, structure and formula.

Deep Matching (1). independently movable subpatches

As we can see from the figure below, in SIFT descriptor matching, for example in the figure below using the conventional HoG template matching, the second configuration is the best matching result. However, rather than using rigid configuration of each subpatches, Deep Matching allows each subpatch to to find the best fit to the target image from the reference image[5].

Figure 1. Movable cells in SIFT descriptor. Left: The reference image, Center: optimal matching in target image with fixed configuration of SIFT, Right: optimal matching on the target image with movable subpatches (This figure is excerpted from [5])

Deep Matching (2). convolution

The reference(first) image is divided into non-overlapping patches of 4 by 4 pixels and convoluted with the target(second) image[5]. And the result of convolution is the response map[5].

Figure 2. convolution of reference image patches and target image (This figure is excerpted from [5])

Deep Matching (3). aggregating into larger patches

After response maps are generated from convolution, do max-pooling and sub-sample the response map to reduce the size into half and aggregate the patches so create 8 by 8 patches from 4 by 4 response maps[5]. Repeat this process until 16 by 16 patches and 32 by 32 patches are obtained[5]. This multi-layer structure is similar to deep convolutional nets[6] and that is the reason the name Deep Matching and Deep Flow comes from[5].

Figure 3. Procedure to make multi-scale response maps (This figure is excerpted from [5])


Figure 4. More detailed procedure on aggregation step (This figure is excerpted from [5])

Deep Matching (4). quasi-dense correspondences

In the pyramid of multi-scale response maps, local maximum is obtained from all the matched patch, even in the areas with less textures or patterns[5]. This is the aspect that makes Deep Matching and Deep Flow stronger than Brox and Malik's method[3] which uses sparse descriptor matching and tends to miss the points in the textureless areas.

Figure 5. Multi-scale response map pyramid and quasi-dense correspondences (This figure is also excerpted from [5])

Results

Figure 6 below shows one example from MIP-Sintel dataset[7]. The result of optical flow computation is shown on the last column and the movement in the arm and head is visualized.

Figure 6. Two consecutive frames from ambush_5 in the training images of MIP-Sintel dataset[7] and the last figure is the optical flow visualization computed with deep flow


We also ran the code from [8] and compared the optical flow computation result. And the optical flow is visualized like following. Xu et al. proposed their motion detail preserving algorithm[8] to keep both small displacement optical flow and large displacement optical flow especially when there exists camera motion because of the large displacement of the background the fine structure or small displacement of the foreground is ignored. In this example, Deep Flow also gave a pretty good result to find out the motion of the arms and head but it seems like the result by [8] gives more clear holistic shape of the person on the right and give more detailed information about the motion in the arm of the person on the right.

Figure 7. optical flow visualization computed with motion detail preserving method[8] on the same two consecutive frames as in Figure 6.

Conclusions and Future work

We ran DeepFlow algorithm, one of the current state of the art optical flow algorithms that ranks within top 5 from MPI-Sintel Dataset Evaluation Competetion[9]. Due to the fast motion of the object, camera motion, blurring, occlusions, textures measuring optical flow is a pretty challenging task. In our experiment DeepFlow gave a pretty reasonable flow of the motion but [8] seems to give more detailed information. Due to the variations in the difficulty of the image settings and constraints it will be more interesting to run the experiment in many different conditions; comparing the result of different optical flow algorithms on different controlled settings such as 1) textureless vs. textured objects 2) small displacement in foreground vs. large displacement in foreground object 3) blur exsits vs. no blur. In addition, more quantitative analysis will give better result to compare different algorithms using metrics such as Average Angular Error(AAE) and End Point Error(EPE).

References

1. B. D. Lucas and T. Kanade. An iterative image registration technique with an application to stereo vision. In proceedings of the 7th International Joint Conference on Artificial Intelligence, 1981.
2. B. Horn and B. Schunck. Determining optical flow. Artificail Intelligence, 17:185-203, 1981.
3. T. Brox, C. Bregler, and J. Malik. Large displacement optical flow. In IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2009.
4. D. Lowe. Distintive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91-110, 2004.
5. DeepFlow: Large displacement optical flow with deep matching Philippe Weinzaepfel, Jerome Revaud, Zaid Harchaoui and Cordelia Schmid, Proc. ICCV‘13, December, 2013.
6. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, november 1998.
7. D. J. Butler, J. Wulff, G. B. Stanley, and M. J. Black. A naturalistic open source movie for optical flow evaluation. In ECCV, 2012.
8. Li Xu, Jiaya Jia, Yasuyuki Matsushita, "Motion Detail Preserving Optical Flow Estimation", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010.
9. http://sintel.is.tue.mpg.de/results