LindaWu

From Psych 221 Image Systems Engineering
Jump to navigation Jump to search

Introduction

Image alignment is the technique of warping one image (or sometimes both images) so that the features in the two images line up perfectly. In many applications, we have two images of the same scene, but they are not aligned. In other words, if you pick a feature (say a corner) on one image, the coordinates of the same corner in the other image is very different.


Basic Theory

At the heart of image alignment techniques is a 3×3 matrix called Homography.

1. The two images are that of a plane.

2. The two images were acquired by rotating the camera about its optical axis.

If we knew the homography, we could apply it to all the pixels of one image to obtain a warped image that is aligned with the second image.

● How to find corresponding points automatically?

In many Computer Vision applications, we often need to identify interesting stable points in an image. These points are called keypoints or feature points.

A feature point detector has two parts [3]

  • Feature Detector
    • Detector identifies points on the image that are stable under image transformations like translation (shift), scale (increase / decrease in size), and rotations. The detector finds the x, y coordinates of such points.
  • Feature Descriptor
    • The locator only tells us where the interesting points are. The second part of the feature detector is the descriptor which encodes the appearance of the point so that we can tell one feature point from the other. The descriptor evaluated at a feature point is simply an array of numbers. Ideally, the same physical point in two images should have the same descriptor.

Task Definition

Iset3D [6] produces (a) image data, and (b) a template with pixel RGB values that define the object location in each image (ground truth).

1. Alignment Algorithms

Investigate on image alignment algorithms to generate the optical flow for image alignment. The image alignment algorithm aligns (a) image data to (b) the template, then generated (c) the aligned image.

2. Evaluation

Implement and apply metric(s) to evaluate the alignment performance. To evaluate the algorithm, compare (b) the template and the (c) the aligned image generated from the alignment algorithm.

Experiments & Results

MATLAB-2019b has been used for performing the image alignment in this project. Table 1 shows the image alignment algorithms from MATLAB’s Computer Vision Toolbox™ used for the feature-detector-descriptors. All remaining parameters are used as default [5].

Dataset

  • Dataset-A

Distorted image is prepared by scaling or/and rotations from original image. Cameraman image (256x256 in grayscale) shown in Fig. 1 is selected from the Computer Vision Toolbox™ of MATLAB.

caption

Fig. 1 Cameraman.tif

  • Dataset-B

The driving scenes generated by Iset3D [6] with the camera shifted into multiple positions (see Fig. 2). Currently only translation is involved.

caption

Fig. 2 Iset3D driving scenes

Ground truths

Ground-truth values for image transformations have been used to calculate and demonstrate error in the recovered results with each feature detector and descriptor. For evaluating scale and rotation invariance, ground-truths have been synthetically generated for each image in Dataset-A by resizing and rotating it to known values of scale (50% to 200%) and rotation (1° to 359°). For evaluating the translation invariance, pick the first image in Dataset-B as the ground-truth, and align the rest images to it.

Generic image alignment phases

Image alignment algorithm involves 5 phases in general [1][3]:

  1. Feature Detection & Description
  2. Feature Matching
  3. Outlier Rejection
  4. Derivation of Transformation Function
  5. Image Warping

This project focuses on applying image alignment algorithms on Dataset-A (Fig. 1) and Dataset-B (Fig. 2), then comparing the image alignment algorithms among ORB, BRISK, SURF, FAST, Harris and MSER.

Matching strategy based on MATLAB Computer Vision Toolbox™

Local features and their descriptors are the building blocks of many computer vision algorithms. The applications include image registration, object detection and classification, tracking, and motion estimation. These algorithms use local features to better handle scale changes, rotation, and occlusion. Computer Vision Toolbox™ algorithms [5] include the corner detectors, and the blob detectors. The toolbox includes the descriptors. The detectors and the descriptors can be mix and match depending on the requirements of the application.

Demonstration of Results

The visualized results, including matching feature points, aligned images and visualized errors with Dataset-A shown on Fig. 3 and Dataset-B shown on Fig. 4.


caption

Fig 3. Feature-detection, alignment, and error visualization with ORB (Scale=75%, Rotation=25 degrees on Dataset-A)


caption

Fig 4. Feature-detection, alignment, and error visualization with ORB (Dataset-B)



Evaluation

Error in Recovered Rotations (Dataset-A)

The result shows the capability of recovered error in recovered rotations.

ORB>FAST>Harris>BRISK>MSER>SURF

Error in Recovered Scale changes (Dataset-A)

The result shows the capability of recovered error in recovered scales.

ORB>MSER>SURF>BRISK>Harris>FAST

Inlier Percentage

Inlier percentage of a feature-detector is the percentage of detected features that survive photometric or geometric transformations in an image (a.k.a Repeatability[1]). Inlier percentage is not related with the descriptors and only depends on the performance of the feature detectors. The results of comparing each alignment algorithms shown on Fig. 6-1. The inlier percentage is calculated as:

NumberofCorrectMatchesKeypoints1+Keypoints2


  • Percentage of inliers w.r.t synthetic rotations

FAST and Harris detectors outperforms BRISK ,while SURF and ORB have the best performance, which shows quantization effects at 45-degree angles due to its Haar-wavelet composition [2]. The performance of inlier percentage w.r.t. rotations can be rated as:

SURF>ORB>MSER>FAST>Harris>BRISK


  • Percentage of inliers w.r.t synthetic scale changes

SURF outperforms ORB, MSER and BRISK with regards to the synthetic scale changes. FAST and Harris features detectors are not scale invariance, which is also proved from this experiment; FAST and Harris detectors are unable to locate sufficient keypoints for alignment while scale changes over 40 percent on Dataset-A, so keep them out of the comparison. The performance of inlier percentage w.r.t scale changes can be rated as:

SURF>ORB>MSER>BRISK


  • Percentage of inliers w.r.t synthetic translations

SURF and ORB detectors outperforms FAST and BRISK, while Harris performs the best, with over 90 % inliers, on Dataset-B w.r.t translations. The performance of inlier percentage w.r.t translations can be rated as:

Harris>SURF>ORB>FAST>BRISK>MSER 

Feature Matching Accuracy

Accuracy of descriptor is the number of correctly matched regions with respect to total number of matches between template image and input image of the same scene [7]. The feature matching accuracy is calculated as:


NumberofCorrectMatchesNumberofMatches*100%


  • Feature Matching Accuracy w.r.t synthetic rotations
FAST>Harris>ORB>BRISK>MSER>SURF 


  • Feature Matching Accuracy w.r.t synthetic scale changes
MSER>ORB>BRISK>SURF 


  • Feature Matching Accuracy w.r.t synthetic translations
MSER>ORB>BRISK>Harris>SURF>FAST


In summary, MSER and ORB descriptors performs the best for extracting the correct features, while SURF and FAST performs the worst in this experiment.

Total Image Matching Time

Total image matching time refers to the total computational time of feature detection, feature extraction, feature matching, outlier rejection and transformations.

  • Total Matching Time w.r.t synthetic rotations
 SURF>ORB>FAST>Harris>MSER>BRISK


  • Total Matching Time w.r.t synthetic scale changes
 SURF>ORB>MSER>BRISK


  • Total Matching Time w.r.t synthetic translations
 FAST>ORB>SURF>Harris>MSER>BRISK


SURF and ORB performs as the fastest and stable image alignment algorithms. FAST performs the best with only translations involved. BRISK performs the slowest in all the datasets.

Root-Mean-Square errors (RMSE)

Comparing restoration results requires a measure of image quality. RMSE is a measure of how spread out the regression data points are. In other words, it tells you how concentrated the data is around the line of best fit [12].


  • RMSE w.r.t synthetic rotations
Harris>MSER>FAST>ORB>BRISK>SURF


  • RMSE w.r.t synthetic scale changes
MSER>ORB>BRISK>SURF


  • RMSE w.r.t synthetic translations
MSER>BRISK>FAST>Harris>SURF>ORB


The result shows the variance of the RMSE are high w.r.t both scale changes and rotations, with SURF performs the worst. Harris shows less RMSE w.r.t rotations, while it is not scale invariant. The RMSE experiments on Dataset-B with translation shows comparable restoration quality.

Quantitative Comparison and Computational Costs of Different Feature-Detector-Descriptors

The quantitative results, including keypoints detected in template, keypoints detected in distorted image, number of match features, computational times, etc., shown on Table 2.


The quantitative comparison shows FAST and Harris outperforms on feature matching accuracy, however, they are not scale invariant. BRISK turned out to be the most accurate algorithm w.r.t the distortion among all geometry distortions, while the matching time for such a large number of features prolongs the total image matching time. ORB performs the fastest with lower decomposition level, which minimizes the number of detected features, and speed up the total computational time.


caption

Table 2. Quantitative Comparison and Computational Costs of Different Feature-Detector-Descriptors

Conclusion

This project presents comparison of ORB, BRISK, SURF, FAST, Harris and MSER feature-detector-descriptors. SURF and ORB are found to be the most scale invariant feature detectors (on the basis of inlier percentage) that have survived wide-spread scale variations. BRISK is found to be least scale invariant (FAST and Harris are not scale invariant). SURF and ORB are also more rotation invariant than others. FAST and Harris have higher accuracy for image rotations as compared to the rest. Although, ORB, BRISK are the most efficient algorithms that can detect a huge amount of features, the matching time for such a large number of features prolongs the total image matching time. On the contrary, FAST and SURF perform fastest image matching but their accuracy gets compromised.

The quantitative comparison (Table 2.) has shown that the generic order of feature-detector-descriptors for their ability to detect high quantity of features (Inliers Percentage) is:

SURF>Harris>ORB>BRISK>FAST>MSER

● The sequence of algorithms for computational efficiency of feature-detection-description per feature-point is:

ORB>SURF>Harris>FAST>BRISK>MSER

● The order of efficient feature-matching per feature-point is:

Harris>SURF>BRISK>FAST>MSER>ORB

ORB is most efficient feature-detection-description algorithm, while it is most inefficient during feature matching.

● The feature-detector-descriptors can be rated for the speed of total image matching as:

ORB>FAST>SURF>Harris>MSER>BRISK

● The image matching accuracy of descriptors can be rated as:

(FAST>Harris>)BRISK>MSER>ORB>SURF 

The overall accuracy of BRISK and MSER are found to be highest for all types of geometric transformations (as FAST and Harris are not scale invariant), and ORB performs the best with regards to speed versus accuracy.

Experiment of Burst Photography

One of the applications using image alignment algorithm is whether you can align these images and average them! The goal is to bring all of the radiance images into register so that the average is less noisy than the originals. This computational challenge is connected to the idea of 'burst photography' in which vendors take a series of brief images and then average them to produce a final high-quality image.

Experiment 1: Gray-scale Image

  • Step 1: Pick any image in the carMovingAway dataset as the original(template) image.
  • Step 2: Convert it to grayscale using MATLAB function, rgb2gray(rgbOriginal).

  • Step 3: Align the rest of images (the other 7 images) to the template using ORB alignment algorithm. Choosing ORB alignment algorithm because it is evaluated as the fastest in this project.
  • Step 4: Average the aligned images.

  • Step 5: Add the "averaged aligned data"(Step 4) to the original and produce below.


From the above experiment, Step 4 averaging the alignment data from sequence of burst photos does reduce the noise (misalignment from trees, clouds).

To enhance the original photo, I apply the "averaged" alignment data to the original, which brightens up the regions where were blurred by motion (Step 5).


Experiment 2: Color Image

  • Step 1: Pick any image in the carMovingAway dataset as the original(template) image.

  • Step 2: Separate original template into R, G and B channels
  • Step 3: Repeat Step 3&4 from Experiment 1 on R, G, B channels respectively
  • Step 4: Add the "averaged aligned data"(Step 3) to the original image and generate below.

  • Step 5: Normalize R, G, B channels and generate enhanced quality image.


The comparison between original image with the enhanced image:

Reference

[1] Shaharyar Ahmed Khan Tareen and Zahra Saleem. “A Comparative Analysis of SIFT, SURF, KAZE, AKAZE, ORB, and BRISK”, in International Conference on Computing, Mathematics and Engineering Technologies, iCoMET, 2018

[2] Rublee, E., V. Rabaud, K. Konolige and G. Bradski. "ORB: An efficient alternative to SIFT or SURF." In Proceedings of the 2011 International Conference on Computer Vision, 2564–2571. Barcelona, Spain, 2011.

[3] Image Alignment (Feature Based) using OpenCV (C++/Python) https://www.learnopencv.com/image-alignment-feature-based-using-opencv-c-python/

[4] Matlab Computer Vision Toolbox™ https://www.mathworks.com/help/vision/feature-detection-and-extraction.html

[5] The Image Systems Engineering Toolbox for Cameras (isetcam) https://github.com/ISET/isetcam

[6] PBRT scene rendering (Iset3D) https://github.com/ISET/iset3d

[7] Siok Yee Tan, Haslina Arshad and Azizi Abdullah, “Distinctive accuracy measurement of binary descriptors in mobile augmented reality”, published in January, 2019

[8] Rosten, E., and T. Drummond. “Machine Learning for High-Speed Corner Detection.” 9th European Conference on Computer Vision. Vol. 1, 2006, pp. 430–443.

[9] Bay, H., A. Ess, T. Tuytelaars, and L. Van Gool. “SURF: Speeded Up Robust Features.” Computer Vision and Image Understanding (CVIU). Vol. 110, No. 3, 2008, pp. 346–359.

[10] Leutenegger, S., M. Chli, and R. Siegwart. “BRISK: Binary Robust Invariant Scalable Keypoints.” Proceedings of the IEEE International Conference. ICCV, 2011.

[11] Matas, J., O. Chum, M. Urba, and T. Pajdla. "Robust wide-baseline stereo from maximally stable extremal regions."Proceedings of British Machine Vision Conference. 2002, pp. 384–396.

[12] Barnston, A., (1992). “Correspondence among the Correlation [root mean square error] and Heidke Verification Measures; Refinement of the Heidke Score.” Notes and Correspondence, Climate Analysis Center.