Fast Alignment of Image Bursts for Google's HDR+

From Psych 221 Image Systems Engineering
Revision as of 09:03, 15 December 2017 by imported>Student2017 (Results)
Jump to navigation Jump to search

Introduction

Standard imaging systems often lack the dynamic range necessary to properly expose all parts of a given scene. Typically, an exposure value is chosen to compromise between overexposure, with bright regions lacking detail due to saturation, and underexposure, with dark regions predominantly consisting of noise. High dynamic range (HDR) imaging refers to a family of techniques for alleviating this tradeoff. A common approach to HDR imaging is to capture the same scene multiple times using multiple exposure values. A short exposure produces good results in bright regions, a long exposure increases the signal-to-noise ratio in dark regions, and a number of other exposure values in between can be used to properly expose different parts of the scene. All of the resulting images can then be merged into a single image in which all content appears to be well exposed. This method is effective in certain scenarios, but it necessarily suffers with scene motion and camera motion; the frames must be aligned in order to merge them, but alignment becomes difficult when different features are prominent in each image, and this problem worsens when some or all portions of the scene move between frames.

A great deal of photography today is done with the image sensors built into mobile phones; the very existence of platforms like Instagram and Snapchat are evidence of this. These sensors have increasingly high resolution and general performance, but they are inherently limited by their small size, the quality of the optical assemblies attached to them, the computing power available to them for image processing, the likelihood that they will move while capturing images, and the responsiveness and high throughput desired by their users. The first two limitations suggest that mobile image sensors would be good candidates for HDR imaging. The last three suggest that the common HDR method described above would be a bad candidate for this paradigm. In response to this problematic situation, Google developed a different HDR algorithm optimized for mobile imaging platforms with the constraints discussed above. This project explores a prototype Matlab implementation of frame alignment, a key step in the algorithm, with the goal of understanding which parts of the process most affect the speed of computation.

Background

Google's HDR+ algorithm [1] is designed to operate on bursts of raw frames directly from the image sensor, eventually merging them into a single raw image that can then follow the normal demosaicking and processing steps that a traditional single-exposure image would undergo. The process is roughly broken up into four steps: capture, align, merge, finish. Alignment is the focus of this project, but some information about all four steps is provided below for context. Each step is optimized for speed and user experience.

Capture

Instead of attempting to perform exposure bracketing, HDR+ uses a burst of intentionally-underexposed frames. These frames are captured in quick succession, and each frame is captured with the same exposure value. Underexposure increases the chances of preserving detail in the bright parts of each frame and reduces the impact of scene motion and camera motion, but it has the obvious side effect of decreasing signal in dark areas. The HDR problem is thereby transformed into a denoising problem for most parts of the image. Capturing a short burst of frames aids in denoising in two ways: it provides additional samples of the same scene that can be averaged to reduce noise, and it increases the chances that one or more of the frames was captured with minimal motion.

Align

An underlying assumption for the algorithm is that camera motion and scene motion will be present to a certain extent. Alignment between corresponding regions of all frames is therefore necessary to properly combine the information from all the frames into one usable image. Alignment is accomplished by identifying a reference frame, downsampling each frame several times to create successively smaller frames, and then aligning tiles with the corresponding reference frame tile within a local perturbation window by finding the offset that produces the least difference between the tiles. The process follows a pyramid approach, starting with coarse alignment in the smallest downsampled frame and working up to fine alignment in the largest frames, with previous results determining each new initial guess for tile alignment in the larger image.

Merge

The merge step uses the data from the alignment step to match tiles in multiple frames. The tiles are transformed into the spatial frequency domain, after which high-frequency content is largely discarded and averaging corresponding tiles results in a significant reduction in noise. The averaging function assigns relative weights to contributions from each frame depending on how closely it is aligned with the reference tile. Merging of the raw data occurs separately in each color channel and blends overlapping tiles rather than iterating over each tile independently. The output of this stage is a new, high dynamic range, single-exposure raw image that can be processed as if it came directly from the sensor.

Finish

The finishing stage can be thought of as the part of the processing pipeline that would usually occur after raw frame capture regardless of whether HDR techniques are being used. Standard sensor correction steps like black subtraction and white balancing, optics correction to compensate for vignetting and chromatic aberration, multiple forms of color correction and adjustment, and dehazing are performed, albeit sometimes in nonstandard ways. Dynamic range compression and tone mapping are also performed and are even more important than usual in this context since the image contains a higher dynamic range than typical file representations and displays can support.

Methods

My implementation of the alignment step in Matlab followed the outline above. The major components are finding a reference frame, downsampling,

Results

Reference frame

Downsampling to green
Method   Time (s/MP)
blockproc()   2.175
For loop (pixel)   0.0746
For loop (line)   0.0121


Gradient
Operator   Time (s/MP)
Sobel   0.0171
Prewitt   0.0177
Central   0.0204
Intermediate   0.0122
Roberts   0.0208

Conclusions

Appendix

You can write math equations as follows: y=x+5

You can include images as follows (you will need to upload the image first using the toolbox on the left bar.):

References

<references />