Analysis of Google's HDR+ Burst Image Processing Pipeline

Introduction: HDR capture on cell phones

Cell phone cameras have improved rapidly over the years thanks to improvements in software and hardware technology. Due to lens size limitations, however, cell phone cameras tend to perform poorly in low-light conditions, resulting in more noise and loss of detail in the final image. The camera sensors are also limited when capturing high-dynamic range (HDR) shots, where attempts to capture details in the shadows often leads to overexposure of bright regions, or vice-versa.

Exposure bracketing is one solution for producing better low-light and HDR images on a cell phone. In this method, the camera captures multiple shots at different exposure times, and then aligns and merges them into one final image. This allows for details in dark and bright areas to be preserved in the final image. Due to the different exposure times and possible movement in the scene, alignment and merging of the images can be difficult and can lead to artifacts such as ghosting.

Another proposed approach is to take a burst of images with the same exposure time, all slightly underexposed to avoid saturation, and then align and merge them together [Ref Google]. A constant exposure time makes aligning the images easier and reduces the effect of ghosting. Merging the multiple bursts together also helps reduce the increased noise due to a shorter exposure time.

Here, we explore Google’s HDR+ image processing pipeline and show one approach for using MATLAB to recreate part of the image alignment step. (We also discuss some other image alignment techniques in the literature and highlight their similarities and differences?)

Background: Google's HDR+ Pipeline

We present a brief overview of the image processing pipeline described in greater detail in the paper by Hasinoff, et al [1]. The pipeline begins from the raw sensor data to the final output on your phone. The three major steps are 1) aligning the burst images, 2) merging them, and 3) finishing the image. Unlike many HDR phone apps, which use the camera’s image signal processor (ISP) to process the image first and then create an HDR image, Google uses their Camera2 API to use the Bayer raw images up until the demosaicking step in their pipeline [2]. The increased bit depth in a raw image allows for a larger dynamic range compared to JPEG images, but processing such large sets of data on a camera can be challenging.

We start with a burst of 2-8 raw images. The optimal exposure is determined through the camera ISP. In order to avoid oversaturation, the images are underexposed. The reference frame is selected as the sharpest image using a lucky imaging method.

Aligning

Most alignment steps described in the literature are aimed at creating high-quality, accurate alignments but require long computation times. To align a series of images rapidly on a phone processor, the images are downsampled multiple times. The Bayer raw image is first downsampled by averaging 2x2 blocks to create a 3 Megapixel grayscale image. This image is further downsampled via a Gaussian pyramid up to 4 levels. The alignment is performed on the coarsest set of images using a tile-based method, with the position of the aligned tiles informing the next level. Each step involves selecting a search radius (up to 4 pixels), tile size (up to 16 x 16 pixels), and alignment algorithm (L1 or L2-based).

Merging

The alignment is performed on all burst images relative to the selected reference frame. Next, the images are merged onto the reference frame to reduce noise. In order to reduce computational power, each tile in the reference frame is merged with only one tile from the alternate frames. Merging multiple regions per frame may improve image quality, but to a non-noticeable amount. The merge is performed using a set of temporal filters, with the spatial denoising step applied later.

Finishing

Now, the single Bayer raw image goes through a series of correction, demosaicking, and tone-mapping processes to create a consumer-friendly image. The steps are fairly standard across different systems, e.g. black-level subtraction, white balancing, demosaicking, and denoising. Because the raw frame has a larger dynamic range compared to finished JPEG images, a local tonemapping method is used to preserve local contrast when compressing the dynamic range. Additional touches such as hue-specific color adjustments, dehazing, and chromatic aberration correction are also performed.

Methods

The goal of this project was to simulate part of the HDR+ imaging pipeline. The alignment and merge steps were the primary areas of interest and innovation. Due to the complexity of the algorithms used and time-limitations, only a small portion of the alignment algorithm was attempted here.

Comparing Across Methods

We also were interested in comparing the final outcome of Google’s HDR+ process with conventional bracketing and other techniques available in the literature. In order to do so, we collected a series of images using a Google Pixel 2 camera. The Open Camera app (available free from the Google Play store) was used to capture the following sets of images:

10X burst images, raw + JPEG format
HDR image using 3 level exposure bracketing

The native Google camera app was also used to capture the same image in HDR+ and non-HDR+ mode.

To compare the final HDR+ photo and alignment/merge algorithm with other methods, we decided to look into JPEG burst fusion methods instead. In these methods, the burst images undergo the finishing step and are converted to JPEG format (in the camera ISP) before the alignment and merging steps. One academic example is a MATLAB program developed by researchers at the Tampere University of Technology [3]. The program, "CV-BM3D" denoises RGB videos and 4-D matrices (a.k.a. a burst of images) using a 3D collaborative filtering technique [4].

L1 Alignment

We use the following equation to calculate the L1 distances between the reference image tile and tile to be aligned:

Where $T$ is the tile of the reference image, $I$ is the larger search area of the alternate image, $p$ is 1, $n$ is the size of the tile, and $(u_{0},v_{0})$ is the initial alignment of the tile from the previous coarser alignment. For the initial coarsest alignment, we set $(u_{0},v_{0})=(0,0)$ . The set of alignment coordinates $(u,v)$ that minimize the L1 distance, $D_{p}(u,v)$ , is used for the next level of alignments.

For simplicity, we set our search window to be +/- 4 pixels in the x,y direction and use a tile size of 16 x 16 pixels for all calculations.

A Gaussian pyramid-type approach was used to perform the alignment step. The following downsampled images were created:

Downsample raw Bayer to grayscale (2x2 average)
Downsample by factor of 4 (Gaussian pyramid reduction)
Downsample by factor of 4 (Gaussian pyramid reduction)

During the course of the project, due to difficulty in working with the file sizes on MATLAB, we chose to use a smaller set of images provided generously by Prof. Wandell to test our algorithm instead.

Results & Discussion

The MATLAB code and image files are attached below. We were able to write a MATLAB function that given a reference image and image to align, calculates the offset that minimizes the L1 norm in each tile using Equation 1. The following images show the downsampling process.

The calculation of tile alignment values (u, v) was successful for the first three images, separately. Unfortunately, we ran out of time to modify the function to allow for alignment of the next upscaled set of images using the input from the previous set. There are many improvements that could be made to this work. The first would be to make it faster to import and align multiple sets of images to one reference image, and then to use the coarse alignment as input to the next level. Better working knowledge of MATLAB could get us 80% there.

The attempts to compare different JPEG burst fusion methods were also semi-successful. The CV-BM3D program was unable to be run due to incompatibilities in MATLAB versions and a protected file that required updating on the authors' side (suggesting we should perhaps look to newer methods such as the group's newer V-BM4D program).

Conclusions

We were able to implement part of Google's HDR+ alignment algorithm on a small set of images. This was much more involved than originally anticipated, and most of the time spent on using my own images to compare JPEG to Fusion methods was unsuccessful. While creating a full program on MATLAB to align and merge multiple burst images using their algorithms may be possible, there may be better languages to write it in to reduce computational power. A future line of work could be integrating pre-existing attempts using C++/Halide [5] to simulate the pipeline, while using the ISET toolbox to adjust for changes in scene illumination, sensor noise, etc. to study the sensitivity effects of certain parameters on the HDR+ method.

References

[1] "Burst photography for high dynamic range and low-light imaging on mobile cameras" S. W. Hasinoff, D. Sharlet, R. Geiss, A. Adams, J. T. Barron, F. Kainz, J. Chen, and M. Levoy. ACM Transactions on Graphics (Proc. SIGGRAPH Asia 2016), 35(6), 12 pp. http://www.hdrplusdata.org/hdrplus.pdf

[2] Android Camera2 API. GOOGLE INC., 2016. http://developer.android.com/reference/android/hardware/camera2/package-summary.html

[3] "Image and video denoising by sparse 3D transform-domain collaborative filtering" Tampere University of Technology, 2014. http://www.cs.tut.fi/~foi/GCF-BM3D/

[4] “Video denoising by sparse 3D transform-domain collaborative filtering” K. Dabov, A. Foi, and K. Egiazarian. Proc. 15th European Signal Processing Conference, EUSIPCO 2007, Poznan, Poland, September 2007.

[5] "HDR+ Implementation" Timothy Brooks, 2016. https://github.com/timothybrooks/hdr-plus/

Appendix

MATLAB code to align the images can be found here [1].

Images captured on Pixel 2 found here [2].

Thanks to Prof. Wandell for the simulated images, and also Prof. Farrell and Trisha for an enjoyable quarter.

Analysis of Google's HDR+ Burst Image Processing Pipeline

Contents

Introduction: HDR capture on cell phones