Google HDR+ Image Processing Pipeline

From Psych 221 Image Systems Engineering
Jump to navigation Jump to search

Introduction and Background

Figure 1: Google's HDR+ Image Pipeline [1]

HDR Overview High dynamic range, or HDR is a image processing technique which allows photographers to create vibrant images with color contrasts which are more representative of what is seen by the human eye. Traditional HDR methods utilize a technique called exposure bracketing where images are taken at different exposures (one under exposed, one at normal exposure, and the last over exposed) and the best parts of each are used to produced the final image. Typically, under exposed images will capture the bright areas of a scene the best, while over exposed images will capture the darker parts of the scene the best. Specifically, parts of each exposure are chosen such that features of the bright and dimmer areas of the photo are visible and clear.

HDR+ Pipeline While the exposure bracketing works well in capturing HDR images on professional photography, it is not suitable for mobile photography, where computation resource is limited and the post processing pipeline is highly automated. To address the issue, Google AI presented a new pipeline, HDR+, for Google’s Pixel phone using constant exposure and new alignment and merging algorithms. In HDR+, the frames are captured with low enough exposure to avoid blowing out the highlights; the pipeline begins at Bayer raw frames rather than the demosaicked RGB frames[1].

The HDR+ pipeline can be broken down into the following steps:

1. Alignment: The alignment is done with a pairwise 4-level image pyramid. That is, each alternate frame in the burst is individually merged with the selected reference frame.

2. Merge: Merge is done with a temporal Weiner filter, then spatially de-noised with a Gaussian filter. The temporal filter is designed in the way that is robust against alignment failure.

3. White Balance, Demosaic, Chroma Denoise: The classic post processing pipeline on a bayer raw image. This include black level subtraction, lens shading correction. White balancing, demosaicking and chroma denoising

4. Local Tone Mapping (Exposure Fusion): This is the key step for creating a HDR image for display, without actually generating the full HDR image. From the demosaiced image, the pipeline generate two synthetic exposures, one long and one short, by applying gain and gamma correction to it. The two images is then merged with the exposure blending algorithm described in [2]. The idea of exposure fusion is to blend multi-exposure image sequences together by keeping the “best” parts in the images. The fusion is done per-pixel wise and the weight map is determined by the contrast, saturation and well-exposedness of the local area.

5. Finishing: The final step fine tone the image to look better. The steps include dehazing, global tone adjustment, chromatic aberration correction, sharpening. Hue-specific color adjustments and dithering for display.

Methods and Implementation

As exposure fusion is now a common technique to generate HDR images for low-resolution displays, we will focus on the robustness of the alignment algorithm proposed in the HDR+ paper. For our implementation, we have implemented the alignment algorithm and a simplified version of the merge algorithm as well as part of the post processing steps in order to view the generated images. We did this all in MATLAB, and our GitHub is referenced in 'Appendix I'.

Figure 2: Our Pipeline Implementation

Alignment

L1 and L2 Brute Force

The alignment portion compares the reference image with an alternate image and consists of a coarse-to-fine 4-level image pyramid:

Figure 3: Four Level Image Pyramid
Level 4: L2 Alignment → 16x16 tile, +/- 1 pixel search radius

Level 3: L2 Alignment → 16x16 tile, +/- 4 pixel search radius

Level 3: L2 Alignment → 16x16 tile, +/- 4 pixel search radius

Level 1: L1 Alignment → 8x8 tile, +/- 4 pixel search radius



At each level, the image is downsampled, where level 4 correlates to the coarsest adjustments (with one pixel adjustment corresponding to a 128 pixel adjustment in the level 1 image) and level 1 is the finest adjustment at ½ resolution. The displacement between the alternate image and reference image is measured by the following formula:

and either the L1 or L2 norm is used for displacement depending on the level.

Fast L2

The brute force L2 methodology while simple and straightforward is highly inefficient because the double sum needs to be recalculated for each shift position u and v in the given search area.

The Google research team devised a more efficient way of calculating this distance by expanding the original brute force equation.

This can be further simplified to:

Here, T and I represent the matricies of the reference and alternate image respectively. Since the first term is independent of the variables u and v, this only needs to be computer once. The second term can be computed efficiently using box filtering. The third term is a cross correlation term that measures how similar the reference image is to the shifted alternate image that particular shift position. This can be computed efficiently using Fast Fourier Transforms.

The output D2 is a matrix representing the L2 distance between the reference image and alternate image at different shift amounts.

We ran some tests using identical 3 x 3 dummy input matrices for both T and I implementing this algorithm in matlab as a sanity check of our implementation expecting to see that the shift position that would yield the shortest image distance would be u=0, and v=0. However, we realized that we weren’t getting the expected shift amount. We were able to narrow down the error to the computation of the third term and believe that there might be some issues with how the FFT or inverse FFT is being computed. We decided not to pursue the Fast L2 implementation because we already had a working brute force L2 and wanted to focus on other parts of the pipeline.

Merge

In the HDR+ paper, merge is done by applying a pairwise Wiener filter in the spatial frequency domain. This merge is shown to be robust against alignment failure and results in a significant noise reduction. Due to the scope of the project, we decided to simplify the merge into two simpler methods: temporal average and weighted average.

Temporal Average To observe the alignment results, the simplest way is to average across all frames. This can show if the image contains any alignment failures as shown below in 'Testing Temporal and Weighted Average'.

Weighted Average To safeguard the merged image against alignment failure, we tile up the image into 8x8 tile. Inspired by the algorithm proposed by this blog post[3]. The weight for each tile in the alternate frame is the reciprocal of the L1 distance between the two tile in reference and alternate frames. A misaligned tile will thus received a relatively low weight when merging with the reference frame.

Testing Temporal and Weighted Average

We first took a 4-6 second burst image using an iPhone to test our merge algorithm alone.

Below is the burst image (left), after temporal average (middle), and after weighted average(right):

As seen above, using the temporal average alone will reveal alignment failures, which can be seen in the red colors in the model's arm. However, after using the weighted average algorithm, these color artifacts are gone; thus, this made us confident in our weighted average algorithm and we decided to use this on the raw Bayer images provided in Google's HDR+ dataset.

Post-Processing

Since the HDR+ dataset provides only the raw Bayer images, we will need to manually post-process the image for viewing after merging. Here, we have selected a few to test with our raw Bayer image:

Black Level Subtraction Black level for each color filter is obtained from the Tiff tag BlackLevel

White Balancing White balancing is done by dividing each bayer color channel by the matrix found in the Tiff ShotAsNatural tag

Demosaic Demosaicked with ‘bggr’ color filter (Although the Google research paper mentioned ‘rggb’ for demosaicing, our algorithm used the ‘bggr’ in Matlab instead. A possible reason for this is we may have misinterpreted Google’s alignment algorithm, so our raw Bayer images may have shifted differently than how Google had implemented their algorithm.)

Brightness This was not part of the HDR pipeline. However, we brightened our final image (using ‘imadjust’ in MATLAB, with gamma = 0.5) to have a more visually appealing result since we did not implement the exposure fusion portion of the pipeline.

Results

Using a Raw Bayer Image

In the images below, we used the raw Bayer image from Google's HDR+ dataset. After going through the pipeline described in Methods and Implementation, here are our results (left) versus Google's (right):



What Our Algorithm Did Well:

1. Reconstructed the image well, where our result is recognizable as the same image as Google’s

2. Colors in reconstructed image match the original scene (reference image), although the color vibrancy does not

3. Final image appears aligned in the same manner as original scene


What Our Algorithm Can Improve On:

Color vibrancy: The colors in our image does not look as vivid as the result done by Google -- the colors seem a little faded (possibly image was brightened too much). In addition, the image overall appears a little dark compared to Google's result. This may be because we did not implement the HDR post-processing part of the pipeline with exposure fusion and the correct one mapping steps. If we had implemented this, our image could possibly have more vibrant colors as in the Google example image.

Using a Test Grayscale Image

Since ghosting and image movement is often the biggest issue with producing HDR images from burst photography, we wanted to explore specifically the alignment and merge in the HDR pipeline and see what it did well or not well, using a grayscale 1024 x 1024 pixel test image. The following images use our temporal average merge algorithm since we are only using grayscale images.

Below is the reference image:

Shifts and Rotations

Below is the original image with the '1' rotated by clockwise by 15 degrees and shifted down 300px and left 100 px.

The images are displayed in the following order: original image(left), after alignment(middle), after temporal average merge(right)

It can be seen that our algorithm seems to do well with translational shifts and rotations. In the final image (after temporal average merge), there is some noise at the bottom of the '1' but it is quite negligible. Therefore in the following sections, we will look at where the limitations of our algorithm lies for shifts and rotations.

Testing Limits of Translations
Figure 4: Image after Different Type of Shifts
Testing Limits of Rotations
Figure 5: Rotations with 90 and 135 degrees

Noise

Figure 6: Adding White Gaussian Noise to Entire Image


Looking at all the images, our algorithm does not seem that affected by translational shifts. However, it does seem to affected when there are rotations. Looking at the reconstructed image for 90 degree rotation, the '1' has a white streak in the middle and the reconstructed image for the 135 degree rotation has missing pixels at the bottom of '1'.

In addition, the alignment algorithm cannot compensate pixel-wise noise well. When reconstructed, there appears to be even more noise in the image. This makes sense since our algorithm can only align and merge. When the algorithm is aligning, it cannot create a new black pixel where black pixels are missing in the alternate image. Therefore, whenever our algorithm hits a white pixel, it makes sense that our algorithm may get confused at where it is in the image and choose the wrong pixel during alignment. In an image processing pipeline, random noise like this is typically removed by applying a de-noising filter, which is included in the merge step of the HDR+ pipeline.

Conclusions

The series of tests with alignment algorithm indicate that the proposed alignment algorithm is robust against most translational movement and a limited degree of rotational movement in the burst image. However, it does not handle random white noises well. Although the merge algorithm is robust against alignment failure, we are worried that this issue may manifest itself if the steps were to be deployed in low-light condition where the SNR is significantly lower, reducing the noise reduction benefit from the burst photography technique.

In this project, we investigated the robustness of alignment with only part of the HDR+ pipeline. Most of the testing is done on a post processed jpg or png images. Although we can get a preliminary conclusion of the performance of the alignment algorithm, we were not able to investigate the alignment robustness with regard to the HDR+ pipeline. In particular, the goal of alignment is to reduce the noise in the merge and the HDR compression stage of the pipeline. Without the proper merge and exposure fusion step, it is hard to conclude the affect of alignment failure to the resulting HDR image. Moving forward, we would like to test the alignment with the complete pipeline. To achieve this, we would like to work on completing the post processing pipeline including the exposure fusion step as well as the merge algorithm.

References

1. Burst photography for high dynamic range and low-light imaging on mobile cameras http://static.googleusercontent.com/media/www.hdrplusdata.org/en//hdrplus.pdf

2. Exposure fusion Mertensm T. Kautz, J, And Reeth F.V. (2007)

3. HDR+ pipeline (http://timothybrooks.com/tech/hdr-plus/)


Appendix I

Our git implementation https://github.com/jenniferlin0902/HDRMerge

Appendix II

Jennifer Lin was in charge of the implementation of the code and analysis of different merge algorithms. Warren worked on the implementation and analysis of Fast L2. Linda Banh worked on the test images and analyzing the post-processing part of the pipeline. We worked together on the write-up.