Google HDR pipeline Implementation in Matlab

Introduction

Smartphone cameras have evolved and improved a lot over the past decade. Nowadays, a lot of photography is performed using smartphone. However, small apertures and small sensor pixels of smartphone cameras normally lead to noisy images in low light and limited dynamic range. To fix these issues, Google proposes a complete computational photography pipeline which captures, aligns and merges a burst of frames to reduce noise in low-light setting and also increase dynamic range.

Background

In this section, we will introduce Google's HDR pipeline in the paper [1] in detail.

As shown in the figure above, the pipeline has several stages: capturing, aligning, merging, demosaicking, local tone mapping and finishing.

Capturing

A common approach to HDR imaging is to use bracketed exposure, which captures the same scene multiple times using different exposure length and merge them into a single image to make both bright region and dark region well-exposed. Although this method works well for stationary scenes, it does not work well for moving scenes. To address this issue, in google HDR pipeline a series of frames of constant exposure are captured, and the frames are intentionally made to be under-exposure so that fewer pixels saturate.

Aligning

Aligning refers to the process of estimating offset of a tile across different frames. The Google HDR pipeline does the aligning in a hierarchical manner, from coarse to fine. The alignment algorithm is applied to a grayscale image (obtained by averaging the color channels) instead of the Bayer raw frame. At coarse scale, the search radius is large, so computation is done in Fourier domain for higher speed.

Merging

After obtaining the offset for every tile in the image, a merging phase is needed to obtain one high-quality image that contains the information from all frames and has less artifact. The merging is done in Fourier domain. To merge two images, the Google pipeline controls the contribution of alternative frames according to the quality of alignment, so that merging is robust to alignment failures.

Demosaicking

After creating a single Bayer raw image with higher SNR using aliging and merging techniques, demosaicking step converts the image from a Bayer raw image to a full-resolution linear RGB image. The commonest method for demosaicking is binear interpolation. In Google's HDR pipeline, a more complex technique which includes edge directed interpolation with weighted averaging, constant-hue based interpolation is used.

Local Tone Mapping

For high dynamic range scenes, local tone mapping is needed to reduce the contrast between highlights and shadows while preserving local contrast. Google HDR Pipeline uses a variant of the exposure fusion[2], where one original image and one brightened image (that mimics a long-exposure shot) are merged to achieve a satisfied output.

Finishing

The finishing step aims at producing a high-quality aesthetically pleasing image. In the paper, several techiniques are introduced, including color correction, dehazing, global tone adjustment, chromatic aberration correction, sharpening and hue-specific color adjustments.

Methods

Preprocessing

The burst images we acquire from the Google HDR dataset are Bayer raw frames aligned in grbg order. Pixels are transferred from sensor value to double within the range of 0 to 1. For the aligning part we downsample the Bayer raw frame by averaging the 2x2 grbg tile and acquire a gray-scale image. We then merge the Bayer raw frame and finish the rest of the pipeline.

For the ISET3D dataset, we first convert the RGB image back to Bayer raw frame (2x size). The rest of the pipeline is the same as above.

Aligning

To do aligning in a hierarchical fashion from coarse to fine, we first construct a Gaussian pyramid for each frame. We start from the original grayscale image (average of color channels) and do a Gaussian blur. Then we downsample this blurred image by a factor of 2, and this is our second level in the Gaussian pyramid. Then we do a Gaussian blur again and downsample to produce the third level. We repeat this process until we reach the desired depth, as indicated in the figure below.

Then we start the aligning from the bottom level. For each tile in the reference image, we look at all the other alternative frames one by one, and in each frame, we search for a tile with best match within a search radius. By best match, we mean the sum of absolute difference throughout the tile is minimized. Once we find this initial estimate for the offset for each tile, we go one level up (to the blurred images with twice the size of previous layer), and use the previous level's estimate as a starting point. Similar to the previous level, we search for a best match within a search radius for each tile. This refines our estimate for the offset. We repeat this process until we are back at the original size. At this point, we have obtained the (relatively accurate) offset amounts for each tile. In the actual implementation, we use a tile size of 16*16 pixels, a stride of 8 pixels and a search radius of 8 pixels at each level of the Gaussian pyramid.

Merging

Alignment can fail for a variety of reasons, including motion, occlusion, and change in lighting. To make the merging process robust to alignment failures, we need to control the contribution of each alternative tile. Specifically, if an alternative tile has high alignment quality, this tile should have high contribution towards the merging result; if the tile has low alignment quality, this tile should not contribute very much. Following the paper, we use the following equations for controlling the contribution:

In the first equation above, $D_{z}(w)$ is the difference between the Fourier spectrum of the reference tile and the Fourier spectrum of the incoming tile (from frame $z$ ), which are denoted as $T_{0}(w)$ , $T_{z}(w)$ respectively. Then, this $A_{z}$ is used to control the contribution of the two spectrum. $\sigma ^{2}$ in the first equation is the noise of the image, which we estimate by the variance of the reference tile. Together, these equations mean that if the difference of the Fourier spectrum is small relative to the noise, we let the incoming tile contribute more. If the difference is large relative to the noise, then we do not let this incoming tile contribute much. Then we take in inverse Fourier transform and put it in the merging result.

The key here is to do this weighing in Fourier domain. The reason is that if alignment is poor at one frequency, then we turn down the contribution of the incoming tile at that frequency, but this does not affect other frequencies. This is where the robustness comes from.

To merge Bayer raw frames, we do the above process for each color channel separately, and combine these results for different channels to a Bayer raw frame.

Correct Sensor Defects

We perform the sensor defect detection and correction according to the work by Noha et al[3]. We detect the hot and cold pixel failures under these 2 conditions: (1) the intensity of the pixel is significantly different from the neighbor’s average; (2) the local brightness difference is significantly different from its neighbors’. We then correct the sensor defects by simpler replacing the pixel with the median of its neighbouring pixels. We find this method robust enough to correct all the sensor defects in the images added deliberately by us.

Demosaicking

For the demosaicking part, we follow the work by Malvar et al [4]. Instead of using the information of same color of nearest neighbor, we keep the information of different color as well. More specifically, if we want to interpolate G values at some R location, we use the formula

Matlab has built-in demosaicking function for this algorithm so we directly called the function.

Exposure Fusion

We follow the work by Tom et al [2]. From our output after merging, we manually create a brightened image to mimic a long-exposure shot. We then construct the weight map by calculating the contrast, saturation and well-exposedness of each pixel on two different images. The weight in the weight map indicates how much the pixel in the fused image should take from each of original image. We then merge the two images by hierarchical fusion using the Laplacian pyramid of the original image, and the Gaussian pyramid of the weight.

Finishing

We tried several techniques mentioned in Google original paper and found that two of them were helpful to our dataset. We first adjust saturation by multiplying a constant factor (1.6 in our case) to S value in HSV space. Then we do sharpening using unsharp masking which subtracts a blurred (unsharp) version of the image from itself. By using these two techniques we achieve good visual effect compared to image after merging.

Results

We generate the images after each stage of the pipeline to verify our implementation. To evaluate the effectiveness of the alignment and merging part, we take one part of the scene that has moved across different frames. From the figures, we can see that the taxi near the center of the image has moved 60 pixels horizontally and 10 pixels vertically. While the merging result kept the position of the car in the first frame, the car's body became smoother, which means the alignment was effective for this amount of shift, and merging succeeded in combining information.

Two raw images:

Merging results:

For the final output, visually it's much better than the output from the previous stages of the pipeline. The sky maintains its color, while the building is visible and there is no obvious ghosting or shadow in the output.

Final output (left) and Google's reference output (right):

We also run the pipeline on the ISET3D scenes (which contain staionary background with car moving away from the camera) generated by Zhenyi. We convert the RGB images back to Bayer raw frames, which results in 2x size. We then generate the output, which looks much more visually satisfying than all of the burst frames. The city background looks much brighter and clearer, while the sky maintains its clarity. There is no ghost or shadow around the car, which means the merge algorithm is robust.

One of original burst frame:

Final output:

Conclusions

Overall, we are satisfied with what we achieve through this project. However, compared with the Google HDR pipeline's output, our output is still inferior in the following aspects: (1) the aligning and merging are not as robust as the sample implementation, (2) the finishing step is more visually satisfying than our implementation. We notice that the dehazing effect is much better in the sample image, and the color contrast seems better. Since we only use 3 input images for aligning while Google uses ~10, the merging might perform better given the same number of burst frames.

Although there is still a gap between our implementation and the Google's, we learned a lot of image processing algorithms from this project. I believe for future work, we can research more into the details of the pipeline's algorithms, and generate a better image which is comparable to the sample output.

Reference

[1] "Burst photography for high dynamic range and low-light imaging on mobile cameras" S. W. Hasinoff, D. Sharlet, R. Geiss, A. Adams, J. T. Barron, F. Kainz, J. Chen, and M. Levoy. ACM Transactions on Graphics (Proc. SIGGRAPH Asia 2016), 35(6), 12 pp. http://www.hdrplusdata.org/hdrplus.pdf

[2] Tom Mertens, Jan Kautz, Frank Van Reeth. Exposure Fusion. Pacific Graphics 2007.

[3] Noha El-Yamany; Intel Corporation; Tampere, Finland Abstract. Robust Defect Pixel Detection and Correction for Bayer Imaging Systems. DOI: 10.2352/ISSN.2470-1173.2017.15.DPMI-088

[4] Malvar, Henrique S., Li-wei He, and Ross Cutler. "High-quality linear interpolation for demosaicing of Bayer-patterned color images." Acoustics, Speech, and Signal Processing, 2004. Proceedings.(ICASSP'04). IEEE International Conference on. Vol. 3. IEEE, 2004. https://www.computer.org/csdl/proceedings/icassp/2004/8484/03/01326587.pdf

Appendix

Code and Data

The code and data for our whole pipeline can be found here. Please read README to play around.

The following zip only contains the code part of the project. Please refer to the google drive if you want to download the code with the scenes and reference output.

Only source code:Google_HDR_MATLAB_code.zip

[Google drive: https://drive.google.com/open?id=1tUQoqnKr7wSZy3F7hHnUr3pIkL2czvvc]

Work breakdown

Zhihan Jiang: HDR data pre-processing, sensor defect correction and exposure fusion.

Yicheng Li: Aligning and Merging.

Wensi Yin: Demosaicking and finishing.

Google HDR pipeline Implementation in Matlab

Contents

Introduction

Background

Capturing

Aligning

Merging

Demosaicking

Local Tone Mapping

Finishing

Methods

Preprocessing

Aligning

Merging

Correct Sensor Defects

Demosaicking

Exposure Fusion

Finishing

Results

Conclusions

Reference

Appendix

Code and Data

Work breakdown

Navigation menu

Google HDR pipeline Implementation in Matlab

Introduction

Background

Capturing

Aligning

Merging

Demosaicking

Local Tone Mapping

Finishing

Methods

Preprocessing

Aligning

Merging

Correct Sensor Defects

Demosaicking

Exposure Fusion

Finishing

Results

Conclusions

Reference

Appendix

Code and Data

Work breakdown

Navigation menu

Search