TienDaiIFHDR: Difference between revisions

From Psych 221 Image Systems Engineering
Jump to navigation Jump to search
imported>Psych2012
imported>Psych2012
Line 102: Line 102:


[[File:gaussian_pyramid.png|thumb|center|700px|Fig.9. A sample Gaussian Pyramid.]]
[[File:gaussian_pyramid.png|thumb|center|700px|Fig.9. A sample Gaussian Pyramid.]]
 
[[File:laplacian_pyramid.png|thumb|center|700px|Fig.9. A sample Laplacian Pyramid.]]





Revision as of 00:00, 21 March 2012

Project Title

A Versatile Image Fusion Method

Introduction

High Dynamic Range and Exposure Fusion

Dynamic range of a scene is defined as the ratio of the highest to the lowest luminance. The real world scenes often have a very wide range of luminance, sometimes exceeding 10 orders of magnitude. Fig. 1 shows a HDR scene with a dynamic range of about 167, 470:1. To reproduce these scenes presents a challenge for conventional digital capture and display devices, which suffer a limited dynamic range of only 2 orders of magnitude.

The most common solution to address this problem is to take a sequence of low dynamic range ( LDR ) images of the same scene under different exposure intervals to capture all the radiance information and then render the captured stack to display. There are generally two pipelines. One way is to firstly estimate the camera response function from the image sequence to recover the true radiance of the original scene ( recorded as a 32 bit float radiance map ) [1, 2], and then tone map the created radiance map for display on LDR reproduction media ( usually 8 bit per channel ) [3, 4, 5]. Although this way gives very satisfying result, it's computationally expensive and time consuming. The other way is to fuse the captured images directly without the intermediate step of creating radiance map [6, 7], which is usually referred as "Exposure Fusion ( EF )" [7]. EF produces HDR-like images, which are comparable to those tone-mapped results, at a much lower computational cost. Due to its effectiveness and computational efficiency, EF is adopted by most of HDR applications on mobile platform, which has limited computational power [8, 9].

Fig.1. Multi-exposed image stack of a high dynamic range scene.

All-in-focus Imaging

In fact, EF essentially solves the problem of merging multiple images, and consequently could be easily extended to deal with other imaging and photography challenges except for HDR. The most direct application is to fuse multiple focus image stack ( Fig. 2 ) to produce an all-in-focus image [9].

The size of a camera's aperture provides a trade-off between the depth of field ( DoF ) and the amount of light that is captured by an image with a given exposure time. For an image to be sharp across a large range of depths in the scene, a small aperture is required. However, decreasing the aperture size is not always feasible. On the one hand, most low-end cameras, like cellphone cameras, have a fixed aperture size. On the other hand, small apertures require slower shutter speeds, which can result in image blur due to handshake and motion of objects in the scene. EF successfully address this problem to render all pixels in focus. It's also worthy to mention that EF could also combine flash/ no flash image pair taken under low light condition to fight with the artifacts caused by flash light [7].

Fig.2. Multi-focus image stack of a large DoF scene.

Project Content

In this project, we would study EF from following aspects:

1) Analyze and implement the algorithm to create HDR image.

2) Extend the algorithm to all-in-focus imaging.

3) Propose ways to accelerate the algorithm.

Methods

EF computes the desired image by keeping only the "best" parts in the multi-exposure image stack. The final image is obtained by collapsing the stack using weighted blending, guided by simple quality measures, namely contrast, saturation and well-exposuredness. The process is done in a multi-resolution fashion in order to avoid undesirable artifacts. It is assumed that the images are perfectly aligned, possibly using a registration algorithm [10]. We would firstly go through the original algorithm of exposure fusion and then describe how to extend it to create all-in-focus image.

Weighting Map

In the multiple exposure image stack, over-exposed and under-exposed regions are flat and colorless, which should receive less weight during fusion. While areas under good exposure contain bright colors and details and they should be preserved with more weighting. The algorithm uses the following measures to decide the weighting for each of the pixels in the image stack.

Contrast ( C )

Under- and over-exposed regions are relatively more "flat" or "uniform" without much fluctuation of intensity, or less contrast. Besides, texture and edges are visually important elements. As a result, pixels of high contrast should be assigned large weighting. The algorithm applies a Laplacian filter to the grayscale version of each image following [11], and take the absolute value of the filter response as a simple indicator C for contrast. Fig.3 shows the contrast maps calculated from the image stack in Fig. 1.

Fig.3. Contrast maps calculated from image stack in Fig. 1.

Saturation ( S )

Saturated colors are desirable and make the image look vivid. As a pixel undergoes a longer exposure, the resulting colors become desaturated and eventually clipped. This algorithm also includes a saturation measure S, which is computed as the standard deviation within the R, G and B channel, at each pixel. Fig. 4 shows the saturation map calculated from the image stack in Fig. 1.

Fig.4. Saturation maps calculated from image stack in Fig. 1.

Well-exposuredness ( E )

According to the camera response curve ( like a sample in Fig. 5 ), the over-exposed pixels are clamped to 1 ( or 255 ) and under-exposed pixels are mapped to 0. So the the gray level of the pixel reveals how well it is exposed. To be specific, pixel intensities around 0.5 are well-exposed and should be more trusted, while those near 0 ( under-exposed ) or 1 ( over-exposed ) are worse exposed and should have less weighing. The algorithm weights each intensity g based on how close it is to 0.5 using a Gauss curve:

The algorithm applies the Gauss curve to each RGB channel separately and multiply the results, yielding the measure E.

Fig.5. A sample camera response curve.
Fig.6. Well_exposuredness maps calculated from image stack in Fig. 1.

Map Combination

For each pixel, the algorithm combines the information from the different measures into a scalar weighting map using multiplication and controls the influence of eachmeasure using a power function:

where refer to the final weighting map, contrast map, saturation map and well-exposuredness map respectively. are the corresponding weighting parameters to control how much each measure contributes to the final weighting map. If any parameter is equal to 0, the corresponding measure is 1 in the multiplication and thus will not be taken into account. indicates the image in the image stack while the coordinate of the pixel. This weighting map will be firstly normalized across multiple images at each pixel before being used to guide fusion process. Fig. 7 shows the final weighting maps for each of the image of the image stack shown in Fig. 1.

Fig.7. Weighting maps for images shown in Fig. 1.

Naive Fusion

Obtained weighting maps for the image stack, the next step is to fuse the multiple images. The most intuitive way is to compute a weighted average across the images at each pixel . The averaging result of each pixel could be calculated as:

where is the image in the input image sequence. This formula is computed on RGB color channel respectively. This process could also be visually demonstrated by Fig. 8. Fig. 9 shows the resultant image from naive fusion. It's easy to see that the transition between pixels are not smooth which makes the final result not appealing. This is because naive averaging the image set could not guarantee seamlessness of blending, especially where weights vary quickly. For instance, considering two neighboring pixels, whose weights differ dramatically such that the first pixel is exactly the corresponding pixel in the first image and the second pixel the one from the second image, in this case there is no average and the final result of the two pixels looks not natural. As the algorithm is directly working on the intensities of the images, the seam artifacts are easily observed, which become even more obvious in flat regions with less textures ( or transitions ).

Fig.8. Visual demonstration for naive fusion.
Fig.9. Resultant image from naive fusion.

Pyramid Fusion

To solve the seam problem, the algorithm uses a multi-resolution fusion technique. Specifically, the algorithm transforms the image into pyramid representation [12] and conducts fusion on each level, and then reconstructs the final image from the fused pyramid.

Gaussian Pyramid and Laplacian Pyramid

The Laplacian Pyramid was introduced by Burt and Adelson in the context of compression of images [12]. The name Laplacian Pyramid is a misnomer. The value at each node in the pyramid represents the difference between two Gaussian-like or related functions convolved with the original image. The difference between these two functions is similar to the "Laplacian" operators commonly used in image enhancement. As a result, it is referred to as Laplacian pyramid. It has the advantage that the image is only expanded to 4/3 of the original size and that the same (small) filter kernel can be used for all pyramid levels.

There are three major operations to construct the Gaussian and Laplacian pyramid:

(1) Convolve input signal with a smoothing kernel, then down-samples the result at every other value. Blurring creates smoother version of original, containing fewer high-frequency components and thus makes it possible to represent blurred data with fewer samples than in original.

(2) Interpolate the blurred and down-sampled image to estimate the original image.

(3) Subtract the estimated image from the original image to get the difference.

Applying the first operation to the input image and the resultant low-pass image for multiple times, it creates a stack of successively smaller images, with each pixel containing a local average that corresponds to a pixel neighborhood on a lower level of the pyramid. This image stack is called Gaussian Pyramid as shown in Fig. 9. On each level of the Pyramid, conduct the second and the third operations to get the difference, which also creates a stack of difference image. This difference image stack is a Laplacian Pyramid as shown in Fig. 10.

Fig.9. A sample Gaussian Pyramid.
Fig.9. A sample Laplacian Pyramid.


Fig.9. Resultant image from pyramid fusion.

Results

test

Conclusions

test

References

[1] P. E. Debevec and J. Malik, “Recovering high dynamic range radiance maps from photographs”, Proc. ACM SIGGRAPH’97, pp. 369 – 378, 1997.

[2] T. Mitsunaga, S. K. Nayar, Radiometric self calibration,“Proceedings of the Computer Vision and Pattern Recognition, vol.1, 1999, pp.374–380.

[3] G. Ward, A contrast-based scalefactor for luminance display, in: Graphics Gems IV, Academic Press, 1994, pp. 415–421.

[4] F. Durand and J. Dorsey, “Fast bilateral filtering for the display of high-dynamic-range images”, ACM Trans. Graph. (special issue SIGGRAPH 2002) 21, 3, 257-266, 2002.

[5] Q. Tian, J. Duan, M. Chen and T. Peng, "Segmentation Based Tone-mapping for High Dynamic Range Images", Advances Concepts for Intelligent Vision Systems, pp.360-371, 2011.

[6] A. Goshtasby. Fusion of multi-exposure images. Image and Vision Computing, 23:611–618, 2005.

[7] Mertens, T. and Kautz, J. and Van Reeth, F. “Exposure fusion”, Computer Graphics and Applications, 2007. PG'07. 15th Pacific Conference on, 382--390, 2007.

[8] Natasha Gelfand, Andrew Adams, Sung Hee Park, and Kari Pulli, “Multiexposure imaging on mobile devices,” in Proc. of the ACM Multimedia, 2010.

[9] Vaquero, D. and Gelfand, N. and Tico, M. and Pulli, K. and Turk, M., “Generalized Autofocus”, Applications of Computer Vision (WACV), 2011 IEEE Workshop on, pp. 511--518, 2011.

[10] G. Ward. Fast, robust image registration for compositing high dynamic range photographcs from hand-held exposures. Journal of Graphics Tools: JGT, 8(2):17–30, 2003.

[11] J. M. Ogden, E. H. Adelson, J. R. Bergen, and P. J. Burt. Pyramid-based computer graphics. RCA Engineer, 30(5), 1985.

[12] P. Burt and T. Adelson. The Laplacian Pyramid as a Compact Image Code. IEEE Transactions on Communication, COM-31:532–540, 1983.

Acknowledgement

We would like to sincerely thank various authors for making their data available on the Internet for experiments. Images used in this project courtesy of corresponding author(s).

Appendix I - Code and Data

test

Appendix II - Work Partition

test