Arthur Alaniz, Tina Mantaring

From Psych 221 Image Systems Engineering
Revision as of 06:41, 19 March 2011 by imported>Cmgmant (→‎Out-of-the-Box Implementation)
Jump to navigation Jump to search

Introduction

As the use of digital cameras becomes more and more widespread, the imaging algorithms that go into these cameras have also become more sophisticated. The typical digital camera uses a processing pipeline to convert raw sensor values to a final image, and the different parts of the pipeline may involve correcting for hardware defects, interpolating color filter array values, removing noise, and performing color space transformations. As more and more focus is being given to this field, the pipeline algorithms are also becoming more and more complex. However, one must ask the question: do these complex algorithms really outperform their simpler, more straightforward counterparts?

Methods

Denoising and Demosaicking

For our project, the primary algorithm we chose for our imaging pipeline was the one presented in the paper “Denoising and Interpolation of Noisy Bayer Data with Adaptive Cross-Color Filters”, by D. Paliy, A. Foi, R. Bilcu, and V. Katkovnik. Their technique performs simultaneous denoising and demosaicking using directional adaptive filters that are based on the concepts of local polynomial approximation (LPA) and intersection of confidence intervals (ICI). Their paper is summarized in this section.

Local Polynomial Approximation

Local polynomial approximation, or LPA, works on the assumption that the data in some local region can be fitted to a polynomial function. The cross-color filters used in the algorithm are linear combinations of LPA smoothing and difference filters that operate on complimentary color channels.

The above figure shows an example of an LPA smoothing filter (left) and an LPA difference filter (right). The combined LPA cross-color filter is shown below, where the differing colors illustrate that the smoothing and difference components work on different color channels.

Letting denote a directional 1D convolution kernel, where is the polynomial order, is the scale or size of the kernel (kernel width), is the order of the derivative, and is the direction, then the interpolation kernels and the denoising kernels can be written as follows:

The paper used 4 possible directions for the interpolation filters and 8 directions from the denoising kernels. Aside from this, the structures for the interpolation and denoising kernels were very similar. Indeed, the primary difference between them was that their component filters operated on different support channels. This is shown in the figure below.

In the figure, (a) illustrates how the green pixel is interpolated at position g(0). The smoothing components of the interpolation filter operate on the green pixels, while the difference component operates on the red pixel. In (b), we see how denoising of the green pixel at g(0) is done. The smoothing components of the denoising filter operate on the green pixels, while the difference component operate on the red pixels. Similarly, when we are denoising a red pixel, as in (c), the smoothing components operate on the red pixels, while the difference component operates on the green pixels. Finally, since the denoising filters have 8 directions, they can operate along diagonals, which is shown in (d). When denoising green pixels along diagonals, we have only green pixels to deal with, and so there is no difference component. However, when we are denoising red or blue pixels, the smoothing component operates on the same channel while the difference component operates on the complimentary color channel.

Since the filters are directional, the origin of the filter is not in the center, but is located to one side. Obtaining the estimate is done by convolving the filter with the input pixels. That is, if is our noisy CFA data, then for each scale , we obtain the interpolation and denoising estimates and at each pixel position as follows:

Intersection of Confidence Intervals

The interpolation and denoising filters and were created for different scales . The goal of ICI was to choose the largest scale for which the local polynomial approximation is still valid. This was done as follows:

First, the standard deviations of the interpolation and denoising estimates were found using some estimate of the standard deviation of the noise. This was done as follows:

In the above formulas, is the estimated standard deviation of the noise. The paper gave two examples of signal-dependent noise models, and their corresponding standard deviation estimates:

  • Poisson noise: , and
  • Nonstationary Gaussian noise with signal-dependent standard deviation: , and

Next, for the sets of estimates and , they obtained sets of confidence intervals as follows:

where denotes the index of the scale that was used. Note that the threshold parameters are design parameters.

Finally, the adaptive scale for interpolation was found by finding the largest such that the intersection of confidence intervals is nonempty. Using this scale , the interpolation estimate would then be . Similarly, the adaptive scale for denoising was found by finding the largest such that the intersection of confidence intervals is nonempty. This results in the denoising estimate of .

Anisotropic Denoising and Interpolation

Combining the adaptive-scale kernels in all directions gives us an idea of the best local space for which the polynomial model fits the data. The figure below shows the combined denoising kernels for the red pixel in the center.


After applying the ICI criterion, they obtained adaptive-scale estimates from each direction . These directional denoised and interpolated estimates were then linearly combined to produce the final pixel value at a location . The actual formulas used to do this can be found in [1].

The block diagram of the joint denoising and demosaicking algorithm is shown below.

We see here that, using the pre-designed filter kernels, the denoising of the CFA values and the interpolation of green pixels at the red and blue locations happens simultaneously. Then, using the denoised Bayer data and the noisy, fully-interpolated green channel, the full-noise free green channel is produced, as well as the interpolated red and blue pixels at the blue and red locations, respectively. Finally, using the noise-free green channel, the red and blue pixels are interpolated at the green locations, producing the final, 3-channel image.

Noise Estimation

In the previous section, an estimate of the standard deviation of the noise was needed to properly perform demosaicking and denoising. Using the noise models that were used in [1] (and mentioned briefly in the previous section) gave pretty poor results. Thus, we wanted to use a more accurate noise model.

We used the noise estimation algorithm presented in the paper “Practical Poissonian-Gaussian noise modeling and fitting for single-image raw-data” by A. Foi, M. Trimeche, V. Katkovnik, and K. Egiazarian. This was the same noise-estimation algorithm that was used in #LPA-ICI-CFAI to obtain their results using cameraphone images.

In the paper by Foi et. al, they assumed that a noisy pixel observation at a location would be of the form

where is the true noise-free value, is zero-mean independent random noise whose standard deviation is 1, and is a function of y that gives the standard deviation of the overall amount of noise.

Furthermore, they assumed that the noise term was composed of a Poisson component , which models the photon noise of the sensor, and a Gaussian component , which models noise that is independent of the signal (eg. thermal noise). That is,

Using properties of the Poisson and Gaussian distributions, and some elementary algebra (see 3 for the complete derivation), they found that the overall standard deviation of the observed noisy signal would have the form

where and are parameters of the sensor.

The rest of the paper describes a method for estimating and using only a single image. First, they preprocess the data by converting it into the wavelet domain. This allows them to segment the data into non-overlapping regions (level sets), where the value for each set should be smooth. Next, for each of the sets, a (mean, standard deviation) pair is computed. This is done by straight computation of the sample mean and sample standard deviation. However, the paper also mentions a more robust alternative to computing for the standard deviation using the median of absolute deviations (MAD) method. Finally, the maximum-likelihood (ML) approach is used to fit a global model to the set of (mean, standard deviation) pairs.

The paper also adjusts the model for the clipped case; that is, when the noise might be causing the sensor values to go below the minimum or maximum values, and would thus be clipped to the minimum or maximum values, respectively. In such a case, clipping would cause the variance of the noise would be lower, something the model needs to account for.

The figures below show some examples of noise models that were estimated for different cameras with different ISO settings. The x-axis shows the intensity values, ranging from 0 to 1, and the y-axis shows the standard deviation. The red dots represent the sample standard deviations, obtained from each of the level sets, and the solid and dashed lines represent the fitted standard deviation functions using the sample standard deviation calculation and the MAD method, respectively. The falling off of the fitted function as the intensity values approach 0 or 1 are attributed to the effect of clipping. Moreover, the shift along the x-axis can be attributed to the image sensors having some "base" charge, even when there is no light incident on the sensor.

File:Fig cameranoise.png

Adaptive Thresholding

In Denoising and Demosaicking, the thresholds and that were used in ICI were left as design parameters. These thresholds need to be chosen very carefully: if they are too large, then too much smoothing occurs. However, if they are too small, then the noise is not properly removed [1].

The paper's results were obtained by fixing these thresholds to and . However, we found that these values produced too much smoothing in the high-luminance images. This was probably because, for high-luminance images, the signal-to-noise ratio is relatively high, and thus less denoising is needed.

To remedy this behavior, we tried adjusting the thresholds based on the mean value of the input CFA image. Higher means would correspond to higher mean luminances, and consequently, lower thresholds. We fit an exponential function to our ideal threshold behavior, and used this to determine the values of and

Post-Filtering

After performing denoising and demosaicking, we noticed that there was still some noise in the final image, especially for lower luminance values. Thus, we wanted to add another filtering step that would hopefully get rid of this noise.

The filter we chose was the bilateral filter, proposed by C. Tomasi and R. Manduchi [5]. The filter performs smoothing on the image, while at the same time preserving edges. It does this by combining a domain filter, that is, a filter that measures the geometric distance between a pixel and its neighbors, with a range filter, which is a filter that measures the photometric distance between a pixel and its neighbors. The final effect causes a pixel to be replaced by an average of the pixels that are near it and similar to it.

Color Correction

The images had great color overall and we weren't sure if we should do color correction, but we did anyway. Originally, we tried various color consistency algorithms such as, scaling by the max-RGB, Gray-World and Gray-Edge. The algorithms estimates scale values for R, G, and B channels that transform the image to fit their correct color model. Max-RGB estimates the light source's color by taking the maximum RGB value in the image. The Gray-World hypothesis assumes that the average reflectance is gray, or has no color. The Gray-Edge hypothesis is similar, but it assumes the average edge difference is gray. Out of the box, they all did not work well, as shown in the six images below (from left to right, top to bottom: original, no color correction, Gray-World, max-RGB, Gray-Edge, Gray-Edge 2nd order). The max-RGB algorithm should only work if we have something white in the image. Gray-World and Gray-Edge fall apart when the image does not match their assumptions, which not all images do. But, I think the main problem overall is that the image colors weren't that bad in the first place and maybe these algorithms aren't meant for fine tuning.

The next thing we tried was to assume different light sources. We copied a set of the ISET luminants and ran the simple pipeline for each them. The differences are in the figures below. If we chose the correct illuminants, the SNR raised and scielab dropped. While most cameras have an option to choose your light setting, we thought that we might be able to use the max-RGB, Gray-World and Gray-Edge estimates to automatically choose between different light sources instead.

One thought that we had was some of the algorithms may just be adjusting too much, but have a basic idea and direction. Our idea then was to transform the image for all light sources and whichever one needs the least color correction is the winner. The method of measuring distance was by comparing the R, G, and B scaling values (wR, wG, wB) to the identity scale value of (1, 1, 1). Both of these were normalized to 1, and an angle was calculated between them. The hope was that the smaller the angle, or least amount of correction, the better the image was. Below are changes in angle vs SCIELAB values with the best fit line in black. Both angle and SCIELAB had means subtracted out and start at 0.

The results weren't great, but one might be able to see a slight connection between growing angle and SCIELAB scores. Studying the results further, we noticed that fluorescents followed a path of their own and removing them made the plot look cleaner. So instead of trying to decide between all illuminents, we tried just the D50, D55, D60, D75 lights. Their results are below and are a little better.

Results

We present the results we obtained using our chosen algorithm, together with all the other modifications that we made. In our discussion, we compare our results to the Simple Pipeline that we were given as reference. This simple pipeline has no denoising step, uses bilinear interpolation to perform demosaicking, and then uses the linear transformation that converts a Macbeth Color Checker in D65 illumination from the input color space to the output color space.

Out-of-the-Box Implementation

This section discusses the results we obtained when we plugged in the MATLAB code that implemented the LPA-ICI color filter array interpolation algorithm in [1].

We found that, in terms of mean S-CIELAB and mean SNR values, the out-of-the-box implementation only outperforms the simple pipeline in images with very low luminance levels. This was probably due to the fact that the noise models they assumed were very naive and inaccurate. In terms of the MCC metrics, the out-of-the-box implementation outperforms the simple pipeline until the mid-range luminance values.

Using Proper Noise Estimation

Our next step was to use a proper noise estimation algorithm. Using the MATLAB code that implemented the Poisson-Gaussian noise estimation algorithm in [], we were able to properly model our noise. The figure below shows the results from the preprocessing step of the algorithm, using image 3 of our given data at a mean luminance level of 2000 cd/sq m. This step removes the edges of the image, and then divides the smooth areas into non-overlapping level sets.

The algorithm produces the following standard deviation function to estimate the noise. The red dots are the sample standard deviation points obtained from the different level sets, and the black line represents the fitted standard deviation function. In this case, the estimated noise parameters were and . Also, notice that while the estimation function approaches 0 as the intensity approaches 0, it does not behave this way as the intensity approaches 1. This is due to the fact that our dataset only has clipping at the bottom, and so we estimated the noise model as such.

Plugging this segment into our imaging pipeline, we were able to obtain the following results.

File:Fig noiseest scielabs.png

File:Fig noiseest snrs.png

Here, we see that, with proper noise estimation, the LPA-ICI CFAI algorithm outperforms the simple pipeline for the lower luminance images. However, beyond a certain luminance level, the results level off. This is probably due to the fact that, because our thresholds were fixed, oversmoothing happens for the higher luminance levels.

Varying the LPA-ICI Thresholds

Instead of using fixed thresholds, we wanted to adjust the threshold based on the mean intensity value of the raw CFA data. We chose to base the threshold on mean intensity value because the mean intensity value gives us an idea of how much illumination was present in the scene. For high illuminance values, we wanted the LPA-ICI CFAI algorithm to not perform as much smoothing.

To obtain the threshold parameter, we used the following formulas:

The formulas were obtained by calculating the optimal thresholds for the Macbeth Color Checker for all mean luminance levels. Then, we fit a function by hand to these sample points.

The results obtained using adaptive thresholds are shown below.

File:Fig varygamma scielabs.png

File:Fig varygamma snrs.png


Using Color Consistency

Finally, we tried modifying the color transformation step. So far, we have been using the color transformation method of the Simple Pipeline, where a linear transform is calculated for the Macbeth Color Checker, assuming that the illuminant is D65. However, this may not always be the best estimate of the illuminant present in the scene. Using the method described in Color Consistency, we obtained the following results.

File:Fig color scielabs.png

File:Fig color snrs.png

The results weren't amazing and a little unstable. We would have liked to get this working better. There's still a lot of stuff we don't understand, but it was a fun idea =)

Conclusions

References

[1] Paliy, D., A.Foi, R. Bilcu, V. Katkovnik, “Denoising and Interpolation of Noisy Bayer Data with Adaptive Cross-Color Filters”, SPIE-IS&T Electronic Imaging, Visual Communications and Image Processing 2008, vol. 6822, San Jose, CA, January 2008.

[2] Paliy, D., V. Katkovnik, R. Bilcu, S. Alenius, K. Egiazarian, “Spatially Adaptive Color Filter Array Interpolation for Noiseless and Noisy Data”, International Journal of Imaging Systems and Technology (IJISP), Special Issue on Applied Color Image Processing, vol. 17, iss. 3, pp. 105-122, October 2007.

[3] Foi, A., M. Trimeche, V. Katkovnik, and K. Egiazarian, “Practical Poissonian-Gaussian noise modeling and fitting for single-image raw-data”, IEEE Trans. Image Process., vol. 17, no. 10, pp. 1737-1754, October 2008.

[4] Foi, A., S. Alenius, V. Katkovnik, and K. Egiazarian, “Noise measurement for raw data of digital imaging sensors by automatic segmentation of non-uniform targets”, IEEE Sensors Journal, vol. 7, no. 10, pp. 1456-1461, October 2007.

[5] C. Tomasi and R. Manduchi. Bilateral Filtering for Gray and Color Images. In Proceedings of the IEEE International Conference on Computer Vision, 1998.

Appendix

Sources and Results

Work Breakdown