Arthur Alaniz, Tina Mantaring
Introduction
As the use of digital cameras becomes more and more widespread, the imaging algorithms that go into these cameras have also become more sophisticated. The typical digital camera uses a processing pipeline to convert raw sensor values to a final image, and the different parts of the pipeline may involve correcting for hardware defects, interpolating color filter array values, removing noise, and performing color space transformations. As more and more focus is being given to this field, the pipeline algorithms are also becoming more and more complex. However, one must ask the question: do these complex algorithms really outperform their simpler, more straightforward counterparts?
Methods
Denoising and Demosaicking
For our project, the primary algorithm we chose for our imaging pipeline was the one presented in the paper “Denoising and Interpolation of Noisy Bayer Data with Adaptive Cross-Color Filters”, by D. Paliy, A. Foi, R. Bilcu, and V. Katkovnik [1]. Their technique performs simultaneous denoising and demosaicking using directional adaptive filters that are based on the concepts of local polynomial approximation (LPA) and intersection of confidence intervals (ICI). Their paper is summarized in this section.
Local Polynomial Approximation
Local polynomial approximation, or LPA, works on the assumption that the data in some local region can be fitted to a polynomial function. The cross-color filters used in the algorithm are linear combinations of LPA smoothing and difference filters that operate on complimentary color channels.
The above figure shows an example of an LPA smoothing filter (left) and an LPA difference filter (right). The combined LPA cross-color filter is shown below, where the differing colors illustrate that the smoothing and difference components work on different color channels.
Letting denote a directional 1D convolution kernel, where is the polynomial order, is the scale or size of the kernel (kernel width), is the order of the derivative, and is the direction, then the interpolation kernels and the denoising kernels can be written as follows:
The paper used 4 possible directions for the interpolation filters and 8 directions from the denoising kernels. Aside from this, the structures for the interpolation and denoising kernels were very similar. Indeed, the primary difference between them was that their component filters operated on different support channels. This is shown in the figure below.
In the figure, (a) illustrates how the green pixel is interpolated at position g(0). The smoothing components of the interpolation filter operate on the green pixels, while the difference component operates on the red pixels. In (b), we see how denoising of the green pixel at g(0) is done. The smoothing components of the denoising filter operate on the green pixels, while the difference component operate on the red pixels. Similarly, when we are denoising a red pixel, as in (c), the smoothing components operate on the red pixels, while the difference component operates on the green pixels. Finally, since the denoising filters have 8 directions, they can operate along diagonals, which is shown in (d). When denoising green pixels along diagonals, we have only green pixels to deal with, and so there is no difference component. However, when we are denoising red or blue pixels, the smoothing component operates on the same channel while the difference component operates on the complimentary color channel.
Since the filters are directional, the origin of the filter is not in the center, but is located to one side. Obtaining the estimate is done by convolving the filter with the input pixels. That is, if is our noisy CFA data, then for each scale , the interpolation and denoising estimates and are obtained at each pixel position as follows:
Intersection of Confidence Intervals
The interpolation and denoising filters and were created for different scales . In the paper, the goal of ICI was to choose the largest scale for which the local polynomial approximation is still valid. This was done as follows:
First, the standard deviations of the interpolation and denoising estimates were found using some estimate of the standard deviation of the noise. This was done as follows:
In the above formulas, is the estimated standard deviation of the noise. The paper gave two examples of signal-dependent noise models, and their corresponding standard deviation estimates:
- Poisson noise: , and
- Nonstationary Gaussian noise with signal-dependent standard deviation: , and
Next, for the sets of estimates and , they obtained sets of confidence intervals as follows:
where denotes the index of the scale that was used. Note that the threshold parameters are design parameters.
Finally, the adaptive scale for interpolation was found by finding the largest such that the intersection of confidence intervals is nonempty. Using this scale , the interpolation estimate would then be . Similarly, the adaptive scale for denoising was found by finding the largest such that the intersection of confidence intervals is nonempty. This results in the denoising estimate of .
Anisotropic Denoising and Interpolation
Combining the adaptive-scale kernels in all directions gives us an idea of the best local space for which the polynomial model fits the data. The figure below shows the combined denoising kernels for the red pixel in the center.
After applying the ICI criterion, they obtained adaptive-scale estimates from each direction . These directional denoised and interpolated estimates were then linearly combined to produce the final pixel value at a location . The actual formulas used to do this can be found in [1].
The block diagram of the joint denoising and demosaicking algorithm is shown below.
We see here that, using the pre-designed filter kernels, the denoising of the CFA values and the interpolation of green pixels at the red and blue locations happens simultaneously. Then, using the denoised Bayer data and the noisy, fully-interpolated green channel, the full-noise free green channel is produced, as well as the interpolated red and blue pixels at the blue and red locations, respectively. Finally, using the noise-free green channel, the red and blue pixels are interpolated at the green locations, producing the final, 3-channel image.
Noise Estimation
In the previous section, an estimate of the standard deviation of the noise was needed to properly perform demosaicking and denoising. Using the noise models that were used in [1] (and mentioned briefly in the previous section) gave pretty poor results. Thus, we wanted to use a more accurate noise model.
We used the noise estimation algorithm presented in the paper “Practical Poissonian-Gaussian noise modeling and fitting for single-image raw-data” by A. Foi, M. Trimeche, V. Katkovnik, and K. Egiazarian [3]. This was the same noise-estimation algorithm that was used in [1] to obtain their results using cameraphone images.
In the paper by Foi et. al, they assumed that a noisy pixel observation at a location would be of the form
where is the true noise-free value, is zero-mean independent random noise whose standard deviation is 1, and is a function of y that gives the standard deviation of the overall amount of noise.
Furthermore, they assumed that the noise term was composed of a Poisson component , which modeled the photon noise of the sensor, and a Gaussian component , which modeled noise that is independent of the signal (eg. thermal noise). That is,
Using properties of the Poisson and Gaussian distributions, and some elementary algebra (see [3] for the complete derivation), they found that the overall standard deviation of the observed noisy signal would have the form
where and are parameters of the sensor.
The rest of the paper describes a method for estimating and using only a single image. First, they preprocessed the data by converting it into the wavelet domain. This allowed them to segment the data into non-overlapping regions (level sets), where the value for each set should be smooth. Next, for each of the sets, a (mean, standard deviation) pair was computed. This was done by straight computation of the sample mean and sample standard deviation. However, the paper also mentions a more robust alternative to computing for the standard deviation using the median of absolute deviations (MAD) method. Finally, the maximum-likelihood (ML) approach was used to fit a global model to the set of (mean, standard deviation) pairs.
The paper also adjusted the model for the clipped case; that is, when the noise might be causing the sensor values to go below the minimum or maximum values, and would thus be clipped to the minimum or maximum values, respectively. In such a case, clipping would cause the variance of the noise to be lower, something the model needed to account for.
The figures below show some examples of noise models that were estimated for different cameras with different ISO settings. The x-axis shows the intensity values, ranging from 0 to 1, and the y-axis shows the standard deviation. The red dots represent the sample standard deviations, obtained from each of the level sets, and the solid and dashed lines represent the fitted standard deviation functions using the sample standard deviation calculation and the MAD method, respectively. The falling off of the fitted function as the intensity values approach 0 or 1 are attributed to the effect of clipping. Moreover, the shift along the x-axis can be attributed to the image sensors having some "base" charge, even when there is no light incident on the sensor.
Adaptive Thresholding
In Denoising and Demosaicking, the thresholds and that were used in ICI were left as design parameters. These thresholds need to be chosen very carefully: if they are too large, then too much smoothing occurs. However, if they are too small, then the noise is not properly removed [1].
The paper's results were obtained by fixing these thresholds to and . However, we found that these values produced too much smoothing in the high-luminance images. This was probably because, for high-luminance images, the signal-to-noise ratio is relatively high, and thus less denoising is needed.
To remedy this behavior, we tried adjusting the thresholds based on the mean value of the input CFA image. Higher means would correspond to higher mean luminances, and consequently, lower thresholds. We fit an exponential function to our ideal threshold behavior, and used this to determine the values of and
Post-Filtering
After performing denoising and demosaicking, we noticed that there was still some noise in the final image, especially for lower luminance values. Thus, we wanted to add another filtering step that would hopefully get rid of this noise.
The filter we chose was the bilateral filter, proposed by C. Tomasi and R. Manduchi [5]. The filter performs smoothing on the image, while at the same time preserving edges. It does this by combining a domain filter, that is, a filter that measures the geometric distance between a pixel and its neighbors, with a range filter, which is a filter that measures the photometric distance between a pixel and its neighbors. The final effect causes a pixel to be replaced by an average of the pixels that are near it and similar to it.
In the end, we did not use the bilateral filter in our pipeline because it did not make our results better.
Color Correction
The images had great color overall, and we weren't sure if we should attempt to do even better color correction, but we tried anyway. Originally, we tried various color consistency algorithms such as scaling by the maximum RGB values, the Gray World Algorithm, and the Gray Edge Algorithm. The algorithms all estimate scale values for the R, G, and B channels, and these scale values transform the image to fit their individual color model assumptions.
Max-RGB estimates the light source's color by taking the maximum RGB value in the image. The Gray World hypothesis assumes that the average reflectance is gray, or has no color. The Gray Edge hypothesis is similar to Gray World, but it assumes that the average edge difference is gray.
Out of the box, they all did not work well, as shown in the six images below (from left to right, top to bottom: original, no color correction, Gray World, max-RGB, Gray Edge, Gray Edge 2nd order). The max-RGB algorithm should only work if we have something white in the image. Gray World and Gray Edge fall apart when the image does not match their assumptions, which not all images do. However, the main problem overall seemed to be that the image colors weren't that bad in the first place and perhaps these algorithms weren't meant for fine tuning colors.
The next thing we tried was to assume different light sources. We copied a set of the ISET illuminants and ran the simple pipeline for each them. The differences are shown in the figures below. If we chose the correct illuminants, the SNR went up and S-CIELAB dropped. While most cameras have an option to choose your light setting, we thought that we might be able to use the max-RGB, Gray-World and Gray-Edge estimates to automatically choose between different light sources instead.
One thought that we had was that some of the algorithms may just be adjusting too much, but have a basic idea and direction. Our idea then was to transform the image for all light sources, and whichever one resulted in the least color correction was probably the best illuminant estimate. The method of measuring distance was by comparing the R, G, and B scaling values (wR, wG, wB) to the identity scale value of (1, 1, 1). Both of these were normalized to 1, and an angle was calculated between them. The hope was that the smaller the angle, which corresponded to the least amount of correction, the better the image was. Below are changes in angle vs S-CIELAB values with the best fit line in black. Both angle and S-CIELAB had their means subtracted out, and both start at 0.
The results weren't great, but one might be able to see a slight connection between growing angle and S-CIELAB scores. Studying the results further, we noticed that fluorescents followed a path of their own and removing them made the plot look cleaner. So instead of trying to decide between all illuminants, we tried just the D50, D55, D60, D75 lights. Their results are below and are a little better.
In the end, this is the block diagram we used.
Results
We present the results we obtained using our chosen algorithm, together with all the other modifications that we made. In our discussion, we compare our results to the Simple Pipeline that we were given as reference. This simple pipeline has no denoising step, uses bilinear interpolation to perform demosaicking, and then uses the linear transformation that converts a Macbeth Color Checker in D65 illumination from the input color space to the output color space.
Out-of-the-Box Implementation
This section discusses the results we obtained when we plugged in the MATLAB code that implemented the LPA-ICI color filter array interpolation algorithm in [1].
We found that, in terms of mean S-CIELAB and mean SNR values, the out-of-the-box implementation only outperforms the simple pipeline in images with very low luminance levels. This was probably due to the fact that the noise models they assumed were very naive and inaccurate. In terms of the MCC metrics, the out-of-the-box implementation outperforms the simple pipeline until the mid-range luminance values.
Note that in the figure for mean SNR, the SNR for the Simple Pipeline at a mean luminance of 2 cd/sq m was not displayed because it was negative, and we were using a log scale plot.
Using Proper Noise Estimation
Our next step was to use a proper noise estimation algorithm. Using the MATLAB code that implemented the Poisson-Gaussian noise estimation algorithm, we were able to properly model our noise. The figure below shows the results from the preprocessing step of the algorithm, using image 3 of our given data at a mean luminance level of 2000 cd/sq m. This step removes the edges of the image, and then divides the smooth areas into non-overlapping level sets.
The algorithm produces the following standard deviation function to estimate the noise. The red dots are the sample standard deviation points obtained from the different level sets, and the black line represents the fitted standard deviation function. In this case, the estimated noise parameters were and . Also, notice that while the estimation function approaches 0 as the intensity approaches 0, it does not behave this way as the intensity approaches 1. This is due to the fact that our dataset only has clipping at the bottom, and so we estimated the noise model as such.
Plugging this segment into our imaging pipeline, we were able to obtain the following results.
Here, we see that, with proper noise estimation, the LPA-ICI CFAI algorithm outperforms the simple pipeline for the lower luminance images. However, beyond a certain luminance level, the results level off. This is probably due to the fact that, because our thresholds were fixed, oversmoothing happens for the higher luminance levels. One positive effect of oversmoothing, however, is that it causes the MCC color noise to go down.
Varying the LPA-ICI Thresholds
Instead of using fixed thresholds, we wanted to adjust the threshold based on the mean intensity value of the raw CFA data. We chose to base the threshold on mean intensity value because the mean intensity value gives us an idea of how much illumination was present in the scene. For high illuminance values, we wanted the LPA-ICI CFAI algorithm to not perform as much smoothing.
To obtain the threshold parameter, we used the following formulas:
The formulas were obtained by calculating the optimal thresholds for the Macbeth Color Checker for all mean luminance levels. Then, we fit a function by hand to these sample points.
The results obtained using adaptive thresholds are shown below.
We see that using adaptive thresholds gives us better SNR and S-CIELAB values. It also brings the MCC color bias down, compared to the color bias produced by the fixed thresholds. However, the MCC color noise rises considerably. Our goal in varying the gamma was to prevent oversmoothing in the higher luminance levels. Since there isn't as much smoothing, it isn't surprising that the noise metric isn't as good as the value for an oversmoothed image. However, visually, the difference isn't really noticeable, as shown in our output images in Appendix: Output Images
Using Color Consistency
Finally, we tried modifying the color transformation step. So far, we have been using the color transformation method of the Simple Pipeline, where a linear transform is calculated for the Macbeth Color Checker, assuming that the illuminant is D65. However, this may not always be the best estimate of the illuminant present in the scene. Using the method described in Color Consistency, we obtained the following results (Note: for this section, the demosaicking algorithm used was the bilinear interpolation algorithm of the simple pipeline. This was so that we could see the effect of just our Color Consistency algorithm, without the effects of the LPA-ICI-CFAI algorithm)
The results weren't amazing and were a little unstable, so in the end, we decided not to include our color consistency method in our pipeline. Using all illuminants did not work very well, as you can tell from the green lines on the graphs. Using all illuminants except fluorescents looked the best on the mean graph, but were also unstable, not shown well here. But, using only D series lights proved to have a good trade-off between stability and better SCIELAB values. We would have liked to get this working better. There's still a lot of stuff we don't understand, but it was a fun idea =)
Summary of Results
The imaging pipeline we used produced pretty okay results. In particular, we were able to at least match the performance of the Simple Pipeline, and in some cases, even do better. One consequence of the algorithm that we used was that, for the lowest luminance value, the images we produced were very purple-ish (See Appendix: Output Images).
We plotted the histograms of the red, green, and blue channels of the Macbeth Color Checker at a mean luminance of 2 cd/sq m, and we got the following results. The top plots show the histograms of the original image, normalized to have values from 0 to 1, while the bottom plots show the histograms of the denoised and demosaicked images, also normalized to have values from 0 to 1.
While the green channel seems to be properly centered, the red and blue channels both seem to be shifted to the right. This may account for the purple noise phenomenon that we consistently saw. The shift in channels may be a consequence of the fact that the interpolated and denoised green channel is used to interpolate the values of the red and blue channels. Thus, the errors may accumulate, causing the unequal behavior among channels.
The figure below shows the percentage of time spent on each of the sub-blocks of the pipeline.
The denoising/demosaicking step takes a really long time to run, making it a little impractical to use.
Conclusions
In conclusion, we can say that the LPA-ICI Color Filter Array Interpolation algorithm worked fairly okay. It produced good results for images with low mean illuminance values. However, for the higher illuminance values, it took quite a bit of extra work to get it to perform better than the Simple Pipeline. Considering the computational complexity of the overall algorithm that we ended up using, the small margin in results probably would not be worth it. Visually, the difference was clear for the lower luminance values. However, most images are taken in good light, and at good light, the output images really could not be differentiated.
The idea to determine the illuminant from the image itself could have been really useful. Our results showed that choosing the correct illuminant gave us better SNR and S-CIELAB metrics. However, figuring out how to automatically determine the illuminant proved to be very challenging. While our method somewhat worked, there was still a random component that we didn't quite understand. If the image sensors could somehow embed some luminance information into RAW images, then this would help our method.
References
Publications
Software
MATLAB Code: Color Filter Array Interpolation Based on LPA-ICI
MATLAB Code: Poisson-Gaussian Noise Estimation for Single-Image Raw Data
Appendix I
Output Images
Source Code
File:AdaptivePipelineProject.zip
Appendix II
Work Breakdown
Arthur Alaniz : Worked on doing noise estimation and color consistency; Worked on the presentation and documentation
Tina Mantaring: Worked on getting the initial LPA-ICI-CFAI algorithm running and on varying the threshold parameters; Worked on the presentation and documentation