Arthur Alaniz, Tina Mantaring: Difference between revisions
imported>Aalaniz |
imported>Cmgmant |
||
Line 102: | Line 102: | ||
== Color Correction == | == Color Correction == | ||
The images had great color overall and we weren't sure if we should do color correction, but we | The images had great color overall, and we weren't sure if we should attempt to do even better color correction, but we tried anyway. Originally, we tried various color consistency algorithms such as scaling by the maximum RGB values, the Gray World Algorithm, and the Gray Edge Algorithm. The algorithms all estimate scale values for the R, G, and B channels, and these scale values transform the image to fit their individual color model assumptions. | ||
Max-RGB estimates the light source's color by taking the maximum RGB value in the image. The Gray World hypothesis assumes that the average reflectance is gray, or has no color. The Gray Edge hypothesis is similar to Gray World, but it assumes that the average edge difference is gray. | |||
Out of the box, they all did not work well, as shown in the six images below (from left to right, top to bottom: original, no color correction, Gray World, max-RGB, Gray Edge, Gray Edge 2nd order). The max-RGB algorithm should only work if we have something white in the image. Gray World and Gray Edge fall apart when the image does not match their assumptions, which not all images do. However, the main problem overall seemed to be that the image colors weren't that bad in the first place and perhaps these algorithms weren't meant for fine tuning colors. | |||
[[File:cc.jpg]] | [[File:cc.jpg]] | ||
The next thing we tried was to assume different light sources. We copied a set of the ISET | The next thing we tried was to assume different light sources. We copied a set of the ISET illuminants and ran the simple pipeline for each them. The differences are shown in the figures below. If we chose the correct illuminants, the SNR went up and S-CIELAB dropped. While most cameras have an option to choose your light setting, we thought that we might be able to use the max-RGB, Gray-World and Gray-Edge estimates to automatically choose between different light sources instead. | ||
[[File:scivslumdifflum1.png]] | [[File:scivslumdifflum1.png]] | ||
Line 116: | Line 120: | ||
[[File:scivslumdifflum7.png]] | [[File:scivslumdifflum7.png]] | ||
One thought that we had was some of the algorithms may just be adjusting too much, but have a basic idea and direction. Our idea then was to transform the image for all light sources and whichever one | One thought that we had was that some of the algorithms may just be adjusting too much, but have a basic idea and direction. Our idea then was to transform the image for all light sources, and whichever one resulted in the least color correction was probably the best illuminant estimate. The method of measuring distance was by comparing the R, G, and B scaling values (wR, wG, wB) to the identity scale value of (1, 1, 1). Both of these were normalized to 1, and an angle was calculated between them. The hope was that the smaller the angle, which corresponded to the least amount of correction, the better the image was. Below are changes in angle vs S-CIELAB values with the best fit line in black. Both angle and S-CIELAB had their means subtracted out, and both start at 0. | ||
[[File:scivsangdifflum1.png]] | [[File:scivsangdifflum1.png]] | ||
Line 126: | Line 130: | ||
[[File:scivsangdifflum7.png]] | [[File:scivsangdifflum7.png]] | ||
The results weren't great, but one might be able to see a slight connection between growing angle and | The results weren't great, but one might be able to see a slight connection between growing angle and S-CIELAB scores. Studying the results further, we noticed that fluorescents followed a path of their own and removing them made the plot look cleaner. So instead of trying to decide between all illuminants, we tried just the D50, D55, D60, D75 lights. Their results are below and are a little better. | ||
[[File:scivsangdifflum1lim.png]] | [[File:scivsangdifflum1lim.png]] |
Revision as of 06:49, 19 March 2011
Introduction
As the use of digital cameras becomes more and more widespread, the imaging algorithms that go into these cameras have also become more sophisticated. The typical digital camera uses a processing pipeline to convert raw sensor values to a final image, and the different parts of the pipeline may involve correcting for hardware defects, interpolating color filter array values, removing noise, and performing color space transformations. As more and more focus is being given to this field, the pipeline algorithms are also becoming more and more complex. However, one must ask the question: do these complex algorithms really outperform their simpler, more straightforward counterparts?
Methods
Denoising and Demosaicking
For our project, the primary algorithm we chose for our imaging pipeline was the one presented in the paper “Denoising and Interpolation of Noisy Bayer Data with Adaptive Cross-Color Filters”, by D. Paliy, A. Foi, R. Bilcu, and V. Katkovnik. Their technique performs simultaneous denoising and demosaicking using directional adaptive filters that are based on the concepts of local polynomial approximation (LPA) and intersection of confidence intervals (ICI). Their paper is summarized in this section.
Local Polynomial Approximation
Local polynomial approximation, or LPA, works on the assumption that the data in some local region can be fitted to a polynomial function. The cross-color filters used in the algorithm are linear combinations of LPA smoothing and difference filters that operate on complimentary color channels.
The above figure shows an example of an LPA smoothing filter (left) and an LPA difference filter (right). The combined LPA cross-color filter is shown below, where the differing colors illustrate that the smoothing and difference components work on different color channels.
Letting denote a directional 1D convolution kernel, where is the polynomial order, is the scale or size of the kernel (kernel width), is the order of the derivative, and is the direction, then the interpolation kernels and the denoising kernels can be written as follows:
The paper used 4 possible directions for the interpolation filters and 8 directions from the denoising kernels. Aside from this, the structures for the interpolation and denoising kernels were very similar. Indeed, the primary difference between them was that their component filters operated on different support channels. This is shown in the figure below.
In the figure, (a) illustrates how the green pixel is interpolated at position g(0). The smoothing components of the interpolation filter operate on the green pixels, while the difference component operates on the red pixel. In (b), we see how denoising of the green pixel at g(0) is done. The smoothing components of the denoising filter operate on the green pixels, while the difference component operate on the red pixels. Similarly, when we are denoising a red pixel, as in (c), the smoothing components operate on the red pixels, while the difference component operates on the green pixels. Finally, since the denoising filters have 8 directions, they can operate along diagonals, which is shown in (d). When denoising green pixels along diagonals, we have only green pixels to deal with, and so there is no difference component. However, when we are denoising red or blue pixels, the smoothing component operates on the same channel while the difference component operates on the complimentary color channel.
Since the filters are directional, the origin of the filter is not in the center, but is located to one side. Obtaining the estimate is done by convolving the filter with the input pixels. That is, if is our noisy CFA data, then for each scale , we obtain the interpolation and denoising estimates and at each pixel position as follows:
Intersection of Confidence Intervals
The interpolation and denoising filters and were created for different scales . The goal of ICI was to choose the largest scale for which the local polynomial approximation is still valid. This was done as follows:
First, the standard deviations of the interpolation and denoising estimates were found using some estimate of the standard deviation of the noise. This was done as follows:
In the above formulas, is the estimated standard deviation of the noise. The paper gave two examples of signal-dependent noise models, and their corresponding standard deviation estimates:
- Poisson noise: , and
- Nonstationary Gaussian noise with signal-dependent standard deviation: , and
Next, for the sets of estimates and , they obtained sets of confidence intervals as follows:
where denotes the index of the scale that was used. Note that the threshold parameters are design parameters.
Finally, the adaptive scale for interpolation was found by finding the largest such that the intersection of confidence intervals is nonempty. Using this scale , the interpolation estimate would then be . Similarly, the adaptive scale for denoising was found by finding the largest such that the intersection of confidence intervals is nonempty. This results in the denoising estimate of .
Anisotropic Denoising and Interpolation
Combining the adaptive-scale kernels in all directions gives us an idea of the best local space for which the polynomial model fits the data. The figure below shows the combined denoising kernels for the red pixel in the center.
After applying the ICI criterion, they obtained adaptive-scale estimates from each direction . These directional denoised and interpolated estimates were then linearly combined to produce the final pixel value at a location . The actual formulas used to do this can be found in [1].
The block diagram of the joint denoising and demosaicking algorithm is shown below.
We see here that, using the pre-designed filter kernels, the denoising of the CFA values and the interpolation of green pixels at the red and blue locations happens simultaneously. Then, using the denoised Bayer data and the noisy, fully-interpolated green channel, the full-noise free green channel is produced, as well as the interpolated red and blue pixels at the blue and red locations, respectively. Finally, using the noise-free green channel, the red and blue pixels are interpolated at the green locations, producing the final, 3-channel image.
Noise Estimation
In the previous section, an estimate of the standard deviation of the noise was needed to properly perform demosaicking and denoising. Using the noise models that were used in [1] (and mentioned briefly in the previous section) gave pretty poor results. Thus, we wanted to use a more accurate noise model.
We used the noise estimation algorithm presented in the paper “Practical Poissonian-Gaussian noise modeling and fitting for single-image raw-data” by A. Foi, M. Trimeche, V. Katkovnik, and K. Egiazarian. This was the same noise-estimation algorithm that was used in #LPA-ICI-CFAI to obtain their results using cameraphone images.
In the paper by Foi et. al, they assumed that a noisy pixel observation at a location would be of the form
where is the true noise-free value, is zero-mean independent random noise whose standard deviation is 1, and is a function of y that gives the standard deviation of the overall amount of noise.
Furthermore, they assumed that the noise term was composed of a Poisson component , which models the photon noise of the sensor, and a Gaussian component , which models noise that is independent of the signal (eg. thermal noise). That is,
Using properties of the Poisson and Gaussian distributions, and some elementary algebra (see 3 for the complete derivation), they found that the overall standard deviation of the observed noisy signal would have the form
where and are parameters of the sensor.
The rest of the paper describes a method for estimating and using only a single image. First, they preprocess the data by converting it into the wavelet domain. This allows them to segment the data into non-overlapping regions (level sets), where the value for each set should be smooth. Next, for each of the sets, a (mean, standard deviation) pair is computed. This is done by straight computation of the sample mean and sample standard deviation. However, the paper also mentions a more robust alternative to computing for the standard deviation using the median of absolute deviations (MAD) method. Finally, the maximum-likelihood (ML) approach is used to fit a global model to the set of (mean, standard deviation) pairs.
The paper also adjusts the model for the clipped case; that is, when the noise might be causing the sensor values to go below the minimum or maximum values, and would thus be clipped to the minimum or maximum values, respectively. In such a case, clipping would cause the variance of the noise would be lower, something the model needs to account for.
The figures below show some examples of noise models that were estimated for different cameras with different ISO settings. The x-axis shows the intensity values, ranging from 0 to 1, and the y-axis shows the standard deviation. The red dots represent the sample standard deviations, obtained from each of the level sets, and the solid and dashed lines represent the fitted standard deviation functions using the sample standard deviation calculation and the MAD method, respectively. The falling off of the fitted function as the intensity values approach 0 or 1 are attributed to the effect of clipping. Moreover, the shift along the x-axis can be attributed to the image sensors having some "base" charge, even when there is no light incident on the sensor.
Adaptive Thresholding
In Denoising and Demosaicking, the thresholds and that were used in ICI were left as design parameters. These thresholds need to be chosen very carefully: if they are too large, then too much smoothing occurs. However, if they are too small, then the noise is not properly removed [1].
The paper's results were obtained by fixing these thresholds to and . However, we found that these values produced too much smoothing in the high-luminance images. This was probably because, for high-luminance images, the signal-to-noise ratio is relatively high, and thus less denoising is needed.
To remedy this behavior, we tried adjusting the thresholds based on the mean value of the input CFA image. Higher means would correspond to higher mean luminances, and consequently, lower thresholds. We fit an exponential function to our ideal threshold behavior, and used this to determine the values of and
Post-Filtering
After performing denoising and demosaicking, we noticed that there was still some noise in the final image, especially for lower luminance values. Thus, we wanted to add another filtering step that would hopefully get rid of this noise.
The filter we chose was the bilateral filter, proposed by C. Tomasi and R. Manduchi [5]. The filter performs smoothing on the image, while at the same time preserving edges. It does this by combining a domain filter, that is, a filter that measures the geometric distance between a pixel and its neighbors, with a range filter, which is a filter that measures the photometric distance between a pixel and its neighbors. The final effect causes a pixel to be replaced by an average of the pixels that are near it and similar to it.
Color Correction
The images had great color overall, and we weren't sure if we should attempt to do even better color correction, but we tried anyway. Originally, we tried various color consistency algorithms such as scaling by the maximum RGB values, the Gray World Algorithm, and the Gray Edge Algorithm. The algorithms all estimate scale values for the R, G, and B channels, and these scale values transform the image to fit their individual color model assumptions.
Max-RGB estimates the light source's color by taking the maximum RGB value in the image. The Gray World hypothesis assumes that the average reflectance is gray, or has no color. The Gray Edge hypothesis is similar to Gray World, but it assumes that the average edge difference is gray.
Out of the box, they all did not work well, as shown in the six images below (from left to right, top to bottom: original, no color correction, Gray World, max-RGB, Gray Edge, Gray Edge 2nd order). The max-RGB algorithm should only work if we have something white in the image. Gray World and Gray Edge fall apart when the image does not match their assumptions, which not all images do. However, the main problem overall seemed to be that the image colors weren't that bad in the first place and perhaps these algorithms weren't meant for fine tuning colors.
The next thing we tried was to assume different light sources. We copied a set of the ISET illuminants and ran the simple pipeline for each them. The differences are shown in the figures below. If we chose the correct illuminants, the SNR went up and S-CIELAB dropped. While most cameras have an option to choose your light setting, we thought that we might be able to use the max-RGB, Gray-World and Gray-Edge estimates to automatically choose between different light sources instead.
One thought that we had was that some of the algorithms may just be adjusting too much, but have a basic idea and direction. Our idea then was to transform the image for all light sources, and whichever one resulted in the least color correction was probably the best illuminant estimate. The method of measuring distance was by comparing the R, G, and B scaling values (wR, wG, wB) to the identity scale value of (1, 1, 1). Both of these were normalized to 1, and an angle was calculated between them. The hope was that the smaller the angle, which corresponded to the least amount of correction, the better the image was. Below are changes in angle vs S-CIELAB values with the best fit line in black. Both angle and S-CIELAB had their means subtracted out, and both start at 0.
The results weren't great, but one might be able to see a slight connection between growing angle and S-CIELAB scores. Studying the results further, we noticed that fluorescents followed a path of their own and removing them made the plot look cleaner. So instead of trying to decide between all illuminants, we tried just the D50, D55, D60, D75 lights. Their results are below and are a little better.
Results
We present the results we obtained using our chosen algorithm, together with all the other modifications that we made. In our discussion, we compare our results to the Simple Pipeline that we were given as reference. This simple pipeline has no denoising step, uses bilinear interpolation to perform demosaicking, and then uses the linear transformation that converts a Macbeth Color Checker in D65 illumination from the input color space to the output color space.
Out-of-the-Box Implementation
This section discusses the results we obtained when we plugged in the MATLAB code that implemented the LPA-ICI color filter array interpolation algorithm in [1].
We found that, in terms of mean S-CIELAB and mean SNR values, the out-of-the-box implementation only outperforms the simple pipeline in images with very low luminance levels. This was probably due to the fact that the noise models they assumed were very naive and inaccurate. In terms of the MCC metrics, the out-of-the-box implementation outperforms the simple pipeline until the mid-range luminance values.
Using Proper Noise Estimation
Our next step was to use a proper noise estimation algorithm. Using the MATLAB code that implemented the Poisson-Gaussian noise estimation algorithm in [], we were able to properly model our noise. The figure below shows the results from the preprocessing step of the algorithm, using image 3 of our given data at a mean luminance level of 2000 cd/sq m. This step removes the edges of the image, and then divides the smooth areas into non-overlapping level sets.
The algorithm produces the following standard deviation function to estimate the noise. The red dots are the sample standard deviation points obtained from the different level sets, and the black line represents the fitted standard deviation function. In this case, the estimated noise parameters were and . Also, notice that while the estimation function approaches 0 as the intensity approaches 0, it does not behave this way as the intensity approaches 1. This is due to the fact that our dataset only has clipping at the bottom, and so we estimated the noise model as such.
Plugging this segment into our imaging pipeline, we were able to obtain the following results.
File:Fig noiseest scielabs.png
Here, we see that, with proper noise estimation, the LPA-ICI CFAI algorithm outperforms the simple pipeline for the lower luminance images. However, beyond a certain luminance level, the results level off. This is probably due to the fact that, because our thresholds were fixed, oversmoothing happens for the higher luminance levels.
Varying the LPA-ICI Thresholds
Instead of using fixed thresholds, we wanted to adjust the threshold based on the mean intensity value of the raw CFA data. We chose to base the threshold on mean intensity value because the mean intensity value gives us an idea of how much illumination was present in the scene. For high illuminance values, we wanted the LPA-ICI CFAI algorithm to not perform as much smoothing.
To obtain the threshold parameter, we used the following formulas:
The formulas were obtained by calculating the optimal thresholds for the Macbeth Color Checker for all mean luminance levels. Then, we fit a function by hand to these sample points.
The results obtained using adaptive thresholds are shown below.
File:Fig varygamma scielabs.png
Using Color Consistency
Finally, we tried modifying the color transformation step. So far, we have been using the color transformation method of the Simple Pipeline, where a linear transform is calculated for the Macbeth Color Checker, assuming that the illuminant is D65. However, this may not always be the best estimate of the illuminant present in the scene. Using the method described in Color Consistency, we obtained the following results.
The results weren't amazing and a little unstable. We would have liked to get this working better. There's still a lot of stuff we don't understand, but it was a fun idea =)