PadmanabanVarma
Introduction
Camera technology has been advancing at a fast pace, from giving everyone access to a high-quality cameras in their smartphones to looking beyond 2D images and being able to capture depth information. However, one of the persistent problems that still remains is being able to capture high quality images in low light conditions. As we often see with the images we capture at night with our smart phones, they are usually poor quality and often corrupted by noise that looks like speckles in the image. Specifically, these images are dominated by Poisson shot noise that is inherent to the distribution of photons that are recorded by camera sensors. This photon noise is usually too small relative to the number of photons captured in daylight/bright-light settings and therefore not noticeable. However as photon count drops in dark settings, this noise begins to dominate image formation.
Image denoising is a popular problem that has been studied extensively in the past as a signal denoising problem. However, one of the major drawbacks of these processes is that they focus almost solely on Gaussian noise in images. Gaussian noise is inherently different from Poisson noise in that it is easier to "average out". As described in the next section, there are some methods that look at removing non-Gaussian noise as well, but these methods are very complex and computationally expensive.
Our objective in this project is to find a simple and inexpensive method to be able to denoise existing and new low-light images. We approach this problem via a machine learning perspective, specifically linear regression, to learn a parameter that maps the noisy values of a patch in the image to its center pixel value. Once this parameter is learned, it is easy to use it to denoise previously captured images. Moreover, since the approach denoises a patch at a time, it can be applied to images of varying sizes. With this formulation, the denoising only relies on vector-vector multiplications, which can can be computed efficiently and quickly.
Background
The first attempts at denoising images began with applying techniques from signal processing such as Wiener filtering. However, this relied on the underlying image being smooth, which is not always true for captured images. A more advanced approach looked at transforming the image to the Fourier or Wavelet domain and removing noise by thresholding the coefficients. The two drawbacks of this approach in context of low light image denoising are that it is computationally expensive and it is targeted towards images with Gaussian, not Poisson, noise. A more recent approach applies independent component analysis (ICA) to images in order to denoise them. This procedure works well with non-Gaussian noise but requires multiple frames of the same scene to be able to denoise the image properly. For previously captured images, this is not a viable option.
Another approach uses a single iteration of the expectation-maximization (EM) algorithm to denoise images with high amounts of Gaussian noise. The advantage of this method is that it can adapt the generic prior it learns to the specific noisy image it is being applied to. This means that even if it hasn't seen a particular type of noise or image during "training" time, it has the ability to perform a valid denoising on the image. The disadvantage, again, is that it is meant only for Gaussian noise. A separate deep learning approach uses Convolution Neural Networks (CNN) to denoise images once properly trained. Since this is a neural network approach, this requires a much larger amount of training data and the training time is considerably higher.
Dataset
For the base images of our dataset, we start with the Berkeley Segmentation Data Set [1]. This dataset consists mostly of outdoor images, but includes people, nature and monuments. This ensured that we had a good variety of images to learn from and the learned parameter was not biased towards certain colors or spatial frequencies.
We processed these daytime images to simulate nighttime low-light image captures of the same scenes. We do this using the Image Systems Engineering Toolbox. The data set does not provide camera information metadata, and so we convert the RGB image to a scene using our monitors (Apple LCD) as the display calibration SPDs. Then we use sceneAdjustIlluminant to normalize the illuminant in all the images to D65. Next, we set the peak photon count in the images to 10 photons because this visually created a reasonable amount of Poisson noise in testing. This is done using the sceneAdjustLuminance function and by querying sceneGet for the photon count. Next, we use scenePhotonNoise to manually change the photon counts in the scene according to the Poisson distribution, and we modify the scenePhotonNoise function to remove the Gaussian approximation. Lastly, when saving the scene as an RGB image using sceneSaveImage, we modify the code such that it produces a dark RGB output (instead of the default, which rescales the XYZ values so that the RGB image is bright). Earlier, we assumed that the daylight is 6500K blackbody radiation. We justify this choice because it matches a sunny day well, and images taken taken in outdoor night settings can look almost exactly like daylight photos with a long exposure, which then informs our decision to create low light images by simply scaling down the photon count.
The above procedure details how to create the noisy training examples. The noise-free values corresponding to these will be generated in much the same way, but excluding the Poisson noise addition step. These noise-free values are used as ground truth for the linear regression algorithm.



Methods
All the methods described below were implemented in MATLAB.
Baselines
We wanted some computational efficient standard algorithms to compare our approaches against as a baseline to see if this was worth pursuing.
Gaussian Smoothing
We chose the first baseline to be simple smoothing with a Gaussian kernel. The parameters used for the smoothing were a 13x13 window (to match the regression algorithm), and a of 0.95. The was chosen as a reasonable compromise between blur and noise removal. In this case, some of the chromatic components of the noise is still noticeable, and the edges (i.e. areas in the image where there is a sharp transition in the color values) are slightly blurred, suggesting that this is a reasonable choice in the tradeoff.
Bilateral Filter
As our second baseline, we use a bilateral filter. Since bilateral filters add an additional weighting factor based on the relative pixel values, they preserve edges better than simply smoothing. We chose this weighting factor to visually retain a similar amount of noise reduction in the areas of more uniform color as the Gaussian smoothing, but while better preserving some edges. The chosen weighting here was a Gaussian weighting with a s of 0.1 (for RGB values from 0 to 1), and the window remains 13x13, and the spatial s remains 0.95.
CIELAB space approach
Since colors are perceptually linear in the Lab color space, we wanted to see if these simple baseline techniques would be improved by applying them to Lab values instead of RGB values. The conversion to Lab was done using ieXYZ2LAB and srgb2xyz functions, with the D65 whitepoint, but darkened to match how we generated the XYZ values for the images.
Gaussian Smoothing
After Lab conversion, we applied the same 13x13 window and of 0.95 as with the baseline smoothing, then converted back to RGB.
Bilateral Filter
The bilateral filter was implemented with the same 13x13 window and spatial s of 0.95 again, but the v for spread of the Lab values is now 30 for the "L" and "b" channel, and 100 for the "a" channel. The "L" and "b" parameters were chosen to balance blur and denoising, but the output image wasn't as sensitive to these values as the "a" channel. If the "a" channel v was too small (less blurring), the image gets dominated by noise in the form of red and green dots, but if it was too big (more blurring), this manifested as bright red or green regions at many of the edges. As there seems to be remaining red and green noise, there does seem to be a perfect tradeoff in this space either.
Linear Regression
We chose linear regression to learn for denoising low-light images was linear regression where we predict the RGB values of the "denoised" image based on RGB values of other pixels in the noisy image. We use linear regression of the form J()= TODO and learn the parameter via batch gradient descent. Specifically, we learn this denoising parameter per patch - that is, one example consists of the pixel values in a 13x13 patch that is used to predict the value of the center pixel/patch. This approach is supported by the assumption that a pixel's color is locally dependent (highly related to and dependent on) the colors of the pixels around it.

Our features for a single pixel/patch are the following:
- Raw RGB values of the 13x13 surrounding patch
- Vertical RGB differences across 13x13 patch (resulting in a 13x12 patch)
- Horizontal RGB differences across 13x13 patch (resulting in a 12x13 patch)
1 Pixel Interpolation
Our first attempt focused on using the features from a 13x13 patch to predict the RGB values of the center pixel of the patch. In this case, the dimensions of the feature vector, x(i) is (13x13 + 12x13 + 13x12) x 3 = 1443. The dimensions of the output vector, y(i) is 3. The that is learned can be visualized as shown in Fig. 1. We can see that the raw pixel values of a specific color channel are highly correlated with values of the same color channel for neighboring pixels. The next two rows, which represent the pixel difference features, show something interesting - each color channel's filter does depend on information from the other two colors. This suggests that training for one RGB pixel instead of each color channel separately will lead to an improvement in denoising compared to learning a separate parameter per color channel. Moreover, this also suggests that the cross-talk in the color channels for pixel difference can be a factor in helping preserve edges in the images.

3x3 Pixels Interpolation
We also tried a slightly edited model of the above - instead of predicting the RGB values of a single center pixel, we predicted the RGB values of the center 3x3 patch. This means that the dimensions of the output vector, y(i) is no (3x3)x3 = 27. We learned another theta parameter, but it was difficult to visualize because of its dimensions. A comparison of the two approaches is provided in the Results section.
Evaluation
We compared the images using two separate metrics for the denoised images. First, the PSNR (peak SNR) when compared a noise-free version, since it's a commonly used standard for comparing images. Furthermore, PSNR depends only on the ratio of the max pixel value to the mean squared error (MSE). The images in the dataset were all normalized to similar brightnesses, and the regression algorithms optimize for MSE, so PSNR serves as a test for if the learning algorithms work properly as well as image quality.
For a metric not dependent on our optimized parameter, and one that is perceptually more relevant, we adapt the xyz2vSNR function to operate on the difference between a pair of images instead of a single image. This is done by taking the absolute difference between the two images in question. The white point was set to the darkened D65 white point as before. The default spatial parameters are calculated for viewing an image on a monitor from a distance of 50cm. This turns out to roughly match our computers as well (with a viewing distance of 54cm), and so these were left at their default values.
Results


Quantitative Results
First, we looked at a numerical comparison of how the different methods we tried performed on images on the test set. The two metrics we used to perform this comparison were the peak signal to noise ratio (PSNR) and visual signal to noise ratio (VSNR) values. Note that PSNR compares two images while VSNR is set up to only look at the noise content of a single image. To account for this difference, in our project, we looked at the difference in VSNR of the denoised and original clean image, which led to very small values. This can be interpreted as how much the VSNR improved after the denoising techniques were applied.
Note that the numbers are the averages across entire images in the test set, while the linear regression algorithm was performed over individual patches in the image. Higher PSNR corresponds to a lower MSE, which is what the machine learning algorithm was optimized for. Therefore, as expected, the linear regression performs the best in terms of high PSNR. Moreover, even in terms of VSNR improvement, the algorithms perform the same relative to each other as with PSNR. This means that even though we did not optimize for VSNR, our algorithm performs well even in terms of that metric.
Another interesting observation is that the noisy image has more of a VSNR improvement than the denoised methods. This could mean that the artifacts the denoising methods introduce are more noticeable in the dark images than the noise, therefore leading to a better VSNR value for the noisy dark image than the denoised dark image.

Qualitative Results
We compare the performance of the 5 methods qualitatively by the following criteria: spatial blur, chromatic blur (i.e. blur that keeps sharp edges in brightness, but bleeds color), residual noise in intensity (i.e. noise in the brightness values), and residual chromatic noise (i.e. noise only in the color not brightness). We look at sections of a particular image to see how each of the algorithms we tested perform differently in terms of denoising the images.
Looking at spatial blur, it's clear, especially looking at the white nailhead in patch C, that the linear regression does well, likely due to the inclusion of the pixel differences. The averaging approaches are the worst, though the bilateral could be tweaked to improve this, at the expense of more residual noise. Next, chromatic blurring occurs with regression. Looking at the netting in patch C or the red and silver areas of patch A, one can see that the green of the grass in the background is like the brown of the wood, the red becomes more black, and the silver more red at the edge.
The 3x3 performs better than the 1x1, validating our reasoning in using a larger output patch for the regression. The standard approaches don't suffer this issue. Similarly, the standard approaches don't have much in the way of noticeable noise in their intensities whereas the machine learning approaches do. Finally, comparing the residual chromatic noise, the situation flips, with the machine learning approaches having nothing noticeable, whereas the standard approaches still have speckles of color.
Conclusions
The best approaches of the ones we tried were using a bilateral filter in the Lab color space, as well the as the 3x3 linear regression. The two methods have almost complementary flaws, which is an interesting coincidence. On the other hand, smoothing normally, but in Lab space, did almost identically to the RGB smoothing. This is probably because both of them are perceptually linear in some way (though Lab has other perceptually relevant features), and so blurring in either perceptually linear space looks the same.
Considering the Lab space bilateral filter, we see from the sample patches that this is actually the best of our methods in terms of retaining the sharpness of the image. This is likely due to the fact that the sharp edges are well represented as luminance changes, and since the noise was generated independently for the channels in our dataset, the averaging across the channels when computing luminance means less of the noise is present in the luminance channel, and we can force the bilateral filter to more strongly prefer nearer luminance values without keeping as much noise. Of course, our model is slightly incorrect in this regard, since the Bayer filter pattern and interpolation to find values means that the noise will not be as independent. Also, while the Lab bilateral filter does the best with sharp edges, it seems to be the worst in terms of remaining chromatic elements of the noise, so finding some way to improve this would be useful – tweaking the 'a' and 'b' channels didn't seem to help as much as we'd liked.
The 3x3 linear regression on the other hand completely removes any of the chromatic elements of the noise, with seemingly only luminance components of noise left over. This makes sense since it averages over all three of the channels to an extent, while preferentially using the channel corresponding to the output channel. It also performs well at edges thanks to the spatial information it receives from the vertical and horizontal differences, so these were good features to select. The 3x3 approach also does better than, though not perfectly, with a problem that the 1x1 shows especially clearly, which is color "bleeding" across edges to other regions. The fact that the 3x3 approach has to optimize for learning sharp color transitions within the 3x3 output area helps improve the output over the 1x1, which has no spatial constraints at the output. At the cost of efficiency, this may be further improved by increasing the output patch size. It would also be informative to investigate the input patch size's effect. Our visualized learning parameter seemed to indicate that the weighted average was largely 0 far from the center of the 13x13 input patch, but with larger output patches to reduce color bleed artifacts, this may change.
Combining the strengths of these two approaches may be a possible approach to try by adding Lab values and differences as features to the machine learning algorithm to learn a better linear regression.
References
Farrell, Joyce, Okincha, Mike, Parmar, Manu, and Wan- dell, Brian. Using visible snr (vsnr) to compare the image quality of pixel binning and digital resizing. In IS&T/SPIE Electronic Imaging, pp. 75370C–75370C. International Society for Optics and Photonics, 2010.
Hunt, R.W.G. The Reproduction of Colour. The Wiley-IS&T Series in Imaging Science and Tech- nology. Wiley, 2005. ISBN 9780470024263. [2].
Hyvärinen, Aapo, Hoyer, Patrik, and Oja, Erkki. Image denoising by sparse code shrinkage. In Intelligent Signal Processing. Citeseer, 1999.
Jin, Fu, Fieguth, Paul, Winger, Lowell, and Jernigan, Ed- ward. Adaptive wiener filtering of noisy images and image sequences. In Image Processing, 2003. ICIP 2003. Proceedings. 2003 International Conference on, volume 3, pp. III–349. IEEE, 2003.
Kingsbury, Nick. Complex wavelets for shift invariant analysis and filtering of signals. Applied and computational harmonic analysis, 10(3):234–253, 2001.
Luo, Enming, Chan, Stanley H, and Nguyen, Truong Q. Adaptive image denoising by mixture adaptation. arXiv preprint arXiv:1601.04770, 2016.
Martin, D., Fowlkes, C., Tal, D., and Malik, J. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proc. 8th Int’l Conf. Computer Vision, volume 2, pp. 416–423, July 2001.
Ren, Jimmy SJ and Xu, Li. On vectorization of deep convolutional neural networks for vision tasks. arXiv preprint arXiv:1501.07338, 2015.
Wandell, Brian and Farrell, Joyce. Image systems engineer- ing toolbox (iset) [3].
Appendix I
The code and datasets used for this project are available here
Appendix II
Nitish Padmanaban: Conversion of dataset to dark and noisy, dark images. Baseline and Lab space methods (RGB and Lab Gaussian Smoothing, RGB and Lab Bilateral Filters).
Paroma Varma: Background Research, Linear regression methods (1x1 and 3x3 versions).