Image Upsampling using L3

Introduction

Image resolution enhancement is an important problem with many practical applications. It enables efficient image compression for storage and transfer over a network. It also offers the possibility of enhancing an image captured using a lower resolution camera before displaying it on a bigger or higher resolution screen. With the ubiquitous cellphone cameras and high resolution displays nowadays, having a fast and accurate technique to upsample a low resolution image has become quite important.

L3 Method

L3 stands for Linear, Local and Learned [1]. In a real world image there is usually a strong correlation between the neighboring pixels. L3 uses this correlation to generate a higher resolution image from the lower resolution image pixels. It uses machine learning to efficiently learn this dependence present in the data. L3 consists of two steps: Rendering and Learning. The rendering step adaptively selects from a stored table of linear transforms to convert the low resolution pixel data into higher resolution image pixels. The learning step learns and stores these linear transforms used in the rendering step. The algorithm is illustrated in the figure shown below [1].

Rendering

In the rendering step, a $N*N$ patch, $n(x,y,p)$ is selected, centered around the pixel $(x,y)$ in the low resolution sensor data. Then we classify the pixel into one of the predefined classes, $c$ , based on the mean intensity level, pixel color and contrast. Finally, we apply the appropriate linear transform, $T(c,r)$ , for the class $c$ and output channel $r$ , to get the rendered output $o(x,y,r)$ .

$o(x,y,r)=T(c,r)n(x,y,p)$

The computation is repeated independently for each pixel in the low resolution sensor image. The figure below shows the output of the rendering step for a Bayer pattern sensor array input. In the figure below, the big RGGB (red, green, blue) pixels in the background correspond to the low resolution sensor data. For each of these four input pixels, there are four output pixels (R, G, G, B), shown by the smaller squares in the foreground.

Learning

We use the Image Systems Engineering Toolbox (ISET) to produce the training data for our machine learning model. Using ISET, we generate the low and high resolution sensor data and images for the scenes in the training dataset. The purpose of the training step is to learn the transforms for the various classes and the output channels. First, we classify all the input pixels into their respective classes and then compute the transforms for each class independently to minimize a predefined loss function (error) $L$ , between the target image and the transformed sensor data.

$minimize_{\{T_{i}\}}\sum _{j\in C_{i}}^{}L(y_{j},x_{j}T_{i})$

Here, $x_{j}$ is a $N^{2}$ dimensional row vector containing the $j^{th}$ patch data from the sensor belonging to the class $C_{i}$ and $y_{j}$ is a row vector containing the corresponding output values in the target upsampled image. Let $X_{i}$ and $Y_{i}$ be the matrices obtained by stacking the corresponding $x_{j}$ and $y_{j}$ data. Now, we define the loss function $L$ using a regularized RMSE (root mean square error) as follows:

$L_{i}=\left\|Y_{i}-X_{i}T_{i}\right\|^{2}+\lambda \left\|T_{i}\right\|^{2}$ ,

where $\lambda$ is a parameter used to regularize the kernel coefficients and avoid noise magnification. Now, the above error can be minimized using the following closed form expression for $T_{i}$

$T_{i}=(X_{i}^{T}X_{i}+\lambda I)^{-1}X_{i}^{T}Y_{i}$ .

Results

We use the following two approaches based on the L3 technique as described above to enhance the spatial resolution of an input image.

Approach 1

Here, the rendering step takes the low resolution sensor data to produce sensor data corresponding a higher spatial resolution. Then, we use the standard ISET image processing functionalities to convert the upsampled sensor data to the target color space. For illustration, we quadruple the total number of pixels i.e., the upsampled image has twice the number of rows and columns. We use a $5*5$ sized patch, $10*4$ classes corresponding to 10 linearly spaced illuminant levels and 4 color filters in the Bayer array. The algorithm is trained on 5 scenes containing different faces. The figure below shows the upsampled image produced by the trained algorithm on a test scene (not present in the training set). The left panel shows the low resolution image, the middle panel is the rendered upsampled image and the right panel is the target higher resolution image.

The CIELAB error $\Delta E$ between the target and rendered upsampled image is plotted below. As expected, the error is larger in the regions containing higher spatial variations (e.g. near the flowers on the right). For this particular test image $\Delta E_{av}=2.36$ . Overall, the rendered image is an improvement over the lower resolution image.

A few of the kernels learned are shown in the figure below. The two kernels take an input patch centered around a red pixel (right panel) and produce a corresponding pixel in the red channel. The left panel shows the kernel weights for the lowest intensity class and the middle panel shows the kernel weights for a higher intensity class. As, can be seen from the plots, the weights vary significantly with the illumination level.

Approach 2

In this approach, we extend the existing ISET L3 package to include upsampling. Here, the rendering step takes the low resolution sensor data to directly produce the pixel values in the target color space for the higher resolution image. Unlike the previous approach, image processing is included in the rendering step, i.e., the final pixel values are directly generated from the low resolution sensor data. This is done by training kernels that multiply the low resolution sensor data patches to yield RGB values of sets of pixels in the target image instead of the RGB value of just the center pixel. A new class object is introduced, l3DataUpsample() with a property upsample, which is the factor by which the image is upsampled. A new method, l3DataGetUpsampling() is introduced to obtain high resolution sensor data from the input scenes. Further, l3ClassifyFast() is modified to provide the correct target data patches when the upsampling factor is not 1.

When upsample = 2, similar to the previous approach, the upsampled image is twice the dimensions of the input image. We use a $5*5$ sized patch, $20*2*4$ classes corresponding to 20 illuminant levels, 2 contrast levels and 4 color filters in the Bayer pixel array. The algorithm is trained on 5 scenes at 3 exposure times (15 input images). The figure below shows the upsampled image produced by the trained algorithm on a test scene (not present in the training set). The left panel shows the low resolution image, the middle panel is the rendered upsampled image and the right panel is the target higher resolution image.

The CIELAB error $\Delta E$ between the target and rendered upsampled image is plotted below. Again, there is considerable error in the regions containing higher spatial variations (e.g. near the flower). For this particular test image $\Delta E_{av}=0.44$ and $\Delta E_{max}=4.78$ . This approach performs much better than the previous approach, which is expected as there are more classes here and also as it includes the image processing step.

Conclusions & Future Work

$L^{3}$ is a powerful tool to efficiently perform various image processing tasks such as image resolution enhancement as illustrated in this project. We adopted two different approaches to the upsampling problem. In the first approach, the low resolution sensor data is fed into the algorithm which then generates the high spatial resolution sensor data. Whereas, in the second approach, the algorithm directly outputs the high resolution image data (XYZ), with the sensor data as the input. In the second approach no further image processing step is needed and it seems to perform better than the first approach.

Once the training is done, the inference or rendering step is quite fast and could be efficiently done in a parallelized way on a GPU. To explore further possibilities with such a scheme, deep neural network architectures could be employed. In such a deep learning scheme, there is no need to predefine various classes as it can learn the features itself much more efficiently.

Appendix

Media:Approach_1.zip: Matlab script files and training and test scene data for the training and rendering steps in Approach 1.

References

1. Haomiao Jiang, Qiyuan Tian, Joyce Farrell, and Brian A. Wandell. "Learning the image processing pipeline." IEEE Transactions on Image Processing 26, no. 10 (2017): 5032-5042.

2. Yaniv Romano, John Isidoro, and Peyman Milanfar. "RAISR: rapid and accurate image super resolution." IEEE Transactions on Computational Imaging 3, no. 1 (2017): 110-125.

Image Upsampling using L3

Contents

Introduction

L3 Method

Rendering

Learning

Results

Approach 1

Approach 2

Conclusions & Future Work

Appendix

References

Navigation menu

Image Upsampling using L3

Introduction

L3 Method

Rendering

Learning

Results

Approach 1

Approach 2

Conclusions & Future Work

Appendix

References

Navigation menu

Search