Image Upsampling using L3
Introduction
Image resolution enhancement is an important problem with many practical applications. It enables efficient image compression for storage and transfer over a network. It also offers the possibility of enhancing an image captured using a lower resolution camera before displaying it on a bigger or higher resolution screen. With the ubiquitous cellphone cameras and high resolution displays nowadays, having a fast and accurate technique to upsample a low resolution image has become quite important.
Background
L3 stands for Linear, Local and Learned. In a real world image there is usually a strong correlation between the neighboring pixels. L3 uses this correlation to generate a higher resolution image from the lower resolution image pixels. It uses machine learning to efficiently learn this dependence present in the data. L3 consists of two steps: Rendering and Learning. The rendering step adaptively selects from a stored table of linear transforms to convert the low resolution pixel data into higher resolution image pixels. The learning step learns and stores these linear transforms used in the rendering step. The algorithm is illustrated in the figure shown below.
Rendering
In the rendering step, a patch, is selected, centered around the pixel in the low resolution sensor data. Then we classify the pixel into one of the predefined classes, , based on the mean intensity level, pixel color and contrast. Finally, we apply the appropriate linear transform, , for the class and output channel , to get the rendered output .
The computation is repeated independently for each pixel in the low resolution sensor image. The figure below shows the output of the rendering step for a Bayer pattern sensor array input. In the figure below, the big RGGB (red, green, blue) pixels in the background correspond to the low resolution sensor data. For each of these four input pixels, there are four output pixels (R, G, G, B), shown by the smaller squares in the foreground.
Learning
We use the Image Systems Engineering Toolbox (ISET) to produce the training data for our machine learning model. Using ISET, we generate the low and high resolution sensor data and images for the scenes in the training dataset. The purpose of the training step is to learn the transforms for the various classes and the output channels. First, we classify all the input pixels into their respective classes and then compute the transforms for each class independently to minimize a predefined loss function (error) , between the target image and the transformed sensor data.
Here, is a dimensional row vector containing the patch data from the sensor belonging to the class and is a row vector containing the corresponding output values in the target upsampled image. Let and be the matrices obtained by stacking the corresponding and data. Now, we define the loss function using a regularized RMSE (root mean square error) as follows:
,
where is a parameter used to regularize the kernel coefficients and avoid noise magnification. Now, the above error can be minimized using the following closed form expression
We use the following two approaches based on L3 to enhance the spatial resolution of an input image.
Methods
Results
Conclusions
Appendix
You can write math equations as follows:
You can include images as follows (you will need to upload the image first using the toolbox on the left bar, using the "Upload file" link).