LuppescuShah: Difference between revisions

Latest revision as of 19:59, 16 December 2016

Introduction

From human gesture classification to 3D modeling, the use of depth information for important research and industrial applications is ubiquitous in today’s world. A fundamental problem with depth cameras is the production of holes in the resultant depth image. Holes are formed in depth maps due to the spatial separation between the IR camera and the IR projector, which can lead to object occlusion, temporal inconsistencies, and corruption at object edges [1]. Holes in a depth map can compromise the effectiveness of other algorithms using the depth map for other processes, which is why it is imperative that the holes must be filled.

Once the holes are filled, other image processing techniques are more tractable. Image segmentation is a crucial problem in computer vision, image processing, and many other fields that strive to classify objects in an image. While there are segmentation methods based solely on RGB images, adding depth as a new mode of data can make image segmentation much more accurate and robust.

Another technique that is often used in photography is depth of field, which is determined by various camera parameters. There are features in photo editing softwares like Photoshop which allow manual blurring of photos to simulate depth of field, but this can be a very tedious process for multiple images. Given depth maps, however, it is possible to simulate depth of field as an automatic post processing effect.

One more effect that is common in photography is the tilt-shift effect, which is used as an artistic application to “miniaturize” objects in a photo. Using depth maps, it is also possible to simulate this effect.

In this project, we implement hole filling algorithms, depth map segmentation algorithms, a depth of field simulator, and a tilt-shift simulator. All of these algorithms are integrated into an interactive GUI in MATLAB for efficient testing and demonstration of the provided algorithms.

Background

Given a pixel with missing information, hole filling can be viewed as an interpolation problem, where neighboring pixel information can be used to infer the value of the missing pixel. In this project, we implemented two basic interpolation methods -- mean filtering and median filtering -- as they worked well for most of the cases and allowed us to focus on other algorithms as well. There are more complicated hole filling algorithms such as hierarchical hole filling [2] and colorization [3]. Hierarchical hole filling eliminates the need for any smoothing or filtering of the depth map by implementing a pyramid-like approach to estimate the hole pixels. Colorization using optimization is a well known technique that uses a grayscale image and some color hints as guides to color a grayscale image. This can be extended to hole filling for depth maps by using a corresponding RGB image as a guide to fill up the holes in the depth map based on the known depth values.

Image segmentation is still an active research problem that has been studied extensively. Classical methods like K-means, mean shift, and n-cuts are fairly robust for many types of segmentation problems, but also have many weaknesses when segmenting RGB images. For these methods, problems arise when there is not much color variance, there are complicated shapes, or the image contains multiple occlusions. Depth information can be used along side RGB images to increase the robustness of these segmentation algorithms.

Simulating depth of field as an automatic post-processing effect is a very difficult problem without depth information. Usually, depth of field has to be manually simulated using professional software like Photoshop. Automatic depth of field simulation without depth information would require elaborate and robust segmentation and object detection algorithms, and accurate estimations of depth solely based on the RGB image. The inclusion of depth maps significantly simplifies this problem, allowing for efficient and simple algorithms to simulate depth of field.

Methods

Image Acquisition

The RGB and depth images were acquired using two cameras. We used librealsense [4] and OpenCV for this purpose. The first camera was an Intel RealSense SR300 camera, which was used to capture short range depth maps. The second camera was an Intel RealSense R200 camera, which was used to capture longer range depth maps.

Hole Filling

The pixels in a depth image with values equal to zero were considered to be holes.

Mean filter-based Hole Filling

We implemented a basic hole-filling algorithm using a mean filter. A mean filter of size NxN updates the central pixels with the mean of the NxN neighborhood around the central pixel. However, directly applying a mean filter to the entire image updates not only the value of the holes but also the value of the pixels with valid depth values. Hence, we first find the pixels that correspond to holes i.e. have value = 0, and apply the mean filter to update values of only the pixels that correspond to the holes. Also, the neighborhood of a hole may have holes as well. For faster convergence, it is useful not to include the holes in the neighborhood while calculating the mean. Thus, for each pixel corresponding to a hole, this can be written mathematically as -

Median filter-based Hole Filling

Hole-filling can also be implemented using the algorithm mentioned by using a median filter instead of a mean filter. A median filter updates the value of a pixel to the median of the values in a neighborhood around the pixel. Again, to speed-up the convergence, one should only consider the non-zero values to calculate the median.

In summary, the hole filling pipeline is given below -

Segmentation

K-means Segmentation

K-means is one of the simplest image segmentation algorithms. The main purpose of this algorithm is to cluster data. In our case, we want to cluster pixels by depth such that pixels at similar depths will be in the same cluster. A theoretical treatment of this problem ultimately leads to a two-step algorithm:

In step 1, each data point is assigned to the cluster with the closest centroid. Step 2 then recalculates the cluster centroids given the new assignments from step 1. These two steps are repeated until convergence, i.e. the change in cluster centers is below a certain threshold.

Mean shift Segmentation

As an alternative to K-means, we also implemented the mean shift segmentation algorithm. The mean shift algorithm seeks local maxima of densities in a given distribution, i.e. it seeks local modes in distributions.

Consider the following figure in a feature space of dimension 2:

Fig 2. Example of two iterations of the meanshift algorithm [5]

In the first figure, the blue point is the centroid of the previous search area and the orange point is the new centroid of the current search area. At each iteration, the center of the search area is shifted to the new centroid, and this occurs until convergence, where the difference between the new centroid and the old centroid is below some threshold. In the case of depth map images, the feature space is simply the 1D histogram of the grayscale values.

In general, the pseudo code for the mean shift algorithm is as follows:

Until all pixels have been “seen”:
- For a given search radius:
  - Choose a starting point that hasn’t been “seen” (in our case, the grayscale value of one of the pixels in our depth image)
  - Compute the centroid of the data in the search window
  - Mark all pixels in search radius as “seen”
  - Center the search window at the new centroid location
  - Repeat until convergence
- Once converged:
  - If the current search radius does not overlap with any existing cluster
    - Declare a new cluster
  - Otherwise:
    - Do not decare a new cluster
Assign all pixels to closest cluster centers

The main point of mean shift is that instead of specifying a fixed number of clusters (like in K-means), one must simply specify a search radius. Intuitively, the smaller the search radius, the more clusters you will have, as there will be fewer overlapped clusters.

Simulating Depth of Field

Depth of field is a region around the focal plane that appears acceptably sharp in an image. Sometimes it is desirable to have certain parts of an image in focus and others out of focus like, for example, focusing on the foreground while blurring the background. For this purpose, we need a shallow depth of field. For images where almost everything in the scene should be in focus, we need a larger depth of field. Typically, depth of field depends on the type of lens, aperture, and focal length.

Since depth maps give us information about how far objects in a scene are, we figured we could use that information to simulate depth of field. Given a reference point from the depth map, the amount of blur applied to each pixel is determined by the distance of that pixel from the reference pixel. The farther away the pixel from the reference pixel, the more blur applied. Essentially, we made the variance of a Gaussian blurring kernel proportional to how far a pixel was from a reference pixel.

Simulating Tilt-shift

Tilt-shift is achieved by tilting and shifting the camera such that one can selectively focus a region of an image. This effect can be simulated by taking a wedge-shaped plane to be in focus and by blurring out the other parts of an image relative to the plane. For the blurring aspect of tilt-shift, we incorporate depth information for more realistic blurring like we did for depth of field.

Testing GUI

To expedite testing our various algorithms, we implemented a GUI in MATLAB that incorporates all of the aforementioned methods into a single window. An example of the GUI can be seen in figure 3.

The GUI displays an RGB image and its corresponding depth map, and has pushbuttons and parameter inputs for mean and median filtering, K-means segmentation, mean shift segmentation, simulation of DOF, and simulation of the tilt-shift effect. When a user clicks the depth of field pushbutton, a selection cursor appears, where the user must choose a point in the RGB image as a point of reference. When a user clicks on the tilt-shift pushbutton, a selection cursor appears, where the user must choose a point in the RGB image as the center point of the in-focus wedge of the image with specified width and angle.

Results

Hole Filling

Given below are the results we obtained for the mean filter-based hole filling algorithm. The image was taken using the Intel RealSense SR300 camera. 4 iterations of hole filling were applied on the depth image and the size of the neighborhood used was 3x3. It can be seen that after each iteration more holes are filled, but at the same time, there is a loss of detail and blurring of the edges. Hence, there is a trade-off between the amount of hole filling and sharpness in the image. It is important to note that warmer colors like red represent higher depth values and cooler colors like blue represent smaller depth values. Thus, the darkest shade of blue (the color that covers most area in the image) corresponds to zero values and hence, holes. The background is mostly made up of zero values as that distance in the scene is out of the range of the short-range camera.

Given below are the results for the median filter-based hole filling algorithm. The results are almost identical to the mean filter-based algorithm -

Fig 5. Results for 4 iterations of median filter-based hole filling

Segmentation

The results below show how depth maps can be useful to separate the foreground from the background using a simple algorithm like K-means segmentation. Foreground segmentation becomes rather complicated for RGB images without any depth information. For this experiment, we simply segmented the depth map into two clusters using K-means segmentation. On comparing the RGB image and depth map, it can be clearly seen how the K-means segmentation result is a foreground-background mask (with red being the background and blue being the foreground).The next section shows how segmented depth maps can be used for post-processing effects.

K-means segmentation did not work well for long range depth maps. This is because, for scenes which cover longer ranges of distances, there are more objects of interest and hence, a larger number of potential clusters. K-means segmentation was not effective when the number of clusters was large. Hence, for images taken from the long range depth camera i.e. Intel RealSense R200, we tried the mean shift segmentation algorithm, which resulted in better segmentations of the scene. Given below is the resulting image obtained by segmenting the depth map using the mean shift algorithm. It can be seen that the mean shift algorithm discretizes a continuous depth map and isolates regions of interest, like, for example, the body of the person in the image. Again, the next section shows how segmentation can make depth maps more useful and effective for post-processing effects like depth of field (the difference between colors may not be evident because of the range of values).

Fig 7. Segmenting depth maps using the mean shift algorithm

Simulating Depth of Field

To simulate depth of field, we need to know the plane in focus. We added a functionality in our GUI where the user can click on the object of interest and thus, select the focal plane. The depth maps can then be used to blur the other objects depending on their distance from the object of interest. The greater the relative distance, the stronger the Gaussian blur. Given below is a simple example demonstrating the depth of field effect. The first image is the RGB image taken by the Intel RealSense SR300 camera. The second image is the corresponding depth map. There are only two objects in the scene - a can and a box, and they are separated well as seen in the depth map. For the third image, the can was chosen as the object of interest. It can be seen how the rest of the image is blurred giving a very good depth of field effect. The fourth image has the box as the object of interest.

Fig 8. Depth-of-field simulation on a simple test scene

The previous example showed how we can leverage depth maps to simulate depth of field very easily, which would otherwise require manual editing or sophisticated algorithms. Here, we demonstrate how one can go further and combine segmentation and depth of field to create different kinds of effects. For this example, we take the same image used for the hole-filling results. We then take the hole-filled depth map and choose the water bottle in the front as the object of interest. It can be seen from the first image how the water bottle is in focus, while the rest of the image is blurred. However, sometimes a user would want the entire foreground in focus, while the background is blurred. Achieving this without depth information is tedious and it would require a complicated foreground-background segmentation algorithm or manual intervention. Using a depth map, this is simply a two-step process where we first segment the depth map into background and foreground using the K-means algorithm with the number of clusters = 2. Then, we use the segmented result as a mask to perform depth of field (with anything in the foreground chosen as an object of interest). The second image is the result after performing the above two steps, and it can be seen how everything in the foreground is in focus and the background is blurred.

Fig 9. Using segmentation to blur the background and keep the foreground in focus

This example shows two example results for different blurring factors. A larger blurring factor is proportional to a smaller depth of field, which is why the box in the image with a blurring factor of 20 is more out of focus than the box in the image with a blurring factor of 3.

Fig 10. Using segmentation to blur the background and keep the foreground in focus

Finally, we use the RGB image and depth map taken using the long range camera, and also the mean shift segmentation result computed earlier to segment out the person from the scene. There are some artifacts near the edges which is because of the large number of holes, usually occurring in depth maps from long range cameras. The resulting image is shown below -

Fig 11. Using mean-shift segmentation followed by depth of field for images taken using the long range camera

Simulating Tilt-shift

The GUI expects the user to input two parameters - width of the wedge-shaped plane in pixels and angle of inclination of the plane. The user can then click on the point of interest, and a mask of the region in the image that will remain in focus is computed. Based on the mask, the wedge-shaped region is kept in focus and the rest of the image is blurred according to the same principle used in depth of field. Given below is an image of a chair taken from a long distance using a long range camera. One can see a "miniaturization" effect in the resultant image, where the chair is perceived smaller because of the tilt-shift blurring (Note: if you can't see the miniaturization effect, try moving farther from your computer screen).

Fig 12. Tilt-shift effect which makes the chair look miniature

Conclusion & Future Work

In this project, we explored different hole filling procedures for depth maps, implemented different segmentation algorithms, and simulated depth of field and tilt-shift photography as post processing effects based on depth maps. Hole filling worked very well for the close range images. For future work, an extension of hole filling could be to implement more complex hole filling algorithms like colorization and hierarchical hole filling and to compare the results to the simpler methods presented in this project. We are happy we explored both K-means and mean shift segmentation, as both seemed to work well in different scenarios. For the short range camera, we achieved foreground detection by using K-means with number of clusters = 2. For the longer range depth image, we used mean shift with a radius of 0.04. K-means failed to segment out the body of the person standing in the image from the rest of the clutter in the long range image, but mean shift did reasonably well. The downside to mean shift is that it is much more computationally expensive. For future work, students could explore other types of segmentation algorithms for depth maps like the n-cuts segmentation algorithm.

Depth maps allowed us to automatically simulate depth of field given a point of interest, which would be quite tedious given only RGB information. We noticed that there were edge effects when computing depth of field, creating a “glowing” effect around an object in focus. For future work, students could try to improve this depth of field algorithm by accounting for those edge artifacts to reduce the “glowing.” Also, we chose an arbitrary amount to blur based on distance, so another extension of this project could be to incorporate camera information like focal length and f-number to simulate how images would look with different types of cameras.

Lastly, the tilt-shift algorithm works reasonably well, as there is a miniaturization effect when the algorithm is performed correctly on an image. However, we were not able to incorporate a large amount of depth information because, in order for tilt-shift to work, the object needs to be sufficiently far away -- which is mostly outside of the range of the long-range depth camera.

References

[1] Bapat, Akash, Adit Ravi, and Shanmuganathan Raman. "An iterative, non-local approach for restoring depth maps in RGB-D images." Communications (NCC), 2015 Twenty First National Conference on. IEEE, 2015.
[2] Solh, Mashhour, and Ghassan AlRegib. "Hierarchical hole-filling for depth-based view synthesis in FTV and 3D video." IEEE Journal of Selected Topics in Signal Processing 6.5 (2012): 495-504.
[3] Levin, Anat, Dani Lischinski, and Yair Weiss. "Colorization using optimization." ACM Transactions on Graphics (TOG). Vol. 23. No. 3. ACM, 2004.
[4] https://github.com/IntelRealSense/librealsense
[5] http://web.stanford.edu/class/cs231a/lectures/lecture13_segmentation_scene_understanding.pdf

Appendix

Appendix 1

Source Code: Code
(Note: "main.cpp" and "example.hpp" are the files used for image acquisition and they require librealsense and OpenCV to be installed. All the other files run on MATLAB.)

Appendix 2

Work breakdown:
Raj - Got cameras working so we could get depth images from both the long range and short range cameras.
Greg - Implemented GUI.
Both - All the algorithms were implemented together.

@@ Line 40: / Line 40: @@
 ====K-means Segmentation====
 K-means is one of the simplest image segmentation algorithms. The main purpose of this algorithm is to cluster data. In our case, we want to cluster pixels by depth such that pixels at similar depths will be in the same cluster. A theoretical treatment of this problem ultimately leads to a two-step algorithm:
-[[File:Picture11.png|center]
+[[File:Picture11.png|center]]
 In step 1, each data point is assigned to the cluster with the closest centroid. Step 2 then recalculates the cluster centroids given the new assignments from step 1. These two steps are repeated until convergence, i.e. the change in cluster centers is below a certain threshold.
@@ Line 55: / Line 55: @@
 *Until all pixels have been “seen”:
 **For a given search radius:
-***Choose a starting point that hasn’t “seen” (in our case, the grayscale value of one of the pixels in our depth image)
+***Choose a starting point that hasn’t been “seen” (in our case, the grayscale value of one of the pixels in our depth image)
 ***Compute the centroid of the data in the search window
 ***Mark all pixels in search radius as “seen”
@@ Line 99: / Line 99: @@
 <br>
-K-means segmentation did not work well for long range depth maps. This is because, for scenes which cover longer ranges of distances, there are more objects of interest and hence, a larger number of potential clusters. K-means segmentation was not effective when the number of clusters was large. Hence, for images taken from the long range depth camera i.e. Intel RealSense R200, we tried the mean shift segmentation algorithm resulting in better segmentations of the scene. Given below is the resulting image obtained by segmenting the depth map using the mean shift algorithm. It can be seen that the mean shift algorithm discretizes a continuous depth map and isolates regions of interest, like, for example, the body of the person in the image. Again, the next section shows how segmentation can make depth maps more useful and effective for post-processing effects like depth of field (the difference between colors may not be evident because of the range of values).
+K-means segmentation did not work well for long range depth maps. This is because, for scenes which cover longer ranges of distances, there are more objects of interest and hence, a larger number of potential clusters. K-means segmentation was not effective when the number of clusters was large. Hence, for images taken from the long range depth camera i.e. Intel RealSense R200, we tried the mean shift segmentation algorithm, which resulted in better segmentations of the scene. Given below is the resulting image obtained by segmenting the depth map using the mean shift algorithm. It can be seen that the mean shift algorithm discretizes a continuous depth map and isolates regions of interest, like, for example, the body of the person in the image. Again, the next section shows how segmentation can make depth maps more useful and effective for post-processing effects like depth of field (the difference between colors may not be evident because of the range of values).
 [[File:GRMeanShift.png|frame|center|Fig 7. Segmenting depth maps using the mean shift algorithm]]
@@ Line 123: / Line 123: @@
 ==Conclusion & Future Work==
-In this project, we explored different hole filling procedures for depth maps, implemented different segmentation algorithms, and simulated depth of field and tilt-shift photography as post processing effects based on depth maps. Hole filling worked very well for the close range images. For future work, an extension of hole filling could be tonimplement more complex hole filling algorithms like colorization and hierarchical hole filling and to compare the results to the simpler methods presented in this project. We are happy we explored both K-means and mean shift segmentation, as both seemed to work well in different scenarios. For the short range camera, we achieved foreground detection by using K-means with number of clusters = 2. For the longer range depth image, we used mean shift with a radius of 0.04. K-means failed to segment out the body of the person standing in the image from the rest of the clutter in the long range image, but mean shift did reasonably well. The downside to mean shift is that it is much more computationally expensive. For future work, students could explore other types of segmentation algorithms for depth maps like the n-cuts segmentation algorithm.
+In this project, we explored different hole filling procedures for depth maps, implemented different segmentation algorithms, and simulated depth of field and tilt-shift photography as post processing effects based on depth maps. Hole filling worked very well for the close range images. For future work, an extension of hole filling could be to implement more complex hole filling algorithms like colorization and hierarchical hole filling and to compare the results to the simpler methods presented in this project. We are happy we explored both K-means and mean shift segmentation, as both seemed to work well in different scenarios. For the short range camera, we achieved foreground detection by using K-means with number of clusters = 2. For the longer range depth image, we used mean shift with a radius of 0.04. K-means failed to segment out the body of the person standing in the image from the rest of the clutter in the long range image, but mean shift did reasonably well. The downside to mean shift is that it is much more computationally expensive. For future work, students could explore other types of segmentation algorithms for depth maps like the n-cuts segmentation algorithm.
 Depth maps allowed us to automatically simulate depth of field given a point of interest, which would be quite tedious given only RGB information. We noticed that there were edge effects when computing depth of field, creating a “glowing” effect around an object in focus. For future work, students could try to improve this depth of field algorithm by accounting for those edge artifacts to reduce the “glowing.” Also, we chose an arbitrary amount to blur based on distance, so another extension of this project could be to incorporate camera information like focal length and f-number to simulate how images would look with different types of cameras.

LuppescuShah: Difference between revisions

Latest revision as of 19:59, 16 December 2016

Contents

Introduction

Background