Accelerating Denoising at the Speed of Light: Difference between revisions

From Psych 221 Image Systems Engineering
Jump to navigation Jump to search
Samidhm (talk | contribs)
Samidhm (talk | contribs)
Line 26: Line 26:
To perform this rendering, we use Blender's Cycle path tracer software. Using Blender's scripting interface, we automatically iterate through the different configurations, place the camera at these positions and the render the required data. To enable automatic saving of these features, Blender's compositing nodes are used.  
To perform this rendering, we use Blender's Cycle path tracer software. Using Blender's scripting interface, we automatically iterate through the different configurations, place the camera at these positions and the render the required data. To enable automatic saving of these features, Blender's compositing nodes are used.  


For training, validation, and testing purposes, the dataset is divided into a 60\%-20\%-20\% ratio. However, instead of randomly splitting the examples, we carefully handpick the points for each subset to ensure a more balanced and representative distribution. The selection process is designed to maintain significant scene diversity across all splits, which is crucial for effective model evaluation.
For training, validation, and testing purposes, the dataset is divided into a 60%-20%-20% ratio. However, instead of randomly splitting the examples, we carefully handpick the points for each subset to ensure a more balanced and representative distribution. The selection process is designed to maintain significant scene diversity across all splits, which is crucial for effective model evaluation.


This approach is necessary because images rendered from the same point in the scene tend to be highly similar. If the same points were included in both the training and test sets, it could lead to an unrealistic assessment of the model’s performance, as the model might memorize specific points rather than learning to generalize. By ensuring that the points in each subset are distinct, we can better evaluate the model’s ability to generalize to new, unseen viewpoints, thus providing a more reliable measure of its denoising capabilities.
This approach is necessary because images rendered from the same point in the scene tend to be highly similar. If the same points were included in both the training and test sets, it could lead to an unrealistic assessment of the model’s performance, as the model might memorize specific points rather than learning to generalize. By ensuring that the points in each subset are distinct, we can better evaluate the model’s ability to generalize to new, unseen viewpoints, thus providing a more reliable measure of its denoising capabilities.

Revision as of 07:00, 13 December 2024

Introduction

In computer graphics, real-time ray tracing has become widely adopted for generating high-quality visuals in applications like gaming and interactive simulations. A significant challenge in ray tracing is that using a low number of samples per pixel often results in noisy images, limiting their practical use. Achieving high-quality images typically requires ray tracing with a large number of samples per pixel, which demands substantial computational power and makes real-time generation difficult. Consequently, there is a growing need for effective noise reduction techniques for images rendered with fewer samples per pixel. Efficient denoising can produce high-quality images that preserve scene realism while optimizing computational resources.

Background and Problem Setup

Background

While applications such as gaming typically render high-resolution images (e.g., 1080p, 4K), recent advancements in fields like robotics have created a demand for extremely fast, real-time rendering of low-resolution images \cite{7019765}, \cite{8860966}. This project specifically addresses this challenge, focusing on developing high-quality and efficient denoising techniques for low-resolution ray-traced images.

Problem Definition

Given a 64x64 image rendered with one sample per pixel, along with other features that can be obtained using similar computational resources, we propose a denoising framework capable of producing a 64x64 output image that closely matches the quality of a ground-truth image rendered with 512 samples per pixel. Our framework is evaluated primarily based on two key criteria:

Quality

The generated image should closely replicate the realism and quality of the ground-truth image. Quality is assessed using Peak Signal-to-Noise Ratio (PSNR).

Performance

The system should be computationally efficient. Performance is evaluated by the number of frames it can denoise per second, serving as a secondary metric.

Approach

Dataset Generation

To meet the training and evaluation requirements of our framework, we have developed a comprehensive dataset featuring both low- and high-quality ray-traced images. We selected a scene rich in detail, including various objects, textures, lighting, and colors, using the exterior of the Amazon Lumberyard scene \cite{ORCAAmazonBistro} as our setting. The dataset consists of 12,600 pairs of 64x64 images, where each pair includes a low-quality image rendered with one sample per pixel and a corresponding high-quality image rendered with 512 samples per pixel.

To generate these images, we strategically chose 35 key points within the scene to ensure broad coverage. At each point, we rendered images at 15 different heights and 24 distinct 3D angles. Additionally, we included auxiliary features such as depth maps, normal maps, albedo, direct gloss, and indirect gloss reflecting light within the scene, all of which contribute to more accurate denoising.

To perform this rendering, we use Blender's Cycle path tracer software. Using Blender's scripting interface, we automatically iterate through the different configurations, place the camera at these positions and the render the required data. To enable automatic saving of these features, Blender's compositing nodes are used.

For training, validation, and testing purposes, the dataset is divided into a 60%-20%-20% ratio. However, instead of randomly splitting the examples, we carefully handpick the points for each subset to ensure a more balanced and representative distribution. The selection process is designed to maintain significant scene diversity across all splits, which is crucial for effective model evaluation.

This approach is necessary because images rendered from the same point in the scene tend to be highly similar. If the same points were included in both the training and test sets, it could lead to an unrealistic assessment of the model’s performance, as the model might memorize specific points rather than learning to generalize. By ensuring that the points in each subset are distinct, we can better evaluate the model’s ability to generalize to new, unseen viewpoints, thus providing a more reliable measure of its denoising capabilities.

Model Architecture

Model Training

Input Feature Design

Loss Function

Training Schedule

Performance Optimization

Results

Conclusions

Appendix

You can write math equations as follows: y=x+5

You can include images as follows (you will need to upload the image first using the toolbox on the left bar, using the "Upload file" link).