Simultaneous Color Holographic Display

Introduction

A holographic display is a type of display system that produces 2D or 3D images by manipulating the wavefront of light. By using a spatial light modulator (SLM), a holographic display can adjust the phase of a coherent wavefront at the pixel level. This capability enables it to reshape the wavefront precisely as it would originally emanate from a real object, thereby creating an image with authentic depth cues. Due to time constraints, this project focuses solely on using the holographic display to show 2D images. As a result, we do not need to address the 'color replicas' problem caused by the simultaneous color scheme. [1-2]

Holographic displays typically use a laser as a light source, producing monochromatic holograms. To achieve color holograms, color holographic displays sequentially switch between RGB lasers at a high frequency, leveraging the human eye's persistence of vision. This process allows the eye to fuse sequential monochromatic holograms into a perceived color hologram. However, this approach sacrifices the refresh rate of the SLM, as displaying a single frame of a color image requires three phase patterns—one for each RGB channel.

A recent paper [1] shows one potential solution to fully utilize the SLM’s refresh rate is to simultaneously activate the three primary laser lights and have the SLM modulate the phase of their wavefronts simultaneously using the same phase pattern. This approach could potentially enable the full utilization of the SLM’s refresh rate.

This project aims to evaluate the effectiveness of traditional phase retrieval pipelines in a simultaneous color setup and explore potential improvements in reconstruction quality by employing different loss functions.

Background

How to Derive the Phase of Light from Intensity?

Unlike conventional displays that directly control light intensity, holographic displays use a Spatial Light Modulator (SLM) to modulate the phase of light on a per-pixel basis. The modulated wavefront then propagates through free space from a starting plane $0$ to the image plane $z$ . Our objective is to determine the phase pattern on the SLM such that, at the image plane $z$ , the resulting intensity distribution matches a desired target intensity pattern.

Angular Spectrum Method

The Angular Spectrum Method is a computational technique used to model the propagation of wavefronts through free space. It can be represented as follows:

$u (x, y, z, λ) = ℱ^{- 1} {ℱ {u (x, y, 0, λ)} \cdot ℋ (k_{x}, k_{y}, z, λ)}$

$ℋ (k_{x}, k_{y}, z, λ) = {\begin{matrix} e^{- \frac{i 2 π}{λ} \sqrt{1 - (λ k_{x})^{2} - (λ k_{y})^{2}} z} & if \sqrt{k_{x}^{2} + k_{y}^{2}} < \frac{1}{λ}, \\ 0 & otherwise . \end{matrix}$

$u (x, y, 0, λ)$ represents the wavefield at the plane $z = 0$ . By applying the 2D Fourier transform to it, $ℱ {u (x, y, 0, λ)}$ , the wavefield is decomposed into a superposition of plane waves traveling in various directions. This continuous distribution of plane waves is referred to as the angular spectrum. The spatial frequencies $k_{x}$ and $k_{y}$ determine the propagation direction of each plane wave component.

As each plane wave propagates through free space, it accumulates a distance-dependent phase shift. This phase shift is described by the transfer function $ℋ (k_{x}, k_{y}, z, λ)$ . In the Fourier domain, by multiplying the angular spectrum $ℱ {u (x, y, 0, λ)}$ by $ℋ (k_{x}, k_{y}, z, λ)$ , we effectively propagate all plane wave components over the distance $z$ .

Finally, to reconstruct the propagated wavefield at $z$ , we apply the inverse Fourier transform: $ℱ^{- 1} {ℱ {u (x, y, 0, λ)} \cdot ℋ (k_{x}, k_{y}, z, λ)}$ .

Image Formation Model

In our holographic display setup, the coherent light source illuminating the SLM has a source field $u_{s r c} (x, y, λ)$ .

The phase-only SLM can apply a spatially varying delay $ϕ (x, y, λ)$ to the phase of the field $u_{s r c} (x, y, λ)$ , so the wavefield at the SLM becomes:

$u_{S L M} (x, y, λ) = e^{i q (ϕ (x, y, λ))} u_{s r c} (x, y, λ)$

The SLM is at the plane $z = 0$ , We can use the Angular Spectrum Method to model what the wavefield will look like at the image plane $z$ .

$u_{z} (x, y, λ) = A S M (u_{S L M} (x, y, λ), z)$

At the image plane, what people see is the intensity of light, not the wavefield. The light intensity can be obtained by squaring the magnitude of the wavefield.

$I_{z} (x, y, λ) = | u_{z} (x, y, λ) |^{2}$

In combination, the final light intensity distribution is: $I_{z} (x, y, λ) = | A S M (e^{i q (ϕ (x, y, λ))} u_{s r c} (x, y, λ), z) |^{2}$

For notational convenience, we can express the intensity pattern at the image plane $z$ as:

$𝑰_{𝒛} (𝒙, 𝒚, 𝝀) = | 𝑨 𝑺 𝑴 (𝒆^{𝒊 𝒒 (𝝓)}, 𝒛) |^{2}$

This is how we can obtain the light intensity at the image plane $z$ , by displaying a phase pattern $ϕ$ on the SLM.

Iterative Method to Derive the Phase of Light from Intensity

Now we know how to calculate light intensity from the phase pattern. However, to display images on the holographic display, we need to find a way to calculate the phase pattern from the light intensity. Gradient descent is commonly used to solve this type of inverse problem.

At iteration 0, we can generate a random phase pattern $ϕ^{0}$ . Using the image formation model derived in the previous section, we can calculate the intensity pattern at the image plane as $| A S M (e^{i q (ϕ)}, z) |^{2}$ . The light wave amplitude is then the square root of the intensity: $| A S M (e^{i q (ϕ)}, z) |$ .

If the target light intensity is $I_{t a r g e t}$ , the target light amplitude is the square root of the intensity: $a_{t a r g e t} = \sqrt{I_{t a r g e t}}$ . We can compare $| A S M (e^{i q (ϕ)}, z) |$ and $a_{t a r g e t}$ using a loss function: $ℒ (| A S M (e^{i q (ϕ)}, z) |, a_{t a r g e t})$ .

We can calculate the gradient of $ϕ$ from the loss function, $(\frac{\partial ℒ}{\partial ϕ})^{T} ℒ (| A S M (e^{i q (ϕ)}, z) |, a_{t a r g e t})$ , and iteratively update the value of $ϕ$ : $ϕ^{(k)} \leftarrow ϕ^{(k - 1)} - α (\frac{\partial ℒ}{\partial ϕ})^{T} ℒ (| A S M (e^{i q (ϕ^{(k - 1)})}, z) |, a_{t a r g e t})$ .

After sufficient iterations, the value of $ϕ$ should converge. In this way, we can obtain a phase pattern $ϕ$ that generates the target intensity $I_{t a r g e t}$ at the image plane $z$ .

Questions Related to The Simultaneous Color Scheme

Does the laser support turning on all three colors simultaneously?

The laser used in our hardware setup is the FISBA RGBeam. It has three diodes, each emitting red, green, and blue light. It is possible to turn on all three diodes simultaneously. The lasers, with different wavelengths, pass through the same optical fiber. After passing through the collimating optics, the SLM observes a plane wave in white color.

How can we use a single phase pattern to generate three different intensity patterns?

We model the propagation of light waves in free space using the angular spectrum method.

$u (x, y, z, λ) = ℱ^{- 1} {ℱ {u (x, y, 0, λ)} \cdot ℋ (k_{x}, k_{y}, z, λ)}$

$ℋ (k_{x}, k_{y}, z, λ) = {\begin{matrix} e^{- \frac{i 2 π}{λ} \sqrt{1 - (λ k_{x})^{2} - (λ k_{y})^{2}} z} & if \sqrt{k_{x}^{2} + k_{y}^{2}} < \frac{1}{λ}, \\ 0 & otherwise . \end{matrix}$

The transfer function $ℋ (k_{x}, k_{y}, z, λ)$ in the Angular Spectrum Method (ASM) is not only distance-dependent but also wavelength-dependent. As a plane wave propagates through free space, the phase accumulation of light waves varies depending on the wavelength of the light. As a result, even the SLM applies the same phase pattern simultaneously to red, green, and blue light, the final intensity distributions of these three lights at the image plane $z$ are still different. This phenomenon provides some degrees of freedom to use a single phase pattern to match three different target intensities at the image plane $z$ .

Figure 7: Same Phase Pattern; RGB Light Propagates Differently [2]

Methods

In the field sequential color scheme, we can generate three phase patterns to match the three color channels of the RGB target. In the simultaneous color scheme, we aim to match three color channels using just one phase pattern. While the Angular Spectrum Method suggests that it is possible to derive three different intensity patterns from a single phase pattern, there may not be enough degrees of freedom to perfectly match the target intensities. As a result, there may always be some errors between the reconstructed image and the target image. A perceptually driven loss function could be useful in this case. Instead of solely focusing on matching the light intensities, it could prioritize matching visual features that are more significant to human perception.

The codebase for this project, Time-multiplexed Neural Holography: A Flexible Framework for Holographic Near-eye Displays with Fast Heavily-quantized Spatial Light Modulators [3], provides a robust framework for solving the phase retrieval problem. It also allows for flexibility in modifying the loss function used during the gradient descent process. I modified the code pipeline so that, for a color target, only a single phase pattern is initialized. During each iteration of gradient descent, the corresponding light amplitudes of this phase pattern is compared to the RGB channel amplitudes three times, resulting in three updates to the phase per iteration.

L2 Loss

The mean square loss, also called the L2 loss, is the default loss function in the code base. It prioritizes minimizing the error between the reconstructed amplitude and the target amplitude.

$M S E = \frac{1}{n} \sum_{i = 1}^{n} (| A S M (e^{i q (ϕ)}, z) | - a_{t a r g e t})^{2}$

Reconstruction Results

Using the L2 loss, after 5000 iterations of gradient descent, the phase pattern, the corresponding reconstructed images, and the target images are shown below.

From the L2 loss reconstruction, we observe a noticeable color shift between the reconstructed and target images. These results indicate that using only one phase pattern does not provide enough degrees of freedom to match the target intensity.

CIELAB loss

From the class, we learned that CIELAB is a color space that is perceptually uniform. Perhaps we can convert the target intensities and reconstructed intensities into the CIELAB color space and calculate the L2 loss there, allowing us to prioritize color matching.

CIELAB Loss = $M S E (R G B 2 L A B (| A S M (e^{i q (ϕ)}, z) |^{2}), R G B 2 L A B (I_{t a r g e t}))$

To construct the RGB2LAB function, we first need to determine the RGB2XYZ matrix for our holographic display. We assume the wavelengths of the RGB lasers are 636 nm, 518 nm, and 441 nm, and the power they can achieve is 0.0035 Watts/sr/nm/m² for each. The following plot shows the spectral power distributions of our holographic display setup:

Using the ieXYZFromEnergy function from istcam, we can obtain the RGB2XYZ matrix for our holographic display setup:

RGB2XYZ = ieXYZFromEnergy(primaries', wavelength(:))'

RGB2XYZ = $[\begin{matrix} 1.2493 & 0.1147 & 0.8357 \\ 0.4976 & 1.6037 & 0.0581 \\ 0.4976 & 1.6037 & 0.0581 \end{matrix}]$

Once we converted the images from the RGB colorspace to the XYZ colorspace, the XYZ2LAB function are constructed by implementing the following equations we learnt on the class:

$L^{*} = {\begin{matrix} 116 {(\frac{Y}{Y_{w}})}^{1 / 3} - 16, & if \frac{Y}{Y_{w}} > 0.00856 \\ 903.3 (\frac{Y}{Y_{w}}), & otherwise \end{matrix}$

$a^{*} = 500 {{(\frac{X}{X_{w}})}^{1 / 3} - {(\frac{Y}{Y_{w}})}^{1 / 3}}$

$b^{*} = 200 {{(\frac{Y}{Y_{w}})}^{1 / 3} - {(\frac{Z}{Z_{w}})}^{1 / 3}}$

The white point we selected is D65, with approximate XYZ values as follows:

$X_{w} = 95.047, Y_{w} = 100.000, Z_{w} = 108.883$

Reconstruction Results

After switching the loss function to CIELAB loss, and following 5000 iterations of optimization, the phase pattern, corresponding reconstruction images, and target images are shown below:

From the CIELAB loss reconstruction results, we can observe that the color matching between the reconstructed and target images is improved. However, there are still some color shifts, and overall, the reconstructed image appears even noisier. It is possible that we simply lack sufficient degrees of freedom to achieve perfect color matching.

S-CIELAB loss

At this point, I feel that the one-phase, three-intensities problem is similar to an image compression problem: how to achieve similar visual quality under limited bandwidth.

In class, we learned that for high spatial frequency patterns, the human visual system is more sensitive to luminance changes than to color changes. We also discussed the Spatial CIELAB metric [4] in class. Spatial CIELAB [4] applies spatial filters based on the human visual system’s sensitivity to spatial frequencies. It might be a good idea to use S-CIELAB as the loss function. By ignoring high spatial frequency color differences, we can allocate more bandwidth to match what is most important to human vision.

Figure 9: The HVS Is More Sensitive to Luminance Changes Than to Color Changes

Figure 10: Spatial CIELAB Representation [4]

Convert Images from the RGB Color Space to the Opponent Color Space

In order to implement the S-CIELAB loss function, we first need to convert the RGB color space into the opponent color space. We calculated the RGB2XYZ matrix in the previous section

RGB2XYZ = $[\begin{matrix} 1.2493 & 0.1147 & 0.8357 \\ 0.4976 & 1.6037 & 0.0581 \\ 0.4976 & 1.6037 & 0.0581 \end{matrix}]$

The XYZ2OPP matrix is copied from the MATLAB implementation of SCIELAB-1996.

XYZ2OPP = $[\begin{matrix} 0.2787336 & 0.7218031 & - 0.106552 \\ - 0.4487736 & 0.2898056 & 0.0771569 \\ 0.0859513 & - 0.5899859 & 0.5011089 \end{matrix}]$

By sequentially multiplying the RGB2XYZ and XYZ2OPP matrices, we can convert the reconstructed and target images from RGB colorspace into the opponent colorspace.

Apply Spatial Filter to Each Color Channel

The opponent color space has three channels: $O 1$ , $O 2$ , and $O 3$ . $O 1$ represents luminance, $O 2$ represents the contrast between red and green, and $O 3$ represents the contrast between blue and yellow. Since humans have different spatial frequency sensitivities on these three channels, we will apply low-pass filters with different cutoff ratios to each channel.

Display Model

To determine the low-pass filter cutoff ratio for each channel, we need to calculate the effective resolution using the following display model.

The resolution of the SLM is 1280x720, with a pixel pitch of 10.8um*10.8um. The physical dimensions of the SLM are 1.382cm x 0.78cm.

In our hardware setup, the eyepiece has a focal length of 50mm. We set $d^{'}$ in above diagram to 45mm, so the magnification ratio of the eyepiece is: $\frac{50 mm}{50 mm - 45 mm} = 10$ .

This makes the virtual screen dimensions 13.82cm x 7.8cm, with a display diagonal size of approximately 6.24 inches.

In the diagram, $d$ is calculated as: $\frac{1}{(\frac{1}{45 mm} - \frac{1}{50 mm})} = 450 mm$

The total viewing distance is: $450 m m + 50 m m = 50 c m$ .

For a 6.24-inch screen with a resolution of 1280×720 and a viewing distance of 50 cm, the pixels per degree (PPD) is approximately 80. The effective resolution is around 40 cycles per degree (CPD).

Low-Pass Filter Cutoff Ratio

I set the low-pass filter cutoff ratios for each opponent color channel to 0.8, 0.45, and 0.3, so that the highest remaining spatial frequencies roughly match the Space-Time-Color graph we learned in class.

Effective resolution = 40 cpd

Luminance: 0.8 cutoffs ~ 32 cpd

Red-green: 0.45 cutoffs ~ 18 cpd

Blue-yellow: 0.3 cutoffs ~ 12 cpd

After applying the low-pass filter to each channel of the opponent color space, we can use the OPP2XYZ matrix to convert the opponent color space back to the XYZ space. The subsequent steps are the same as those for the CIELAB loss.

Reconstruction Results

After switching the loss function to the S-CIELAB loss and completing 5000 iterations of optimization, the phase pattern, corresponding reconstructed images, and target images are shown below:

The reconstruction results using the S-CIELAB loss are better than those from the previous two losses.

ColorVideoVDP + S-CIELAB Loss

The ColorVideoVDP [5] is a new quality metric designed to evaluate the perceptual quality of color images and videos. It models spatial vision, temporal vision, and color vision while accounting for display geometry and photometry. The CSF used in S-CIELAB may be overly simplistic, whereas the ColorVideoVDP employs a novel contrast sensitivity model (castleCSF) that accounts for changes in contrast sensitivity with luminance and incorporates supra-threshold vision effects (e.g., contrast masking and contrast constancy). This metric can also be used as a loss function. By combining the ColorVideoVDP loss with the S-CIELAB loss, I achieved even better reconstruction results.

Reconstruction Results

The reconstruction results are shown below:

Results

The reconstructed images using four different loss functions are evaluated using two metrics: PSNR and ColorVideoVDP (CVVDP).

PSNR is defined as $PSNR = 10 \cdot \log_{10} (\frac{{MAX}^{2}}{MSE})$ . It quantifies the amount of distortion or noise present in the reconstructed image by comparing the pixel intensity differences between the reconstructed and target images.

ColorVideoVDP reports image and video quality in JOD (Just-Objectionable-Difference) units. The highest quality (no difference) is reported as 10, and lower values indicate increased distortion. A one-point difference in the JOD score means that 75% of people would prefer the image with the higher JOD value.

Test Image 1:

Test Image 2:

Test Image 3:

Why are images with high CVVDP scores low in PSNR?

Images with high CVVDP scores are indeed visually better, but I am surprised that they have very low PSNR values. When I zoom in on the reconstructed images generated using the CVVDP loss, I notice that the PSNR is indeed very low. There is a significant amount of high spatial frequency color noise present, which increases the MSE between the reconstructed and target images, thereby lowering the PSNR. However, this high spatial frequency color noise is not visible to the human eye at a normal viewing distance. As a result, the CVVDP scores remain high, since CVVDP accounts for the display geometry.

Why do images with high PSNR not look good?

It is interesting to note that, for all three test images, the reconstructed images optimized using the L2 loss consistently achieve the highest PSNR. This occurs because the L2 loss prioritizes minimizing pixel intensity differences between the target and the reconstructed images. However, these images receive comparatively low ColorVideoVDP scores and do not look visually appealing. This discrepancy serves as an illustrative example that PSNR does not always correlate well with human visual perception.

Conclusions

In this project, we explored the feasibility of using a single phase pattern to generate a color hologram. Traditionally, in a field-sequential color scheme, three separate phase patterns are employed to produce three distinct intensity patterns, one for each color channel, ensuring enough degrees of freedom to perfectly match the color target in simulation. However, with a simultaneous color scheme, only one phase pattern is available, which may prevent us from fully matching the RGB target. This constraint inevitably introduces some discrepancies between the reconstructed and target images. Our results demonstrate that by applying perceptually driven loss functions, such as S-CIELAB [4] and ColorVideoVDP [5], we can redistribute these unavoidable errors into regions where they are less noticeable to the human visual system. Consequently, even though the errors cannot be completely eliminated, the perceived quality of the reconstructed image remains high.

This project focuses on the spatial aspects of human vision. However, the primary reason for adopting a simultaneous color scheme over a sequential color scheme is that it fully utilizes the SLM’s refresh capabilities. With a simultaneous color scheme, a 60 Hz SLM can display 60 Hz color video content, making it worthwhile to also explore the temporal aspects of human vision. Human vision is more sensitive to changes in luminance than in color over time. Therefore, the next step is to design a new loss function that takes both spatial and temporal sensitivity into account, potentially improving video quality on the simultaneous color holographic display.

Reference

[1] Eric Markley, Nathan Matsuda, Florian Schiffers, Oliver Cossairt, and Grace Kuo. 2023. Simultaneous Color Computer Generated Holography. In SIGGRAPH Asia 2023 Conference Papers (SA '23). Association for Computing Machinery, New York, NY, USA, Article 22, 1–11. https://doi.org/10.1145/3610548.3618250

[2] David Blinder, Fan Wang, Colas Schretter, Takashi Kakue, Tomoyoshi Shimobaba, and Peter Schelkens "Joint color optimization for computer-generated holography without color replicas", Proc. SPIE 12998, Optics, Photonics, and Digital Technologies for Imaging Applications VIII, 129980G (18 June 2024); https://doi.org/10.1117/12.3022244

[3] Suyeon Choi, Manu Gopakumar, Yifan Peng, Jonghyun Kim, Matthew O'Toole, and Gordon Wetzstein. 2022. Time-multiplexed Neural Holography: A Flexible Framework for Holographic Near-eye Displays with Fast Heavily-quantized Spatial Light Modulators. In ACM SIGGRAPH 2022 Conference Proceedings (SIGGRAPH '22). Association for Computing Machinery, New York, NY, USA, Article 32, 1–9. https://doi.org/10.1145/3528233.3530734

[4] Zhang, X. and Wandell, B.A. (1997), A spatial extension of CIELAB for digital color-image reproduction. Journal of the Society for Information Display, 5: 61-63. https://doi.org/10.1889/1.1985127

[5] Rafal K. Mantiuk, Param Hanji, Maliha Ashraf, Yuta Asano, and Alexandre Chapiro. 2024. ColorVideoVDP: A visual difference predictor for image, video and display distortions. ACM Trans. Graph. 43, 4, Article 129 (July 2024), 20 pages. https://doi.org/10.1145/3658144

Appendix

GitHub Repository

Example target images can be found in the data/ folder.

New loss functions are implemented in the loss_functions.py file.

Simultaneous Color Holographic Display

Contents

Introduction

Background