Simultaneous Color Holographic Display

Introduction

A holographic display is a type of display system that produces 2D or 3D images by manipulating the wavefront of light. By using a spatial light modulator (SLM), a holographic display can adjust the phase of a coherent wavefront at the pixel level. This capability enables it to reshape the wavefront precisely as it would originally emanate from a real object, thereby creating an image with authentic depth cues. Due to time constraints, this project focuses solely on using the holographic display to show 2D images. As a result, we do not need to address the 'color replicas' problem caused by the simultaneous color scheme. [1-2]

Holographic displays typically use a laser as a light source, producing monochromatic holograms. To achieve color holograms, color holographic displays sequentially switch between RGB lasers at a high frequency, leveraging the human eye's persistence of vision. This process allows the eye to fuse sequential monochromatic holograms into a perceived color hologram. However, this approach sacrifices the refresh rate of the SLM, as displaying a single frame of a color image requires three phase patterns—one for each RGB channel.

A recent paper [1] shows one potential solution to fully utilize the SLM’s refresh rate is to simultaneously activate the three primary laser lights and have the SLM modulate the phase of their wavefronts simultaneously using the same phase pattern. This approach could potentially enable the full utilization of the SLM’s refresh rate.

This project aims to evaluate the effectiveness of traditional phase retrieval pipelines in a simultaneous color setup and explore potential improvements in reconstruction quality by employing different loss functions.

Background

How to Derive the Phase of Light from Intensity?

Unlike conventional displays that directly control light intensity, holographic displays use a Spatial Light Modulator (SLM) to modulate the phase of light on a per-pixel basis. The modulated wavefront then propagates through free space from a starting plane $0$ to the image plane $z$ . Our objective is to determine the phase pattern on the SLM such that, at the image plane $z$ , the resulting intensity distribution matches a desired target intensity pattern.

Angular Spectrum Method

The Angular Spectrum Method is a computational technique used to model the propagation of wavefronts through free space. It can be represented as follows:

$u (x, y, z, λ) = ℱ^{- 1} {ℱ {u (x, y, 0, λ)} \cdot ℋ (k_{x}, k_{y}, z, λ)}$

$ℋ (k_{x}, k_{y}, z, λ) = {\begin{matrix} e^{- \frac{i 2 π}{λ} \sqrt{1 - (λ k_{x})^{2} - (λ k_{y})^{2}} z} & if \sqrt{k_{x}^{2} + k_{y}^{2}} < \frac{1}{λ}, \\ 0 & otherwise . \end{matrix}$

$u (x, y, 0, λ)$ represents the wavefield at the plane $z = 0$ . By applying the 2D Fourier transform to it, $ℱ {u (x, y, 0, λ)}$ , the wavefield is decomposed into a superposition of plane waves traveling in various directions. This continuous distribution of plane waves is referred to as the angular spectrum. The spatial frequencies $k_{x}$ and $k_{y}$ determine the propagation direction of each plane wave component.

As each plane wave propagates through free space, it accumulates a distance-dependent phase shift. This phase shift is described by the transfer function $ℋ (k_{x}, k_{y}, z, λ)$ . In the Fourier domain, by multiplying the angular spectrum $ℱ {u (x, y, 0, λ)}$ by $ℋ (k_{x}, k_{y}, z, λ)$ , we effectively propagate all plane wave components over the distance $z$ .

Finally, to reconstruct the propagated wavefield at $z$ , we apply the inverse Fourier transform: $ℱ^{- 1} {ℱ {u (x, y, 0, λ)} \cdot ℋ (k_{x}, k_{y}, z, λ)}$ .

Image Formation Model

In our holographic display setup, the coherent light source illuminating the SLM has a source field $u_{s r c} (x, y, λ)$ .

The phase-only SLM can apply a spatially varying delay $ϕ (x, y, λ)$ to the phase of the field $u_{s r c} (x, y, λ)$ , so the wavefield at the SLM becomes:

$u_{S L M} (x, y, λ) = e^{i q (ϕ (x, y, λ))} u_{s r c} (x, y, λ)$

The SLM is at the plane $z = 0$ , We can use the Angular Spectrum Method to model what the wavefield will look like at the image plane $z$ .

$u_{z} (x, y, λ) = A S M (u_{S L M} (x, y, λ), z)$

At the image plane, what people see is the intensity of light, not the wavefield. The light intensity can be obtained by squaring the magnitude of the wavefield.

$I_{z} (x, y, λ) = | u_{z} (x, y, λ) |^{2}$

In combination, the final light intensity distribution is: $I_{z} (x, y, λ) = | A S M (e^{i q (ϕ (x, y, λ))} u_{s r c} (x, y, λ), z) |^{2}$

For notational convenience, we can express the intensity pattern at the image plane $z$ as:

$𝑰_{𝒛} (𝒙, 𝒚, 𝝀) = | 𝑨 𝑺 𝑴 (𝒆^{𝒊 𝒒 (𝝓)}, 𝒛) |^{2}$

This is how we can obtain the light intensity at the image plane $z$ , by displaying a phase pattern $ϕ$ on the SLM.

Iterative Method to Derive the Phase of Light from Intensity

Now we know how to calculate light intensity from the phase pattern. However, to display images on the holographic display, we need to find a way to calculate the phase pattern from the light intensity. Gradient descent is commonly used to solve this type of inverse problem.

At iteration 0, we can generate a random phase pattern $ϕ^{0}$ . Using the image formation model derived in the previous section, we can calculate the intensity pattern at the image plane as $| A S M (e^{i q (ϕ)}, z) |^{2}$ . The light wave amplitude is then the square root of the intensity: $| A S M (e^{i q (ϕ)}, z) |$ .

If the target light intensity is $I_{t a r g e t}$ , the target light amplitude is the square root of the intensity: $a_{t a r g e t} = \sqrt{I_{t a r g e t}}$ . We can compare $| A S M (e^{i q (ϕ)}, z) |$ and $a_{t a r g e t}$ using a loss function: $ℒ (| A S M (e^{i q (ϕ)}, z) |, a_{t a r g e t})$ .

We can calculate the gradient of $ϕ$ from the loss function, $(\frac{\partial ℒ}{\partial ϕ})^{T} ℒ (| A S M (e^{i q (ϕ)}, z) |, a_{t a r g e t})$ , and iteratively update the value of $ϕ$ : $ϕ^{(k)} \leftarrow ϕ^{(k - 1)} - α (\frac{\partial ℒ}{\partial ϕ})^{T} ℒ (| A S M (e^{i q (ϕ^{(k - 1)})}, z) |, a_{t a r g e t})$ .

After sufficient iterations, the value of $ϕ$ should converge. In this way, we can obtain a phase pattern $ϕ$ that generates the target intensity $I_{t a r g e t}$ at the image plane $z$ .

Questions Related to The Simultaneous Color Scheme

Does the laser support turning on all three colors simultaneously?

The laser used in our hardware setup is the FISBA RGBeam. It has three diodes, each emitting red, green, and blue light. It is possible to turn on all three diodes simultaneously. The lasers, with different wavelengths, pass through the same optical fiber. After passing through the collimating optics, the SLM observes a plane wave in white color.

How can we use a single phase pattern to generate three different intensity patterns?

We model the propagation of light waves in free space using the angular spectrum method.

$u (x, y, z, λ) = ℱ^{- 1} {ℱ {u (x, y, 0, λ)} \cdot ℋ (k_{x}, k_{y}, z, λ)}$

$ℋ (k_{x}, k_{y}, z, λ) = {\begin{matrix} e^{- \frac{i 2 π}{λ} \sqrt{1 - (λ k_{x})^{2} - (λ k_{y})^{2}} z} & if \sqrt{k_{x}^{2} + k_{y}^{2}} < \frac{1}{λ}, \\ 0 & otherwise . \end{matrix}$

The transfer function $ℋ (k_{x}, k_{y}, z, λ)$ in the Angular Spectrum Method (ASM) is not only distance-dependent but also wavelength-dependent. As a plane wave propagates through free space, the phase accumulation of light waves varies depending on the wavelength of the light. As a result, even the SLM applies the same phase pattern simultaneously to red, green, and blue light, the final intensity distributions of these three lights at the image plane $z$ are still different. This phenomenon provides a degree of freedom to use a single phase pattern to match three different target intensities at the image plane $z$ .

Figure 7: Same Phase Pattern, RGB Light Propagates Differently [2]

Methods

In the field sequential color scheme, we can generate three phase patterns to match the three color channels of the RGB target. In the simultaneous color scheme, we aim to match three color channels using just one phase pattern. While the Angular Spectrum Method suggests that it is possible to derive three different intensity patterns from a single phase pattern, there may not be enough degrees of freedom to perfectly match the target intensities. As a result, there may always be some errors between the reconstructed image and the target image. A perceptually driven loss function could be useful in this case. Instead of solely focusing on matching the light intensities, it could prioritize matching visual features that are more significant to human perception.

The code base for this project is "Time-multiplexed Neural Holography: A Flexible Framework for Holographic Near-eye Displays with Fast Heavily-quantized Spatial Light Modulators" [3] (https://github.com/computational-imaging/time-multiplexed-neural-holography). It provides a robust framework for solving the phase retrieval problem and offers flexibility to modify the loss function in the gradient descent method. I modified the code pipeline so that, for a color target, only one phase pattern is initialized. During each iteration of the gradient descent method, the corresponding light amplitude of the phase pattern is compared to the amplitudes of the RGB channels three times, resulting in the phase being updated three times per iteration.

L2 Loss

The mean square loss, also called the L2 loss, is the default loss function in the code base. It prioritizes minimizing the error between the reconstructed amplitude and the target amplitude.

$M S E = \frac{1}{n} \sum_{i = 1}^{n} (| A S M (e^{i q (ϕ)}, z) | - a_{t a r g e t})^{2}$

Using the L2 loss, after 5000 iterations of gradient descent, the phase pattern, the corresponding reconstructed image, and the target image are shown below.

From the L2 loss reconstruction, we observe a noticeable color shift between the reconstructed and target images. These results indicate that using only one phase pattern does not provide enough degrees of freedom to match the target intensity.

CIELAB loss

From the class, we learned that CIELAB is a color space that is perceptually uniform. Perhaps we can convert the target intensities and reconstructed intensities into the CIELAB color space and calculate the L2 loss there, allowing us to prioritize color matching.

CIELAB Loss = $M S E (R G B 2 L A B (| A S M (e^{i q (ϕ)}, z) |^{2}), R G B 2 L A B (I_{t a r g e t}))$

To construct the RGB2LAB function, we first need to determine the RGB2XYZ matrix for our holographic display. We assume the wavelengths of the RGB lasers are 636 nm, 518 nm, and 441 nm, and the power they can achieve is 0.0035 Watts/sr/nm/m² for each. The following plot shows the spectral power distributions of our holographic display setup:"

Using the ieXYZFromEnergy function from istcam, we can obtain the RGB2XYZ matrix for our holographic display setup:

RGB2XYZ = ieXYZFromEnergy(primaries', wavelength(:))'

RGB2XYZ = $[\begin{matrix} 1.2493 & 0.1147 & 0.8357 \\ 0.4976 & 1.6037 & 0.0581 \\ 0.4976 & 1.6037 & 0.0581 \end{matrix}]$

Once we converted the images from the RGB colorspace to the XYZ colorspace, the XYZ2LAB function are constructed by implementing the following equations we learnt on the class:

$L^{*} = {\begin{matrix} 116 {(\frac{Y}{Y_{w}})}^{1 / 3} - 16, & if \frac{Y}{Y_{w}} > 0.00856 \\ 903.3 (\frac{Y}{Y_{w}}), & otherwise \end{matrix}$

$a^{*} = 500 {{(\frac{X}{X_{w}})}^{1 / 3} - {(\frac{Y}{Y_{w}})}^{1 / 3}}$

$b^{*} = 200 {{(\frac{Y}{Y_{w}})}^{1 / 3} - {(\frac{Z}{Z_{w}})}^{1 / 3}}$

The white point we selected is D65, with approximate XYZ values as follows:

$X_{w} = 95.047, Y_{w} = 100.000, Z_{w} = 108.883$

After switching the loss function to CIELAB L2 loss, and following 5000 iterations of optimization, the phase pattern, corresponding reconstruction intensity, and target intensity are shown below:

From the CIELAB loss reconstruction results, we can observe that the color matching between the reconstructed and target images is improved. However, there are still some color shifts, and overall, the reconstructed image appears even noisier. It is possible that we simply lack sufficient degrees of freedom to achieve perfect color matching.

Spacial CIELAB loss

At this point, I feel that the one-phase, three-intensities problem is similar to an image compression problem: how to achieve similar visual quality under limited bandwidth.

In class, we learned that for high spatial frequency patterns, the human visual system is more sensitive to luminance changes than to color changes. We also discussed the Spatial CIELAB metric [4] in class. Spatial CIELAB [4] applies spatial filters based on the human visual system’s sensitivity to spatial frequencies. It might be a good idea to use S-CIELAB as the loss function. By ignoring high spatial frequency color differences, we can allocate more bandwidth to match what is most important to human vision.

Figure 9: The HVS Is More Sensitive to Luminance Changes Than to Color Changes

Convert Images from the RGB Color Space to the Opponent Color Space

In order to implement the S-CIELAB loss function, we first need to convert the RGB color space into the opponent color space. We calculated the RGB2XYZ matrix in the previous section

RGB2XYZ = $[\begin{matrix} 1.2493 & 0.1147 & 0.8357 \\ 0.4976 & 1.6037 & 0.0581 \\ 0.4976 & 1.6037 & 0.0581 \end{matrix}]$

The XYZ2OPP matrix is copied from the MATLAB implementation of SCIELAB-1996(https://github.com/wandell/SCIELAB-1996/blob/master/cmatrix.m).

$X Y Z 2 O P P = [\begin{matrix} 0.2787336 & 0.7218031 & - 0.106552 \\ - 0.4487736 & 0.2898056 & 0.0771569 \\ 0.0859513 & - 0.5899859 & 0.5011089 \end{matrix}]$

By sequentially multiply RGB2XYZ and XYZ2OPP matrix, we can convert reconstruction and target image into the opponent color space.

The opponent color space has three channels: $O 1$ , $O 2$ , $O 3$ . $O 1$ represents the luminance $O 2$ represents the contrast between red and green, and $O 3$ represents the contrast between blue and yellow. Since human has different spatial frequencies sensitivity to these three channels. We will apply low pass filter with different cutoff ratio to each channel.

In order to determine the low pass filter cutoff ratio for each channels, we calculate the effective resolution using the following display model.

The resolution of the SLM is 1280x720. The pixel pitch is 10.8um*10.8um. The physical dimension of the SLM is 1.382cm x 0.78cm. The eyepiece in our hardware setup has a focal length of 50mm. We set d' in the graph as 45mm, so the magnification ratio of the eyepiece is $\frac{50 mm}{50 mm - 45 mm} = 10$ . Then the virtual screen dimension is 13.82cm x 7.8cm. Display diagonal size is around 6.24 inches. d in the above graph is $\frac{1}{(\frac{1}{45 mm} - \frac{1}{50 mm})} = 450 mm$ The total viewing distance is $450 m m + 50 m m = 50 c m$ .

For a 6.24 inches screen; resolution is 1280x720; viewing distance is 50cm. Pixels per degree is around 80ppd. The effective resolution is around 40 cpd.

I set the lowpass filter cutoff ratio for each opponent color channel to be 0.8, 0.45, 0.3, so that the highest remaining spatial frequency roughly match the Space-Time-Color graph we learnt on the class.

Effective resolution = 40 cpd

Luminance: 0.8 cutoffs ~ 32 cpd

Red-green: 0.45 cutoffs ~ 18 cpd

Blue-yellow: 0.3 cutoffs ~ 12 cpd

After we applied lowpass filter to each channels of the opponent color space. We can use the OPP2XYZ matrix to convert opponent color space back to the XYZ space. The following steps are the same as the CIELAB loss: calculating the L2 loss in CIELAB color space.

After switching loss function to the S-CIELAB loss, after 5000 iteration of optimization, the phase pattern, corresponding reconstruction intensity and target intensity are shown below:

ColorVideoVDP + S-CIELAB Loss

The reconstruction results using the S-CIELAB loss is better than previous 2 losses. The ColorVideoVDP is a new quality metric designed to evaluate the perceptual quality of color images and videos. It models all spatial vision, temporal vision, color vision, and accounts for display geometry and photometry. The CSF used in S-CIELAB might be overly simplistic. The ColorVideoVDP uses a novel contrast sensitivity model (castleCSF) which accounts for the changes in contrast sensitivity with luminance and accounts for supra-threshold vision (e.g. contrast masking and contrast constancy). The metric can also be used as used as a loss function. By combining ColorVideoVDP Loss with S-CIELAB loss I can achieve even better reconstruction results. The reconstruction results are shown below:

Results

The reconstruction images using four different loss function are evaluated using 2 metric: PSNR and ColorVideoVDP(CVVDP)

PSNR is defined as $PSNR = 10 \cdot \log_{10} (\frac{{MAX}^{2}}{MSE})$ , It quantifies how much distortion or noise is present in the reconstruction image by comparing the pixel intensity differences between it and the target image.

ColorVideoVDP reports image/video quality in the JOD (Just-Objectionable-Difference) units. The highest quality (no difference) is reported as 10 and lower values are reported for distorted content. One JOD score difference means that 75% of the people will think the image with higher JOD is better.

Test image 1:

Test image 2:

Test image 3:

Conclusions

In this project, we explored the possibility of just using one phase pattern to generate color hologram. In the traditional field sequential color scheme, we can use three phase pattern to generate three intensity pattern to match each color channel, so we have enough degree of freedom to perfectly match the color target in the simulation. However, since in the simultaneous color scheme, we can only manipulate one phase pattern, there is a chance we cannot fully match the RGB target. There are always some error between the reconstruction images and target images. This project shows by using perceptual driven losses like, S-CIELAB and ColorVideoVDP. we can shift these inevitable error to places that is hard for human to perceive, so the perceptual quality of the reconstructed image is still good.

It is interesting to see for the 3 test images, the reconstructed image using L2 loss always have the highest PSNR, because the L2 loss prioritize minimizing the error of pixel intensity between target and reconstructed image. However, these image have relatively low ColorVideoVDP score and doesn't visually looks good. This is an interesting example to show PSNR is not aligned with human visual perception.

This project explores spatial aspect of human vision, but the purpose to choose simultaneous color scheme over the sequential color scheme is we can fully utilize the refresh of the SLM. The simultaneous color scheme allows a 60hz SLM to show 60hz color video contents, so it is worthwhile to explore the temporal aspect of human vision. Human vision is more sensitive to luminance changes than to color changes in the temporal domain too. The next step is to design a new loss function that accounts for this phenomenon too, so that we can achieve better video quality on the simultaneous color holographic display.

Reference

[1] Eric Markley, Nathan Matsuda, Florian Schiffers, Oliver Cossairt, and Grace Kuo. 2023. Simultaneous Color Computer Generated Holography. In SIGGRAPH Asia 2023 Conference Papers (SA '23). Association for Computing Machinery, New York, NY, USA, Article 22, 1–11. https://doi.org/10.1145/3610548.3618250

[2] David Blinder, Fan Wang, Colas Schretter, Takashi Kakue, Tomoyoshi Shimobaba, and Peter Schelkens "Joint color optimization for computer-generated holography without color replicas", Proc. SPIE 12998, Optics, Photonics, and Digital Technologies for Imaging Applications VIII, 129980G (18 June 2024); https://doi.org/10.1117/12.3022244

[3] Suyeon Choi, Manu Gopakumar, Yifan Peng, Jonghyun Kim, Matthew O'Toole, and Gordon Wetzstein. 2022. Time-multiplexed Neural Holography: A Flexible Framework for Holographic Near-eye Displays with Fast Heavily-quantized Spatial Light Modulators. In ACM SIGGRAPH 2022 Conference Proceedings (SIGGRAPH '22). Association for Computing Machinery, New York, NY, USA, Article 32, 1–9. https://doi.org/10.1145/3528233.3530734

[4] Zhang, X. and Wandell, B.A. (1997), A spatial extension of CIELAB for digital color-image reproduction. Journal of the Society for Information Display, 5: 61-63. https://doi.org/10.1889/1.1985127

Simultaneous Color Holographic Display

Contents

Introduction

Background