Psych 221 Image Systems Engineering - User contributions [en]

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-13T13:28:08Z

Rainas: /* References */

== Introduction ==
Color Vision Deficiency (CVD) affects approximately 350 million individuals worldwide, impairing their ability to distinguish certain colors. Image recoloring for individuals with CVDs has been a well-researched area, with numerous attempts aimed at creating images that make colors more distinguishable for those with CVDs. CVDs are typically classified into three main types: protanopia (difficulty perceiving red), deuteranopia (difficulty perceiving green), and tritanopia (difficulty perceiving blue). These conditions arise due to the absence or malfunction of certain types of cone cells in the retina, which are responsible for color vision. For instance, the lack or defect of red or green cones leads to protanopia or deuteranopia, respectively. This can result in partial or complete loss of perception of specific colors. Moreover, individuals with CVD can be dichromats (completely missing one type of cone cell), or anomalous trichromats (having all three types of cones but with altered sensitivity), causing milder color perception issues. Most research has focused on protanopia and deuteranopia, as these conditions are more prevalent, and only a few consider different severity levels.

In this work, we aim to consider all three types of CVDs, taking into account varying levels of severity for personalization. We explore existing methods in the field and experiment through two main approaches: mathematical transformations and deep learning techniques. We will start by reviewing current advancements in these two domains, followed by presenting our experiments and results. Evaluations of each method will be provided, leading to a discussion of our findings and outlining potential directions for future work.

The motivation for this work arose from a personal experience. While admiring a rainbow, a phenomenon I find among the most beautiful in the world, with a friend who has deuteranopia, I realized that they were unable to distinguish the vibrant array of colors. This experience highlighted the emotional and perceptual gap caused by CVD, inspiring the goal of this project: to develop personalized and efficient tools that enhance color perception for individuals with CVDs. Ultimately, we aim to enable those with CVDs to enjoy the same vivid experiences, such as the beauty of a rainbow, experienced by those with normal color vision.

== Background ==
In recent years, numerous methods have been developed to recolor images for individuals with CVDs, ranging from traditional mathematical approaches to advanced deep learning techniques. This section focuses on the prominent recent works in these two categories.

=== Mathematical-based methods ===
Mathematical approaches to image recoloring for individuals with CVDs have been extensively developed to enhance color discrimination while trying to preserve the natural appearance of images. These methods typically involve color space transformations, optimization techniques, and perceptual modeling to achieve their objectives.

==== Daltonization ====
Daltonization enhances images for individuals with CVD by correcting colors based on the simulated deficiency. The process involves comparing the original LMS values with the simulated deficient values to compute the error:
<math display="block">
\text{Error}_{\text{LMS}} = \text{LMS}_{\text{original}} - \text{LMS}_{\text{simulated}}
</math>

The error is then mapped back to the RGB space using a correction matrix because the error contains the information that dichromats cannot see, and the correction matrix rotates it to a part of the spectrum that they can see. For example, the correction matrix, as implemented in tools like Daltonize [5] and Vischeck [6], is:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

The corrected RGB values are added back to the original LMS values to generate a daltonized image that improves contrast for CVD viewers.

==== Optimization-based Method ====
Zhu et al. [8] introduced an optimization-based recoloring framework for red-green dichromacy, aiming to balance naturalness and contrast. The framework minimizes a total loss function defined as:

<math display="block"> E = \beta E_{\text{nat}} + E_{\text{cont}} </math>

where <math>\beta</math> is a scalar weight that controls the trade-off between the two objectives: naturalness preservation (<math>E_{\text{nat}}</math>) and contrast enhancement (<math>E_{\text{cont}}</math>).

The naturalness term, <math>E_{\text{nat}}</math>, ensures that the recolored image closely resembles the original image for CVD viewers by minimizing perceptual differences:

<math display="block"> E_{\text{nat}} = \sum_{i=1}^N \| c_i^+ - c_i \|^2, </math>

where:
* <math>N</math> is the total number of pixels in the image,
* <math>c_i</math> is the original color of the <math>i</math>-th pixel,
* <math>c_i^+</math> is the recolored value of the <math>i</math>-th pixel,
* <math>\| c_i^+ - c_i \|</math> is the Euclidean distance, measuring the perceptual difference between the original and recolored colors.

The contrast term, <math>E_{\text{cont}}</math>, enhances the distinguishability of colors in the recolored image by minimizing changes in color contrast:

<math display="block"> E_{\text{cont}} = \sum_{i \neq j} \| (c_i^+ - c_j^+) - (c_i - c_j) \|^2, </math>

where:
* <math>(c_i^+ - c_j^+)</math> is the perceived color difference between pixels <math>i</math> and <math>j</math> after recoloring,
* <math>(c_i - c_j)</math> is the original color difference,
* <math>\| (c_i^+ - c_j^+) - (c_i - c_j) \|</math> represents the deviation in color contrast before and after recoloring.

To address the limitations of this approach, Zhu et al. [9] proposed a degree-adaptable framework incorporating a transformation matrix <math>T</math> that simulates CVD perception. The transformation matrix is defined as:

<math display="block"> T = \begin{bmatrix} t_{11} & t_{12} & t_{13} \\ t_{21} & t_{22} & t_{23} \\ t_{31} & t_{32} & t_{33} \end{bmatrix}, </math>

where <math>t_{ij}</math> are the elements representing the relationships between the original and perceived LMS (Long, Medium, Short wavelength) cone responses for individuals with CVD.

The degree-adaptable loss function extends the optimization by adjusting weights based on perceptual importance, defined as:

<math display="block"> E = \beta \sum_{i=1}^N \alpha_i \| T(c_i^+ - c_i) \|^2 + \sum_{i \neq j} \| T(c_i^+ - c_j^+) - T(c_i - c_j) \|^2. </math>

Here:
* <math>\alpha_i</math> assigns weights to each pixel, prioritizing the preservation of colors with smaller perception errors,
* <math>\| T(c_i^+ - c_i) \|</math> measures the perceptual difference after recoloring,
* <math>\| T(c_i^+ - c_j^+) - T(c_i - c_j) \|</math> quantifies the deviation in color contrast under CVD simulation.

This framework improves both contrast and personalization but requires further optimization for real-time performance.

==== Confusion lines based Method ====
Tsekouras et al. [10] proposed a novel image recoloring approach for individuals with protanopia and deuteranopia, focusing on improving color naturalness and enhancing contrast. Their framework consists of four modules, with a key focus on shifting confusing colors along confusion lines in the CIE 1931 chromaticity diagram.

The process begins with fuzzy clustering, which identifies representative colors (key colors) from the input image. These key colors are then analyzed on the chromaticity diagram, where confusion lines—paths representing colors indistinguishable by individuals with CVD—serve as the basis for recoloring. Confusion lines are defined using the copunctal point of the missing cone type and another reference point:

<math display="block">
d(v, L) = \frac{\left|(x_{cp} - x_0)(y_0 - y_v) - (x_0 - x_v)(y_{cp} - y_0)\right|}{\sqrt{(x_{cp} - x_0)^2 + (y_{cp} - y_0)^2}},
</math>

where:
* <math display="inline">v = (x_v, y_v)</math> is the chromaticity coordinate of the color,
* <math display="inline">L</math> is the confusion line passing through the copunctal point <math display="inline">(x_{cp}, y_{cp})</math> and another reference point <math display="inline">(x_0, y_0)</math>,
* <math display="inline">d(v, L)</math> measures the perpendicular distance from the point <math display="inline">v</math> to the confusion line <math display="inline">L</math>.

Confusing colors, identified as key colors lying on occupied confusion lines, are iteratively shifted to the nearest non-occupied confusion lines to enhance discriminability for CVD viewers. High-ranking colors, determined by their prominence in image clusters, are shifted to the nearest unoccupied confusion lines. This reallocation ensures that these colors are distinguishable to viewers with CVD while minimizing disruption to the image's overall color harmony.

After shifting, the luminance of the recolored key colors is optimized using a regularized objective function to balance naturalness and contrast:
<math display="block">
E = (E_1 + E_2) + \lambda E_3,
</math>

where:
* <math display="inline">E</math> is the total loss,
* <math display="inline">\lambda</math> is a weight parameter controlling the trade-off between contrast enhancement and naturalness preservation.

The first term, <math display="inline">E_1</math>, measures contrast enhancement for normal trichromats:

<math display="block">
E_1 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - b_j\| - \|f_D(a_{i,\text{rec}}) - f_D(b_j)\| \right|,
</math>

where:
* <math display="inline">n_A</math> and <math display="inline">n_B</math> are the number of key colors in clusters <math display="inline">A</math> and <math display="inline">B</math>, respectively,
* <math display="inline">a_i</math> is the chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">b_j</math> is the chromaticity of the <math display="inline">j</math>-th key color in cluster <math display="inline">B</math>,
* <math display="inline">f_D</math> is a function simulating the dichromatic vision of individuals with color vision deficiencies,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color.

The second term, <math display="inline">E_2</math>, measures contrast enhancement for dichromats:

<math display="block">
E_2 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - a_j\| - \|f_D(a_{i,\text{rec}}) - f_D(a_{j,\text{rec}})\| \right|,
</math>

where:
* <math display="inline">a_i</math> and <math display="inline">a_j</math> are the chromaticities of the <math display="inline">i</math>-th and <math display="inline">j</math>-th key colors in cluster <math display="inline">A</math>,
* <math display="inline">f_D(a_{i,\text{rec}})</math> simulates the dichromatic perception of the recolored chromaticity <math display="inline">a_{i,\text{rec}}</math>.

The third term, <math display="inline">E_3</math>, preserves the naturalness of the recolored image:

<math display="block">
E_3 = \frac{1}{n_A} \sum_{i=1}^{n_A} \|a_i - a_{i,\text{rec}}\|,
</math>

where:
* <math display="inline">a_i</math> is the original chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">\|a_i - a_{i,\text{rec}}\|</math> is the Euclidean distance between the original and recolored chromaticities, measuring how much the naturalness is preserved.

This method significantly enhances the contrast and naturalness of recolored images by leveraging confusion line geometry and regularized optimization. However, challenges remain in achieving real-time performance and handling cases where shifting may distort the aesthetic quality of the image.

==== GMM-based Method ====
Huang et al. [11] proposed an efficient and effective re-coloring algorithm for individuals with CVD using a Gaussian Mixture Model (GMM) to represent color distributions. The algorithm comprises four main steps: feature extraction, clustering using GMM, optimization of Gaussian components, and interpolation for recoloring.

Step 1 - Feature Extraction:
Each pixel in the input image is represented in the CIEL*a*b* color space, which approximates perceptual differences using the Euclidean distance between colors. The color feature vector <math display="inline">x</math> is used as input for clustering.

Step 2 - Clustering via GMM:
The color distribution of the image is modeled using a GMM with <math display="inline">K</math> Gaussian components:
<math display="block">
p(x|\Theta) = \sum_{i=1}^K \omega_i G_i(x|\theta_i),
</math>
where:
* <math display="inline">\Theta</math> is the parameter set containing all weights, means, and covariance matrices,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian,
* <math display="inline">G_i(x|\theta_i)</math> is the 3D normal distribution with parameters <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix).

Step 3 - Optimization:
To ensure color distinguishability for CVD viewers, the algorithm adjusts the mean vector of each Gaussian component using an optimization function that preserves the symmetric Kullback-Leibler (KL) divergence:
<math display="block">
D_{sKL}(G_i, G_j) = D_{KL}(G_i \| G_j) + D_{KL}(G_j \| G_i),
</math>
where:
* <math display="inline">D_{KL}(G_i \| G_j)</math> measures the dissimilarity between two Gaussian distributions <math display="inline">G_i</math> and <math display="inline">G_j</math>.

Step 4 - Interpolation for Recoloring:
After optimizing the Gaussians, the mapping function <math display="inline">M_i(\cdot)</math> relocates the mean vectors while maintaining covariance matrices. Interpolation ensures smooth transitions between recolored regions:
<math display="block">
T(x_j)_H = x_j^H + \sum_{i=1}^K p(i|x_j, \Theta) (M_i(\mu_i)_H - \mu_i^H),
</math>
where:
* <math display="inline">T(x_j)_H</math> is the hue adjustment for the <math display="inline">j</math>-th color,
* <math display="inline">M_i(\mu_i)_H</math> is the mapped hue of the <math display="inline">i</math>-th Gaussian's mean.

While the GMM-based approach effectively models color distributions and enhances the contrast of recolored images significantly, it has limitations:
* The accuracy of recoloring depends on the choice of <math display="inline">K</math>, which may vary for different images.
* The method assumes diagonal covariance matrices for computational efficiency, which may oversimplify real-world color distributions. Sometimes the colors in the recolored images are not very natural.
* The high computational complexity in the optimization step of this algorithm may be difficult for real-time applications.

=== Deep Learning based methods ===
Conventional methods for recoloring, including optimization-based approaches (as discussed above), fail to generalize well across varying severity levels and CVD types. While these methods improve color differentiation, they frequently compromise naturalness or require extensive computational resources, making them less suitable for real-time, efficient, personalized applications.

==== GAN-Based Recoloring for CVD ====

In [1] GANs (Generative Adversarial Networks) was explored for recoloring, with a backbone Pix2Pix-GAN, Cycle-GAN, and Bicycle-GAN structure showing promising results. These models are generate creative recolored images by learning mappings between normal and CVD-affected color spaces. However, this and existing GAN approaches struggle with balancing naturalness and contrast. This specific reference also requires paired datasets (since it is adapted from style transfer), making it computationally intensive and less suitable for personalization.

==== Swin Transformer Recoloring ====

The authors in [2] introduced a hierarchical vision transformer (SWIN) architecture that processes images through shifted windows, effectively capturing both local and global contextual information. In computer vision, this design generally allows efficient handling of high-resolution images and has been applied to various tasks, including image classification and object detection. Despite its robust performance, this architecture is still computationally intensive and does not inherently account for the specific needs of CVD individuals, as it lacks mechanisms for personalized color adjustments.

==== Personalized CVD-GAN ====

To cater to the diverse needs of the CVD population, the Personalized CVD-GAN [3] was developed. This model generates images that are not only CVD-friendly but also tailored to individual degrees of color vision deficiency. By disentangling color representations using a unique triple-latent structure in their method, continuous personalization was possible to adjust images according to specific CVD severities. While effective, this approach is computationally demanding, making it less practical for real-time applications. In our experiment, it took around 18 days for one epoch (or one iteration over the entire dataset).

Thus, existing methods either lack personalization or are too resource-intensive for widespread use.

== Methods ==
We aim to find effective and efficient ways to recolor images for people with CVD with the personalization of different severity levels. We start by exploring existing methods and identifying opportunities for improvement. Since mathematical-based approaches provide a solid foundation and are well-documented, we began our experiments by testing these methods, as described in the background. We later extended our exploration to deep learning based methods.

=== Mathematical based ===
We explored four main methods, building on the foundational work discussed in the background section.

==== Method 1: Daltonization as a Baseline ====
We started with the relatively intuitive Daltonization method, where we adjusted the colors in an image to compensate for color vision deficiencies by simulating how the colors appear to individuals with CVD. This involves computing the difference between the original and simulated color perception in the LMS (Long, Medium, Short wavelength) color space. The calculated error is then corrected and mapped back to the RGB space using a transformation matrix, resulting in a recolored image that enhances color differentiation for viewers with CVD.

The simulation of CVDs relies on the physiology of human vision, particularly the responses of the Long (L), Medium (M), and Short (S) wavelength-sensitive cones in the retina. The LMS color space is derived from the spectral sensitivities of these cones, making it an ideal framework for modeling human color perception.

To simulate CVD, we first transformed colors in RGB color space into the LMS color space using the following linear transformation matrix based on Stockman and Sharpe’s cone fundamentals:
<math display="block">
T_{\text{RGB-to-LMS}} = \begin{bmatrix}
0.3904725 & 0.54990437 & 0.00890159 \\
0.07092586 & 0.96310739 & 0.00135809 \\
0.02314268 & 0.12801221 & 0.93605194
\end{bmatrix}
</math>

For individuals with CVD, the missing cone’s response is replaced by a weighted combination of the remaining two cones. This approach, introduced by Brettel, Viénot, and Mollon (1997) [7], uses specific coefficients derived from cone sensitivities. For example, in protanopia (L-cone deficiency), the L-cone response is approximated using the M- and S-cone responses as:
<math display="block">
L_{\text{simulated}} = 0 \cdot L + 0.90822864 \cdot M + 0.008192 \cdot S
</math>

For deuteranopia (M-cone deficiency), the M-cone is replaced as:
<math display="block">
M_{\text{simulated}} = 1.10104433 \cdot L + 0 \cdot M - 0.00901975 \cdot S
</math>

For tritanopia (S-cone deficiency), the S-cone is replaced as:
<math display="block">
S_{\text{simulated}} = -0.15773032 \cdot L + 1.19465634 \cdot M + 0 \cdot S
</math>

These transformations allow accurate simulation of the perceptual experience of individuals with CVD. (The numbers are derived from [5]).

The error between the original and simulated is then mapped into the RGB color space using a deficiency-specific correction matrix, which adjusts the image to enhance contrast and recover lost color differences. The predefined correction matrix is applied to the error in RGB space, transforming it back into LMS space for final adjustments. The corrected LMS values are added back to the original values, producing a recolored image that improves visual accessibility for viewers with CVD. This approach uses the Daltonize-inspired correction matrix:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

==== Method 2: Optimizing Objective Function ====
To improve the results from the Daltonization method, we designed a framework inspired by methods discussed in the background, incorporating dominant color extraction, optimization-based recoloring, and edit propagation. This approach aims to find a balance between the naturalness and contrast while compensating colors that are not visible for corresponding CVD types.

===== 1. Extraction of Dominant Colors =====
We begin by extracting the dominant colors from the input image using fuzzy clustering via a K-means algorithm. This step identifies a reduced set of representative colors that capture the primary color information in the image:
<math display="block">
\mathbf{C} = \{\mathbf{c}_1, \mathbf{c}_2, \ldots, \mathbf{c}_N\},
</math>
where <math display="inline">N</math> represents the number of clusters, and <math display="inline">\mathbf{c}_i</math> represents the centroid of the <math display="inline">i</math>-th cluster.

===== 2. Optimization-Based Recoloring =====
Once the dominant colors are extracted, we apply an optimization process to adjust these colors. The optimization uses the formulas mentioned in [9], and aims to balance two key objectives:

1. Naturalness Preservation: Ensures the recolored image minimally deviates from the original.
<math display="block">
E_{\text{nat}} = \sum_{i=1}^N \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_i^{\text{original}}) \|^2,
</math>
where <math display="inline">\mathbf{T}</math> is the transformation matrix based on the severity and type of CVD, and <math display="inline">\mathbf{c}_i^{\text{original}}</math> is the original color.

2. Contrast Enhancement: Improves the differentiation of colors for individuals with CVD:
<math display="block">
E_{\text{cont}} = \sum_{i=1}^N \sum_{j>i} \left( \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_j) \|^2 - \| \mathbf{c}_i^{\text{original}} - \mathbf{c}_j^{\text{original}} \|^2 \right)^2.
</math>

The total objective function combines these two terms:
<math display="block">
E = \beta E_{\text{nat}} + E_{\text{cont}},
</math>
where <math display="inline">\beta</math> controls the trade-off between naturalness and contrast.

Optimization is performed using the L-BFGS-B algorithm to ensure efficient convergence under bounded constraints.

The transformation matrices for each type of CVD are the following, which are based on [12]:

<div style="text-align:center;">
<math display="inline">
T_{\text{Protanopia}} = \begin{bmatrix} 0.566 & 0.558 & 0 \\ 0.433 & 0.442 & 0.242 \\ 0 & 0 & 0.758 \end{bmatrix},
</math>
<math display="inline">
T_{\text{Deuteranopia}} = \begin{bmatrix} 0.625 & 0.7 & 0 \\ 0.375 & 0.3 & 0.3 \\ 0 & 0 & 0.7 \end{bmatrix},
</math>
<math display="inline">
T_{\text{Tritanopia}} = \begin{bmatrix} 0.95 & 0 & 0 \\ 0.05 & 0.433 & 0 \\ 0 & 0.567 & 1 \end{bmatrix}.
</math>
</div>

===== 3. Edit Propagation =====
After optimizing the dominant colors, we propagate these edits across the entire image to ensure smooth transitions. This propagation step leverages the CIE-Lab color space, which is perceptually uniform, meaning that the Euclidean distance in this space correlates well with human color perception. The process begins by mapping the original image and the optimized dominant colors into the Lab color space. In this space, the differences between the original and recolored dominant colors are computed to capture the adjustments made during the optimization step:
<math display="block">
\Delta L^* = \text{griddata}(\mathbf{c}^{\text{original}}, \mathbf{c}^{\text{recolored}} - \mathbf{c}^{\text{original}}, \mathbf{I}),
</math>
where <math display="inline">\mathbf{I}</math> represents the pixel values in the Lab color space. Once the interpolated changes are computed, they are applied to the Lab representation of the original image. Finally, the adjusted Lab values are converted back to the RGB color space to reconstruct the recolored image.

==== Method 3: Improved with Confusion Line Adjustments ====
This method builds upon the previous method by introducing enhancements in dominant color extraction, optimization, and edit propagation, while incorporating an additional step to adjust colors near confusion lines in the CIE 1931 xyY color space inspired by [10]. These improvements aim to further enhance contrast and naturalness of the recolored images. Moreover, this method adds flexibility in adjusting for different severity levels for each CVD type.

===== 1. Improvements on Method 2 =====
To improve the performance of dominant color extraction, we transitioned from traditional K-means to MiniBatch K-means. This algorithm processes data in small batches, significantly reducing computational time while maintaining accuracy in clustering. The number of dominant colors was also reduced from 50 to 30 to focus on key representative colors and further enhance efficiency. The optimization objective is refined to leverage vectorization, improving computational efficiency. The two key terms remain:
<math display="block">
E = \beta E_{\text{nat}} + (1 - \beta) E_{\text{cont}}.
</math>
The optimization objective was refined to significantly improve computational efficiency by replacing the nested loops in the contrast enhancement term with vectorized operations. In the original implementation, the pairwise differences between colors were calculated iteratively using <math display="inline">O(N^2)</math> nested loops. The improved version eliminates this overhead by leveraging array broadcasting to compute all pairwise differences simultaneously, and the transformation matrix <math display="inline">\mathbf{T}</math> is then applied to all pairwise differences in a single tensor operation:
<math display="block">
\mathbf{T}_{\Delta} = \text{tensordot}(\Delta_{ij}, \mathbf{T}),
</math>
and the norms are computed in parallel across the entire array. Additionally, the weighting parameter <math display="inline">\beta</math> was adjusted to favor naturalness preservation, ensuring better visual integrity in the recolored image.
The propagation step changed to use a k-d tree for fast nearest neighbor searches, replacing grid-based interpolation. This approach more efficiently matches each pixel in the Lab color space to the closest dominant color:
<math display="block">
\mathbf{I}_{\text{adjusted}} = \mathbf{C}_{\text{recolored}}[\text{k-d tree query}(\mathbf{I})],
</math>
where <math display="inline">\mathbf{I}</math> represents the pixel values in Lab space.
These refinements enable faster optimization while improving the balance between naturalness and contrast enhancement.

===== 2. Confusion Line Adjustments =====
An additional step adjusts colors near confusion lines in the CIE 1931 xyY color space to enhance distinguishability:

1. Confusion lines are defined for protanopia, deuteranopia, and tritanopia, based on [10]. For example, for protanopia:
<math display="block">
\text{Confusion Line: Start} = (0.735, 0.265), \quad \text{End} = (0.115, 0.885).
</math>

2. Colors near the confusion line are identified using orthogonal distance:
<math display="block">
d(\mathbf{xy}, L) = \frac{\| (\mathbf{xy} - \mathbf{p}_1) \times (\mathbf{p}_2 - \mathbf{p}_1) \|}{\|\mathbf{p}_2 - \mathbf{p}_1\|},
</math>
where <math display="inline">\mathbf{p}_1</math> and <math display="inline">\mathbf{p}_2</math> are the start and end points of the confusion line.

3. Identified colors are shifted orthogonally away from the line:
<math display="block">
\mathbf{xy}_{\text{adjusted}} = \mathbf{xy} + \lambda \mathbf{v}_{\perp},
</math>
where <math display="inline">\mathbf{v}_{\perp}</math> is a perpendicular vector, and <math display="inline">\lambda</math> is a scaling factor.

===== 3. Personalise with Severity Levels =====
To take into account of severity levels, the transformation matrix <math display="inline">\mathbf{T}</math> linearly interpolates between normal vision and full CVD perception based on severity and type:
<math display="block">
\mathbf{T} = (1 - s) \mathbf{I} + s \mathbf{T}_{\text{CVD}},
</math>
where <math display="inline">s</math> represents the severity of CVD (0-100%), <math display="inline">\mathbf{I}</math> is the identity matrix, and <math display="inline">\mathbf{T}_{\text{CVD}}</math> is the full transformation matrix specific to protanopia, deuteranopia, or tritanopia. Such a method is based on DaltonLens simulator [13].

These improvements significantly enhanced both the effectiveness and efficiency of the recoloring process on top of Method 2.

==== Method 4: Improved with GMM-based Method ====
The last mathematical method we exprimented enhances recoloring by integrating a Gaussian Mixture Model (GMM)-based global recoloring algorithm. The method also applies nonlinear adjustments for colors near confusion lines to ensure improved contrast and naturalness.

===== 1. GMM-Based Global Recoloring =====
The image is first resized and transformed into the Lab color space. A GMM is applied to cluster the color distribution into <math display="inline">K</math> components, optimizing the number of clusters using the Bayesian Information Criterion (BIC):
<math display="block">
\text{BIC} = -2 \cdot \text{log-likelihood} + P \cdot \log(N),
</math>
where <math display="inline">P</math> represents the model parameters and <math display="inline">N</math> is the number of pixels.

The GMM means are simulated using the transformation matrix <math display="inline">T</math> with severity levels taken into account, and the symmetric Kullback-Leibler (KL) divergence (<math display="inline">D_{\text{sKL}}</math>) is calculated between pairs of clusters:
<math display="block">
D_{\text{sKL}}(i, j) = D_{\text{KL}}(G_i \| G_j) + D_{\text{KL}}(G_j \| G_i),
</math>
where <math display="inline">G_i</math> and <math display="inline">G_j</math> are Gaussian components, and <math display="inline">D_{\text{KL}}</math> represents the KL divergence. The GMM cluster means are then adjusted by solving a nonlinear least-squares problem to minimize the discrepancy.

===== 2. Adjusting Near Confusion Lines Improved =====
Following global recoloring, colors near confusion lines in the CIE 1931 xyY color space are further adjusted based on formulas used in Method 3. Nonlinear scaling is applied to amplify the shifts for pixels closer to the line:
<math display="block">
w = \left( \frac{\text{threshold} - d}{\text{threshold}} \right)^2,
</math>
where <math display="inline">w</math> is the scaling factor.

The adjustments from the GMM and confusion line steps are combined to produce the final recolored image. These enhancements make the method more robust and effective for individuals with varying levels of CVD.

Through our experimentation with mathematical methods, we gained a deeper understanding of the algorithmic aspects of image recoloring for CVD, particularly in balancing naturalness, contrast, and personalization. Building on these insights, we transitioned to exploring deep learning approaches, applying the lessons learned to guide training, evaluation, and ground truth dataset generation.

=== Deep Learning based ===

==== Task Overview ====
Given an input RGB image and a label for the user (as shown in the figure), we want a deep learning model to output a recolored RGB image that is specific to that user. More details on inputs and outputs are discussed in further sections but an overview is shown in Figure 1. All of the code was done in Python using a deep learning framework called [https://pytorch.org PyTorch]
[[File:Io.png|right|thumb|200px|Figure 1: Dataset]]

==== Types ====
1. ''' Supervised methods ''':
These are deep learning models that require a 'ground truth' recolored image for the neural network to learn recolorization. While these methods are simple, easy to train and integrate the user label, they require an already present ground truth comparison of expected output.

2. ''' Unsupervised methods ''':
These models are trained without a ground truth and can also encode user label information while training. They are generally better at generating more natural images, but they require more compute and sophisticated model architectures or loss functions for the recoloring task

==== Dataset ====
The dataset used for this project was constructed specifically to address the challenges of recoloring images for individuals with color vision deficiency (CVD). We first gathered an open-source RGB image dataset from [2]. To improve the capability of the proposed model to enhance the contrast between CVD-indistinguishable color
pairs, in their study, they created a new dataset consisting of 141,000 pictures of both natural scenes and artificial images containing
CVD-confusing colors without labels. To generate labels (and ground truth recolored images for supervised methods), we randomly sampled 15,000 images and recolored by simulating random labels for severity and type of CVD. The recoloring for ground truth images was done using a [https://github.com/jbhuang0604/RecolorForColorblind/tree/master MATLAB script] (adapted to Python) from [4]. Note: The open-source tools used in the Python version for the recoloring script were [https://scikit-image.org Scikit-Image], [https://scipy.org Scipy] and [https://python-colormath.readthedocs.io/en/latest/ Colormath].

As shown in Figure 1, each sample in the dataset consists of:

1. ''' Original RGB Image''' : High-resolution images, resized to <code> 256x256</code> pixels and normalized to <code>[0,1]</code> range, representing the standard color space.

2. ''' CVD Labels ''' : Condition labels encoded as <code>severity * [protan, deutan]</code>, where severity ranges from 0.1 to 1.0. For example, a label <code>[0.6, 0]</code> corresponds to protanopia at 60% severity.

Data augmentation techniques such as random rotations, crops, and brightness adjustments were applied to expand the dataset, ensuring robust model generalization across diverse scenarios.

==== Supervised Methods ====
===== Conditional Parallel RGB MLP =====
[[File:mlp.png|right|thumb|Figure 2: Conditional MLP architecture]]
As shown in Figure 2, the model predicts the R, G, and B channels separately using an independent multi-layer perceptron (MLP) for each channel. The input image is concatenated with the label encoding along the channel dimension and is passed to 3 parallel MLPs simultaneously. These parallel networks are learned to predicted R, G, B channels of a recolored image based on given ground truth. The outputs from each of these networks are concatenated to produce the recolored RGB image of same spatial dimensions as input. Essentially, each channel is disentangled, enabling targeted adjustments.

The loss function used to train was pixel wise, mean-squared error loss:
<math>
\mathcal{L}_{\text{MSE}} = \frac{1}{N} \sum_{p=1}^{N} \left( I(p) - I'(p) \right)^2
</math>

Where:
* I, I': Recolored (model output) image and ground truth recolored image respectively
* p: Image index
* N: Total number of images

===== Conditional U-Net =====
In a similar fashion of inputs, a convolutional neural network (CNN)-based U-Net architecture was tested to generate a full recolored image as output. The conditional inputs here affect both the encoder and decoder. [[File:Unet condtional.png|right|thumb|Figure 3: Conditional U-Net architecture]]
U-Nets are widely used in computer vision tasks and are very robust to new tasks as well. The architecture we adopted is shown in Figure 3.
The loss function used to train the U-Net was a commonly used VGG Perceptual Loss:
<math>
\mathcal{L}_{\text{VGG}} = \sum_{l} \frac{1}{N_l} \| \phi_l(I) - \phi_l(I') \|_2^2
</math>

Where:
* I and I': are recolored (model output) and ground recolored images respectively
* <math>\phi_l</math> is the l-th of the pre-trained VGG network

==== Unsupervised Methods ====
===== Conditional Autoencoder =====
As shown in Figure4, an unsupervised CNN-based encoder-decoder network was trained to reconstruct full recolored images with a CVD-aware color palette. The key to making this network align with the recoloring task was the loss functions. The loss functions we used to train this network were inspired from [2]. [[File:Ae.png|right|350px|thumb|Figure 4: Conditional Autoencoder architecture]]

The total loss function is given by:
<math>
\mathcal{L}_{\text{total}} = \alpha \cdot \mathcal{L}_{\text{naturalness}} + 2 \cdot (1 - \alpha) \cdot \mathcal{L}_{\text{contrast}}
</math>

Where:
<math>
\mathcal{L}_{\text{contrast}} = \beta \cdot \mathcal{L}_{\text{global}} + (2 - \beta) \cdot \mathcal{L}_{\text{local}}
</math>

The components of the loss functions are described below:

1. '''Global Contrast Loss''':
The global contrast loss ensures that the overall contrast of the recolored image is preserved. It is defined as
<math>
\mathcal{L}_{global} = \frac{1}{\|\omega\|} \sum_{<x, y> \in \epsilon \omega} \text{CL}(x, y)
</math>

2. '''Local Contrast Loss''':
The local contrast loss focuses on preserving the contrast within a small neighborhood around each pixel. <math>
\mathcal{L}_{l} = \frac{1}{N} \sum_{x=1}^{N} \sum_{y \in \omega_x} \frac{\text{CL}(x, y)}{\|\omega_x\|}
</math>

Note:

<math>
\text{CL}(x, y) = \|\hat{c}_x' - \hat{c}_y'\| - \|c_x - c_y\|
</math>

* x,y: Two distinct pixels in the image.
* cx and cy: CVD simulated colors of original image
* c^x′and c^y: CVD simulated colors of recolored image (model output)
* ||w||: Global (or large) window of image
* ||wx||: Local window or neighborhood around a pixel x

3. '''Naturalness Loss''':
The naturalness loss drives output image to have colors that are visually similar and close to natural distributions. <math>
\mathcal{L}_{\text{natural}} = 1 - \text{SSIM}(I', I)
</math>

Where:
* I(i), I'(i): Original and recolored images respectively

== Results ==

=== Mathematical based methods ===

==== Qualitative Results ====
The qualitative results and key observations from the experiments are summarized below.

The result images presented in Figures 10 through 13 follow this sequence: the original image, the CVD-simulated version of the original image, the recolored image, and the CVD-simulated version of the recolored image. The CVD-simulated images demonstrate how the images are perceived by individuals with the corresponding type of CVD. The examples provided focus on protanopia (first row) and deuteranopia (second row) due to space constraints. Additional results for tritanopia and recolored images at varying severity levels are included in the appendix.

1. '''Method 1: Daltonization Baseline''':
[[File:Method1.png|400px|thumb|right|Figure 10: Method 1 Results]]

The Daltonization method provides a foundational approach for recoloring images to enhance visibility for individuals with CVD. Key takeaways from Figure 10 include:

* The method demonstrates significant improvements for protanopia, as seen in the first row, where the recolored images show clear color differences and high contrast. However, for deuteranopia, as shown in the second row, the recolored images exhibit less visible improvements, with lower contrast. This inconsistency highlights the method's limited ability to generalize across different types of CVD.
* The method does not account for severity levels or individual differences in CVD perception, which presents an opportunity for further improvement.
* While the recolored images achieve high contrast between confusing colors, the overall perception of the original image is not preserved. This reduction in naturalness may impact the aesthetic quality and recognizability of the image.
* Performance: this method is the fastest among the methods tested, as it relies solely on matrix transformations. This makes it computationally efficient and suitable for real-time applications.

The Daltonization method provides a baseline for recoloring but requires enhancements in flexibility, contrast optimization across CVD types, and personalization for varying severity levels.

2. '''Method 2: Optimizing Objective Functions''':
[[File:Method2.png|400px|thumb|right|Figure 11 Method 2 Results]]
* While this method aims to balance naturalness and contrast, the resulting recolored images are similar to the original ones. A possible reason for this is the sensitivity of the loss function to the beta parameter, which requires careful tuning.
* The recolored images exhibit some loss of fine details, likely due to the use of the k-means clustering algorithm, which simplifies color representation across the image.
* This algorithm has a very slow runtime, taking over one minute per image. The primary bottlenecks are the color clustering step and the optimization of the objective function, which can be improved significantly.
* Despite its limitations, this method introduces a flexible framework for customizing loss functions, enabling further improvements. This flexibility was leveraged to refine the method in subsequent methods.

3. '''Method 3: Adjustments Near Confusion Lines with Improved Method 2''':
[[File:Method3.png|400px|thumb|right|Figure 12 Method 3 Results]]
* This method produces recolored images with reasonable contrasts between confusing colors while preserving the naturalness of the image well. It can also account for varying severity levels for each CVD type, providing more personalized recoloring.
* The performance of the algorithm was improved significantly, reducing from over one minute to approximately 4 seconds per image.
* In the appendix, we included results with color plates, which commonly used for diagnosing color vision deficiencies, are included in the appendix. This method shows good results, with numbers becoming more easily visible in the CVD-simulated recolored images.
* Some limitations include the fact that this method sometimes lacks sufficient contrast, particularly for the deuteranopia type. It is also sensitive to parameters, such as the shift factor for colors near the confusion lines, which requires careful tuning.

4. '''Method 4: Improved with GMM-based Method''':
[[File:Method4.png|400px|thumb|right|Figure 13 Method 4 Results]]
* This method creates recolored images with very high contrast, making the colors in the images easily distinguishable, even for individuals with severe CVD.
* By using GMM-based clustering instead of k-means, this method preserves most of the image details. The more sophisticated clustering allows for a better representation of the original color distribution, reducing the loss of fine details.
* The runtime for this method is significantly faster than most others, taking only around 1 second per image. This makes it highly practical for real-time applications.
* While the method performs well in enhancing contrast, some recolored images lose the naturalness of the original images. Additionally, certain colors in the recolored images do not transition smoothly, which might be attributed to the clustering step in the process.

==== Quantitative Results ====
Below are some quantitative results from six metrics with the performance for each method:

{| class="wikitable"
|+ Table 1: Quantitative Evaluation Results for Mathematical Methods
! Original vs Recolored !! Method 1 !! Method 2 !! Method 3 !! Method 4
|-
| SSIM || 0.0066 || 0.9998 || 0.9988 || 0.9902
|-
| TCC || 0.4211 || 0.0001 || 0.0003 || 0.0005
|-
| CD ΔE76 || 57.4513 || 0.0217 || 0.0632 || 0.1057
|-
| CIEDE2000 || 41.2667 || 0.0229 || 0.0675 || 0.1312
|-
| CIEDE94 || 57.3637 || 0.0217 || 0.0630 || 0.1056
|-
| D-CIELAB ΔEab || 2.1314 || 3.8863 || 7.6867 || 8.0045
|-
| Time/image || 0.2s || 1m13s || 4.4s || 1.6s
|}

* SSIM: Measures structural similarity between images, combining luminance, contrast, and structure components. Computed using `torchmetrics.StructuralSimilarityIndexMeasure`.

* TCC: Evaluates changes in total color contrast, compares random pixel pairs in each image and calculates the difference in their color distances.

* D-CIELAB ΔEab [14]: Quantifies perceptual color differences for dichromats under specific CVD types.

* CD ΔE76, CIEDE2000, CIEDE94: Standard perceptual color difference metrics, computed with scikit-image package. ΔE76 is basic Euclidean distance in Lab space, while CIEDE2000 and CIEDE94 include perceptual corrections.

Ovearll, method 4 stands out as the best-performing approach, delivering high contrast, preserving image details through GMM-based clustering, and achieving the fastest runtime, while addressing many limitations of the earlier methods.

=== Deep Learning based methods ===
The results focus on evaluating the performance of the above neural network architectures—Conditional Parallel RGB MLP, Deep U-Net, and Conditional Autoencoder. Quantitive metrics such as Structural Similarity Index (SSIM), total color contrast (TCC), Chromatic Difference (CD), and inference time were used to assess the effectiveness of the models provided in [1] and [2].

==== Qualitative Results ====
The recolored outputs were visually evaluated to determine their alignment with expected results. The 'expected' results for supervised mean how closely they resemble ground truth recolored image and for unsupervised method mean how much contrast and naturalness is observed in the CVD simulated recolored images compared to original.
The results and takeaways can be summarized as follows:

1. '''Conditional Parallel RGB MLP''': (Figure 5)
[[File:Mlp_res.png|right|400px|thumb|Figure 5 Conditional MLP: Model failure]]
* Recoloring was inconsistent, with visible artifacts in regions where spatial correlations were essential.
* The pixels seemed more discretized, suggesting that disentanglement was not very useful for this case (especially naturalness).
* Failed to preserve natural color transitions, particularly in complex images.
2. '''Conditional U-Net''': (Figure 6, 7)
[[File:Unet_res1.png|right|400px|thumb|Figure 6 Conditional U-Net: Model failure]]
[[File:Unet_res2.png|right|400px|thumb|Figure 7 Conditional U-Net: CVD Simulated examples]]
* Produced stable recoloring, preserving structural details.
* Initially showed improvement towards resembling ground truth, but over time started 'reconstructing' the colors of the original image.
* The CVD simulations of recolored versus original were similar or worse meaning that the model was not doing well for this task
* Sometimes it over-saturated some colors, affecting the visual appeal.
3. '''Conditional Autoencoder''': (Figure 8, 9)
[[File:ae_res1.png|right|400px|thumb|Figure 8 Conditional Autoencoder: Majority good results]]
[[File:ae_res1.png|right|400px|thumb|Figure 9 Conditional Autoencoder: Marginal or negative improvement + Blurriness]]
* Achieved smooth and natural recoloring, with fewer artifacts.
* Showed the highest contrast improvement among the three models.
* In some cases, hurt the contrast in the CVD simulated colors and in some there was marginal improvement in contrast.
* Blurriness in the recolored images was seen (possibly due to naturalness factor being more prioritized even though weight coefficients in the loss term favored contrast (alpha = 0.25, beta = 1.0)).

==== Quantitative Results ====
Based on the above qualitative results, we decided to score and evaluate metrics for comparison with related work only using the Conditional Autoencoder.
As mentioned above, the evaluation metrics are adapted from [1] and [2]. Please refer to the definitions in the paper, as we have used the same. On a high level, the three components are:
* SSIM: Measures the structural similarity between the original and recolored images, ensuring the structural integrity of the recolored image is maintained.
<math>
SSIM(X, Y) = \frac{(2\mu_X\mu_Y + c_1)(2\sigma_{XY} + c_2)}{(\mu_X^2 + \mu_Y^2 + c_1)(\sigma_X^2 + \sigma_Y^2 + c_2)}
</math>

* Total Color Contrast: Quantifies the visibility improvement between indistinguishable colors for CVD individuals.
<math>
TCC = \frac{1}{n_1} \sum_{(i,j) \in \Omega_1} |x_i - x_j|
+ \frac{1}{N \cdot n_2} \sum_{i=1}^{N} \sum_{j \in \Omega_2} |x_i - x_j|
</math>
* Chromatic Difference: Quantifies the perceptual differences in color before and after recoloring, ensuring enhanced distinguishability
<math>
CD(i) = \sqrt{\lambda (l_i' - l_i)^2 + (a_i' - a_i)^2 + (b_i' - b_i)^2}
</math>
(lamda is a constant, not wavelength and l,a,b represent LAB space coordinates of recolored (') and original respectively.)
* Inference Time: Determines the computational efficiency of the models.

The key results are in Table 2 and takeaways for the Conditional Autoencoder can be summarized as follows:

{| class="wikitable" style="text-align:center; width:40%; margin:auto;"
|-
! Metric
! Value
|-
| Inference Time
| 2.6 seconds/image
|-
| SSIM ("Structure")
| 0.8707
|-
| Total Color Contrast ("Distinguishability")
| 0.5771 (vs. ~0.851)*
|-
| Chromatic Difference ("Color")
| 0.3521 (vs. ~0.963)*
|+ '''Table 2: Quantitative Evaluation Results'''
|}

Note: * indicates results from paper [2] for protan/deutan whichever is larger.

* TCC and CD are good but not as good as paper [2] because they use optimize networks for each CVD type separately.
* Blurry (SSIM is not optimized for enough)
* Mixing CVD types in the same network needs to be more sophisticated

== Conclusions ==
Through our (many) experiments, we learned a couple of things:

1. '''Model Effectiveness''':
Among the models, the Conditional Autoencoder showed the best balance between enhancing color contrast and preserving naturalness. It improved the distinguishability of colors for CVD individuals while maintaining a smooth, visually appealing output. However, it produced slightly blurry images, which could be improved with better loss functions or refinement techniques. The Conditional U-Net was also effective in preserving structure and providing stable recoloring, but it required careful training to avoid overfitting. The Conditional Parallel RGB MLP, while computationally fast, lacked the ability to capture spatial relationships between pixels, making it unsuitable for this task.

2. '''Importance of Loss Functions''':
Designing appropriate loss functions was crucial for achieving the right balance between naturalness, contrast enhancement, and structural preservation. The global and local contrast losses significantly improved the visibility of recolored images, while the naturalness loss ensured that the outputs did not look artificial. Incorporating metrics like SSIM and Chromatic Difference into the evaluation also helped us better understand how well the models performed.

3. '''Challenges with Data''':
One of the biggest challenges was ensuring that the dataset effectively represented real-world scenarios for CVD individuals. Simulating CVD perceptions and generating recolored images that matched those perceptions required a well-defined pipeline. A more diverse dataset or additional user studies with CVD participants could help fine-tune the models further.

4. '''Computational Efficiency''':
While models like the Conditional Autoencoder and Conditional U-Net provided high-quality recoloring, their inference times were moderate, making them feasible for real-time applications. Optimizing these models further could make them more scalable for real-world use cases, such as accessibility tools in apps or websites.

5. '''What Worked and What Didn’t''':
* Worked: Contrast enhancement methods using local and global losses were effective in improving visibility for CVD individuals. Transformer-inspired loss functions borrowed from Swin architecture added robustness.
* Didn’t Work: Pixel-wise methods like the Conditional RGB MLP struggled due to their inability to handle spatial dependencies. Additionally, overfitting was a recurring issue in larger architectures without careful training.

6. '''Future Directions''':
* Better Loss Functions: Refining the loss functions to address issues like blurriness in outputs could further improve results.
* User Studies: Testing the models with real CVD participants would provide valuable insights and help validate the results.
* Model Optimization: Reducing the computational cost of high-performing models like the Conditional Autoencoder could make them more practical for deployment.
* Exploration of New Architectures: Trying newer methods, such as lightweight transformers or diffusion-based models, might enhance recoloring performance while maintaining efficiency.

While there’s still room for improvement, our models demonstrated the potential of deep learning in addressing the challenges faced by individuals with CVD. Our future work would focus on refining these methods and bringing them closer to practical, everyday applications.

== References ==
[1] Li, H., Zhang, L., Zhang, X., Zhang, M., Zhu, G., Shen, P., ... & Shah, S. A. A. (2020). Color vision deficiency datasets & recoloring evaluation using GANs. Multimedia Tools and Applications, 79, 27583-27614.

[2] Chen, L., Zhu, Z., Huang, W., Go, K., Chen, X., & Mao, X. (2024). Image recoloring for color vision deficiency compensation using Swin transformer. Neural Computing and Applications, 36(11), 6051-6066.

[3] Jiang, S., Liu, D., Li, D., & Xu, C. (2023). Personalized image generation for color vision deficiency population. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22571-22580).

[4] Huang, J.-B., Chen, C.-S., Jen, T.-C., & Wang, S.-J. (n.d.). Image recolorization for the colorblind [GitHub repository]. Retrieved December 12, 2024, from https://github.com/jbhuang0604/RecolorForColorblind

[5] Dietrich, J. (n.d.). Daltonize Python Package [GitHub repository]. Retrieved December 12, 2024, from https://github.com/joergdietrich/daltonize/blob/main/daltonize/daltonize.py

[6] Dougherty, B., & Wade, A. (2000). Vischeck. Retrieved December 12, 2024, from https://www.vischeck.com/

[7] Brettel, H., Viénot, F., & Mollon, J. D. (1997). Computerized simulation of color appearance for dichromats. Josa a, 14(10), 2647-2655.

[8] Zhu, Z., Toyoura, M., Go, K., Fujishiro, I., Kashiwagi, K., & Mao, X. (2019). Processing images for red–green dichromats compensation via naturalness and information-preservation considered recoloring. The Visual Computer, 35, 1053-1066.

[9] Zhu, Z., Toyoura, M., Go, K., Kashiwagi, K., Fujishiro, I., Wong, T. T., & Mao, X. (2021). Personalized image recoloring for color vision deficiency compensation. IEEE Transactions on Multimedia, 24, 1721-1734.

[10] Tsekouras, G. E., Rigos, A., Chatzistamatis, S., Tsimikas, J., Kotis, K., Caridakis, G., & Anagnostopoulos, C. N. (2021). A novel approach to image recoloring for color vision deficiency. Sensors, 21(8), 2740.

[11] Huang, J. B., Chen, C. S., Jen, T. C., & Wang, S. J. (2009, April). Image recolorization for the colorblind. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 1161-1164). IEEE.

[12] Color-Blindness.com. (n.d.). COBLIS - Color Blindness Simulator. Retrieved December 13, 2024, from https://www.color-blindness.com/coblis-color-blindness-simulator/

[13] DaltonLens. (n.d.). DaltonLens-Python [Computer software]. GitHub. Retrieved December 13, 2024, from https://github.com/DaltonLens/DaltonLens-Python

[14] Interactive Science and Engineering Tools. (n.d.). D-CIELAB GitHub repository. Retrieved December 13, 2024, from https://github.com/ISET/D-CIELAB

== Appendix I ==
* [https://github.com/rainasong/psych221-aut24-final-project.git Code]
* [https://drive.google.com/drive/folders/10WMXPbtpV7Hy5_qBA_TCEbW-kCpj1D7v Dataset]

=== Additional results ===
1. '''Recolored Images - Conditional Autoencoder'''
<div style="text-align: center;">
<div style="display: inline-block; vertical-align: middle;">
[[File:eb_1.png|400px|Wikipedia encyclopedia]]
</div>
<div style="display: inline-block; vertical-align: middle;">
[[File:eb_2.png|400px]]
</div>
</div>

2. '''Loss curves'''
<div style="text-align: center;">
<div style="display: inline-block; vertical-align: middle;">
[[File:loss_ae.png|350px|thumb|Conditional Autoencoder]]
</div>
<div style="display: inline-block; vertical-align: middle;">
[[File:loss_unet.png|350px|thumb|Conditional U-Net]]
</div>
<div style="display: inline-block; vertical-align: middle;">
[[File:loss_mlp.png|350px|thumb|Conditional MLP]]
</div>
<div style="clear: both; text-align: center;">
Losses: Conditional Autoencoder, Conditional U-Net, and Conditional MLP
</div>
</div>

3. '''Mathematical method results with color plates'''

<div style="text-align: center;">
<div style="display: inline-block; vertical-align: middle;">
[[File:Method1-color-plates.png|400px|thumb|Method 1 Color Plates Results]]
</div>
<div style="display: inline-block; vertical-align: middle;">
[[File:Method2-color-plates.png|400px|thumb|Method 2 Color Plates Results]]
</div>
</div>

<div style="text-align: center;">
<gallery mode="nolines" widths="400px" heights="300px" caption="Method 3 Color Plates Results for Protanopia, Deuteranopia, and Tritanopia with Severity Levels">
File:Method3-protan.png|Protanopia
File:Method3-deutan.png|Deuteranopia
File:Method3-tritan.png|Tritanopia
</gallery>
</div>

<div style="text-align: center;">
<gallery mode="nolines" widths="400px" heights="300px" caption="Method 4 Color Plates Results for Protanopia, Deuteranopia, and Tritanopia with Severity Levels">
File:Method4-protan.png|Protanopia
File:Method4-deutan.png|Deuteranopia
File:Method4-tritan.png|Tritanopia
</gallery>
</div>

== Appendix II ==
'''Ishikaa''':
* Training, evaluation and visualization for all deep learning methods (MLP, U-Net and Autoencoder)
* GMM recoloring method in Python & adding severity index
* 'Ground Truth' dataset creation and logging
* AWS Compute setup & configuration
* Written Report & Presentation

'''Raina''':
* Researching, writing and running scripts for four (and more) mathematical-based methods (Daltonization, Optimization-based, Confusion lines based, GMM based and some other experiments such as a segmentation-based method which was discarded due to slow performance)
* Results generation and validation for all scripts written
* Evaluation metrics scripts for mathematical methods
* Written Report & Presentation

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-13T13:27:46Z

Rainas: /* Quantitative Results */

== Introduction ==
Color Vision Deficiency (CVD) affects approximately 350 million individuals worldwide, impairing their ability to distinguish certain colors. Image recoloring for individuals with CVDs has been a well-researched area, with numerous attempts aimed at creating images that make colors more distinguishable for those with CVDs. CVDs are typically classified into three main types: protanopia (difficulty perceiving red), deuteranopia (difficulty perceiving green), and tritanopia (difficulty perceiving blue). These conditions arise due to the absence or malfunction of certain types of cone cells in the retina, which are responsible for color vision. For instance, the lack or defect of red or green cones leads to protanopia or deuteranopia, respectively. This can result in partial or complete loss of perception of specific colors. Moreover, individuals with CVD can be dichromats (completely missing one type of cone cell), or anomalous trichromats (having all three types of cones but with altered sensitivity), causing milder color perception issues. Most research has focused on protanopia and deuteranopia, as these conditions are more prevalent, and only a few consider different severity levels.

In this work, we aim to consider all three types of CVDs, taking into account varying levels of severity for personalization. We explore existing methods in the field and experiment through two main approaches: mathematical transformations and deep learning techniques. We will start by reviewing current advancements in these two domains, followed by presenting our experiments and results. Evaluations of each method will be provided, leading to a discussion of our findings and outlining potential directions for future work.

The motivation for this work arose from a personal experience. While admiring a rainbow, a phenomenon I find among the most beautiful in the world, with a friend who has deuteranopia, I realized that they were unable to distinguish the vibrant array of colors. This experience highlighted the emotional and perceptual gap caused by CVD, inspiring the goal of this project: to develop personalized and efficient tools that enhance color perception for individuals with CVDs. Ultimately, we aim to enable those with CVDs to enjoy the same vivid experiences, such as the beauty of a rainbow, experienced by those with normal color vision.

== Background ==
In recent years, numerous methods have been developed to recolor images for individuals with CVDs, ranging from traditional mathematical approaches to advanced deep learning techniques. This section focuses on the prominent recent works in these two categories.

=== Mathematical-based methods ===
Mathematical approaches to image recoloring for individuals with CVDs have been extensively developed to enhance color discrimination while trying to preserve the natural appearance of images. These methods typically involve color space transformations, optimization techniques, and perceptual modeling to achieve their objectives.

==== Daltonization ====
Daltonization enhances images for individuals with CVD by correcting colors based on the simulated deficiency. The process involves comparing the original LMS values with the simulated deficient values to compute the error:
<math display="block">
\text{Error}_{\text{LMS}} = \text{LMS}_{\text{original}} - \text{LMS}_{\text{simulated}}
</math>

The error is then mapped back to the RGB space using a correction matrix because the error contains the information that dichromats cannot see, and the correction matrix rotates it to a part of the spectrum that they can see. For example, the correction matrix, as implemented in tools like Daltonize [5] and Vischeck [6], is:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

The corrected RGB values are added back to the original LMS values to generate a daltonized image that improves contrast for CVD viewers.

==== Optimization-based Method ====
Zhu et al. [8] introduced an optimization-based recoloring framework for red-green dichromacy, aiming to balance naturalness and contrast. The framework minimizes a total loss function defined as:

<math display="block"> E = \beta E_{\text{nat}} + E_{\text{cont}} </math>

where <math>\beta</math> is a scalar weight that controls the trade-off between the two objectives: naturalness preservation (<math>E_{\text{nat}}</math>) and contrast enhancement (<math>E_{\text{cont}}</math>).

The naturalness term, <math>E_{\text{nat}}</math>, ensures that the recolored image closely resembles the original image for CVD viewers by minimizing perceptual differences:

<math display="block"> E_{\text{nat}} = \sum_{i=1}^N \| c_i^+ - c_i \|^2, </math>

where:
* <math>N</math> is the total number of pixels in the image,
* <math>c_i</math> is the original color of the <math>i</math>-th pixel,
* <math>c_i^+</math> is the recolored value of the <math>i</math>-th pixel,
* <math>\| c_i^+ - c_i \|</math> is the Euclidean distance, measuring the perceptual difference between the original and recolored colors.

The contrast term, <math>E_{\text{cont}}</math>, enhances the distinguishability of colors in the recolored image by minimizing changes in color contrast:

<math display="block"> E_{\text{cont}} = \sum_{i \neq j} \| (c_i^+ - c_j^+) - (c_i - c_j) \|^2, </math>

where:
* <math>(c_i^+ - c_j^+)</math> is the perceived color difference between pixels <math>i</math> and <math>j</math> after recoloring,
* <math>(c_i - c_j)</math> is the original color difference,
* <math>\| (c_i^+ - c_j^+) - (c_i - c_j) \|</math> represents the deviation in color contrast before and after recoloring.

To address the limitations of this approach, Zhu et al. [9] proposed a degree-adaptable framework incorporating a transformation matrix <math>T</math> that simulates CVD perception. The transformation matrix is defined as:

<math display="block"> T = \begin{bmatrix} t_{11} & t_{12} & t_{13} \\ t_{21} & t_{22} & t_{23} \\ t_{31} & t_{32} & t_{33} \end{bmatrix}, </math>

where <math>t_{ij}</math> are the elements representing the relationships between the original and perceived LMS (Long, Medium, Short wavelength) cone responses for individuals with CVD.

The degree-adaptable loss function extends the optimization by adjusting weights based on perceptual importance, defined as:

<math display="block"> E = \beta \sum_{i=1}^N \alpha_i \| T(c_i^+ - c_i) \|^2 + \sum_{i \neq j} \| T(c_i^+ - c_j^+) - T(c_i - c_j) \|^2. </math>

Here:
* <math>\alpha_i</math> assigns weights to each pixel, prioritizing the preservation of colors with smaller perception errors,
* <math>\| T(c_i^+ - c_i) \|</math> measures the perceptual difference after recoloring,
* <math>\| T(c_i^+ - c_j^+) - T(c_i - c_j) \|</math> quantifies the deviation in color contrast under CVD simulation.

This framework improves both contrast and personalization but requires further optimization for real-time performance.

==== Confusion lines based Method ====
Tsekouras et al. [10] proposed a novel image recoloring approach for individuals with protanopia and deuteranopia, focusing on improving color naturalness and enhancing contrast. Their framework consists of four modules, with a key focus on shifting confusing colors along confusion lines in the CIE 1931 chromaticity diagram.

The process begins with fuzzy clustering, which identifies representative colors (key colors) from the input image. These key colors are then analyzed on the chromaticity diagram, where confusion lines—paths representing colors indistinguishable by individuals with CVD—serve as the basis for recoloring. Confusion lines are defined using the copunctal point of the missing cone type and another reference point:

<math display="block">
d(v, L) = \frac{\left|(x_{cp} - x_0)(y_0 - y_v) - (x_0 - x_v)(y_{cp} - y_0)\right|}{\sqrt{(x_{cp} - x_0)^2 + (y_{cp} - y_0)^2}},
</math>

where:
* <math display="inline">v = (x_v, y_v)</math> is the chromaticity coordinate of the color,
* <math display="inline">L</math> is the confusion line passing through the copunctal point <math display="inline">(x_{cp}, y_{cp})</math> and another reference point <math display="inline">(x_0, y_0)</math>,
* <math display="inline">d(v, L)</math> measures the perpendicular distance from the point <math display="inline">v</math> to the confusion line <math display="inline">L</math>.

Confusing colors, identified as key colors lying on occupied confusion lines, are iteratively shifted to the nearest non-occupied confusion lines to enhance discriminability for CVD viewers. High-ranking colors, determined by their prominence in image clusters, are shifted to the nearest unoccupied confusion lines. This reallocation ensures that these colors are distinguishable to viewers with CVD while minimizing disruption to the image's overall color harmony.

After shifting, the luminance of the recolored key colors is optimized using a regularized objective function to balance naturalness and contrast:
<math display="block">
E = (E_1 + E_2) + \lambda E_3,
</math>

where:
* <math display="inline">E</math> is the total loss,
* <math display="inline">\lambda</math> is a weight parameter controlling the trade-off between contrast enhancement and naturalness preservation.

The first term, <math display="inline">E_1</math>, measures contrast enhancement for normal trichromats:

<math display="block">
E_1 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - b_j\| - \|f_D(a_{i,\text{rec}}) - f_D(b_j)\| \right|,
</math>

where:
* <math display="inline">n_A</math> and <math display="inline">n_B</math> are the number of key colors in clusters <math display="inline">A</math> and <math display="inline">B</math>, respectively,
* <math display="inline">a_i</math> is the chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">b_j</math> is the chromaticity of the <math display="inline">j</math>-th key color in cluster <math display="inline">B</math>,
* <math display="inline">f_D</math> is a function simulating the dichromatic vision of individuals with color vision deficiencies,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color.

The second term, <math display="inline">E_2</math>, measures contrast enhancement for dichromats:

<math display="block">
E_2 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - a_j\| - \|f_D(a_{i,\text{rec}}) - f_D(a_{j,\text{rec}})\| \right|,
</math>

where:
* <math display="inline">a_i</math> and <math display="inline">a_j</math> are the chromaticities of the <math display="inline">i</math>-th and <math display="inline">j</math>-th key colors in cluster <math display="inline">A</math>,
* <math display="inline">f_D(a_{i,\text{rec}})</math> simulates the dichromatic perception of the recolored chromaticity <math display="inline">a_{i,\text{rec}}</math>.

The third term, <math display="inline">E_3</math>, preserves the naturalness of the recolored image:

<math display="block">
E_3 = \frac{1}{n_A} \sum_{i=1}^{n_A} \|a_i - a_{i,\text{rec}}\|,
</math>

where:
* <math display="inline">a_i</math> is the original chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">\|a_i - a_{i,\text{rec}}\|</math> is the Euclidean distance between the original and recolored chromaticities, measuring how much the naturalness is preserved.

This method significantly enhances the contrast and naturalness of recolored images by leveraging confusion line geometry and regularized optimization. However, challenges remain in achieving real-time performance and handling cases where shifting may distort the aesthetic quality of the image.

==== GMM-based Method ====
Huang et al. [11] proposed an efficient and effective re-coloring algorithm for individuals with CVD using a Gaussian Mixture Model (GMM) to represent color distributions. The algorithm comprises four main steps: feature extraction, clustering using GMM, optimization of Gaussian components, and interpolation for recoloring.

Step 1 - Feature Extraction:
Each pixel in the input image is represented in the CIEL*a*b* color space, which approximates perceptual differences using the Euclidean distance between colors. The color feature vector <math display="inline">x</math> is used as input for clustering.

Step 2 - Clustering via GMM:
The color distribution of the image is modeled using a GMM with <math display="inline">K</math> Gaussian components:
<math display="block">
p(x|\Theta) = \sum_{i=1}^K \omega_i G_i(x|\theta_i),
</math>
where:
* <math display="inline">\Theta</math> is the parameter set containing all weights, means, and covariance matrices,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian,
* <math display="inline">G_i(x|\theta_i)</math> is the 3D normal distribution with parameters <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix).

Step 3 - Optimization:
To ensure color distinguishability for CVD viewers, the algorithm adjusts the mean vector of each Gaussian component using an optimization function that preserves the symmetric Kullback-Leibler (KL) divergence:
<math display="block">
D_{sKL}(G_i, G_j) = D_{KL}(G_i \| G_j) + D_{KL}(G_j \| G_i),
</math>
where:
* <math display="inline">D_{KL}(G_i \| G_j)</math> measures the dissimilarity between two Gaussian distributions <math display="inline">G_i</math> and <math display="inline">G_j</math>.

Step 4 - Interpolation for Recoloring:
After optimizing the Gaussians, the mapping function <math display="inline">M_i(\cdot)</math> relocates the mean vectors while maintaining covariance matrices. Interpolation ensures smooth transitions between recolored regions:
<math display="block">
T(x_j)_H = x_j^H + \sum_{i=1}^K p(i|x_j, \Theta) (M_i(\mu_i)_H - \mu_i^H),
</math>
where:
* <math display="inline">T(x_j)_H</math> is the hue adjustment for the <math display="inline">j</math>-th color,
* <math display="inline">M_i(\mu_i)_H</math> is the mapped hue of the <math display="inline">i</math>-th Gaussian's mean.

While the GMM-based approach effectively models color distributions and enhances the contrast of recolored images significantly, it has limitations:
* The accuracy of recoloring depends on the choice of <math display="inline">K</math>, which may vary for different images.
* The method assumes diagonal covariance matrices for computational efficiency, which may oversimplify real-world color distributions. Sometimes the colors in the recolored images are not very natural.
* The high computational complexity in the optimization step of this algorithm may be difficult for real-time applications.

=== Deep Learning based methods ===
Conventional methods for recoloring, including optimization-based approaches (as discussed above), fail to generalize well across varying severity levels and CVD types. While these methods improve color differentiation, they frequently compromise naturalness or require extensive computational resources, making them less suitable for real-time, efficient, personalized applications.

==== GAN-Based Recoloring for CVD ====

In [1] GANs (Generative Adversarial Networks) was explored for recoloring, with a backbone Pix2Pix-GAN, Cycle-GAN, and Bicycle-GAN structure showing promising results. These models are generate creative recolored images by learning mappings between normal and CVD-affected color spaces. However, this and existing GAN approaches struggle with balancing naturalness and contrast. This specific reference also requires paired datasets (since it is adapted from style transfer), making it computationally intensive and less suitable for personalization.

==== Swin Transformer Recoloring ====

The authors in [2] introduced a hierarchical vision transformer (SWIN) architecture that processes images through shifted windows, effectively capturing both local and global contextual information. In computer vision, this design generally allows efficient handling of high-resolution images and has been applied to various tasks, including image classification and object detection. Despite its robust performance, this architecture is still computationally intensive and does not inherently account for the specific needs of CVD individuals, as it lacks mechanisms for personalized color adjustments.

==== Personalized CVD-GAN ====

To cater to the diverse needs of the CVD population, the Personalized CVD-GAN [3] was developed. This model generates images that are not only CVD-friendly but also tailored to individual degrees of color vision deficiency. By disentangling color representations using a unique triple-latent structure in their method, continuous personalization was possible to adjust images according to specific CVD severities. While effective, this approach is computationally demanding, making it less practical for real-time applications. In our experiment, it took around 18 days for one epoch (or one iteration over the entire dataset).

Thus, existing methods either lack personalization or are too resource-intensive for widespread use.

== Methods ==
We aim to find effective and efficient ways to recolor images for people with CVD with the personalization of different severity levels. We start by exploring existing methods and identifying opportunities for improvement. Since mathematical-based approaches provide a solid foundation and are well-documented, we began our experiments by testing these methods, as described in the background. We later extended our exploration to deep learning based methods.

=== Mathematical based ===
We explored four main methods, building on the foundational work discussed in the background section.

==== Method 1: Daltonization as a Baseline ====
We started with the relatively intuitive Daltonization method, where we adjusted the colors in an image to compensate for color vision deficiencies by simulating how the colors appear to individuals with CVD. This involves computing the difference between the original and simulated color perception in the LMS (Long, Medium, Short wavelength) color space. The calculated error is then corrected and mapped back to the RGB space using a transformation matrix, resulting in a recolored image that enhances color differentiation for viewers with CVD.

The simulation of CVDs relies on the physiology of human vision, particularly the responses of the Long (L), Medium (M), and Short (S) wavelength-sensitive cones in the retina. The LMS color space is derived from the spectral sensitivities of these cones, making it an ideal framework for modeling human color perception.

To simulate CVD, we first transformed colors in RGB color space into the LMS color space using the following linear transformation matrix based on Stockman and Sharpe’s cone fundamentals:
<math display="block">
T_{\text{RGB-to-LMS}} = \begin{bmatrix}
0.3904725 & 0.54990437 & 0.00890159 \\
0.07092586 & 0.96310739 & 0.00135809 \\
0.02314268 & 0.12801221 & 0.93605194
\end{bmatrix}
</math>

For individuals with CVD, the missing cone’s response is replaced by a weighted combination of the remaining two cones. This approach, introduced by Brettel, Viénot, and Mollon (1997) [7], uses specific coefficients derived from cone sensitivities. For example, in protanopia (L-cone deficiency), the L-cone response is approximated using the M- and S-cone responses as:
<math display="block">
L_{\text{simulated}} = 0 \cdot L + 0.90822864 \cdot M + 0.008192 \cdot S
</math>

For deuteranopia (M-cone deficiency), the M-cone is replaced as:
<math display="block">
M_{\text{simulated}} = 1.10104433 \cdot L + 0 \cdot M - 0.00901975 \cdot S
</math>

For tritanopia (S-cone deficiency), the S-cone is replaced as:
<math display="block">
S_{\text{simulated}} = -0.15773032 \cdot L + 1.19465634 \cdot M + 0 \cdot S
</math>

These transformations allow accurate simulation of the perceptual experience of individuals with CVD. (The numbers are derived from [5]).

The error between the original and simulated is then mapped into the RGB color space using a deficiency-specific correction matrix, which adjusts the image to enhance contrast and recover lost color differences. The predefined correction matrix is applied to the error in RGB space, transforming it back into LMS space for final adjustments. The corrected LMS values are added back to the original values, producing a recolored image that improves visual accessibility for viewers with CVD. This approach uses the Daltonize-inspired correction matrix:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

==== Method 2: Optimizing Objective Function ====
To improve the results from the Daltonization method, we designed a framework inspired by methods discussed in the background, incorporating dominant color extraction, optimization-based recoloring, and edit propagation. This approach aims to find a balance between the naturalness and contrast while compensating colors that are not visible for corresponding CVD types.

===== 1. Extraction of Dominant Colors =====
We begin by extracting the dominant colors from the input image using fuzzy clustering via a K-means algorithm. This step identifies a reduced set of representative colors that capture the primary color information in the image:
<math display="block">
\mathbf{C} = \{\mathbf{c}_1, \mathbf{c}_2, \ldots, \mathbf{c}_N\},
</math>
where <math display="inline">N</math> represents the number of clusters, and <math display="inline">\mathbf{c}_i</math> represents the centroid of the <math display="inline">i</math>-th cluster.

===== 2. Optimization-Based Recoloring =====
Once the dominant colors are extracted, we apply an optimization process to adjust these colors. The optimization uses the formulas mentioned in [9], and aims to balance two key objectives:

1. Naturalness Preservation: Ensures the recolored image minimally deviates from the original.
<math display="block">
E_{\text{nat}} = \sum_{i=1}^N \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_i^{\text{original}}) \|^2,
</math>
where <math display="inline">\mathbf{T}</math> is the transformation matrix based on the severity and type of CVD, and <math display="inline">\mathbf{c}_i^{\text{original}}</math> is the original color.

2. Contrast Enhancement: Improves the differentiation of colors for individuals with CVD:
<math display="block">
E_{\text{cont}} = \sum_{i=1}^N \sum_{j>i} \left( \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_j) \|^2 - \| \mathbf{c}_i^{\text{original}} - \mathbf{c}_j^{\text{original}} \|^2 \right)^2.
</math>

The total objective function combines these two terms:
<math display="block">
E = \beta E_{\text{nat}} + E_{\text{cont}},
</math>
where <math display="inline">\beta</math> controls the trade-off between naturalness and contrast.

Optimization is performed using the L-BFGS-B algorithm to ensure efficient convergence under bounded constraints.

The transformation matrices for each type of CVD are the following, which are based on [12]:

<div style="text-align:center;">
<math display="inline">
T_{\text{Protanopia}} = \begin{bmatrix} 0.566 & 0.558 & 0 \\ 0.433 & 0.442 & 0.242 \\ 0 & 0 & 0.758 \end{bmatrix},
</math>
<math display="inline">
T_{\text{Deuteranopia}} = \begin{bmatrix} 0.625 & 0.7 & 0 \\ 0.375 & 0.3 & 0.3 \\ 0 & 0 & 0.7 \end{bmatrix},
</math>
<math display="inline">
T_{\text{Tritanopia}} = \begin{bmatrix} 0.95 & 0 & 0 \\ 0.05 & 0.433 & 0 \\ 0 & 0.567 & 1 \end{bmatrix}.
</math>
</div>

===== 3. Edit Propagation =====
After optimizing the dominant colors, we propagate these edits across the entire image to ensure smooth transitions. This propagation step leverages the CIE-Lab color space, which is perceptually uniform, meaning that the Euclidean distance in this space correlates well with human color perception. The process begins by mapping the original image and the optimized dominant colors into the Lab color space. In this space, the differences between the original and recolored dominant colors are computed to capture the adjustments made during the optimization step:
<math display="block">
\Delta L^* = \text{griddata}(\mathbf{c}^{\text{original}}, \mathbf{c}^{\text{recolored}} - \mathbf{c}^{\text{original}}, \mathbf{I}),
</math>
where <math display="inline">\mathbf{I}</math> represents the pixel values in the Lab color space. Once the interpolated changes are computed, they are applied to the Lab representation of the original image. Finally, the adjusted Lab values are converted back to the RGB color space to reconstruct the recolored image.

==== Method 3: Improved with Confusion Line Adjustments ====
This method builds upon the previous method by introducing enhancements in dominant color extraction, optimization, and edit propagation, while incorporating an additional step to adjust colors near confusion lines in the CIE 1931 xyY color space inspired by [10]. These improvements aim to further enhance contrast and naturalness of the recolored images. Moreover, this method adds flexibility in adjusting for different severity levels for each CVD type.

===== 1. Improvements on Method 2 =====
To improve the performance of dominant color extraction, we transitioned from traditional K-means to MiniBatch K-means. This algorithm processes data in small batches, significantly reducing computational time while maintaining accuracy in clustering. The number of dominant colors was also reduced from 50 to 30 to focus on key representative colors and further enhance efficiency. The optimization objective is refined to leverage vectorization, improving computational efficiency. The two key terms remain:
<math display="block">
E = \beta E_{\text{nat}} + (1 - \beta) E_{\text{cont}}.
</math>
The optimization objective was refined to significantly improve computational efficiency by replacing the nested loops in the contrast enhancement term with vectorized operations. In the original implementation, the pairwise differences between colors were calculated iteratively using <math display="inline">O(N^2)</math> nested loops. The improved version eliminates this overhead by leveraging array broadcasting to compute all pairwise differences simultaneously, and the transformation matrix <math display="inline">\mathbf{T}</math> is then applied to all pairwise differences in a single tensor operation:
<math display="block">
\mathbf{T}_{\Delta} = \text{tensordot}(\Delta_{ij}, \mathbf{T}),
</math>
and the norms are computed in parallel across the entire array. Additionally, the weighting parameter <math display="inline">\beta</math> was adjusted to favor naturalness preservation, ensuring better visual integrity in the recolored image.
The propagation step changed to use a k-d tree for fast nearest neighbor searches, replacing grid-based interpolation. This approach more efficiently matches each pixel in the Lab color space to the closest dominant color:
<math display="block">
\mathbf{I}_{\text{adjusted}} = \mathbf{C}_{\text{recolored}}[\text{k-d tree query}(\mathbf{I})],
</math>
where <math display="inline">\mathbf{I}</math> represents the pixel values in Lab space.
These refinements enable faster optimization while improving the balance between naturalness and contrast enhancement.

===== 2. Confusion Line Adjustments =====
An additional step adjusts colors near confusion lines in the CIE 1931 xyY color space to enhance distinguishability:

1. Confusion lines are defined for protanopia, deuteranopia, and tritanopia, based on [10]. For example, for protanopia:
<math display="block">
\text{Confusion Line: Start} = (0.735, 0.265), \quad \text{End} = (0.115, 0.885).
</math>

2. Colors near the confusion line are identified using orthogonal distance:
<math display="block">
d(\mathbf{xy}, L) = \frac{\| (\mathbf{xy} - \mathbf{p}_1) \times (\mathbf{p}_2 - \mathbf{p}_1) \|}{\|\mathbf{p}_2 - \mathbf{p}_1\|},
</math>
where <math display="inline">\mathbf{p}_1</math> and <math display="inline">\mathbf{p}_2</math> are the start and end points of the confusion line.

3. Identified colors are shifted orthogonally away from the line:
<math display="block">
\mathbf{xy}_{\text{adjusted}} = \mathbf{xy} + \lambda \mathbf{v}_{\perp},
</math>
where <math display="inline">\mathbf{v}_{\perp}</math> is a perpendicular vector, and <math display="inline">\lambda</math> is a scaling factor.

===== 3. Personalise with Severity Levels =====
To take into account of severity levels, the transformation matrix <math display="inline">\mathbf{T}</math> linearly interpolates between normal vision and full CVD perception based on severity and type:
<math display="block">
\mathbf{T} = (1 - s) \mathbf{I} + s \mathbf{T}_{\text{CVD}},
</math>
where <math display="inline">s</math> represents the severity of CVD (0-100%), <math display="inline">\mathbf{I}</math> is the identity matrix, and <math display="inline">\mathbf{T}_{\text{CVD}}</math> is the full transformation matrix specific to protanopia, deuteranopia, or tritanopia. Such a method is based on DaltonLens simulator [13].

These improvements significantly enhanced both the effectiveness and efficiency of the recoloring process on top of Method 2.

==== Method 4: Improved with GMM-based Method ====
The last mathematical method we exprimented enhances recoloring by integrating a Gaussian Mixture Model (GMM)-based global recoloring algorithm. The method also applies nonlinear adjustments for colors near confusion lines to ensure improved contrast and naturalness.

===== 1. GMM-Based Global Recoloring =====
The image is first resized and transformed into the Lab color space. A GMM is applied to cluster the color distribution into <math display="inline">K</math> components, optimizing the number of clusters using the Bayesian Information Criterion (BIC):
<math display="block">
\text{BIC} = -2 \cdot \text{log-likelihood} + P \cdot \log(N),
</math>
where <math display="inline">P</math> represents the model parameters and <math display="inline">N</math> is the number of pixels.

The GMM means are simulated using the transformation matrix <math display="inline">T</math> with severity levels taken into account, and the symmetric Kullback-Leibler (KL) divergence (<math display="inline">D_{\text{sKL}}</math>) is calculated between pairs of clusters:
<math display="block">
D_{\text{sKL}}(i, j) = D_{\text{KL}}(G_i \| G_j) + D_{\text{KL}}(G_j \| G_i),
</math>
where <math display="inline">G_i</math> and <math display="inline">G_j</math> are Gaussian components, and <math display="inline">D_{\text{KL}}</math> represents the KL divergence. The GMM cluster means are then adjusted by solving a nonlinear least-squares problem to minimize the discrepancy.

===== 2. Adjusting Near Confusion Lines Improved =====
Following global recoloring, colors near confusion lines in the CIE 1931 xyY color space are further adjusted based on formulas used in Method 3. Nonlinear scaling is applied to amplify the shifts for pixels closer to the line:
<math display="block">
w = \left( \frac{\text{threshold} - d}{\text{threshold}} \right)^2,
</math>
where <math display="inline">w</math> is the scaling factor.

The adjustments from the GMM and confusion line steps are combined to produce the final recolored image. These enhancements make the method more robust and effective for individuals with varying levels of CVD.

Through our experimentation with mathematical methods, we gained a deeper understanding of the algorithmic aspects of image recoloring for CVD, particularly in balancing naturalness, contrast, and personalization. Building on these insights, we transitioned to exploring deep learning approaches, applying the lessons learned to guide training, evaluation, and ground truth dataset generation.

=== Deep Learning based ===

==== Task Overview ====
Given an input RGB image and a label for the user (as shown in the figure), we want a deep learning model to output a recolored RGB image that is specific to that user. More details on inputs and outputs are discussed in further sections but an overview is shown in Figure 1. All of the code was done in Python using a deep learning framework called [https://pytorch.org PyTorch]
[[File:Io.png|right|thumb|200px|Figure 1: Dataset]]

==== Types ====
1. ''' Supervised methods ''':
These are deep learning models that require a 'ground truth' recolored image for the neural network to learn recolorization. While these methods are simple, easy to train and integrate the user label, they require an already present ground truth comparison of expected output.

2. ''' Unsupervised methods ''':
These models are trained without a ground truth and can also encode user label information while training. They are generally better at generating more natural images, but they require more compute and sophisticated model architectures or loss functions for the recoloring task

==== Dataset ====
The dataset used for this project was constructed specifically to address the challenges of recoloring images for individuals with color vision deficiency (CVD). We first gathered an open-source RGB image dataset from [2]. To improve the capability of the proposed model to enhance the contrast between CVD-indistinguishable color
pairs, in their study, they created a new dataset consisting of 141,000 pictures of both natural scenes and artificial images containing
CVD-confusing colors without labels. To generate labels (and ground truth recolored images for supervised methods), we randomly sampled 15,000 images and recolored by simulating random labels for severity and type of CVD. The recoloring for ground truth images was done using a [https://github.com/jbhuang0604/RecolorForColorblind/tree/master MATLAB script] (adapted to Python) from [4]. Note: The open-source tools used in the Python version for the recoloring script were [https://scikit-image.org Scikit-Image], [https://scipy.org Scipy] and [https://python-colormath.readthedocs.io/en/latest/ Colormath].

As shown in Figure 1, each sample in the dataset consists of:

1. ''' Original RGB Image''' : High-resolution images, resized to <code> 256x256</code> pixels and normalized to <code>[0,1]</code> range, representing the standard color space.

2. ''' CVD Labels ''' : Condition labels encoded as <code>severity * [protan, deutan]</code>, where severity ranges from 0.1 to 1.0. For example, a label <code>[0.6, 0]</code> corresponds to protanopia at 60% severity.

Data augmentation techniques such as random rotations, crops, and brightness adjustments were applied to expand the dataset, ensuring robust model generalization across diverse scenarios.

==== Supervised Methods ====
===== Conditional Parallel RGB MLP =====
[[File:mlp.png|right|thumb|Figure 2: Conditional MLP architecture]]
As shown in Figure 2, the model predicts the R, G, and B channels separately using an independent multi-layer perceptron (MLP) for each channel. The input image is concatenated with the label encoding along the channel dimension and is passed to 3 parallel MLPs simultaneously. These parallel networks are learned to predicted R, G, B channels of a recolored image based on given ground truth. The outputs from each of these networks are concatenated to produce the recolored RGB image of same spatial dimensions as input. Essentially, each channel is disentangled, enabling targeted adjustments.

The loss function used to train was pixel wise, mean-squared error loss:
<math>
\mathcal{L}_{\text{MSE}} = \frac{1}{N} \sum_{p=1}^{N} \left( I(p) - I'(p) \right)^2
</math>

Where:
* I, I': Recolored (model output) image and ground truth recolored image respectively
* p: Image index
* N: Total number of images

===== Conditional U-Net =====
In a similar fashion of inputs, a convolutional neural network (CNN)-based U-Net architecture was tested to generate a full recolored image as output. The conditional inputs here affect both the encoder and decoder. [[File:Unet condtional.png|right|thumb|Figure 3: Conditional U-Net architecture]]
U-Nets are widely used in computer vision tasks and are very robust to new tasks as well. The architecture we adopted is shown in Figure 3.
The loss function used to train the U-Net was a commonly used VGG Perceptual Loss:
<math>
\mathcal{L}_{\text{VGG}} = \sum_{l} \frac{1}{N_l} \| \phi_l(I) - \phi_l(I') \|_2^2
</math>

Where:
* I and I': are recolored (model output) and ground recolored images respectively
* <math>\phi_l</math> is the l-th of the pre-trained VGG network

==== Unsupervised Methods ====
===== Conditional Autoencoder =====
As shown in Figure4, an unsupervised CNN-based encoder-decoder network was trained to reconstruct full recolored images with a CVD-aware color palette. The key to making this network align with the recoloring task was the loss functions. The loss functions we used to train this network were inspired from [2]. [[File:Ae.png|right|350px|thumb|Figure 4: Conditional Autoencoder architecture]]

The total loss function is given by:
<math>
\mathcal{L}_{\text{total}} = \alpha \cdot \mathcal{L}_{\text{naturalness}} + 2 \cdot (1 - \alpha) \cdot \mathcal{L}_{\text{contrast}}
</math>

Where:
<math>
\mathcal{L}_{\text{contrast}} = \beta \cdot \mathcal{L}_{\text{global}} + (2 - \beta) \cdot \mathcal{L}_{\text{local}}
</math>

The components of the loss functions are described below:

1. '''Global Contrast Loss''':
The global contrast loss ensures that the overall contrast of the recolored image is preserved. It is defined as
<math>
\mathcal{L}_{global} = \frac{1}{\|\omega\|} \sum_{<x, y> \in \epsilon \omega} \text{CL}(x, y)
</math>

2. '''Local Contrast Loss''':
The local contrast loss focuses on preserving the contrast within a small neighborhood around each pixel. <math>
\mathcal{L}_{l} = \frac{1}{N} \sum_{x=1}^{N} \sum_{y \in \omega_x} \frac{\text{CL}(x, y)}{\|\omega_x\|}
</math>

Note:

<math>
\text{CL}(x, y) = \|\hat{c}_x' - \hat{c}_y'\| - \|c_x - c_y\|
</math>

* x,y: Two distinct pixels in the image.
* cx and cy: CVD simulated colors of original image
* c^x′and c^y: CVD simulated colors of recolored image (model output)
* ||w||: Global (or large) window of image
* ||wx||: Local window or neighborhood around a pixel x

3. '''Naturalness Loss''':
The naturalness loss drives output image to have colors that are visually similar and close to natural distributions. <math>
\mathcal{L}_{\text{natural}} = 1 - \text{SSIM}(I', I)
</math>

Where:
* I(i), I'(i): Original and recolored images respectively

== Results ==

=== Mathematical based methods ===

==== Qualitative Results ====
The qualitative results and key observations from the experiments are summarized below.

The result images presented in Figures 10 through 13 follow this sequence: the original image, the CVD-simulated version of the original image, the recolored image, and the CVD-simulated version of the recolored image. The CVD-simulated images demonstrate how the images are perceived by individuals with the corresponding type of CVD. The examples provided focus on protanopia (first row) and deuteranopia (second row) due to space constraints. Additional results for tritanopia and recolored images at varying severity levels are included in the appendix.

1. '''Method 1: Daltonization Baseline''':
[[File:Method1.png|400px|thumb|right|Figure 10: Method 1 Results]]

The Daltonization method provides a foundational approach for recoloring images to enhance visibility for individuals with CVD. Key takeaways from Figure 10 include:

* The method demonstrates significant improvements for protanopia, as seen in the first row, where the recolored images show clear color differences and high contrast. However, for deuteranopia, as shown in the second row, the recolored images exhibit less visible improvements, with lower contrast. This inconsistency highlights the method's limited ability to generalize across different types of CVD.
* The method does not account for severity levels or individual differences in CVD perception, which presents an opportunity for further improvement.
* While the recolored images achieve high contrast between confusing colors, the overall perception of the original image is not preserved. This reduction in naturalness may impact the aesthetic quality and recognizability of the image.
* Performance: this method is the fastest among the methods tested, as it relies solely on matrix transformations. This makes it computationally efficient and suitable for real-time applications.

The Daltonization method provides a baseline for recoloring but requires enhancements in flexibility, contrast optimization across CVD types, and personalization for varying severity levels.

2. '''Method 2: Optimizing Objective Functions''':
[[File:Method2.png|400px|thumb|right|Figure 11 Method 2 Results]]
* While this method aims to balance naturalness and contrast, the resulting recolored images are similar to the original ones. A possible reason for this is the sensitivity of the loss function to the beta parameter, which requires careful tuning.
* The recolored images exhibit some loss of fine details, likely due to the use of the k-means clustering algorithm, which simplifies color representation across the image.
* This algorithm has a very slow runtime, taking over one minute per image. The primary bottlenecks are the color clustering step and the optimization of the objective function, which can be improved significantly.
* Despite its limitations, this method introduces a flexible framework for customizing loss functions, enabling further improvements. This flexibility was leveraged to refine the method in subsequent methods.

3. '''Method 3: Adjustments Near Confusion Lines with Improved Method 2''':
[[File:Method3.png|400px|thumb|right|Figure 12 Method 3 Results]]
* This method produces recolored images with reasonable contrasts between confusing colors while preserving the naturalness of the image well. It can also account for varying severity levels for each CVD type, providing more personalized recoloring.
* The performance of the algorithm was improved significantly, reducing from over one minute to approximately 4 seconds per image.
* In the appendix, we included results with color plates, which commonly used for diagnosing color vision deficiencies, are included in the appendix. This method shows good results, with numbers becoming more easily visible in the CVD-simulated recolored images.
* Some limitations include the fact that this method sometimes lacks sufficient contrast, particularly for the deuteranopia type. It is also sensitive to parameters, such as the shift factor for colors near the confusion lines, which requires careful tuning.

4. '''Method 4: Improved with GMM-based Method''':
[[File:Method4.png|400px|thumb|right|Figure 13 Method 4 Results]]
* This method creates recolored images with very high contrast, making the colors in the images easily distinguishable, even for individuals with severe CVD.
* By using GMM-based clustering instead of k-means, this method preserves most of the image details. The more sophisticated clustering allows for a better representation of the original color distribution, reducing the loss of fine details.
* The runtime for this method is significantly faster than most others, taking only around 1 second per image. This makes it highly practical for real-time applications.
* While the method performs well in enhancing contrast, some recolored images lose the naturalness of the original images. Additionally, certain colors in the recolored images do not transition smoothly, which might be attributed to the clustering step in the process.

==== Quantitative Results ====
Below are some quantitative results from six metrics with the performance for each method:

{| class="wikitable"
|+ Table 1: Quantitative Evaluation Results for Mathematical Methods
! Original vs Recolored !! Method 1 !! Method 2 !! Method 3 !! Method 4
|-
| SSIM || 0.0066 || 0.9998 || 0.9988 || 0.9902
|-
| TCC || 0.4211 || 0.0001 || 0.0003 || 0.0005
|-
| CD ΔE76 || 57.4513 || 0.0217 || 0.0632 || 0.1057
|-
| CIEDE2000 || 41.2667 || 0.0229 || 0.0675 || 0.1312
|-
| CIEDE94 || 57.3637 || 0.0217 || 0.0630 || 0.1056
|-
| D-CIELAB ΔEab || 2.1314 || 3.8863 || 7.6867 || 8.0045
|-
| Time/image || 0.2s || 1m13s || 4.4s || 1.6s
|}

* SSIM: Measures structural similarity between images, combining luminance, contrast, and structure components. Computed using `torchmetrics.StructuralSimilarityIndexMeasure`.

* TCC: Evaluates changes in total color contrast, compares random pixel pairs in each image and calculates the difference in their color distances.

* D-CIELAB ΔEab [14]: Quantifies perceptual color differences for dichromats under specific CVD types.

* CD ΔE76, CIEDE2000, CIEDE94: Standard perceptual color difference metrics, computed with scikit-image package. ΔE76 is basic Euclidean distance in Lab space, while CIEDE2000 and CIEDE94 include perceptual corrections.

Ovearll, method 4 stands out as the best-performing approach, delivering high contrast, preserving image details through GMM-based clustering, and achieving the fastest runtime, while addressing many limitations of the earlier methods.

=== Deep Learning based methods ===
The results focus on evaluating the performance of the above neural network architectures—Conditional Parallel RGB MLP, Deep U-Net, and Conditional Autoencoder. Quantitive metrics such as Structural Similarity Index (SSIM), total color contrast (TCC), Chromatic Difference (CD), and inference time were used to assess the effectiveness of the models provided in [1] and [2].

==== Qualitative Results ====
The recolored outputs were visually evaluated to determine their alignment with expected results. The 'expected' results for supervised mean how closely they resemble ground truth recolored image and for unsupervised method mean how much contrast and naturalness is observed in the CVD simulated recolored images compared to original.
The results and takeaways can be summarized as follows:

1. '''Conditional Parallel RGB MLP''': (Figure 5)
[[File:Mlp_res.png|right|400px|thumb|Figure 5 Conditional MLP: Model failure]]
* Recoloring was inconsistent, with visible artifacts in regions where spatial correlations were essential.
* The pixels seemed more discretized, suggesting that disentanglement was not very useful for this case (especially naturalness).
* Failed to preserve natural color transitions, particularly in complex images.
2. '''Conditional U-Net''': (Figure 6, 7)
[[File:Unet_res1.png|right|400px|thumb|Figure 6 Conditional U-Net: Model failure]]
[[File:Unet_res2.png|right|400px|thumb|Figure 7 Conditional U-Net: CVD Simulated examples]]
* Produced stable recoloring, preserving structural details.
* Initially showed improvement towards resembling ground truth, but over time started 'reconstructing' the colors of the original image.
* The CVD simulations of recolored versus original were similar or worse meaning that the model was not doing well for this task
* Sometimes it over-saturated some colors, affecting the visual appeal.
3. '''Conditional Autoencoder''': (Figure 8, 9)
[[File:ae_res1.png|right|400px|thumb|Figure 8 Conditional Autoencoder: Majority good results]]
[[File:ae_res1.png|right|400px|thumb|Figure 9 Conditional Autoencoder: Marginal or negative improvement + Blurriness]]
* Achieved smooth and natural recoloring, with fewer artifacts.
* Showed the highest contrast improvement among the three models.
* In some cases, hurt the contrast in the CVD simulated colors and in some there was marginal improvement in contrast.
* Blurriness in the recolored images was seen (possibly due to naturalness factor being more prioritized even though weight coefficients in the loss term favored contrast (alpha = 0.25, beta = 1.0)).

==== Quantitative Results ====
Based on the above qualitative results, we decided to score and evaluate metrics for comparison with related work only using the Conditional Autoencoder.
As mentioned above, the evaluation metrics are adapted from [1] and [2]. Please refer to the definitions in the paper, as we have used the same. On a high level, the three components are:
* SSIM: Measures the structural similarity between the original and recolored images, ensuring the structural integrity of the recolored image is maintained.
<math>
SSIM(X, Y) = \frac{(2\mu_X\mu_Y + c_1)(2\sigma_{XY} + c_2)}{(\mu_X^2 + \mu_Y^2 + c_1)(\sigma_X^2 + \sigma_Y^2 + c_2)}
</math>

* Total Color Contrast: Quantifies the visibility improvement between indistinguishable colors for CVD individuals.
<math>
TCC = \frac{1}{n_1} \sum_{(i,j) \in \Omega_1} |x_i - x_j|
+ \frac{1}{N \cdot n_2} \sum_{i=1}^{N} \sum_{j \in \Omega_2} |x_i - x_j|
</math>
* Chromatic Difference: Quantifies the perceptual differences in color before and after recoloring, ensuring enhanced distinguishability
<math>
CD(i) = \sqrt{\lambda (l_i' - l_i)^2 + (a_i' - a_i)^2 + (b_i' - b_i)^2}
</math>
(lamda is a constant, not wavelength and l,a,b represent LAB space coordinates of recolored (') and original respectively.)
* Inference Time: Determines the computational efficiency of the models.

The key results are in Table 2 and takeaways for the Conditional Autoencoder can be summarized as follows:

{| class="wikitable" style="text-align:center; width:40%; margin:auto;"
|-
! Metric
! Value
|-
| Inference Time
| 2.6 seconds/image
|-
| SSIM ("Structure")
| 0.8707
|-
| Total Color Contrast ("Distinguishability")
| 0.5771 (vs. ~0.851)*
|-
| Chromatic Difference ("Color")
| 0.3521 (vs. ~0.963)*
|+ '''Table 2: Quantitative Evaluation Results'''
|}

Note: * indicates results from paper [2] for protan/deutan whichever is larger.

* TCC and CD are good but not as good as paper [2] because they use optimize networks for each CVD type separately.
* Blurry (SSIM is not optimized for enough)
* Mixing CVD types in the same network needs to be more sophisticated

== Conclusions ==
Through our (many) experiments, we learned a couple of things:

1. '''Model Effectiveness''':
Among the models, the Conditional Autoencoder showed the best balance between enhancing color contrast and preserving naturalness. It improved the distinguishability of colors for CVD individuals while maintaining a smooth, visually appealing output. However, it produced slightly blurry images, which could be improved with better loss functions or refinement techniques. The Conditional U-Net was also effective in preserving structure and providing stable recoloring, but it required careful training to avoid overfitting. The Conditional Parallel RGB MLP, while computationally fast, lacked the ability to capture spatial relationships between pixels, making it unsuitable for this task.

2. '''Importance of Loss Functions''':
Designing appropriate loss functions was crucial for achieving the right balance between naturalness, contrast enhancement, and structural preservation. The global and local contrast losses significantly improved the visibility of recolored images, while the naturalness loss ensured that the outputs did not look artificial. Incorporating metrics like SSIM and Chromatic Difference into the evaluation also helped us better understand how well the models performed.

3. '''Challenges with Data''':
One of the biggest challenges was ensuring that the dataset effectively represented real-world scenarios for CVD individuals. Simulating CVD perceptions and generating recolored images that matched those perceptions required a well-defined pipeline. A more diverse dataset or additional user studies with CVD participants could help fine-tune the models further.

4. '''Computational Efficiency''':
While models like the Conditional Autoencoder and Conditional U-Net provided high-quality recoloring, their inference times were moderate, making them feasible for real-time applications. Optimizing these models further could make them more scalable for real-world use cases, such as accessibility tools in apps or websites.

5. '''What Worked and What Didn’t''':
* Worked: Contrast enhancement methods using local and global losses were effective in improving visibility for CVD individuals. Transformer-inspired loss functions borrowed from Swin architecture added robustness.
* Didn’t Work: Pixel-wise methods like the Conditional RGB MLP struggled due to their inability to handle spatial dependencies. Additionally, overfitting was a recurring issue in larger architectures without careful training.

6. '''Future Directions''':
* Better Loss Functions: Refining the loss functions to address issues like blurriness in outputs could further improve results.
* User Studies: Testing the models with real CVD participants would provide valuable insights and help validate the results.
* Model Optimization: Reducing the computational cost of high-performing models like the Conditional Autoencoder could make them more practical for deployment.
* Exploration of New Architectures: Trying newer methods, such as lightweight transformers or diffusion-based models, might enhance recoloring performance while maintaining efficiency.

While there’s still room for improvement, our models demonstrated the potential of deep learning in addressing the challenges faced by individuals with CVD. Our future work would focus on refining these methods and bringing them closer to practical, everyday applications.

== References ==
[1] Li, H., Zhang, L., Zhang, X., Zhang, M., Zhu, G., Shen, P., ... & Shah, S. A. A. (2020). Color vision deficiency datasets & recoloring evaluation using GANs. Multimedia Tools and Applications, 79, 27583-27614.

[2] Chen, L., Zhu, Z., Huang, W., Go, K., Chen, X., & Mao, X. (2024). Image recoloring for color vision deficiency compensation using Swin transformer. Neural Computing and Applications, 36(11), 6051-6066.

[3] Jiang, S., Liu, D., Li, D., & Xu, C. (2023). Personalized image generation for color vision deficiency population. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22571-22580).

[4] Huang, J.-B., Chen, C.-S., Jen, T.-C., & Wang, S.-J. (n.d.). Image recolorization for the colorblind [GitHub repository]. Retrieved December 12, 2024, from https://github.com/jbhuang0604/RecolorForColorblind

[5] Dietrich, J. (n.d.). Daltonize Python Package [GitHub repository]. Retrieved December 12, 2024, from https://github.com/joergdietrich/daltonize/blob/main/daltonize/daltonize.py

[6] Dougherty, B., & Wade, A. (2000). Vischeck. Retrieved December 12, 2024, from https://www.vischeck.com/

[7] Brettel, H., Viénot, F., & Mollon, J. D. (1997). Computerized simulation of color appearance for dichromats. Josa a, 14(10), 2647-2655.

[8] Zhu, Z., Toyoura, M., Go, K., Fujishiro, I., Kashiwagi, K., & Mao, X. (2019). Processing images for red–green dichromats compensation via naturalness and information-preservation considered recoloring. The Visual Computer, 35, 1053-1066.

[9] Zhu, Z., Toyoura, M., Go, K., Kashiwagi, K., Fujishiro, I., Wong, T. T., & Mao, X. (2021). Personalized image recoloring for color vision deficiency compensation. IEEE Transactions on Multimedia, 24, 1721-1734.

[10] Tsekouras, G. E., Rigos, A., Chatzistamatis, S., Tsimikas, J., Kotis, K., Caridakis, G., & Anagnostopoulos, C. N. (2021). A novel approach to image recoloring for color vision deficiency. Sensors, 21(8), 2740.

[11] Huang, J. B., Chen, C. S., Jen, T. C., & Wang, S. J. (2009, April). Image recolorization for the colorblind. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 1161-1164). IEEE.

[12] Color-Blindness.com. (n.d.). COBLIS - Color Blindness Simulator. Retrieved December 13, 2024, from https://www.color-blindness.com/coblis-color-blindness-simulator/

[13] DaltonLens. (n.d.). DaltonLens-Python [Computer software]. GitHub. Retrieved December 13, 2024, from https://github.com/DaltonLens/DaltonLens-Python

== Appendix I ==
* [https://github.com/rainasong/psych221-aut24-final-project.git Code]
* [https://drive.google.com/drive/folders/10WMXPbtpV7Hy5_qBA_TCEbW-kCpj1D7v Dataset]

=== Additional results ===
1. '''Recolored Images - Conditional Autoencoder'''
<div style="text-align: center;">
<div style="display: inline-block; vertical-align: middle;">
[[File:eb_1.png|400px|Wikipedia encyclopedia]]
</div>
<div style="display: inline-block; vertical-align: middle;">
[[File:eb_2.png|400px]]
</div>
</div>

2. '''Loss curves'''
<div style="text-align: center;">
<div style="display: inline-block; vertical-align: middle;">
[[File:loss_ae.png|350px|thumb|Conditional Autoencoder]]
</div>
<div style="display: inline-block; vertical-align: middle;">
[[File:loss_unet.png|350px|thumb|Conditional U-Net]]
</div>
<div style="display: inline-block; vertical-align: middle;">
[[File:loss_mlp.png|350px|thumb|Conditional MLP]]
</div>
<div style="clear: both; text-align: center;">
Losses: Conditional Autoencoder, Conditional U-Net, and Conditional MLP
</div>
</div>

3. '''Mathematical method results with color plates'''

<div style="text-align: center;">
<div style="display: inline-block; vertical-align: middle;">
[[File:Method1-color-plates.png|400px|thumb|Method 1 Color Plates Results]]
</div>
<div style="display: inline-block; vertical-align: middle;">
[[File:Method2-color-plates.png|400px|thumb|Method 2 Color Plates Results]]
</div>
</div>

<div style="text-align: center;">
<gallery mode="nolines" widths="400px" heights="300px" caption="Method 3 Color Plates Results for Protanopia, Deuteranopia, and Tritanopia with Severity Levels">
File:Method3-protan.png|Protanopia
File:Method3-deutan.png|Deuteranopia
File:Method3-tritan.png|Tritanopia
</gallery>
</div>

<div style="text-align: center;">
<gallery mode="nolines" widths="400px" heights="300px" caption="Method 4 Color Plates Results for Protanopia, Deuteranopia, and Tritanopia with Severity Levels">
File:Method4-protan.png|Protanopia
File:Method4-deutan.png|Deuteranopia
File:Method4-tritan.png|Tritanopia
</gallery>
</div>

== Appendix II ==
'''Ishikaa''':
* Training, evaluation and visualization for all deep learning methods (MLP, U-Net and Autoencoder)
* GMM recoloring method in Python & adding severity index
* 'Ground Truth' dataset creation and logging
* AWS Compute setup & configuration
* Written Report & Presentation

'''Raina''':
* Researching, writing and running scripts for four (and more) mathematical-based methods (Daltonization, Optimization-based, Confusion lines based, GMM based and some other experiments such as a segmentation-based method which was discarded due to slow performance)
* Results generation and validation for all scripts written
* Evaluation metrics scripts for mathematical methods
* Written Report & Presentation

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-13T13:26:34Z

Rainas: /* Methods */

== Introduction ==
Color Vision Deficiency (CVD) affects approximately 350 million individuals worldwide, impairing their ability to distinguish certain colors. Image recoloring for individuals with CVDs has been a well-researched area, with numerous attempts aimed at creating images that make colors more distinguishable for those with CVDs. CVDs are typically classified into three main types: protanopia (difficulty perceiving red), deuteranopia (difficulty perceiving green), and tritanopia (difficulty perceiving blue). These conditions arise due to the absence or malfunction of certain types of cone cells in the retina, which are responsible for color vision. For instance, the lack or defect of red or green cones leads to protanopia or deuteranopia, respectively. This can result in partial or complete loss of perception of specific colors. Moreover, individuals with CVD can be dichromats (completely missing one type of cone cell), or anomalous trichromats (having all three types of cones but with altered sensitivity), causing milder color perception issues. Most research has focused on protanopia and deuteranopia, as these conditions are more prevalent, and only a few consider different severity levels.

In this work, we aim to consider all three types of CVDs, taking into account varying levels of severity for personalization. We explore existing methods in the field and experiment through two main approaches: mathematical transformations and deep learning techniques. We will start by reviewing current advancements in these two domains, followed by presenting our experiments and results. Evaluations of each method will be provided, leading to a discussion of our findings and outlining potential directions for future work.

The motivation for this work arose from a personal experience. While admiring a rainbow, a phenomenon I find among the most beautiful in the world, with a friend who has deuteranopia, I realized that they were unable to distinguish the vibrant array of colors. This experience highlighted the emotional and perceptual gap caused by CVD, inspiring the goal of this project: to develop personalized and efficient tools that enhance color perception for individuals with CVDs. Ultimately, we aim to enable those with CVDs to enjoy the same vivid experiences, such as the beauty of a rainbow, experienced by those with normal color vision.

== Background ==
In recent years, numerous methods have been developed to recolor images for individuals with CVDs, ranging from traditional mathematical approaches to advanced deep learning techniques. This section focuses on the prominent recent works in these two categories.

=== Mathematical-based methods ===
Mathematical approaches to image recoloring for individuals with CVDs have been extensively developed to enhance color discrimination while trying to preserve the natural appearance of images. These methods typically involve color space transformations, optimization techniques, and perceptual modeling to achieve their objectives.

==== Daltonization ====
Daltonization enhances images for individuals with CVD by correcting colors based on the simulated deficiency. The process involves comparing the original LMS values with the simulated deficient values to compute the error:
<math display="block">
\text{Error}_{\text{LMS}} = \text{LMS}_{\text{original}} - \text{LMS}_{\text{simulated}}
</math>

The error is then mapped back to the RGB space using a correction matrix because the error contains the information that dichromats cannot see, and the correction matrix rotates it to a part of the spectrum that they can see. For example, the correction matrix, as implemented in tools like Daltonize [5] and Vischeck [6], is:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

The corrected RGB values are added back to the original LMS values to generate a daltonized image that improves contrast for CVD viewers.

==== Optimization-based Method ====
Zhu et al. [8] introduced an optimization-based recoloring framework for red-green dichromacy, aiming to balance naturalness and contrast. The framework minimizes a total loss function defined as:

<math display="block"> E = \beta E_{\text{nat}} + E_{\text{cont}} </math>

where <math>\beta</math> is a scalar weight that controls the trade-off between the two objectives: naturalness preservation (<math>E_{\text{nat}}</math>) and contrast enhancement (<math>E_{\text{cont}}</math>).

The naturalness term, <math>E_{\text{nat}}</math>, ensures that the recolored image closely resembles the original image for CVD viewers by minimizing perceptual differences:

<math display="block"> E_{\text{nat}} = \sum_{i=1}^N \| c_i^+ - c_i \|^2, </math>

where:
* <math>N</math> is the total number of pixels in the image,
* <math>c_i</math> is the original color of the <math>i</math>-th pixel,
* <math>c_i^+</math> is the recolored value of the <math>i</math>-th pixel,
* <math>\| c_i^+ - c_i \|</math> is the Euclidean distance, measuring the perceptual difference between the original and recolored colors.

The contrast term, <math>E_{\text{cont}}</math>, enhances the distinguishability of colors in the recolored image by minimizing changes in color contrast:

<math display="block"> E_{\text{cont}} = \sum_{i \neq j} \| (c_i^+ - c_j^+) - (c_i - c_j) \|^2, </math>

where:
* <math>(c_i^+ - c_j^+)</math> is the perceived color difference between pixels <math>i</math> and <math>j</math> after recoloring,
* <math>(c_i - c_j)</math> is the original color difference,
* <math>\| (c_i^+ - c_j^+) - (c_i - c_j) \|</math> represents the deviation in color contrast before and after recoloring.

To address the limitations of this approach, Zhu et al. [9] proposed a degree-adaptable framework incorporating a transformation matrix <math>T</math> that simulates CVD perception. The transformation matrix is defined as:

<math display="block"> T = \begin{bmatrix} t_{11} & t_{12} & t_{13} \\ t_{21} & t_{22} & t_{23} \\ t_{31} & t_{32} & t_{33} \end{bmatrix}, </math>

where <math>t_{ij}</math> are the elements representing the relationships between the original and perceived LMS (Long, Medium, Short wavelength) cone responses for individuals with CVD.

The degree-adaptable loss function extends the optimization by adjusting weights based on perceptual importance, defined as:

<math display="block"> E = \beta \sum_{i=1}^N \alpha_i \| T(c_i^+ - c_i) \|^2 + \sum_{i \neq j} \| T(c_i^+ - c_j^+) - T(c_i - c_j) \|^2. </math>

Here:
* <math>\alpha_i</math> assigns weights to each pixel, prioritizing the preservation of colors with smaller perception errors,
* <math>\| T(c_i^+ - c_i) \|</math> measures the perceptual difference after recoloring,
* <math>\| T(c_i^+ - c_j^+) - T(c_i - c_j) \|</math> quantifies the deviation in color contrast under CVD simulation.

This framework improves both contrast and personalization but requires further optimization for real-time performance.

==== Confusion lines based Method ====
Tsekouras et al. [10] proposed a novel image recoloring approach for individuals with protanopia and deuteranopia, focusing on improving color naturalness and enhancing contrast. Their framework consists of four modules, with a key focus on shifting confusing colors along confusion lines in the CIE 1931 chromaticity diagram.

The process begins with fuzzy clustering, which identifies representative colors (key colors) from the input image. These key colors are then analyzed on the chromaticity diagram, where confusion lines—paths representing colors indistinguishable by individuals with CVD—serve as the basis for recoloring. Confusion lines are defined using the copunctal point of the missing cone type and another reference point:

<math display="block">
d(v, L) = \frac{\left|(x_{cp} - x_0)(y_0 - y_v) - (x_0 - x_v)(y_{cp} - y_0)\right|}{\sqrt{(x_{cp} - x_0)^2 + (y_{cp} - y_0)^2}},
</math>

where:
* <math display="inline">v = (x_v, y_v)</math> is the chromaticity coordinate of the color,
* <math display="inline">L</math> is the confusion line passing through the copunctal point <math display="inline">(x_{cp}, y_{cp})</math> and another reference point <math display="inline">(x_0, y_0)</math>,
* <math display="inline">d(v, L)</math> measures the perpendicular distance from the point <math display="inline">v</math> to the confusion line <math display="inline">L</math>.

Confusing colors, identified as key colors lying on occupied confusion lines, are iteratively shifted to the nearest non-occupied confusion lines to enhance discriminability for CVD viewers. High-ranking colors, determined by their prominence in image clusters, are shifted to the nearest unoccupied confusion lines. This reallocation ensures that these colors are distinguishable to viewers with CVD while minimizing disruption to the image's overall color harmony.

After shifting, the luminance of the recolored key colors is optimized using a regularized objective function to balance naturalness and contrast:
<math display="block">
E = (E_1 + E_2) + \lambda E_3,
</math>

where:
* <math display="inline">E</math> is the total loss,
* <math display="inline">\lambda</math> is a weight parameter controlling the trade-off between contrast enhancement and naturalness preservation.

The first term, <math display="inline">E_1</math>, measures contrast enhancement for normal trichromats:

<math display="block">
E_1 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - b_j\| - \|f_D(a_{i,\text{rec}}) - f_D(b_j)\| \right|,
</math>

where:
* <math display="inline">n_A</math> and <math display="inline">n_B</math> are the number of key colors in clusters <math display="inline">A</math> and <math display="inline">B</math>, respectively,
* <math display="inline">a_i</math> is the chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">b_j</math> is the chromaticity of the <math display="inline">j</math>-th key color in cluster <math display="inline">B</math>,
* <math display="inline">f_D</math> is a function simulating the dichromatic vision of individuals with color vision deficiencies,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color.

The second term, <math display="inline">E_2</math>, measures contrast enhancement for dichromats:

<math display="block">
E_2 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - a_j\| - \|f_D(a_{i,\text{rec}}) - f_D(a_{j,\text{rec}})\| \right|,
</math>

where:
* <math display="inline">a_i</math> and <math display="inline">a_j</math> are the chromaticities of the <math display="inline">i</math>-th and <math display="inline">j</math>-th key colors in cluster <math display="inline">A</math>,
* <math display="inline">f_D(a_{i,\text{rec}})</math> simulates the dichromatic perception of the recolored chromaticity <math display="inline">a_{i,\text{rec}}</math>.

The third term, <math display="inline">E_3</math>, preserves the naturalness of the recolored image:

<math display="block">
E_3 = \frac{1}{n_A} \sum_{i=1}^{n_A} \|a_i - a_{i,\text{rec}}\|,
</math>

where:
* <math display="inline">a_i</math> is the original chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">\|a_i - a_{i,\text{rec}}\|</math> is the Euclidean distance between the original and recolored chromaticities, measuring how much the naturalness is preserved.

This method significantly enhances the contrast and naturalness of recolored images by leveraging confusion line geometry and regularized optimization. However, challenges remain in achieving real-time performance and handling cases where shifting may distort the aesthetic quality of the image.

==== GMM-based Method ====
Huang et al. [11] proposed an efficient and effective re-coloring algorithm for individuals with CVD using a Gaussian Mixture Model (GMM) to represent color distributions. The algorithm comprises four main steps: feature extraction, clustering using GMM, optimization of Gaussian components, and interpolation for recoloring.

Step 1 - Feature Extraction:
Each pixel in the input image is represented in the CIEL*a*b* color space, which approximates perceptual differences using the Euclidean distance between colors. The color feature vector <math display="inline">x</math> is used as input for clustering.

Step 2 - Clustering via GMM:
The color distribution of the image is modeled using a GMM with <math display="inline">K</math> Gaussian components:
<math display="block">
p(x|\Theta) = \sum_{i=1}^K \omega_i G_i(x|\theta_i),
</math>
where:
* <math display="inline">\Theta</math> is the parameter set containing all weights, means, and covariance matrices,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian,
* <math display="inline">G_i(x|\theta_i)</math> is the 3D normal distribution with parameters <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix).

Step 3 - Optimization:
To ensure color distinguishability for CVD viewers, the algorithm adjusts the mean vector of each Gaussian component using an optimization function that preserves the symmetric Kullback-Leibler (KL) divergence:
<math display="block">
D_{sKL}(G_i, G_j) = D_{KL}(G_i \| G_j) + D_{KL}(G_j \| G_i),
</math>
where:
* <math display="inline">D_{KL}(G_i \| G_j)</math> measures the dissimilarity between two Gaussian distributions <math display="inline">G_i</math> and <math display="inline">G_j</math>.

Step 4 - Interpolation for Recoloring:
After optimizing the Gaussians, the mapping function <math display="inline">M_i(\cdot)</math> relocates the mean vectors while maintaining covariance matrices. Interpolation ensures smooth transitions between recolored regions:
<math display="block">
T(x_j)_H = x_j^H + \sum_{i=1}^K p(i|x_j, \Theta) (M_i(\mu_i)_H - \mu_i^H),
</math>
where:
* <math display="inline">T(x_j)_H</math> is the hue adjustment for the <math display="inline">j</math>-th color,
* <math display="inline">M_i(\mu_i)_H</math> is the mapped hue of the <math display="inline">i</math>-th Gaussian's mean.

While the GMM-based approach effectively models color distributions and enhances the contrast of recolored images significantly, it has limitations:
* The accuracy of recoloring depends on the choice of <math display="inline">K</math>, which may vary for different images.
* The method assumes diagonal covariance matrices for computational efficiency, which may oversimplify real-world color distributions. Sometimes the colors in the recolored images are not very natural.
* The high computational complexity in the optimization step of this algorithm may be difficult for real-time applications.

=== Deep Learning based methods ===
Conventional methods for recoloring, including optimization-based approaches (as discussed above), fail to generalize well across varying severity levels and CVD types. While these methods improve color differentiation, they frequently compromise naturalness or require extensive computational resources, making them less suitable for real-time, efficient, personalized applications.

==== GAN-Based Recoloring for CVD ====

In [1] GANs (Generative Adversarial Networks) was explored for recoloring, with a backbone Pix2Pix-GAN, Cycle-GAN, and Bicycle-GAN structure showing promising results. These models are generate creative recolored images by learning mappings between normal and CVD-affected color spaces. However, this and existing GAN approaches struggle with balancing naturalness and contrast. This specific reference also requires paired datasets (since it is adapted from style transfer), making it computationally intensive and less suitable for personalization.

==== Swin Transformer Recoloring ====

The authors in [2] introduced a hierarchical vision transformer (SWIN) architecture that processes images through shifted windows, effectively capturing both local and global contextual information. In computer vision, this design generally allows efficient handling of high-resolution images and has been applied to various tasks, including image classification and object detection. Despite its robust performance, this architecture is still computationally intensive and does not inherently account for the specific needs of CVD individuals, as it lacks mechanisms for personalized color adjustments.

==== Personalized CVD-GAN ====

To cater to the diverse needs of the CVD population, the Personalized CVD-GAN [3] was developed. This model generates images that are not only CVD-friendly but also tailored to individual degrees of color vision deficiency. By disentangling color representations using a unique triple-latent structure in their method, continuous personalization was possible to adjust images according to specific CVD severities. While effective, this approach is computationally demanding, making it less practical for real-time applications. In our experiment, it took around 18 days for one epoch (or one iteration over the entire dataset).

Thus, existing methods either lack personalization or are too resource-intensive for widespread use.

== Methods ==
We aim to find effective and efficient ways to recolor images for people with CVD with the personalization of different severity levels. We start by exploring existing methods and identifying opportunities for improvement. Since mathematical-based approaches provide a solid foundation and are well-documented, we began our experiments by testing these methods, as described in the background. We later extended our exploration to deep learning based methods.

=== Mathematical based ===
We explored four main methods, building on the foundational work discussed in the background section.

==== Method 1: Daltonization as a Baseline ====
We started with the relatively intuitive Daltonization method, where we adjusted the colors in an image to compensate for color vision deficiencies by simulating how the colors appear to individuals with CVD. This involves computing the difference between the original and simulated color perception in the LMS (Long, Medium, Short wavelength) color space. The calculated error is then corrected and mapped back to the RGB space using a transformation matrix, resulting in a recolored image that enhances color differentiation for viewers with CVD.

The simulation of CVDs relies on the physiology of human vision, particularly the responses of the Long (L), Medium (M), and Short (S) wavelength-sensitive cones in the retina. The LMS color space is derived from the spectral sensitivities of these cones, making it an ideal framework for modeling human color perception.

To simulate CVD, we first transformed colors in RGB color space into the LMS color space using the following linear transformation matrix based on Stockman and Sharpe’s cone fundamentals:
<math display="block">
T_{\text{RGB-to-LMS}} = \begin{bmatrix}
0.3904725 & 0.54990437 & 0.00890159 \\
0.07092586 & 0.96310739 & 0.00135809 \\
0.02314268 & 0.12801221 & 0.93605194
\end{bmatrix}
</math>

For individuals with CVD, the missing cone’s response is replaced by a weighted combination of the remaining two cones. This approach, introduced by Brettel, Viénot, and Mollon (1997) [7], uses specific coefficients derived from cone sensitivities. For example, in protanopia (L-cone deficiency), the L-cone response is approximated using the M- and S-cone responses as:
<math display="block">
L_{\text{simulated}} = 0 \cdot L + 0.90822864 \cdot M + 0.008192 \cdot S
</math>

For deuteranopia (M-cone deficiency), the M-cone is replaced as:
<math display="block">
M_{\text{simulated}} = 1.10104433 \cdot L + 0 \cdot M - 0.00901975 \cdot S
</math>

For tritanopia (S-cone deficiency), the S-cone is replaced as:
<math display="block">
S_{\text{simulated}} = -0.15773032 \cdot L + 1.19465634 \cdot M + 0 \cdot S
</math>

These transformations allow accurate simulation of the perceptual experience of individuals with CVD. (The numbers are derived from [5]).

The error between the original and simulated is then mapped into the RGB color space using a deficiency-specific correction matrix, which adjusts the image to enhance contrast and recover lost color differences. The predefined correction matrix is applied to the error in RGB space, transforming it back into LMS space for final adjustments. The corrected LMS values are added back to the original values, producing a recolored image that improves visual accessibility for viewers with CVD. This approach uses the Daltonize-inspired correction matrix:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

==== Method 2: Optimizing Objective Function ====
To improve the results from the Daltonization method, we designed a framework inspired by methods discussed in the background, incorporating dominant color extraction, optimization-based recoloring, and edit propagation. This approach aims to find a balance between the naturalness and contrast while compensating colors that are not visible for corresponding CVD types.

===== 1. Extraction of Dominant Colors =====
We begin by extracting the dominant colors from the input image using fuzzy clustering via a K-means algorithm. This step identifies a reduced set of representative colors that capture the primary color information in the image:
<math display="block">
\mathbf{C} = \{\mathbf{c}_1, \mathbf{c}_2, \ldots, \mathbf{c}_N\},
</math>
where <math display="inline">N</math> represents the number of clusters, and <math display="inline">\mathbf{c}_i</math> represents the centroid of the <math display="inline">i</math>-th cluster.

===== 2. Optimization-Based Recoloring =====
Once the dominant colors are extracted, we apply an optimization process to adjust these colors. The optimization uses the formulas mentioned in [9], and aims to balance two key objectives:

1. Naturalness Preservation: Ensures the recolored image minimally deviates from the original.
<math display="block">
E_{\text{nat}} = \sum_{i=1}^N \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_i^{\text{original}}) \|^2,
</math>
where <math display="inline">\mathbf{T}</math> is the transformation matrix based on the severity and type of CVD, and <math display="inline">\mathbf{c}_i^{\text{original}}</math> is the original color.

2. Contrast Enhancement: Improves the differentiation of colors for individuals with CVD:
<math display="block">
E_{\text{cont}} = \sum_{i=1}^N \sum_{j>i} \left( \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_j) \|^2 - \| \mathbf{c}_i^{\text{original}} - \mathbf{c}_j^{\text{original}} \|^2 \right)^2.
</math>

The total objective function combines these two terms:
<math display="block">
E = \beta E_{\text{nat}} + E_{\text{cont}},
</math>
where <math display="inline">\beta</math> controls the trade-off between naturalness and contrast.

Optimization is performed using the L-BFGS-B algorithm to ensure efficient convergence under bounded constraints.

The transformation matrices for each type of CVD are the following, which are based on [12]:

<div style="text-align:center;">
<math display="inline">
T_{\text{Protanopia}} = \begin{bmatrix} 0.566 & 0.558 & 0 \\ 0.433 & 0.442 & 0.242 \\ 0 & 0 & 0.758 \end{bmatrix},
</math>
<math display="inline">
T_{\text{Deuteranopia}} = \begin{bmatrix} 0.625 & 0.7 & 0 \\ 0.375 & 0.3 & 0.3 \\ 0 & 0 & 0.7 \end{bmatrix},
</math>
<math display="inline">
T_{\text{Tritanopia}} = \begin{bmatrix} 0.95 & 0 & 0 \\ 0.05 & 0.433 & 0 \\ 0 & 0.567 & 1 \end{bmatrix}.
</math>
</div>

===== 3. Edit Propagation =====
After optimizing the dominant colors, we propagate these edits across the entire image to ensure smooth transitions. This propagation step leverages the CIE-Lab color space, which is perceptually uniform, meaning that the Euclidean distance in this space correlates well with human color perception. The process begins by mapping the original image and the optimized dominant colors into the Lab color space. In this space, the differences between the original and recolored dominant colors are computed to capture the adjustments made during the optimization step:
<math display="block">
\Delta L^* = \text{griddata}(\mathbf{c}^{\text{original}}, \mathbf{c}^{\text{recolored}} - \mathbf{c}^{\text{original}}, \mathbf{I}),
</math>
where <math display="inline">\mathbf{I}</math> represents the pixel values in the Lab color space. Once the interpolated changes are computed, they are applied to the Lab representation of the original image. Finally, the adjusted Lab values are converted back to the RGB color space to reconstruct the recolored image.

==== Method 3: Improved with Confusion Line Adjustments ====
This method builds upon the previous method by introducing enhancements in dominant color extraction, optimization, and edit propagation, while incorporating an additional step to adjust colors near confusion lines in the CIE 1931 xyY color space inspired by [10]. These improvements aim to further enhance contrast and naturalness of the recolored images. Moreover, this method adds flexibility in adjusting for different severity levels for each CVD type.

===== 1. Improvements on Method 2 =====
To improve the performance of dominant color extraction, we transitioned from traditional K-means to MiniBatch K-means. This algorithm processes data in small batches, significantly reducing computational time while maintaining accuracy in clustering. The number of dominant colors was also reduced from 50 to 30 to focus on key representative colors and further enhance efficiency. The optimization objective is refined to leverage vectorization, improving computational efficiency. The two key terms remain:
<math display="block">
E = \beta E_{\text{nat}} + (1 - \beta) E_{\text{cont}}.
</math>
The optimization objective was refined to significantly improve computational efficiency by replacing the nested loops in the contrast enhancement term with vectorized operations. In the original implementation, the pairwise differences between colors were calculated iteratively using <math display="inline">O(N^2)</math> nested loops. The improved version eliminates this overhead by leveraging array broadcasting to compute all pairwise differences simultaneously, and the transformation matrix <math display="inline">\mathbf{T}</math> is then applied to all pairwise differences in a single tensor operation:
<math display="block">
\mathbf{T}_{\Delta} = \text{tensordot}(\Delta_{ij}, \mathbf{T}),
</math>
and the norms are computed in parallel across the entire array. Additionally, the weighting parameter <math display="inline">\beta</math> was adjusted to favor naturalness preservation, ensuring better visual integrity in the recolored image.
The propagation step changed to use a k-d tree for fast nearest neighbor searches, replacing grid-based interpolation. This approach more efficiently matches each pixel in the Lab color space to the closest dominant color:
<math display="block">
\mathbf{I}_{\text{adjusted}} = \mathbf{C}_{\text{recolored}}[\text{k-d tree query}(\mathbf{I})],
</math>
where <math display="inline">\mathbf{I}</math> represents the pixel values in Lab space.
These refinements enable faster optimization while improving the balance between naturalness and contrast enhancement.

===== 2. Confusion Line Adjustments =====
An additional step adjusts colors near confusion lines in the CIE 1931 xyY color space to enhance distinguishability:

1. Confusion lines are defined for protanopia, deuteranopia, and tritanopia, based on [10]. For example, for protanopia:
<math display="block">
\text{Confusion Line: Start} = (0.735, 0.265), \quad \text{End} = (0.115, 0.885).
</math>

2. Colors near the confusion line are identified using orthogonal distance:
<math display="block">
d(\mathbf{xy}, L) = \frac{\| (\mathbf{xy} - \mathbf{p}_1) \times (\mathbf{p}_2 - \mathbf{p}_1) \|}{\|\mathbf{p}_2 - \mathbf{p}_1\|},
</math>
where <math display="inline">\mathbf{p}_1</math> and <math display="inline">\mathbf{p}_2</math> are the start and end points of the confusion line.

3. Identified colors are shifted orthogonally away from the line:
<math display="block">
\mathbf{xy}_{\text{adjusted}} = \mathbf{xy} + \lambda \mathbf{v}_{\perp},
</math>
where <math display="inline">\mathbf{v}_{\perp}</math> is a perpendicular vector, and <math display="inline">\lambda</math> is a scaling factor.

===== 3. Personalise with Severity Levels =====
To take into account of severity levels, the transformation matrix <math display="inline">\mathbf{T}</math> linearly interpolates between normal vision and full CVD perception based on severity and type:
<math display="block">
\mathbf{T} = (1 - s) \mathbf{I} + s \mathbf{T}_{\text{CVD}},
</math>
where <math display="inline">s</math> represents the severity of CVD (0-100%), <math display="inline">\mathbf{I}</math> is the identity matrix, and <math display="inline">\mathbf{T}_{\text{CVD}}</math> is the full transformation matrix specific to protanopia, deuteranopia, or tritanopia. Such a method is based on DaltonLens simulator [13].

These improvements significantly enhanced both the effectiveness and efficiency of the recoloring process on top of Method 2.

==== Method 4: Improved with GMM-based Method ====
The last mathematical method we exprimented enhances recoloring by integrating a Gaussian Mixture Model (GMM)-based global recoloring algorithm. The method also applies nonlinear adjustments for colors near confusion lines to ensure improved contrast and naturalness.

===== 1. GMM-Based Global Recoloring =====
The image is first resized and transformed into the Lab color space. A GMM is applied to cluster the color distribution into <math display="inline">K</math> components, optimizing the number of clusters using the Bayesian Information Criterion (BIC):
<math display="block">
\text{BIC} = -2 \cdot \text{log-likelihood} + P \cdot \log(N),
</math>
where <math display="inline">P</math> represents the model parameters and <math display="inline">N</math> is the number of pixels.

The GMM means are simulated using the transformation matrix <math display="inline">T</math> with severity levels taken into account, and the symmetric Kullback-Leibler (KL) divergence (<math display="inline">D_{\text{sKL}}</math>) is calculated between pairs of clusters:
<math display="block">
D_{\text{sKL}}(i, j) = D_{\text{KL}}(G_i \| G_j) + D_{\text{KL}}(G_j \| G_i),
</math>
where <math display="inline">G_i</math> and <math display="inline">G_j</math> are Gaussian components, and <math display="inline">D_{\text{KL}}</math> represents the KL divergence. The GMM cluster means are then adjusted by solving a nonlinear least-squares problem to minimize the discrepancy.

===== 2. Adjusting Near Confusion Lines Improved =====
Following global recoloring, colors near confusion lines in the CIE 1931 xyY color space are further adjusted based on formulas used in Method 3. Nonlinear scaling is applied to amplify the shifts for pixels closer to the line:
<math display="block">
w = \left( \frac{\text{threshold} - d}{\text{threshold}} \right)^2,
</math>
where <math display="inline">w</math> is the scaling factor.

The adjustments from the GMM and confusion line steps are combined to produce the final recolored image. These enhancements make the method more robust and effective for individuals with varying levels of CVD.

Through our experimentation with mathematical methods, we gained a deeper understanding of the algorithmic aspects of image recoloring for CVD, particularly in balancing naturalness, contrast, and personalization. Building on these insights, we transitioned to exploring deep learning approaches, applying the lessons learned to guide training, evaluation, and ground truth dataset generation.

=== Deep Learning based ===

==== Task Overview ====
Given an input RGB image and a label for the user (as shown in the figure), we want a deep learning model to output a recolored RGB image that is specific to that user. More details on inputs and outputs are discussed in further sections but an overview is shown in Figure 1. All of the code was done in Python using a deep learning framework called [https://pytorch.org PyTorch]
[[File:Io.png|right|thumb|200px|Figure 1: Dataset]]

==== Types ====
1. ''' Supervised methods ''':
These are deep learning models that require a 'ground truth' recolored image for the neural network to learn recolorization. While these methods are simple, easy to train and integrate the user label, they require an already present ground truth comparison of expected output.

2. ''' Unsupervised methods ''':
These models are trained without a ground truth and can also encode user label information while training. They are generally better at generating more natural images, but they require more compute and sophisticated model architectures or loss functions for the recoloring task

==== Dataset ====
The dataset used for this project was constructed specifically to address the challenges of recoloring images for individuals with color vision deficiency (CVD). We first gathered an open-source RGB image dataset from [2]. To improve the capability of the proposed model to enhance the contrast between CVD-indistinguishable color
pairs, in their study, they created a new dataset consisting of 141,000 pictures of both natural scenes and artificial images containing
CVD-confusing colors without labels. To generate labels (and ground truth recolored images for supervised methods), we randomly sampled 15,000 images and recolored by simulating random labels for severity and type of CVD. The recoloring for ground truth images was done using a [https://github.com/jbhuang0604/RecolorForColorblind/tree/master MATLAB script] (adapted to Python) from [4]. Note: The open-source tools used in the Python version for the recoloring script were [https://scikit-image.org Scikit-Image], [https://scipy.org Scipy] and [https://python-colormath.readthedocs.io/en/latest/ Colormath].

As shown in Figure 1, each sample in the dataset consists of:

1. ''' Original RGB Image''' : High-resolution images, resized to <code> 256x256</code> pixels and normalized to <code>[0,1]</code> range, representing the standard color space.

2. ''' CVD Labels ''' : Condition labels encoded as <code>severity * [protan, deutan]</code>, where severity ranges from 0.1 to 1.0. For example, a label <code>[0.6, 0]</code> corresponds to protanopia at 60% severity.

Data augmentation techniques such as random rotations, crops, and brightness adjustments were applied to expand the dataset, ensuring robust model generalization across diverse scenarios.

==== Supervised Methods ====
===== Conditional Parallel RGB MLP =====
[[File:mlp.png|right|thumb|Figure 2: Conditional MLP architecture]]
As shown in Figure 2, the model predicts the R, G, and B channels separately using an independent multi-layer perceptron (MLP) for each channel. The input image is concatenated with the label encoding along the channel dimension and is passed to 3 parallel MLPs simultaneously. These parallel networks are learned to predicted R, G, B channels of a recolored image based on given ground truth. The outputs from each of these networks are concatenated to produce the recolored RGB image of same spatial dimensions as input. Essentially, each channel is disentangled, enabling targeted adjustments.

The loss function used to train was pixel wise, mean-squared error loss:
<math>
\mathcal{L}_{\text{MSE}} = \frac{1}{N} \sum_{p=1}^{N} \left( I(p) - I'(p) \right)^2
</math>

Where:
* I, I': Recolored (model output) image and ground truth recolored image respectively
* p: Image index
* N: Total number of images

===== Conditional U-Net =====
In a similar fashion of inputs, a convolutional neural network (CNN)-based U-Net architecture was tested to generate a full recolored image as output. The conditional inputs here affect both the encoder and decoder. [[File:Unet condtional.png|right|thumb|Figure 3: Conditional U-Net architecture]]
U-Nets are widely used in computer vision tasks and are very robust to new tasks as well. The architecture we adopted is shown in Figure 3.
The loss function used to train the U-Net was a commonly used VGG Perceptual Loss:
<math>
\mathcal{L}_{\text{VGG}} = \sum_{l} \frac{1}{N_l} \| \phi_l(I) - \phi_l(I') \|_2^2
</math>

Where:
* I and I': are recolored (model output) and ground recolored images respectively
* <math>\phi_l</math> is the l-th of the pre-trained VGG network

==== Unsupervised Methods ====
===== Conditional Autoencoder =====
As shown in Figure4, an unsupervised CNN-based encoder-decoder network was trained to reconstruct full recolored images with a CVD-aware color palette. The key to making this network align with the recoloring task was the loss functions. The loss functions we used to train this network were inspired from [2]. [[File:Ae.png|right|350px|thumb|Figure 4: Conditional Autoencoder architecture]]

The total loss function is given by:
<math>
\mathcal{L}_{\text{total}} = \alpha \cdot \mathcal{L}_{\text{naturalness}} + 2 \cdot (1 - \alpha) \cdot \mathcal{L}_{\text{contrast}}
</math>

Where:
<math>
\mathcal{L}_{\text{contrast}} = \beta \cdot \mathcal{L}_{\text{global}} + (2 - \beta) \cdot \mathcal{L}_{\text{local}}
</math>

The components of the loss functions are described below:

1. '''Global Contrast Loss''':
The global contrast loss ensures that the overall contrast of the recolored image is preserved. It is defined as
<math>
\mathcal{L}_{global} = \frac{1}{\|\omega\|} \sum_{<x, y> \in \epsilon \omega} \text{CL}(x, y)
</math>

2. '''Local Contrast Loss''':
The local contrast loss focuses on preserving the contrast within a small neighborhood around each pixel. <math>
\mathcal{L}_{l} = \frac{1}{N} \sum_{x=1}^{N} \sum_{y \in \omega_x} \frac{\text{CL}(x, y)}{\|\omega_x\|}
</math>

Note:

<math>
\text{CL}(x, y) = \|\hat{c}_x' - \hat{c}_y'\| - \|c_x - c_y\|
</math>

* x,y: Two distinct pixels in the image.
* cx and cy: CVD simulated colors of original image
* c^x′and c^y: CVD simulated colors of recolored image (model output)
* ||w||: Global (or large) window of image
* ||wx||: Local window or neighborhood around a pixel x

3. '''Naturalness Loss''':
The naturalness loss drives output image to have colors that are visually similar and close to natural distributions. <math>
\mathcal{L}_{\text{natural}} = 1 - \text{SSIM}(I', I)
</math>

Where:
* I(i), I'(i): Original and recolored images respectively

== Results ==

=== Mathematical based methods ===

==== Qualitative Results ====
The qualitative results and key observations from the experiments are summarized below.

The result images presented in Figures 10 through 13 follow this sequence: the original image, the CVD-simulated version of the original image, the recolored image, and the CVD-simulated version of the recolored image. The CVD-simulated images demonstrate how the images are perceived by individuals with the corresponding type of CVD. The examples provided focus on protanopia (first row) and deuteranopia (second row) due to space constraints. Additional results for tritanopia and recolored images at varying severity levels are included in the appendix.

1. '''Method 1: Daltonization Baseline''':
[[File:Method1.png|400px|thumb|right|Figure 10: Method 1 Results]]

The Daltonization method provides a foundational approach for recoloring images to enhance visibility for individuals with CVD. Key takeaways from Figure 10 include:

* The method demonstrates significant improvements for protanopia, as seen in the first row, where the recolored images show clear color differences and high contrast. However, for deuteranopia, as shown in the second row, the recolored images exhibit less visible improvements, with lower contrast. This inconsistency highlights the method's limited ability to generalize across different types of CVD.
* The method does not account for severity levels or individual differences in CVD perception, which presents an opportunity for further improvement.
* While the recolored images achieve high contrast between confusing colors, the overall perception of the original image is not preserved. This reduction in naturalness may impact the aesthetic quality and recognizability of the image.
* Performance: this method is the fastest among the methods tested, as it relies solely on matrix transformations. This makes it computationally efficient and suitable for real-time applications.

The Daltonization method provides a baseline for recoloring but requires enhancements in flexibility, contrast optimization across CVD types, and personalization for varying severity levels.

2. '''Method 2: Optimizing Objective Functions''':
[[File:Method2.png|400px|thumb|right|Figure 11 Method 2 Results]]
* While this method aims to balance naturalness and contrast, the resulting recolored images are similar to the original ones. A possible reason for this is the sensitivity of the loss function to the beta parameter, which requires careful tuning.
* The recolored images exhibit some loss of fine details, likely due to the use of the k-means clustering algorithm, which simplifies color representation across the image.
* This algorithm has a very slow runtime, taking over one minute per image. The primary bottlenecks are the color clustering step and the optimization of the objective function, which can be improved significantly.
* Despite its limitations, this method introduces a flexible framework for customizing loss functions, enabling further improvements. This flexibility was leveraged to refine the method in subsequent methods.

3. '''Method 3: Adjustments Near Confusion Lines with Improved Method 2''':
[[File:Method3.png|400px|thumb|right|Figure 12 Method 3 Results]]
* This method produces recolored images with reasonable contrasts between confusing colors while preserving the naturalness of the image well. It can also account for varying severity levels for each CVD type, providing more personalized recoloring.
* The performance of the algorithm was improved significantly, reducing from over one minute to approximately 4 seconds per image.
* In the appendix, we included results with color plates, which commonly used for diagnosing color vision deficiencies, are included in the appendix. This method shows good results, with numbers becoming more easily visible in the CVD-simulated recolored images.
* Some limitations include the fact that this method sometimes lacks sufficient contrast, particularly for the deuteranopia type. It is also sensitive to parameters, such as the shift factor for colors near the confusion lines, which requires careful tuning.

4. '''Method 4: Improved with GMM-based Method''':
[[File:Method4.png|400px|thumb|right|Figure 13 Method 4 Results]]
* This method creates recolored images with very high contrast, making the colors in the images easily distinguishable, even for individuals with severe CVD.
* By using GMM-based clustering instead of k-means, this method preserves most of the image details. The more sophisticated clustering allows for a better representation of the original color distribution, reducing the loss of fine details.
* The runtime for this method is significantly faster than most others, taking only around 1 second per image. This makes it highly practical for real-time applications.
* While the method performs well in enhancing contrast, some recolored images lose the naturalness of the original images. Additionally, certain colors in the recolored images do not transition smoothly, which might be attributed to the clustering step in the process.

==== Quantitative Results ====
Below are some quantitative results from six metrics with the performance for each method:

{| class="wikitable"
|+ Table 1: Quantitative Evaluation Results for Mathematical Methods
! Original vs Recolored !! Method 1 !! Method 2 !! Method 3 !! Method 4
|-
| SSIM || 0.0066 || 0.9998 || 0.9988 || 0.9902
|-
| TCC || 0.4211 || 0.0001 || 0.0003 || 0.0005
|-
| CD ΔE76 || 57.4513 || 0.0217 || 0.0632 || 0.1057
|-
| CIEDE2000 || 41.2667 || 0.0229 || 0.0675 || 0.1312
|-
| CIEDE94 || 57.3637 || 0.0217 || 0.0630 || 0.1056
|-
| D-CIELAB ΔEab || 2.1314 || 3.8863 || 7.6867 || 8.0045
|-
| Time/image || 0.2s || 1m13s || 4.4s || 1.6s
|}

* SSIM: Measures structural similarity between images, combining luminance, contrast, and structure components. Computed using `torchmetrics.StructuralSimilarityIndexMeasure`.

* TCC: Evaluates changes in total color contrast, compares random pixel pairs in each image and calculates the difference in their color distances.

* D-CIELAB ΔEab: Quantifies perceptual color differences for dichromats under specific CVD types.

* CD ΔE76, CIEDE2000, CIEDE94: Standard perceptual color difference metrics, computed with scikit-image package. ΔE76 is basic Euclidean distance in Lab space, while CIEDE2000 and CIEDE94 include perceptual corrections.

Ovearll, method 4 stands out as the best-performing approach, delivering high contrast, preserving image details through GMM-based clustering, and achieving the fastest runtime, while addressing many limitations of the earlier methods.

=== Deep Learning based methods ===
The results focus on evaluating the performance of the above neural network architectures—Conditional Parallel RGB MLP, Deep U-Net, and Conditional Autoencoder. Quantitive metrics such as Structural Similarity Index (SSIM), total color contrast (TCC), Chromatic Difference (CD), and inference time were used to assess the effectiveness of the models provided in [1] and [2].

==== Qualitative Results ====
The recolored outputs were visually evaluated to determine their alignment with expected results. The 'expected' results for supervised mean how closely they resemble ground truth recolored image and for unsupervised method mean how much contrast and naturalness is observed in the CVD simulated recolored images compared to original.
The results and takeaways can be summarized as follows:

1. '''Conditional Parallel RGB MLP''': (Figure 5)
[[File:Mlp_res.png|right|400px|thumb|Figure 5 Conditional MLP: Model failure]]
* Recoloring was inconsistent, with visible artifacts in regions where spatial correlations were essential.
* The pixels seemed more discretized, suggesting that disentanglement was not very useful for this case (especially naturalness).
* Failed to preserve natural color transitions, particularly in complex images.
2. '''Conditional U-Net''': (Figure 6, 7)
[[File:Unet_res1.png|right|400px|thumb|Figure 6 Conditional U-Net: Model failure]]
[[File:Unet_res2.png|right|400px|thumb|Figure 7 Conditional U-Net: CVD Simulated examples]]
* Produced stable recoloring, preserving structural details.
* Initially showed improvement towards resembling ground truth, but over time started 'reconstructing' the colors of the original image.
* The CVD simulations of recolored versus original were similar or worse meaning that the model was not doing well for this task
* Sometimes it over-saturated some colors, affecting the visual appeal.
3. '''Conditional Autoencoder''': (Figure 8, 9)
[[File:ae_res1.png|right|400px|thumb|Figure 8 Conditional Autoencoder: Majority good results]]
[[File:ae_res1.png|right|400px|thumb|Figure 9 Conditional Autoencoder: Marginal or negative improvement + Blurriness]]
* Achieved smooth and natural recoloring, with fewer artifacts.
* Showed the highest contrast improvement among the three models.
* In some cases, hurt the contrast in the CVD simulated colors and in some there was marginal improvement in contrast.
* Blurriness in the recolored images was seen (possibly due to naturalness factor being more prioritized even though weight coefficients in the loss term favored contrast (alpha = 0.25, beta = 1.0)).

==== Quantitative Results ====
Based on the above qualitative results, we decided to score and evaluate metrics for comparison with related work only using the Conditional Autoencoder.
As mentioned above, the evaluation metrics are adapted from [1] and [2]. Please refer to the definitions in the paper, as we have used the same. On a high level, the three components are:
* SSIM: Measures the structural similarity between the original and recolored images, ensuring the structural integrity of the recolored image is maintained.
<math>
SSIM(X, Y) = \frac{(2\mu_X\mu_Y + c_1)(2\sigma_{XY} + c_2)}{(\mu_X^2 + \mu_Y^2 + c_1)(\sigma_X^2 + \sigma_Y^2 + c_2)}
</math>

* Total Color Contrast: Quantifies the visibility improvement between indistinguishable colors for CVD individuals.
<math>
TCC = \frac{1}{n_1} \sum_{(i,j) \in \Omega_1} |x_i - x_j|
+ \frac{1}{N \cdot n_2} \sum_{i=1}^{N} \sum_{j \in \Omega_2} |x_i - x_j|
</math>
* Chromatic Difference: Quantifies the perceptual differences in color before and after recoloring, ensuring enhanced distinguishability
<math>
CD(i) = \sqrt{\lambda (l_i' - l_i)^2 + (a_i' - a_i)^2 + (b_i' - b_i)^2}
</math>
(lamda is a constant, not wavelength and l,a,b represent LAB space coordinates of recolored (') and original respectively.)
* Inference Time: Determines the computational efficiency of the models.

The key results are in Table 2 and takeaways for the Conditional Autoencoder can be summarized as follows:

{| class="wikitable" style="text-align:center; width:40%; margin:auto;"
|-
! Metric
! Value
|-
| Inference Time
| 2.6 seconds/image
|-
| SSIM ("Structure")
| 0.8707
|-
| Total Color Contrast ("Distinguishability")
| 0.5771 (vs. ~0.851)*
|-
| Chromatic Difference ("Color")
| 0.3521 (vs. ~0.963)*
|+ '''Table 2: Quantitative Evaluation Results'''
|}

Note: * indicates results from paper [2] for protan/deutan whichever is larger.

* TCC and CD are good but not as good as paper [2] because they use optimize networks for each CVD type separately.
* Blurry (SSIM is not optimized for enough)
* Mixing CVD types in the same network needs to be more sophisticated

== Conclusions ==
Through our (many) experiments, we learned a couple of things:

1. '''Model Effectiveness''':
Among the models, the Conditional Autoencoder showed the best balance between enhancing color contrast and preserving naturalness. It improved the distinguishability of colors for CVD individuals while maintaining a smooth, visually appealing output. However, it produced slightly blurry images, which could be improved with better loss functions or refinement techniques. The Conditional U-Net was also effective in preserving structure and providing stable recoloring, but it required careful training to avoid overfitting. The Conditional Parallel RGB MLP, while computationally fast, lacked the ability to capture spatial relationships between pixels, making it unsuitable for this task.

2. '''Importance of Loss Functions''':
Designing appropriate loss functions was crucial for achieving the right balance between naturalness, contrast enhancement, and structural preservation. The global and local contrast losses significantly improved the visibility of recolored images, while the naturalness loss ensured that the outputs did not look artificial. Incorporating metrics like SSIM and Chromatic Difference into the evaluation also helped us better understand how well the models performed.

3. '''Challenges with Data''':
One of the biggest challenges was ensuring that the dataset effectively represented real-world scenarios for CVD individuals. Simulating CVD perceptions and generating recolored images that matched those perceptions required a well-defined pipeline. A more diverse dataset or additional user studies with CVD participants could help fine-tune the models further.

4. '''Computational Efficiency''':
While models like the Conditional Autoencoder and Conditional U-Net provided high-quality recoloring, their inference times were moderate, making them feasible for real-time applications. Optimizing these models further could make them more scalable for real-world use cases, such as accessibility tools in apps or websites.

5. '''What Worked and What Didn’t''':
* Worked: Contrast enhancement methods using local and global losses were effective in improving visibility for CVD individuals. Transformer-inspired loss functions borrowed from Swin architecture added robustness.
* Didn’t Work: Pixel-wise methods like the Conditional RGB MLP struggled due to their inability to handle spatial dependencies. Additionally, overfitting was a recurring issue in larger architectures without careful training.

6. '''Future Directions''':
* Better Loss Functions: Refining the loss functions to address issues like blurriness in outputs could further improve results.
* User Studies: Testing the models with real CVD participants would provide valuable insights and help validate the results.
* Model Optimization: Reducing the computational cost of high-performing models like the Conditional Autoencoder could make them more practical for deployment.
* Exploration of New Architectures: Trying newer methods, such as lightweight transformers or diffusion-based models, might enhance recoloring performance while maintaining efficiency.

While there’s still room for improvement, our models demonstrated the potential of deep learning in addressing the challenges faced by individuals with CVD. Our future work would focus on refining these methods and bringing them closer to practical, everyday applications.

== References ==
[1] Li, H., Zhang, L., Zhang, X., Zhang, M., Zhu, G., Shen, P., ... & Shah, S. A. A. (2020). Color vision deficiency datasets & recoloring evaluation using GANs. Multimedia Tools and Applications, 79, 27583-27614.

[2] Chen, L., Zhu, Z., Huang, W., Go, K., Chen, X., & Mao, X. (2024). Image recoloring for color vision deficiency compensation using Swin transformer. Neural Computing and Applications, 36(11), 6051-6066.

[3] Jiang, S., Liu, D., Li, D., & Xu, C. (2023). Personalized image generation for color vision deficiency population. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22571-22580).

[4] Huang, J.-B., Chen, C.-S., Jen, T.-C., & Wang, S.-J. (n.d.). Image recolorization for the colorblind [GitHub repository]. Retrieved December 12, 2024, from https://github.com/jbhuang0604/RecolorForColorblind

[5] Dietrich, J. (n.d.). Daltonize Python Package [GitHub repository]. Retrieved December 12, 2024, from https://github.com/joergdietrich/daltonize/blob/main/daltonize/daltonize.py

[6] Dougherty, B., & Wade, A. (2000). Vischeck. Retrieved December 12, 2024, from https://www.vischeck.com/

[7] Brettel, H., Viénot, F., & Mollon, J. D. (1997). Computerized simulation of color appearance for dichromats. Josa a, 14(10), 2647-2655.

[8] Zhu, Z., Toyoura, M., Go, K., Fujishiro, I., Kashiwagi, K., & Mao, X. (2019). Processing images for red–green dichromats compensation via naturalness and information-preservation considered recoloring. The Visual Computer, 35, 1053-1066.

[9] Zhu, Z., Toyoura, M., Go, K., Kashiwagi, K., Fujishiro, I., Wong, T. T., & Mao, X. (2021). Personalized image recoloring for color vision deficiency compensation. IEEE Transactions on Multimedia, 24, 1721-1734.

[10] Tsekouras, G. E., Rigos, A., Chatzistamatis, S., Tsimikas, J., Kotis, K., Caridakis, G., & Anagnostopoulos, C. N. (2021). A novel approach to image recoloring for color vision deficiency. Sensors, 21(8), 2740.

[11] Huang, J. B., Chen, C. S., Jen, T. C., & Wang, S. J. (2009, April). Image recolorization for the colorblind. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 1161-1164). IEEE.

[12] Color-Blindness.com. (n.d.). COBLIS - Color Blindness Simulator. Retrieved December 13, 2024, from https://www.color-blindness.com/coblis-color-blindness-simulator/

[13] DaltonLens. (n.d.). DaltonLens-Python [Computer software]. GitHub. Retrieved December 13, 2024, from https://github.com/DaltonLens/DaltonLens-Python

== Appendix I ==
* [https://github.com/rainasong/psych221-aut24-final-project.git Code]
* [https://drive.google.com/drive/folders/10WMXPbtpV7Hy5_qBA_TCEbW-kCpj1D7v Dataset]

=== Additional results ===
1. '''Recolored Images - Conditional Autoencoder'''
<div style="text-align: center;">
<div style="display: inline-block; vertical-align: middle;">
[[File:eb_1.png|400px|Wikipedia encyclopedia]]
</div>
<div style="display: inline-block; vertical-align: middle;">
[[File:eb_2.png|400px]]
</div>
</div>

2. '''Loss curves'''
<div style="text-align: center;">
<div style="display: inline-block; vertical-align: middle;">
[[File:loss_ae.png|350px|thumb|Conditional Autoencoder]]
</div>
<div style="display: inline-block; vertical-align: middle;">
[[File:loss_unet.png|350px|thumb|Conditional U-Net]]
</div>
<div style="display: inline-block; vertical-align: middle;">
[[File:loss_mlp.png|350px|thumb|Conditional MLP]]
</div>
<div style="clear: both; text-align: center;">
Losses: Conditional Autoencoder, Conditional U-Net, and Conditional MLP
</div>
</div>

3. '''Mathematical method results with color plates'''

<div style="text-align: center;">
<div style="display: inline-block; vertical-align: middle;">
[[File:Method1-color-plates.png|400px|thumb|Method 1 Color Plates Results]]
</div>
<div style="display: inline-block; vertical-align: middle;">
[[File:Method2-color-plates.png|400px|thumb|Method 2 Color Plates Results]]
</div>
</div>

<div style="text-align: center;">
<gallery mode="nolines" widths="400px" heights="300px" caption="Method 3 Color Plates Results for Protanopia, Deuteranopia, and Tritanopia with Severity Levels">
File:Method3-protan.png|Protanopia
File:Method3-deutan.png|Deuteranopia
File:Method3-tritan.png|Tritanopia
</gallery>
</div>

<div style="text-align: center;">
<gallery mode="nolines" widths="400px" heights="300px" caption="Method 4 Color Plates Results for Protanopia, Deuteranopia, and Tritanopia with Severity Levels">
File:Method4-protan.png|Protanopia
File:Method4-deutan.png|Deuteranopia
File:Method4-tritan.png|Tritanopia
</gallery>
</div>

== Appendix II ==
'''Ishikaa''':
* Training, evaluation and visualization for all deep learning methods (MLP, U-Net and Autoencoder)
* GMM recoloring method in Python & adding severity index
* 'Ground Truth' dataset creation and logging
* AWS Compute setup & configuration
* Written Report & Presentation

'''Raina''':
* Researching, writing and running scripts for four (and more) mathematical-based methods (Daltonization, Optimization-based, Confusion lines based, GMM based and some other experiments such as a segmentation-based method which was discarded due to slow performance)
* Results generation and validation for all scripts written
* Evaluation metrics scripts for mathematical methods
* Written Report & Presentation

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-13T13:26:00Z

Rainas: /* Results */

== Introduction ==
Color Vision Deficiency (CVD) affects approximately 350 million individuals worldwide, impairing their ability to distinguish certain colors. Image recoloring for individuals with CVDs has been a well-researched area, with numerous attempts aimed at creating images that make colors more distinguishable for those with CVDs. CVDs are typically classified into three main types: protanopia (difficulty perceiving red), deuteranopia (difficulty perceiving green), and tritanopia (difficulty perceiving blue). These conditions arise due to the absence or malfunction of certain types of cone cells in the retina, which are responsible for color vision. For instance, the lack or defect of red or green cones leads to protanopia or deuteranopia, respectively. This can result in partial or complete loss of perception of specific colors. Moreover, individuals with CVD can be dichromats (completely missing one type of cone cell), or anomalous trichromats (having all three types of cones but with altered sensitivity), causing milder color perception issues. Most research has focused on protanopia and deuteranopia, as these conditions are more prevalent, and only a few consider different severity levels.

In this work, we aim to consider all three types of CVDs, taking into account varying levels of severity for personalization. We explore existing methods in the field and experiment through two main approaches: mathematical transformations and deep learning techniques. We will start by reviewing current advancements in these two domains, followed by presenting our experiments and results. Evaluations of each method will be provided, leading to a discussion of our findings and outlining potential directions for future work.

The motivation for this work arose from a personal experience. While admiring a rainbow, a phenomenon I find among the most beautiful in the world, with a friend who has deuteranopia, I realized that they were unable to distinguish the vibrant array of colors. This experience highlighted the emotional and perceptual gap caused by CVD, inspiring the goal of this project: to develop personalized and efficient tools that enhance color perception for individuals with CVDs. Ultimately, we aim to enable those with CVDs to enjoy the same vivid experiences, such as the beauty of a rainbow, experienced by those with normal color vision.

== Background ==
In recent years, numerous methods have been developed to recolor images for individuals with CVDs, ranging from traditional mathematical approaches to advanced deep learning techniques. This section focuses on the prominent recent works in these two categories.

=== Mathematical-based methods ===
Mathematical approaches to image recoloring for individuals with CVDs have been extensively developed to enhance color discrimination while trying to preserve the natural appearance of images. These methods typically involve color space transformations, optimization techniques, and perceptual modeling to achieve their objectives.

==== Daltonization ====
Daltonization enhances images for individuals with CVD by correcting colors based on the simulated deficiency. The process involves comparing the original LMS values with the simulated deficient values to compute the error:
<math display="block">
\text{Error}_{\text{LMS}} = \text{LMS}_{\text{original}} - \text{LMS}_{\text{simulated}}
</math>

The error is then mapped back to the RGB space using a correction matrix because the error contains the information that dichromats cannot see, and the correction matrix rotates it to a part of the spectrum that they can see. For example, the correction matrix, as implemented in tools like Daltonize [5] and Vischeck [6], is:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

The corrected RGB values are added back to the original LMS values to generate a daltonized image that improves contrast for CVD viewers.

==== Optimization-based Method ====
Zhu et al. [8] introduced an optimization-based recoloring framework for red-green dichromacy, aiming to balance naturalness and contrast. The framework minimizes a total loss function defined as:

<math display="block"> E = \beta E_{\text{nat}} + E_{\text{cont}} </math>

where <math>\beta</math> is a scalar weight that controls the trade-off between the two objectives: naturalness preservation (<math>E_{\text{nat}}</math>) and contrast enhancement (<math>E_{\text{cont}}</math>).

The naturalness term, <math>E_{\text{nat}}</math>, ensures that the recolored image closely resembles the original image for CVD viewers by minimizing perceptual differences:

<math display="block"> E_{\text{nat}} = \sum_{i=1}^N \| c_i^+ - c_i \|^2, </math>

where:
* <math>N</math> is the total number of pixels in the image,
* <math>c_i</math> is the original color of the <math>i</math>-th pixel,
* <math>c_i^+</math> is the recolored value of the <math>i</math>-th pixel,
* <math>\| c_i^+ - c_i \|</math> is the Euclidean distance, measuring the perceptual difference between the original and recolored colors.

The contrast term, <math>E_{\text{cont}}</math>, enhances the distinguishability of colors in the recolored image by minimizing changes in color contrast:

<math display="block"> E_{\text{cont}} = \sum_{i \neq j} \| (c_i^+ - c_j^+) - (c_i - c_j) \|^2, </math>

where:
* <math>(c_i^+ - c_j^+)</math> is the perceived color difference between pixels <math>i</math> and <math>j</math> after recoloring,
* <math>(c_i - c_j)</math> is the original color difference,
* <math>\| (c_i^+ - c_j^+) - (c_i - c_j) \|</math> represents the deviation in color contrast before and after recoloring.

To address the limitations of this approach, Zhu et al. [9] proposed a degree-adaptable framework incorporating a transformation matrix <math>T</math> that simulates CVD perception. The transformation matrix is defined as:

<math display="block"> T = \begin{bmatrix} t_{11} & t_{12} & t_{13} \\ t_{21} & t_{22} & t_{23} \\ t_{31} & t_{32} & t_{33} \end{bmatrix}, </math>

where <math>t_{ij}</math> are the elements representing the relationships between the original and perceived LMS (Long, Medium, Short wavelength) cone responses for individuals with CVD.

The degree-adaptable loss function extends the optimization by adjusting weights based on perceptual importance, defined as:

<math display="block"> E = \beta \sum_{i=1}^N \alpha_i \| T(c_i^+ - c_i) \|^2 + \sum_{i \neq j} \| T(c_i^+ - c_j^+) - T(c_i - c_j) \|^2. </math>

Here:
* <math>\alpha_i</math> assigns weights to each pixel, prioritizing the preservation of colors with smaller perception errors,
* <math>\| T(c_i^+ - c_i) \|</math> measures the perceptual difference after recoloring,
* <math>\| T(c_i^+ - c_j^+) - T(c_i - c_j) \|</math> quantifies the deviation in color contrast under CVD simulation.

This framework improves both contrast and personalization but requires further optimization for real-time performance.

==== Confusion lines based Method ====
Tsekouras et al. [10] proposed a novel image recoloring approach for individuals with protanopia and deuteranopia, focusing on improving color naturalness and enhancing contrast. Their framework consists of four modules, with a key focus on shifting confusing colors along confusion lines in the CIE 1931 chromaticity diagram.

The process begins with fuzzy clustering, which identifies representative colors (key colors) from the input image. These key colors are then analyzed on the chromaticity diagram, where confusion lines—paths representing colors indistinguishable by individuals with CVD—serve as the basis for recoloring. Confusion lines are defined using the copunctal point of the missing cone type and another reference point:

<math display="block">
d(v, L) = \frac{\left|(x_{cp} - x_0)(y_0 - y_v) - (x_0 - x_v)(y_{cp} - y_0)\right|}{\sqrt{(x_{cp} - x_0)^2 + (y_{cp} - y_0)^2}},
</math>

where:
* <math display="inline">v = (x_v, y_v)</math> is the chromaticity coordinate of the color,
* <math display="inline">L</math> is the confusion line passing through the copunctal point <math display="inline">(x_{cp}, y_{cp})</math> and another reference point <math display="inline">(x_0, y_0)</math>,
* <math display="inline">d(v, L)</math> measures the perpendicular distance from the point <math display="inline">v</math> to the confusion line <math display="inline">L</math>.

Confusing colors, identified as key colors lying on occupied confusion lines, are iteratively shifted to the nearest non-occupied confusion lines to enhance discriminability for CVD viewers. High-ranking colors, determined by their prominence in image clusters, are shifted to the nearest unoccupied confusion lines. This reallocation ensures that these colors are distinguishable to viewers with CVD while minimizing disruption to the image's overall color harmony.

After shifting, the luminance of the recolored key colors is optimized using a regularized objective function to balance naturalness and contrast:
<math display="block">
E = (E_1 + E_2) + \lambda E_3,
</math>

where:
* <math display="inline">E</math> is the total loss,
* <math display="inline">\lambda</math> is a weight parameter controlling the trade-off between contrast enhancement and naturalness preservation.

The first term, <math display="inline">E_1</math>, measures contrast enhancement for normal trichromats:

<math display="block">
E_1 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - b_j\| - \|f_D(a_{i,\text{rec}}) - f_D(b_j)\| \right|,
</math>

where:
* <math display="inline">n_A</math> and <math display="inline">n_B</math> are the number of key colors in clusters <math display="inline">A</math> and <math display="inline">B</math>, respectively,
* <math display="inline">a_i</math> is the chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">b_j</math> is the chromaticity of the <math display="inline">j</math>-th key color in cluster <math display="inline">B</math>,
* <math display="inline">f_D</math> is a function simulating the dichromatic vision of individuals with color vision deficiencies,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color.

The second term, <math display="inline">E_2</math>, measures contrast enhancement for dichromats:

<math display="block">
E_2 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - a_j\| - \|f_D(a_{i,\text{rec}}) - f_D(a_{j,\text{rec}})\| \right|,
</math>

where:
* <math display="inline">a_i</math> and <math display="inline">a_j</math> are the chromaticities of the <math display="inline">i</math>-th and <math display="inline">j</math>-th key colors in cluster <math display="inline">A</math>,
* <math display="inline">f_D(a_{i,\text{rec}})</math> simulates the dichromatic perception of the recolored chromaticity <math display="inline">a_{i,\text{rec}}</math>.

The third term, <math display="inline">E_3</math>, preserves the naturalness of the recolored image:

<math display="block">
E_3 = \frac{1}{n_A} \sum_{i=1}^{n_A} \|a_i - a_{i,\text{rec}}\|,
</math>

where:
* <math display="inline">a_i</math> is the original chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">\|a_i - a_{i,\text{rec}}\|</math> is the Euclidean distance between the original and recolored chromaticities, measuring how much the naturalness is preserved.

This method significantly enhances the contrast and naturalness of recolored images by leveraging confusion line geometry and regularized optimization. However, challenges remain in achieving real-time performance and handling cases where shifting may distort the aesthetic quality of the image.

==== GMM-based Method ====
Huang et al. [11] proposed an efficient and effective re-coloring algorithm for individuals with CVD using a Gaussian Mixture Model (GMM) to represent color distributions. The algorithm comprises four main steps: feature extraction, clustering using GMM, optimization of Gaussian components, and interpolation for recoloring.

Step 1 - Feature Extraction:
Each pixel in the input image is represented in the CIEL*a*b* color space, which approximates perceptual differences using the Euclidean distance between colors. The color feature vector <math display="inline">x</math> is used as input for clustering.

Step 2 - Clustering via GMM:
The color distribution of the image is modeled using a GMM with <math display="inline">K</math> Gaussian components:
<math display="block">
p(x|\Theta) = \sum_{i=1}^K \omega_i G_i(x|\theta_i),
</math>
where:
* <math display="inline">\Theta</math> is the parameter set containing all weights, means, and covariance matrices,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian,
* <math display="inline">G_i(x|\theta_i)</math> is the 3D normal distribution with parameters <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix).

Step 3 - Optimization:
To ensure color distinguishability for CVD viewers, the algorithm adjusts the mean vector of each Gaussian component using an optimization function that preserves the symmetric Kullback-Leibler (KL) divergence:
<math display="block">
D_{sKL}(G_i, G_j) = D_{KL}(G_i \| G_j) + D_{KL}(G_j \| G_i),
</math>
where:
* <math display="inline">D_{KL}(G_i \| G_j)</math> measures the dissimilarity between two Gaussian distributions <math display="inline">G_i</math> and <math display="inline">G_j</math>.

Step 4 - Interpolation for Recoloring:
After optimizing the Gaussians, the mapping function <math display="inline">M_i(\cdot)</math> relocates the mean vectors while maintaining covariance matrices. Interpolation ensures smooth transitions between recolored regions:
<math display="block">
T(x_j)_H = x_j^H + \sum_{i=1}^K p(i|x_j, \Theta) (M_i(\mu_i)_H - \mu_i^H),
</math>
where:
* <math display="inline">T(x_j)_H</math> is the hue adjustment for the <math display="inline">j</math>-th color,
* <math display="inline">M_i(\mu_i)_H</math> is the mapped hue of the <math display="inline">i</math>-th Gaussian's mean.

While the GMM-based approach effectively models color distributions and enhances the contrast of recolored images significantly, it has limitations:
* The accuracy of recoloring depends on the choice of <math display="inline">K</math>, which may vary for different images.
* The method assumes diagonal covariance matrices for computational efficiency, which may oversimplify real-world color distributions. Sometimes the colors in the recolored images are not very natural.
* The high computational complexity in the optimization step of this algorithm may be difficult for real-time applications.

=== Deep Learning based methods ===
Conventional methods for recoloring, including optimization-based approaches (as discussed above), fail to generalize well across varying severity levels and CVD types. While these methods improve color differentiation, they frequently compromise naturalness or require extensive computational resources, making them less suitable for real-time, efficient, personalized applications.

==== GAN-Based Recoloring for CVD ====

In [1] GANs (Generative Adversarial Networks) was explored for recoloring, with a backbone Pix2Pix-GAN, Cycle-GAN, and Bicycle-GAN structure showing promising results. These models are generate creative recolored images by learning mappings between normal and CVD-affected color spaces. However, this and existing GAN approaches struggle with balancing naturalness and contrast. This specific reference also requires paired datasets (since it is adapted from style transfer), making it computationally intensive and less suitable for personalization.

==== Swin Transformer Recoloring ====

The authors in [2] introduced a hierarchical vision transformer (SWIN) architecture that processes images through shifted windows, effectively capturing both local and global contextual information. In computer vision, this design generally allows efficient handling of high-resolution images and has been applied to various tasks, including image classification and object detection. Despite its robust performance, this architecture is still computationally intensive and does not inherently account for the specific needs of CVD individuals, as it lacks mechanisms for personalized color adjustments.

==== Personalized CVD-GAN ====

To cater to the diverse needs of the CVD population, the Personalized CVD-GAN [3] was developed. This model generates images that are not only CVD-friendly but also tailored to individual degrees of color vision deficiency. By disentangling color representations using a unique triple-latent structure in their method, continuous personalization was possible to adjust images according to specific CVD severities. While effective, this approach is computationally demanding, making it less practical for real-time applications. In our experiment, it took around 18 days for one epoch (or one iteration over the entire dataset).

Thus, existing methods either lack personalization or are too resource-intensive for widespread use.

== Methods ==
We aim to find effective and efficient ways to recolor images for people with CVD with the personalization of different severity levels. We start by exploring existing methods and identifying opportunities for improvement. Since mathematical-based approaches provide a solid foundation and are well-documented, we began our experiments by testing these methods, as described in the background. We later extended our exploration to deep learning based methods.

=== Mathematical based ===
We explored four main methods, building on the foundational work discussed in the background section.

==== Method 1: Daltonization as a Baseline ====
We started with the relatively intuitive Daltonization method, where we adjusted the colors in an image to compensate for color vision deficiencies by simulating how the colors appear to individuals with CVD. This involves computing the difference between the original and simulated color perception in the LMS (Long, Medium, Short wavelength) color space. The calculated error is then corrected and mapped back to the RGB space using a transformation matrix, resulting in a recolored image that enhances color differentiation for viewers with CVD.

The simulation of CVDs relies on the physiology of human vision, particularly the responses of the Long (L), Medium (M), and Short (S) wavelength-sensitive cones in the retina. The LMS color space is derived from the spectral sensitivities of these cones, making it an ideal framework for modeling human color perception.

To simulate CVD, we first transformed colors in RGB color space into the LMS color space using the following linear transformation matrix based on Stockman and Sharpe’s cone fundamentals:
<math display="block">
T_{\text{RGB-to-LMS}} = \begin{bmatrix}
0.3904725 & 0.54990437 & 0.00890159 \\
0.07092586 & 0.96310739 & 0.00135809 \\
0.02314268 & 0.12801221 & 0.93605194
\end{bmatrix}
</math>

For individuals with CVD, the missing cone’s response is replaced by a weighted combination of the remaining two cones. This approach, introduced by Brettel, Viénot, and Mollon (1997) [7], uses specific coefficients derived from cone sensitivities. For example, in protanopia (L-cone deficiency), the L-cone response is approximated using the M- and S-cone responses as:
<math display="block">
L_{\text{simulated}} = 0 \cdot L + 0.90822864 \cdot M + 0.008192 \cdot S
</math>

For deuteranopia (M-cone deficiency), the M-cone is replaced as:
<math display="block">
M_{\text{simulated}} = 1.10104433 \cdot L + 0 \cdot M - 0.00901975 \cdot S
</math>

For tritanopia (S-cone deficiency), the S-cone is replaced as:
<math display="block">
S_{\text{simulated}} = -0.15773032 \cdot L + 1.19465634 \cdot M + 0 \cdot S
</math>

These transformations allow accurate simulation of the perceptual experience of individuals with CVD. (The numbers are derived from [5]).

The error between the original and simulated is then mapped into the RGB color space using a deficiency-specific correction matrix, which adjusts the image to enhance contrast and recover lost color differences. The predefined correction matrix is applied to the error in RGB space, transforming it back into LMS space for final adjustments. The corrected LMS values are added back to the original values, producing a recolored image that improves visual accessibility for viewers with CVD. This approach uses the Daltonize-inspired correction matrix:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

==== Method 2: Optimizing Objective Function ====
To improve the results from the Daltonization method, we designed a framework inspired by methods discussed in the background, incorporating dominant color extraction, optimization-based recoloring, and edit propagation. This approach aims to find a balance between the naturalness and contrast while compensating colors that are not visible for corresponding CVD types.

===== 1. Extraction of Dominant Colors =====
We begin by extracting the dominant colors from the input image using fuzzy clustering via a K-means algorithm. This step identifies a reduced set of representative colors that capture the primary color information in the image:
<math display="block">
\mathbf{C} = \{\mathbf{c}_1, \mathbf{c}_2, \ldots, \mathbf{c}_N\},
</math>
where <math display="inline">N</math> represents the number of clusters, and <math display="inline">\mathbf{c}_i</math> represents the centroid of the <math display="inline">i</math>-th cluster.

===== 2. Optimization-Based Recoloring =====
Once the dominant colors are extracted, we apply an optimization process to adjust these colors. The optimization uses the formulas mentioned in [9], and aims to balance two key objectives:

1. Naturalness Preservation: Ensures the recolored image minimally deviates from the original.
<math display="block">
E_{\text{nat}} = \sum_{i=1}^N \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_i^{\text{original}}) \|^2,
</math>
where <math display="inline">\mathbf{T}</math> is the transformation matrix based on the severity and type of CVD, and <math display="inline">\mathbf{c}_i^{\text{original}}</math> is the original color.

2. Contrast Enhancement: Improves the differentiation of colors for individuals with CVD:
<math display="block">
E_{\text{cont}} = \sum_{i=1}^N \sum_{j>i} \left( \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_j) \|^2 - \| \mathbf{c}_i^{\text{original}} - \mathbf{c}_j^{\text{original}} \|^2 \right)^2.
</math>

The total objective function combines these two terms:
<math display="block">
E = \beta E_{\text{nat}} + E_{\text{cont}},
</math>
where <math display="inline">\beta</math> controls the trade-off between naturalness and contrast.

Optimization is performed using the L-BFGS-B algorithm to ensure efficient convergence under bounded constraints.

The transformation matrices for each type of CVD are the following, which are based on [12]:

<div style="text-align:center;">
<math display="inline">
T_{\text{Protanopia}} = \begin{bmatrix} 0.566 & 0.558 & 0 \\ 0.433 & 0.442 & 0.242 \\ 0 & 0 & 0.758 \end{bmatrix},
</math>
<math display="inline">
T_{\text{Deuteranopia}} = \begin{bmatrix} 0.625 & 0.7 & 0 \\ 0.375 & 0.3 & 0.3 \\ 0 & 0 & 0.7 \end{bmatrix},
</math>
<math display="inline">
T_{\text{Tritanopia}} = \begin{bmatrix} 0.95 & 0 & 0 \\ 0.05 & 0.433 & 0 \\ 0 & 0.567 & 1 \end{bmatrix}.
</math>
</div>

===== 3. Edit Propagation =====
After optimizing the dominant colors, we propagate these edits across the entire image to ensure smooth transitions. This propagation step leverages the CIE-Lab color space, which is perceptually uniform, meaning that the Euclidean distance in this space correlates well with human color perception. The process begins by mapping the original image and the optimized dominant colors into the Lab color space. In this space, the differences between the original and recolored dominant colors are computed to capture the adjustments made during the optimization step:
<math display="block">
\Delta L^* = \text{griddata}(\mathbf{c}^{\text{original}}, \mathbf{c}^{\text{recolored}} - \mathbf{c}^{\text{original}}, \mathbf{I}),
</math>
where <math display="inline">\mathbf{I}</math> represents the pixel values in the Lab color space. Once the interpolated changes are computed, they are applied to the Lab representation of the original image. Finally, the adjusted Lab values are converted back to the RGB color space to reconstruct the recolored image.

==== Method 3: Improved with Confusion Line Adjustments ====
This method builds upon the previous method by introducing enhancements in dominant color extraction, optimization, and edit propagation, while incorporating an additional step to adjust colors near confusion lines in the CIE 1931 xyY color space inspired by [10]. These improvements aim to further enhance contrast and naturalness of the recolored images. Moreover, this method adds flexibility in adjusting for different severity levels for each CVD type.

===== 1. Improvements on Method 2 =====
To improve the performance of dominant color extraction, we transitioned from traditional K-means to MiniBatch K-means. This algorithm processes data in small batches, significantly reducing computational time while maintaining accuracy in clustering. The number of dominant colors was also reduced from 50 to 30 to focus on key representative colors and further enhance efficiency. The optimization objective is refined to leverage vectorization, improving computational efficiency. The two key terms remain:
<math display="block">
E = \beta E_{\text{nat}} + (1 - \beta) E_{\text{cont}}.
</math>
The optimization objective was refined to significantly improve computational efficiency by replacing the nested loops in the contrast enhancement term with vectorized operations. In the original implementation, the pairwise differences between colors were calculated iteratively using <math display="inline">O(N^2)</math> nested loops. The improved version eliminates this overhead by leveraging array broadcasting to compute all pairwise differences simultaneously, and the transformation matrix <math display="inline">\mathbf{T}</math> is then applied to all pairwise differences in a single tensor operation:
<math display="block">
\mathbf{T}_{\Delta} = \text{tensordot}(\Delta_{ij}, \mathbf{T}),
</math>
and the norms are computed in parallel across the entire array. Additionally, the weighting parameter <math display="inline">\beta</math> was adjusted to favor naturalness preservation, ensuring better visual integrity in the recolored image.
The propagation step changed to use a k-d tree for fast nearest neighbor searches, replacing grid-based interpolation. This approach more efficiently matches each pixel in the Lab color space to the closest dominant color:
<math display="block">
\mathbf{I}_{\text{adjusted}} = \mathbf{C}_{\text{recolored}}[\text{k-d tree query}(\mathbf{I})],
</math>
where <math display="inline">\mathbf{I}</math> represents the pixel values in Lab space.
These refinements enable faster optimization while improving the balance between naturalness and contrast enhancement.

===== 2. Confusion Line Adjustments =====
An additional step adjusts colors near confusion lines in the CIE 1931 xyY color space to enhance distinguishability:

1. Confusion lines are defined for protanopia, deuteranopia, and tritanopia, based on [10]. For example, for protanopia:
<math display="block">
\text{Confusion Line: Start} = (0.735, 0.265), \quad \text{End} = (0.115, 0.885).
</math>

2. Colors near the confusion line are identified using orthogonal distance:
<math display="block">
d(\mathbf{xy}, L) = \frac{\| (\mathbf{xy} - \mathbf{p}_1) \times (\mathbf{p}_2 - \mathbf{p}_1) \|}{\|\mathbf{p}_2 - \mathbf{p}_1\|},
</math>
where <math display="inline">\mathbf{p}_1</math> and <math display="inline">\mathbf{p}_2</math> are the start and end points of the confusion line.

3. Identified colors are shifted orthogonally away from the line:
<math display="block">
\mathbf{xy}_{\text{adjusted}} = \mathbf{xy} + \lambda \mathbf{v}_{\perp},
</math>
where <math display="inline">\mathbf{v}_{\perp}</math> is a perpendicular vector, and <math display="inline">\lambda</math> is a scaling factor.

===== 3. Personalise with Severity Levels =====
To take into account of severity levels, the transformation matrix <math display="inline">\mathbf{T}</math> linearly interpolates between normal vision and full CVD perception based on severity and type:
<math display="block">
\mathbf{T} = (1 - s) \mathbf{I} + s \mathbf{T}_{\text{CVD}},
</math>
where <math display="inline">s</math> represents the severity of CVD (0-100%), <math display="inline">\mathbf{I}</math> is the identity matrix, and <math display="inline">\mathbf{T}_{\text{CVD}}</math> is the full transformation matrix specific to protanopia, deuteranopia, or tritanopia. Such a method is based on DaltonLens simulator [13].

These improvements significantly enhanced both the effectiveness and efficiency of the recoloring process on top of Method 2.

==== Method 4: Improved with GMM-based Method ====
The last mathematical method we exprimented enhances recoloring by integrating a Gaussian Mixture Model (GMM)-based global recoloring algorithm. The method also applies nonlinear adjustments for colors near confusion lines to ensure improved contrast and naturalness.

===== 1. GMM-Based Global Recoloring =====
The image is first resized and transformed into the Lab color space. A GMM is applied to cluster the color distribution into <math display="inline">K</math> components, optimizing the number of clusters using the Bayesian Information Criterion (BIC):
<math display="block">
\text{BIC} = -2 \cdot \text{log-likelihood} + P \cdot \log(N),
</math>
where <math display="inline">P</math> represents the model parameters and <math display="inline">N</math> is the number of pixels.

The GMM means are simulated using the transformation matrix <math display="inline">T</math> with severity levels taken into account, and the symmetric Kullback-Leibler (KL) divergence (<math display="inline">D_{\text{sKL}}</math>) is calculated between pairs of clusters:
<math display="block">
D_{\text{sKL}}(i, j) = D_{\text{KL}}(G_i \| G_j) + D_{\text{KL}}(G_j \| G_i),
</math>
where <math display="inline">G_i</math> and <math display="inline">G_j</math> are Gaussian components, and <math display="inline">D_{\text{KL}}</math> represents the KL divergence. The GMM cluster means are then adjusted by solving a nonlinear least-squares problem to minimize the discrepancy.

===== 2. Adjusting Near Confusion Lines Improved =====
Following global recoloring, colors near confusion lines in the CIE 1931 xyY color space are further adjusted based on formulas used in Method 3. Nonlinear scaling is applied to amplify the shifts for pixels closer to the line:
<math display="block">
w = \left( \frac{\text{threshold} - d}{\text{threshold}} \right)^2,
</math>
where <math display="inline">w</math> is the scaling factor.

The adjustments from the GMM and confusion line steps are combined to produce the final recolored image. These enhancements make the method more robust and effective for individuals with varying levels of CVD.

=== Deep Learning based ===

==== Task Overview ====
Given an input RGB image and a label for the user (as shown in the figure), we want a deep learning model to output a recolored RGB image that is specific to that user. More details on inputs and outputs are discussed in further sections but an overview is shown in Figure 1. All of the code was done in Python using a deep learning framework called [https://pytorch.org PyTorch]
[[File:Io.png|right|thumb|200px|Figure 1: Dataset]]

==== Types ====
1. ''' Supervised methods ''':
These are deep learning models that require a 'ground truth' recolored image for the neural network to learn recolorization. While these methods are simple, easy to train and integrate the user label, they require an already present ground truth comparison of expected output.

2. ''' Unsupervised methods ''':
These models are trained without a ground truth and can also encode user label information while training. They are generally better at generating more natural images, but they require more compute and sophisticated model architectures or loss functions for the recoloring task

==== Dataset ====
The dataset used for this project was constructed specifically to address the challenges of recoloring images for individuals with color vision deficiency (CVD). We first gathered an open-source RGB image dataset from [2]. To improve the capability of the proposed model to enhance the contrast between CVD-indistinguishable color
pairs, in their study, they created a new dataset consisting of 141,000 pictures of both natural scenes and artificial images containing
CVD-confusing colors without labels. To generate labels (and ground truth recolored images for supervised methods), we randomly sampled 15,000 images and recolored by simulating random labels for severity and type of CVD. The recoloring for ground truth images was done using a [https://github.com/jbhuang0604/RecolorForColorblind/tree/master MATLAB script] (adapted to Python) from [4]. Note: The open-source tools used in the Python version for the recoloring script were [https://scikit-image.org Scikit-Image], [https://scipy.org Scipy] and [https://python-colormath.readthedocs.io/en/latest/ Colormath].

As shown in Figure 1, each sample in the dataset consists of:

1. ''' Original RGB Image''' : High-resolution images, resized to <code> 256x256</code> pixels and normalized to <code>[0,1]</code> range, representing the standard color space.

2. ''' CVD Labels ''' : Condition labels encoded as <code>severity * [protan, deutan]</code>, where severity ranges from 0.1 to 1.0. For example, a label <code>[0.6, 0]</code> corresponds to protanopia at 60% severity.

Data augmentation techniques such as random rotations, crops, and brightness adjustments were applied to expand the dataset, ensuring robust model generalization across diverse scenarios.

==== Supervised Methods ====
===== Conditional Parallel RGB MLP =====
[[File:mlp.png|right|thumb|Figure 2: Conditional MLP architecture]]
As shown in Figure 2, the model predicts the R, G, and B channels separately using an independent multi-layer perceptron (MLP) for each channel. The input image is concatenated with the label encoding along the channel dimension and is passed to 3 parallel MLPs simultaneously. These parallel networks are learned to predicted R, G, B channels of a recolored image based on given ground truth. The outputs from each of these networks are concatenated to produce the recolored RGB image of same spatial dimensions as input. Essentially, each channel is disentangled, enabling targeted adjustments.

The loss function used to train was pixel wise, mean-squared error loss:
<math>
\mathcal{L}_{\text{MSE}} = \frac{1}{N} \sum_{p=1}^{N} \left( I(p) - I'(p) \right)^2
</math>

Where:
* I, I': Recolored (model output) image and ground truth recolored image respectively
* p: Image index
* N: Total number of images

===== Conditional U-Net =====
In a similar fashion of inputs, a convolutional neural network (CNN)-based U-Net architecture was tested to generate a full recolored image as output. The conditional inputs here affect both the encoder and decoder. [[File:Unet condtional.png|right|thumb|Figure 3: Conditional U-Net architecture]]
U-Nets are widely used in computer vision tasks and are very robust to new tasks as well. The architecture we adopted is shown in Figure 3.
The loss function used to train the U-Net was a commonly used VGG Perceptual Loss:
<math>
\mathcal{L}_{\text{VGG}} = \sum_{l} \frac{1}{N_l} \| \phi_l(I) - \phi_l(I') \|_2^2
</math>

Where:
* I and I': are recolored (model output) and ground recolored images respectively
* <math>\phi_l</math> is the l-th of the pre-trained VGG network

==== Unsupervised Methods ====
===== Conditional Autoencoder =====
As shown in Figure4, an unsupervised CNN-based encoder-decoder network was trained to reconstruct full recolored images with a CVD-aware color palette. The key to making this network align with the recoloring task was the loss functions. The loss functions we used to train this network were inspired from [2]. [[File:Ae.png|right|350px|thumb|Figure 4: Conditional Autoencoder architecture]]

The total loss function is given by:
<math>
\mathcal{L}_{\text{total}} = \alpha \cdot \mathcal{L}_{\text{naturalness}} + 2 \cdot (1 - \alpha) \cdot \mathcal{L}_{\text{contrast}}
</math>

Where:
<math>
\mathcal{L}_{\text{contrast}} = \beta \cdot \mathcal{L}_{\text{global}} + (2 - \beta) \cdot \mathcal{L}_{\text{local}}
</math>

The components of the loss functions are described below:

1. '''Global Contrast Loss''':
The global contrast loss ensures that the overall contrast of the recolored image is preserved. It is defined as
<math>
\mathcal{L}_{global} = \frac{1}{\|\omega\|} \sum_{<x, y> \in \epsilon \omega} \text{CL}(x, y)
</math>

2. '''Local Contrast Loss''':
The local contrast loss focuses on preserving the contrast within a small neighborhood around each pixel. <math>
\mathcal{L}_{l} = \frac{1}{N} \sum_{x=1}^{N} \sum_{y \in \omega_x} \frac{\text{CL}(x, y)}{\|\omega_x\|}
</math>

Note:

<math>
\text{CL}(x, y) = \|\hat{c}_x' - \hat{c}_y'\| - \|c_x - c_y\|
</math>

* x,y: Two distinct pixels in the image.
* cx and cy: CVD simulated colors of original image
* c^x′and c^y: CVD simulated colors of recolored image (model output)
* ||w||: Global (or large) window of image
* ||wx||: Local window or neighborhood around a pixel x

3. '''Naturalness Loss''':
The naturalness loss drives output image to have colors that are visually similar and close to natural distributions. <math>
\mathcal{L}_{\text{natural}} = 1 - \text{SSIM}(I', I)
</math>

Where:
* I(i), I'(i): Original and recolored images respectively

== Results ==

=== Mathematical based methods ===

==== Qualitative Results ====
The qualitative results and key observations from the experiments are summarized below.

The result images presented in Figures 10 through 13 follow this sequence: the original image, the CVD-simulated version of the original image, the recolored image, and the CVD-simulated version of the recolored image. The CVD-simulated images demonstrate how the images are perceived by individuals with the corresponding type of CVD. The examples provided focus on protanopia (first row) and deuteranopia (second row) due to space constraints. Additional results for tritanopia and recolored images at varying severity levels are included in the appendix.

1. '''Method 1: Daltonization Baseline''':
[[File:Method1.png|400px|thumb|right|Figure 10: Method 1 Results]]

The Daltonization method provides a foundational approach for recoloring images to enhance visibility for individuals with CVD. Key takeaways from Figure 10 include:

* The method demonstrates significant improvements for protanopia, as seen in the first row, where the recolored images show clear color differences and high contrast. However, for deuteranopia, as shown in the second row, the recolored images exhibit less visible improvements, with lower contrast. This inconsistency highlights the method's limited ability to generalize across different types of CVD.
* The method does not account for severity levels or individual differences in CVD perception, which presents an opportunity for further improvement.
* While the recolored images achieve high contrast between confusing colors, the overall perception of the original image is not preserved. This reduction in naturalness may impact the aesthetic quality and recognizability of the image.
* Performance: this method is the fastest among the methods tested, as it relies solely on matrix transformations. This makes it computationally efficient and suitable for real-time applications.

The Daltonization method provides a baseline for recoloring but requires enhancements in flexibility, contrast optimization across CVD types, and personalization for varying severity levels.

2. '''Method 2: Optimizing Objective Functions''':
[[File:Method2.png|400px|thumb|right|Figure 11 Method 2 Results]]
* While this method aims to balance naturalness and contrast, the resulting recolored images are similar to the original ones. A possible reason for this is the sensitivity of the loss function to the beta parameter, which requires careful tuning.
* The recolored images exhibit some loss of fine details, likely due to the use of the k-means clustering algorithm, which simplifies color representation across the image.
* This algorithm has a very slow runtime, taking over one minute per image. The primary bottlenecks are the color clustering step and the optimization of the objective function, which can be improved significantly.
* Despite its limitations, this method introduces a flexible framework for customizing loss functions, enabling further improvements. This flexibility was leveraged to refine the method in subsequent methods.

3. '''Method 3: Adjustments Near Confusion Lines with Improved Method 2''':
[[File:Method3.png|400px|thumb|right|Figure 12 Method 3 Results]]
* This method produces recolored images with reasonable contrasts between confusing colors while preserving the naturalness of the image well. It can also account for varying severity levels for each CVD type, providing more personalized recoloring.
* The performance of the algorithm was improved significantly, reducing from over one minute to approximately 4 seconds per image.
* In the appendix, we included results with color plates, which commonly used for diagnosing color vision deficiencies, are included in the appendix. This method shows good results, with numbers becoming more easily visible in the CVD-simulated recolored images.
* Some limitations include the fact that this method sometimes lacks sufficient contrast, particularly for the deuteranopia type. It is also sensitive to parameters, such as the shift factor for colors near the confusion lines, which requires careful tuning.

4. '''Method 4: Improved with GMM-based Method''':
[[File:Method4.png|400px|thumb|right|Figure 13 Method 4 Results]]
* This method creates recolored images with very high contrast, making the colors in the images easily distinguishable, even for individuals with severe CVD.
* By using GMM-based clustering instead of k-means, this method preserves most of the image details. The more sophisticated clustering allows for a better representation of the original color distribution, reducing the loss of fine details.
* The runtime for this method is significantly faster than most others, taking only around 1 second per image. This makes it highly practical for real-time applications.
* While the method performs well in enhancing contrast, some recolored images lose the naturalness of the original images. Additionally, certain colors in the recolored images do not transition smoothly, which might be attributed to the clustering step in the process.

==== Quantitative Results ====
Below are some quantitative results from six metrics with the performance for each method:

{| class="wikitable"
|+ Table 1: Quantitative Evaluation Results for Mathematical Methods
! Original vs Recolored !! Method 1 !! Method 2 !! Method 3 !! Method 4
|-
| SSIM || 0.0066 || 0.9998 || 0.9988 || 0.9902
|-
| TCC || 0.4211 || 0.0001 || 0.0003 || 0.0005
|-
| CD ΔE76 || 57.4513 || 0.0217 || 0.0632 || 0.1057
|-
| CIEDE2000 || 41.2667 || 0.0229 || 0.0675 || 0.1312
|-
| CIEDE94 || 57.3637 || 0.0217 || 0.0630 || 0.1056
|-
| D-CIELAB ΔEab || 2.1314 || 3.8863 || 7.6867 || 8.0045
|-
| Time/image || 0.2s || 1m13s || 4.4s || 1.6s
|}

* SSIM: Measures structural similarity between images, combining luminance, contrast, and structure components. Computed using `torchmetrics.StructuralSimilarityIndexMeasure`.

* TCC: Evaluates changes in total color contrast, compares random pixel pairs in each image and calculates the difference in their color distances.

* D-CIELAB ΔEab: Quantifies perceptual color differences for dichromats under specific CVD types.

* CD ΔE76, CIEDE2000, CIEDE94: Standard perceptual color difference metrics, computed with scikit-image package. ΔE76 is basic Euclidean distance in Lab space, while CIEDE2000 and CIEDE94 include perceptual corrections.

Ovearll, method 4 stands out as the best-performing approach, delivering high contrast, preserving image details through GMM-based clustering, and achieving the fastest runtime, while addressing many limitations of the earlier methods.

=== Deep Learning based methods ===
The results focus on evaluating the performance of the above neural network architectures—Conditional Parallel RGB MLP, Deep U-Net, and Conditional Autoencoder. Quantitive metrics such as Structural Similarity Index (SSIM), total color contrast (TCC), Chromatic Difference (CD), and inference time were used to assess the effectiveness of the models provided in [1] and [2].

==== Qualitative Results ====
The recolored outputs were visually evaluated to determine their alignment with expected results. The 'expected' results for supervised mean how closely they resemble ground truth recolored image and for unsupervised method mean how much contrast and naturalness is observed in the CVD simulated recolored images compared to original.
The results and takeaways can be summarized as follows:

1. '''Conditional Parallel RGB MLP''': (Figure 5)
[[File:Mlp_res.png|right|400px|thumb|Figure 5 Conditional MLP: Model failure]]
* Recoloring was inconsistent, with visible artifacts in regions where spatial correlations were essential.
* The pixels seemed more discretized, suggesting that disentanglement was not very useful for this case (especially naturalness).
* Failed to preserve natural color transitions, particularly in complex images.
2. '''Conditional U-Net''': (Figure 6, 7)
[[File:Unet_res1.png|right|400px|thumb|Figure 6 Conditional U-Net: Model failure]]
[[File:Unet_res2.png|right|400px|thumb|Figure 7 Conditional U-Net: CVD Simulated examples]]
* Produced stable recoloring, preserving structural details.
* Initially showed improvement towards resembling ground truth, but over time started 'reconstructing' the colors of the original image.
* The CVD simulations of recolored versus original were similar or worse meaning that the model was not doing well for this task
* Sometimes it over-saturated some colors, affecting the visual appeal.
3. '''Conditional Autoencoder''': (Figure 8, 9)
[[File:ae_res1.png|right|400px|thumb|Figure 8 Conditional Autoencoder: Majority good results]]
[[File:ae_res1.png|right|400px|thumb|Figure 9 Conditional Autoencoder: Marginal or negative improvement + Blurriness]]
* Achieved smooth and natural recoloring, with fewer artifacts.
* Showed the highest contrast improvement among the three models.
* In some cases, hurt the contrast in the CVD simulated colors and in some there was marginal improvement in contrast.
* Blurriness in the recolored images was seen (possibly due to naturalness factor being more prioritized even though weight coefficients in the loss term favored contrast (alpha = 0.25, beta = 1.0)).

==== Quantitative Results ====
Based on the above qualitative results, we decided to score and evaluate metrics for comparison with related work only using the Conditional Autoencoder.
As mentioned above, the evaluation metrics are adapted from [1] and [2]. Please refer to the definitions in the paper, as we have used the same. On a high level, the three components are:
* SSIM: Measures the structural similarity between the original and recolored images, ensuring the structural integrity of the recolored image is maintained.
<math>
SSIM(X, Y) = \frac{(2\mu_X\mu_Y + c_1)(2\sigma_{XY} + c_2)}{(\mu_X^2 + \mu_Y^2 + c_1)(\sigma_X^2 + \sigma_Y^2 + c_2)}
</math>

* Total Color Contrast: Quantifies the visibility improvement between indistinguishable colors for CVD individuals.
<math>
TCC = \frac{1}{n_1} \sum_{(i,j) \in \Omega_1} |x_i - x_j|
+ \frac{1}{N \cdot n_2} \sum_{i=1}^{N} \sum_{j \in \Omega_2} |x_i - x_j|
</math>
* Chromatic Difference: Quantifies the perceptual differences in color before and after recoloring, ensuring enhanced distinguishability
<math>
CD(i) = \sqrt{\lambda (l_i' - l_i)^2 + (a_i' - a_i)^2 + (b_i' - b_i)^2}
</math>
(lamda is a constant, not wavelength and l,a,b represent LAB space coordinates of recolored (') and original respectively.)
* Inference Time: Determines the computational efficiency of the models.

The key results are in Table 2 and takeaways for the Conditional Autoencoder can be summarized as follows:

{| class="wikitable" style="text-align:center; width:40%; margin:auto;"
|-
! Metric
! Value
|-
| Inference Time
| 2.6 seconds/image
|-
| SSIM ("Structure")
| 0.8707
|-
| Total Color Contrast ("Distinguishability")
| 0.5771 (vs. ~0.851)*
|-
| Chromatic Difference ("Color")
| 0.3521 (vs. ~0.963)*
|+ '''Table 2: Quantitative Evaluation Results'''
|}

Note: * indicates results from paper [2] for protan/deutan whichever is larger.

* TCC and CD are good but not as good as paper [2] because they use optimize networks for each CVD type separately.
* Blurry (SSIM is not optimized for enough)
* Mixing CVD types in the same network needs to be more sophisticated

== Conclusions ==
Through our (many) experiments, we learned a couple of things:

1. '''Model Effectiveness''':
Among the models, the Conditional Autoencoder showed the best balance between enhancing color contrast and preserving naturalness. It improved the distinguishability of colors for CVD individuals while maintaining a smooth, visually appealing output. However, it produced slightly blurry images, which could be improved with better loss functions or refinement techniques. The Conditional U-Net was also effective in preserving structure and providing stable recoloring, but it required careful training to avoid overfitting. The Conditional Parallel RGB MLP, while computationally fast, lacked the ability to capture spatial relationships between pixels, making it unsuitable for this task.

2. '''Importance of Loss Functions''':
Designing appropriate loss functions was crucial for achieving the right balance between naturalness, contrast enhancement, and structural preservation. The global and local contrast losses significantly improved the visibility of recolored images, while the naturalness loss ensured that the outputs did not look artificial. Incorporating metrics like SSIM and Chromatic Difference into the evaluation also helped us better understand how well the models performed.

3. '''Challenges with Data''':
One of the biggest challenges was ensuring that the dataset effectively represented real-world scenarios for CVD individuals. Simulating CVD perceptions and generating recolored images that matched those perceptions required a well-defined pipeline. A more diverse dataset or additional user studies with CVD participants could help fine-tune the models further.

4. '''Computational Efficiency''':
While models like the Conditional Autoencoder and Conditional U-Net provided high-quality recoloring, their inference times were moderate, making them feasible for real-time applications. Optimizing these models further could make them more scalable for real-world use cases, such as accessibility tools in apps or websites.

5. '''What Worked and What Didn’t''':
* Worked: Contrast enhancement methods using local and global losses were effective in improving visibility for CVD individuals. Transformer-inspired loss functions borrowed from Swin architecture added robustness.
* Didn’t Work: Pixel-wise methods like the Conditional RGB MLP struggled due to their inability to handle spatial dependencies. Additionally, overfitting was a recurring issue in larger architectures without careful training.

6. '''Future Directions''':
* Better Loss Functions: Refining the loss functions to address issues like blurriness in outputs could further improve results.
* User Studies: Testing the models with real CVD participants would provide valuable insights and help validate the results.
* Model Optimization: Reducing the computational cost of high-performing models like the Conditional Autoencoder could make them more practical for deployment.
* Exploration of New Architectures: Trying newer methods, such as lightweight transformers or diffusion-based models, might enhance recoloring performance while maintaining efficiency.

While there’s still room for improvement, our models demonstrated the potential of deep learning in addressing the challenges faced by individuals with CVD. Our future work would focus on refining these methods and bringing them closer to practical, everyday applications.

== References ==
[1] Li, H., Zhang, L., Zhang, X., Zhang, M., Zhu, G., Shen, P., ... & Shah, S. A. A. (2020). Color vision deficiency datasets & recoloring evaluation using GANs. Multimedia Tools and Applications, 79, 27583-27614.

[2] Chen, L., Zhu, Z., Huang, W., Go, K., Chen, X., & Mao, X. (2024). Image recoloring for color vision deficiency compensation using Swin transformer. Neural Computing and Applications, 36(11), 6051-6066.

[3] Jiang, S., Liu, D., Li, D., & Xu, C. (2023). Personalized image generation for color vision deficiency population. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22571-22580).

[4] Huang, J.-B., Chen, C.-S., Jen, T.-C., & Wang, S.-J. (n.d.). Image recolorization for the colorblind [GitHub repository]. Retrieved December 12, 2024, from https://github.com/jbhuang0604/RecolorForColorblind

[5] Dietrich, J. (n.d.). Daltonize Python Package [GitHub repository]. Retrieved December 12, 2024, from https://github.com/joergdietrich/daltonize/blob/main/daltonize/daltonize.py

[6] Dougherty, B., & Wade, A. (2000). Vischeck. Retrieved December 12, 2024, from https://www.vischeck.com/

[7] Brettel, H., Viénot, F., & Mollon, J. D. (1997). Computerized simulation of color appearance for dichromats. Josa a, 14(10), 2647-2655.

[8] Zhu, Z., Toyoura, M., Go, K., Fujishiro, I., Kashiwagi, K., & Mao, X. (2019). Processing images for red–green dichromats compensation via naturalness and information-preservation considered recoloring. The Visual Computer, 35, 1053-1066.

[9] Zhu, Z., Toyoura, M., Go, K., Kashiwagi, K., Fujishiro, I., Wong, T. T., & Mao, X. (2021). Personalized image recoloring for color vision deficiency compensation. IEEE Transactions on Multimedia, 24, 1721-1734.

[10] Tsekouras, G. E., Rigos, A., Chatzistamatis, S., Tsimikas, J., Kotis, K., Caridakis, G., & Anagnostopoulos, C. N. (2021). A novel approach to image recoloring for color vision deficiency. Sensors, 21(8), 2740.

[11] Huang, J. B., Chen, C. S., Jen, T. C., & Wang, S. J. (2009, April). Image recolorization for the colorblind. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 1161-1164). IEEE.

[12] Color-Blindness.com. (n.d.). COBLIS - Color Blindness Simulator. Retrieved December 13, 2024, from https://www.color-blindness.com/coblis-color-blindness-simulator/

[13] DaltonLens. (n.d.). DaltonLens-Python [Computer software]. GitHub. Retrieved December 13, 2024, from https://github.com/DaltonLens/DaltonLens-Python

== Appendix I ==
* [https://github.com/rainasong/psych221-aut24-final-project.git Code]
* [https://drive.google.com/drive/folders/10WMXPbtpV7Hy5_qBA_TCEbW-kCpj1D7v Dataset]

=== Additional results ===
1. '''Recolored Images - Conditional Autoencoder'''
<div style="text-align: center;">
<div style="display: inline-block; vertical-align: middle;">
[[File:eb_1.png|400px|Wikipedia encyclopedia]]
</div>
<div style="display: inline-block; vertical-align: middle;">
[[File:eb_2.png|400px]]
</div>
</div>

2. '''Loss curves'''
<div style="text-align: center;">
<div style="display: inline-block; vertical-align: middle;">
[[File:loss_ae.png|350px|thumb|Conditional Autoencoder]]
</div>
<div style="display: inline-block; vertical-align: middle;">
[[File:loss_unet.png|350px|thumb|Conditional U-Net]]
</div>
<div style="display: inline-block; vertical-align: middle;">
[[File:loss_mlp.png|350px|thumb|Conditional MLP]]
</div>
<div style="clear: both; text-align: center;">
Losses: Conditional Autoencoder, Conditional U-Net, and Conditional MLP
</div>
</div>

3. '''Mathematical method results with color plates'''

<div style="text-align: center;">
<div style="display: inline-block; vertical-align: middle;">
[[File:Method1-color-plates.png|400px|thumb|Method 1 Color Plates Results]]
</div>
<div style="display: inline-block; vertical-align: middle;">
[[File:Method2-color-plates.png|400px|thumb|Method 2 Color Plates Results]]
</div>
</div>

<div style="text-align: center;">
<gallery mode="nolines" widths="400px" heights="300px" caption="Method 3 Color Plates Results for Protanopia, Deuteranopia, and Tritanopia with Severity Levels">
File:Method3-protan.png|Protanopia
File:Method3-deutan.png|Deuteranopia
File:Method3-tritan.png|Tritanopia
</gallery>
</div>

<div style="text-align: center;">
<gallery mode="nolines" widths="400px" heights="300px" caption="Method 4 Color Plates Results for Protanopia, Deuteranopia, and Tritanopia with Severity Levels">
File:Method4-protan.png|Protanopia
File:Method4-deutan.png|Deuteranopia
File:Method4-tritan.png|Tritanopia
</gallery>
</div>

== Appendix II ==
'''Ishikaa''':
* Training, evaluation and visualization for all deep learning methods (MLP, U-Net and Autoencoder)
* GMM recoloring method in Python & adding severity index
* 'Ground Truth' dataset creation and logging
* AWS Compute setup & configuration
* Written Report & Presentation

'''Raina''':
* Researching, writing and running scripts for four (and more) mathematical-based methods (Daltonization, Optimization-based, Confusion lines based, GMM based and some other experiments such as a segmentation-based method which was discarded due to slow performance)
* Results generation and validation for all scripts written
* Evaluation metrics scripts for mathematical methods
* Written Report & Presentation

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-13T13:20:16Z

Rainas: /* Mathematical based methods */

== Introduction ==
Color Vision Deficiency (CVD) affects approximately 350 million individuals worldwide, impairing their ability to distinguish certain colors. Image recoloring for individuals with CVDs has been a well-researched area, with numerous attempts aimed at creating images that make colors more distinguishable for those with CVDs. CVDs are typically classified into three main types: protanopia (difficulty perceiving red), deuteranopia (difficulty perceiving green), and tritanopia (difficulty perceiving blue). These conditions arise due to the absence or malfunction of certain types of cone cells in the retina, which are responsible for color vision. For instance, the lack or defect of red or green cones leads to protanopia or deuteranopia, respectively. This can result in partial or complete loss of perception of specific colors. Moreover, individuals with CVD can be dichromats (completely missing one type of cone cell), or anomalous trichromats (having all three types of cones but with altered sensitivity), causing milder color perception issues. Most research has focused on protanopia and deuteranopia, as these conditions are more prevalent, and only a few consider different severity levels.

In this work, we aim to consider all three types of CVDs, taking into account varying levels of severity for personalization. We explore existing methods in the field and experiment through two main approaches: mathematical transformations and deep learning techniques. We will start by reviewing current advancements in these two domains, followed by presenting our experiments and results. Evaluations of each method will be provided, leading to a discussion of our findings and outlining potential directions for future work.

The motivation for this work arose from a personal experience. While admiring a rainbow, a phenomenon I find among the most beautiful in the world, with a friend who has deuteranopia, I realized that they were unable to distinguish the vibrant array of colors. This experience highlighted the emotional and perceptual gap caused by CVD, inspiring the goal of this project: to develop personalized and efficient tools that enhance color perception for individuals with CVDs. Ultimately, we aim to enable those with CVDs to enjoy the same vivid experiences, such as the beauty of a rainbow, experienced by those with normal color vision.

== Background ==
In recent years, numerous methods have been developed to recolor images for individuals with CVDs, ranging from traditional mathematical approaches to advanced deep learning techniques. This section focuses on the prominent recent works in these two categories.

=== Mathematical-based methods ===
Mathematical approaches to image recoloring for individuals with CVDs have been extensively developed to enhance color discrimination while trying to preserve the natural appearance of images. These methods typically involve color space transformations, optimization techniques, and perceptual modeling to achieve their objectives.

==== Daltonization ====
Daltonization enhances images for individuals with CVD by correcting colors based on the simulated deficiency. The process involves comparing the original LMS values with the simulated deficient values to compute the error:
<math display="block">
\text{Error}_{\text{LMS}} = \text{LMS}_{\text{original}} - \text{LMS}_{\text{simulated}}
</math>

The error is then mapped back to the RGB space using a correction matrix because the error contains the information that dichromats cannot see, and the correction matrix rotates it to a part of the spectrum that they can see. For example, the correction matrix, as implemented in tools like Daltonize [5] and Vischeck [6], is:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

The corrected RGB values are added back to the original LMS values to generate a daltonized image that improves contrast for CVD viewers.

==== Optimization-based Method ====
Zhu et al. [8] introduced an optimization-based recoloring framework for red-green dichromacy, aiming to balance naturalness and contrast. The framework minimizes a total loss function defined as:

<math display="block"> E = \beta E_{\text{nat}} + E_{\text{cont}} </math>

where <math>\beta</math> is a scalar weight that controls the trade-off between the two objectives: naturalness preservation (<math>E_{\text{nat}}</math>) and contrast enhancement (<math>E_{\text{cont}}</math>).

The naturalness term, <math>E_{\text{nat}}</math>, ensures that the recolored image closely resembles the original image for CVD viewers by minimizing perceptual differences:

<math display="block"> E_{\text{nat}} = \sum_{i=1}^N \| c_i^+ - c_i \|^2, </math>

where:
* <math>N</math> is the total number of pixels in the image,
* <math>c_i</math> is the original color of the <math>i</math>-th pixel,
* <math>c_i^+</math> is the recolored value of the <math>i</math>-th pixel,
* <math>\| c_i^+ - c_i \|</math> is the Euclidean distance, measuring the perceptual difference between the original and recolored colors.

The contrast term, <math>E_{\text{cont}}</math>, enhances the distinguishability of colors in the recolored image by minimizing changes in color contrast:

<math display="block"> E_{\text{cont}} = \sum_{i \neq j} \| (c_i^+ - c_j^+) - (c_i - c_j) \|^2, </math>

where:
* <math>(c_i^+ - c_j^+)</math> is the perceived color difference between pixels <math>i</math> and <math>j</math> after recoloring,
* <math>(c_i - c_j)</math> is the original color difference,
* <math>\| (c_i^+ - c_j^+) - (c_i - c_j) \|</math> represents the deviation in color contrast before and after recoloring.

To address the limitations of this approach, Zhu et al. [9] proposed a degree-adaptable framework incorporating a transformation matrix <math>T</math> that simulates CVD perception. The transformation matrix is defined as:

<math display="block"> T = \begin{bmatrix} t_{11} & t_{12} & t_{13} \\ t_{21} & t_{22} & t_{23} \\ t_{31} & t_{32} & t_{33} \end{bmatrix}, </math>

where <math>t_{ij}</math> are the elements representing the relationships between the original and perceived LMS (Long, Medium, Short wavelength) cone responses for individuals with CVD.

The degree-adaptable loss function extends the optimization by adjusting weights based on perceptual importance, defined as:

<math display="block"> E = \beta \sum_{i=1}^N \alpha_i \| T(c_i^+ - c_i) \|^2 + \sum_{i \neq j} \| T(c_i^+ - c_j^+) - T(c_i - c_j) \|^2. </math>

Here:
* <math>\alpha_i</math> assigns weights to each pixel, prioritizing the preservation of colors with smaller perception errors,
* <math>\| T(c_i^+ - c_i) \|</math> measures the perceptual difference after recoloring,
* <math>\| T(c_i^+ - c_j^+) - T(c_i - c_j) \|</math> quantifies the deviation in color contrast under CVD simulation.

This framework improves both contrast and personalization but requires further optimization for real-time performance.

==== Confusion lines based Method ====
Tsekouras et al. [10] proposed a novel image recoloring approach for individuals with protanopia and deuteranopia, focusing on improving color naturalness and enhancing contrast. Their framework consists of four modules, with a key focus on shifting confusing colors along confusion lines in the CIE 1931 chromaticity diagram.

The process begins with fuzzy clustering, which identifies representative colors (key colors) from the input image. These key colors are then analyzed on the chromaticity diagram, where confusion lines—paths representing colors indistinguishable by individuals with CVD—serve as the basis for recoloring. Confusion lines are defined using the copunctal point of the missing cone type and another reference point:

<math display="block">
d(v, L) = \frac{\left|(x_{cp} - x_0)(y_0 - y_v) - (x_0 - x_v)(y_{cp} - y_0)\right|}{\sqrt{(x_{cp} - x_0)^2 + (y_{cp} - y_0)^2}},
</math>

where:
* <math display="inline">v = (x_v, y_v)</math> is the chromaticity coordinate of the color,
* <math display="inline">L</math> is the confusion line passing through the copunctal point <math display="inline">(x_{cp}, y_{cp})</math> and another reference point <math display="inline">(x_0, y_0)</math>,
* <math display="inline">d(v, L)</math> measures the perpendicular distance from the point <math display="inline">v</math> to the confusion line <math display="inline">L</math>.

Confusing colors, identified as key colors lying on occupied confusion lines, are iteratively shifted to the nearest non-occupied confusion lines to enhance discriminability for CVD viewers. High-ranking colors, determined by their prominence in image clusters, are shifted to the nearest unoccupied confusion lines. This reallocation ensures that these colors are distinguishable to viewers with CVD while minimizing disruption to the image's overall color harmony.

After shifting, the luminance of the recolored key colors is optimized using a regularized objective function to balance naturalness and contrast:
<math display="block">
E = (E_1 + E_2) + \lambda E_3,
</math>

where:
* <math display="inline">E</math> is the total loss,
* <math display="inline">\lambda</math> is a weight parameter controlling the trade-off between contrast enhancement and naturalness preservation.

The first term, <math display="inline">E_1</math>, measures contrast enhancement for normal trichromats:

<math display="block">
E_1 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - b_j\| - \|f_D(a_{i,\text{rec}}) - f_D(b_j)\| \right|,
</math>

where:
* <math display="inline">n_A</math> and <math display="inline">n_B</math> are the number of key colors in clusters <math display="inline">A</math> and <math display="inline">B</math>, respectively,
* <math display="inline">a_i</math> is the chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">b_j</math> is the chromaticity of the <math display="inline">j</math>-th key color in cluster <math display="inline">B</math>,
* <math display="inline">f_D</math> is a function simulating the dichromatic vision of individuals with color vision deficiencies,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color.

The second term, <math display="inline">E_2</math>, measures contrast enhancement for dichromats:

<math display="block">
E_2 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - a_j\| - \|f_D(a_{i,\text{rec}}) - f_D(a_{j,\text{rec}})\| \right|,
</math>

where:
* <math display="inline">a_i</math> and <math display="inline">a_j</math> are the chromaticities of the <math display="inline">i</math>-th and <math display="inline">j</math>-th key colors in cluster <math display="inline">A</math>,
* <math display="inline">f_D(a_{i,\text{rec}})</math> simulates the dichromatic perception of the recolored chromaticity <math display="inline">a_{i,\text{rec}}</math>.

The third term, <math display="inline">E_3</math>, preserves the naturalness of the recolored image:

<math display="block">
E_3 = \frac{1}{n_A} \sum_{i=1}^{n_A} \|a_i - a_{i,\text{rec}}\|,
</math>

where:
* <math display="inline">a_i</math> is the original chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">\|a_i - a_{i,\text{rec}}\|</math> is the Euclidean distance between the original and recolored chromaticities, measuring how much the naturalness is preserved.

This method significantly enhances the contrast and naturalness of recolored images by leveraging confusion line geometry and regularized optimization. However, challenges remain in achieving real-time performance and handling cases where shifting may distort the aesthetic quality of the image.

==== GMM-based Method ====
Huang et al. [11] proposed an efficient and effective re-coloring algorithm for individuals with CVD using a Gaussian Mixture Model (GMM) to represent color distributions. The algorithm comprises four main steps: feature extraction, clustering using GMM, optimization of Gaussian components, and interpolation for recoloring.

Step 1 - Feature Extraction:
Each pixel in the input image is represented in the CIEL*a*b* color space, which approximates perceptual differences using the Euclidean distance between colors. The color feature vector <math display="inline">x</math> is used as input for clustering.

Step 2 - Clustering via GMM:
The color distribution of the image is modeled using a GMM with <math display="inline">K</math> Gaussian components:
<math display="block">
p(x|\Theta) = \sum_{i=1}^K \omega_i G_i(x|\theta_i),
</math>
where:
* <math display="inline">\Theta</math> is the parameter set containing all weights, means, and covariance matrices,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian,
* <math display="inline">G_i(x|\theta_i)</math> is the 3D normal distribution with parameters <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix).

Step 3 - Optimization:
To ensure color distinguishability for CVD viewers, the algorithm adjusts the mean vector of each Gaussian component using an optimization function that preserves the symmetric Kullback-Leibler (KL) divergence:
<math display="block">
D_{sKL}(G_i, G_j) = D_{KL}(G_i \| G_j) + D_{KL}(G_j \| G_i),
</math>
where:
* <math display="inline">D_{KL}(G_i \| G_j)</math> measures the dissimilarity between two Gaussian distributions <math display="inline">G_i</math> and <math display="inline">G_j</math>.

Step 4 - Interpolation for Recoloring:
After optimizing the Gaussians, the mapping function <math display="inline">M_i(\cdot)</math> relocates the mean vectors while maintaining covariance matrices. Interpolation ensures smooth transitions between recolored regions:
<math display="block">
T(x_j)_H = x_j^H + \sum_{i=1}^K p(i|x_j, \Theta) (M_i(\mu_i)_H - \mu_i^H),
</math>
where:
* <math display="inline">T(x_j)_H</math> is the hue adjustment for the <math display="inline">j</math>-th color,
* <math display="inline">M_i(\mu_i)_H</math> is the mapped hue of the <math display="inline">i</math>-th Gaussian's mean.

While the GMM-based approach effectively models color distributions and enhances the contrast of recolored images significantly, it has limitations:
* The accuracy of recoloring depends on the choice of <math display="inline">K</math>, which may vary for different images.
* The method assumes diagonal covariance matrices for computational efficiency, which may oversimplify real-world color distributions. Sometimes the colors in the recolored images are not very natural.
* The high computational complexity in the optimization step of this algorithm may be difficult for real-time applications.

=== Deep Learning based methods ===
Conventional methods for recoloring, including optimization-based approaches (as discussed above), fail to generalize well across varying severity levels and CVD types. While these methods improve color differentiation, they frequently compromise naturalness or require extensive computational resources, making them less suitable for real-time, efficient, personalized applications.

==== GAN-Based Recoloring for CVD ====

In [1] GANs (Generative Adversarial Networks) was explored for recoloring, with a backbone Pix2Pix-GAN, Cycle-GAN, and Bicycle-GAN structure showing promising results. These models are generate creative recolored images by learning mappings between normal and CVD-affected color spaces. However, this and existing GAN approaches struggle with balancing naturalness and contrast. This specific reference also requires paired datasets (since it is adapted from style transfer), making it computationally intensive and less suitable for personalization.

==== Swin Transformer Recoloring ====

The authors in [2] introduced a hierarchical vision transformer (SWIN) architecture that processes images through shifted windows, effectively capturing both local and global contextual information. In computer vision, this design generally allows efficient handling of high-resolution images and has been applied to various tasks, including image classification and object detection. Despite its robust performance, this architecture is still computationally intensive and does not inherently account for the specific needs of CVD individuals, as it lacks mechanisms for personalized color adjustments.

==== Personalized CVD-GAN ====

To cater to the diverse needs of the CVD population, the Personalized CVD-GAN [3] was developed. This model generates images that are not only CVD-friendly but also tailored to individual degrees of color vision deficiency. By disentangling color representations using a unique triple-latent structure in their method, continuous personalization was possible to adjust images according to specific CVD severities. While effective, this approach is computationally demanding, making it less practical for real-time applications. In our experiment, it took around 18 days for one epoch (or one iteration over the entire dataset).

Thus, existing methods either lack personalization or are too resource-intensive for widespread use.

== Methods ==
We aim to find effective and efficient ways to recolor images for people with CVD with the personalization of different severity levels. We start by exploring existing methods and identifying opportunities for improvement. Since mathematical-based approaches provide a solid foundation and are well-documented, we began our experiments by testing these methods, as described in the background. We later extended our exploration to deep learning based methods.

=== Mathematical based ===
We explored four main methods, building on the foundational work discussed in the background section.

==== Method 1: Daltonization as a Baseline ====
We started with the relatively intuitive Daltonization method, where we adjusted the colors in an image to compensate for color vision deficiencies by simulating how the colors appear to individuals with CVD. This involves computing the difference between the original and simulated color perception in the LMS (Long, Medium, Short wavelength) color space. The calculated error is then corrected and mapped back to the RGB space using a transformation matrix, resulting in a recolored image that enhances color differentiation for viewers with CVD.

The simulation of CVDs relies on the physiology of human vision, particularly the responses of the Long (L), Medium (M), and Short (S) wavelength-sensitive cones in the retina. The LMS color space is derived from the spectral sensitivities of these cones, making it an ideal framework for modeling human color perception.

To simulate CVD, we first transformed colors in RGB color space into the LMS color space using the following linear transformation matrix based on Stockman and Sharpe’s cone fundamentals:
<math display="block">
T_{\text{RGB-to-LMS}} = \begin{bmatrix}
0.3904725 & 0.54990437 & 0.00890159 \\
0.07092586 & 0.96310739 & 0.00135809 \\
0.02314268 & 0.12801221 & 0.93605194
\end{bmatrix}
</math>

For individuals with CVD, the missing cone’s response is replaced by a weighted combination of the remaining two cones. This approach, introduced by Brettel, Viénot, and Mollon (1997) [7], uses specific coefficients derived from cone sensitivities. For example, in protanopia (L-cone deficiency), the L-cone response is approximated using the M- and S-cone responses as:
<math display="block">
L_{\text{simulated}} = 0 \cdot L + 0.90822864 \cdot M + 0.008192 \cdot S
</math>

For deuteranopia (M-cone deficiency), the M-cone is replaced as:
<math display="block">
M_{\text{simulated}} = 1.10104433 \cdot L + 0 \cdot M - 0.00901975 \cdot S
</math>

For tritanopia (S-cone deficiency), the S-cone is replaced as:
<math display="block">
S_{\text{simulated}} = -0.15773032 \cdot L + 1.19465634 \cdot M + 0 \cdot S
</math>

These transformations allow accurate simulation of the perceptual experience of individuals with CVD. (The numbers are derived from [5]).

The error between the original and simulated is then mapped into the RGB color space using a deficiency-specific correction matrix, which adjusts the image to enhance contrast and recover lost color differences. The predefined correction matrix is applied to the error in RGB space, transforming it back into LMS space for final adjustments. The corrected LMS values are added back to the original values, producing a recolored image that improves visual accessibility for viewers with CVD. This approach uses the Daltonize-inspired correction matrix:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

==== Method 2: Optimizing Objective Function ====
To improve the results from the Daltonization method, we designed a framework inspired by methods discussed in the background, incorporating dominant color extraction, optimization-based recoloring, and edit propagation. This approach aims to find a balance between the naturalness and contrast while compensating colors that are not visible for corresponding CVD types.

===== 1. Extraction of Dominant Colors =====
We begin by extracting the dominant colors from the input image using fuzzy clustering via a K-means algorithm. This step identifies a reduced set of representative colors that capture the primary color information in the image:
<math display="block">
\mathbf{C} = \{\mathbf{c}_1, \mathbf{c}_2, \ldots, \mathbf{c}_N\},
</math>
where <math display="inline">N</math> represents the number of clusters, and <math display="inline">\mathbf{c}_i</math> represents the centroid of the <math display="inline">i</math>-th cluster.

===== 2. Optimization-Based Recoloring =====
Once the dominant colors are extracted, we apply an optimization process to adjust these colors. The optimization uses the formulas mentioned in [9], and aims to balance two key objectives:

1. Naturalness Preservation: Ensures the recolored image minimally deviates from the original.
<math display="block">
E_{\text{nat}} = \sum_{i=1}^N \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_i^{\text{original}}) \|^2,
</math>
where <math display="inline">\mathbf{T}</math> is the transformation matrix based on the severity and type of CVD, and <math display="inline">\mathbf{c}_i^{\text{original}}</math> is the original color.

2. Contrast Enhancement: Improves the differentiation of colors for individuals with CVD:
<math display="block">
E_{\text{cont}} = \sum_{i=1}^N \sum_{j>i} \left( \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_j) \|^2 - \| \mathbf{c}_i^{\text{original}} - \mathbf{c}_j^{\text{original}} \|^2 \right)^2.
</math>

The total objective function combines these two terms:
<math display="block">
E = \beta E_{\text{nat}} + E_{\text{cont}},
</math>
where <math display="inline">\beta</math> controls the trade-off between naturalness and contrast.

Optimization is performed using the L-BFGS-B algorithm to ensure efficient convergence under bounded constraints.

The transformation matrices for each type of CVD are the following, which are based on [12]:

<div style="text-align:center;">
<math display="inline">
T_{\text{Protanopia}} = \begin{bmatrix} 0.566 & 0.558 & 0 \\ 0.433 & 0.442 & 0.242 \\ 0 & 0 & 0.758 \end{bmatrix},
</math>
<math display="inline">
T_{\text{Deuteranopia}} = \begin{bmatrix} 0.625 & 0.7 & 0 \\ 0.375 & 0.3 & 0.3 \\ 0 & 0 & 0.7 \end{bmatrix},
</math>
<math display="inline">
T_{\text{Tritanopia}} = \begin{bmatrix} 0.95 & 0 & 0 \\ 0.05 & 0.433 & 0 \\ 0 & 0.567 & 1 \end{bmatrix}.
</math>
</div>

===== 3. Edit Propagation =====
After optimizing the dominant colors, we propagate these edits across the entire image to ensure smooth transitions. This propagation step leverages the CIE-Lab color space, which is perceptually uniform, meaning that the Euclidean distance in this space correlates well with human color perception. The process begins by mapping the original image and the optimized dominant colors into the Lab color space. In this space, the differences between the original and recolored dominant colors are computed to capture the adjustments made during the optimization step:
<math display="block">
\Delta L^* = \text{griddata}(\mathbf{c}^{\text{original}}, \mathbf{c}^{\text{recolored}} - \mathbf{c}^{\text{original}}, \mathbf{I}),
</math>
where <math display="inline">\mathbf{I}</math> represents the pixel values in the Lab color space. Once the interpolated changes are computed, they are applied to the Lab representation of the original image. Finally, the adjusted Lab values are converted back to the RGB color space to reconstruct the recolored image.

==== Method 3: Improved with Confusion Line Adjustments ====
This method builds upon the previous method by introducing enhancements in dominant color extraction, optimization, and edit propagation, while incorporating an additional step to adjust colors near confusion lines in the CIE 1931 xyY color space inspired by [10]. These improvements aim to further enhance contrast and naturalness of the recolored images. Moreover, this method adds flexibility in adjusting for different severity levels for each CVD type.

===== 1. Improvements on Method 2 =====
To improve the performance of dominant color extraction, we transitioned from traditional K-means to MiniBatch K-means. This algorithm processes data in small batches, significantly reducing computational time while maintaining accuracy in clustering. The number of dominant colors was also reduced from 50 to 30 to focus on key representative colors and further enhance efficiency. The optimization objective is refined to leverage vectorization, improving computational efficiency. The two key terms remain:
<math display="block">
E = \beta E_{\text{nat}} + (1 - \beta) E_{\text{cont}}.
</math>
The optimization objective was refined to significantly improve computational efficiency by replacing the nested loops in the contrast enhancement term with vectorized operations. In the original implementation, the pairwise differences between colors were calculated iteratively using <math display="inline">O(N^2)</math> nested loops. The improved version eliminates this overhead by leveraging array broadcasting to compute all pairwise differences simultaneously, and the transformation matrix <math display="inline">\mathbf{T}</math> is then applied to all pairwise differences in a single tensor operation:
<math display="block">
\mathbf{T}_{\Delta} = \text{tensordot}(\Delta_{ij}, \mathbf{T}),
</math>
and the norms are computed in parallel across the entire array. Additionally, the weighting parameter <math display="inline">\beta</math> was adjusted to favor naturalness preservation, ensuring better visual integrity in the recolored image.
The propagation step changed to use a k-d tree for fast nearest neighbor searches, replacing grid-based interpolation. This approach more efficiently matches each pixel in the Lab color space to the closest dominant color:
<math display="block">
\mathbf{I}_{\text{adjusted}} = \mathbf{C}_{\text{recolored}}[\text{k-d tree query}(\mathbf{I})],
</math>
where <math display="inline">\mathbf{I}</math> represents the pixel values in Lab space.
These refinements enable faster optimization while improving the balance between naturalness and contrast enhancement.

===== 2. Confusion Line Adjustments =====
An additional step adjusts colors near confusion lines in the CIE 1931 xyY color space to enhance distinguishability:

1. Confusion lines are defined for protanopia, deuteranopia, and tritanopia, based on [10]. For example, for protanopia:
<math display="block">
\text{Confusion Line: Start} = (0.735, 0.265), \quad \text{End} = (0.115, 0.885).
</math>

2. Colors near the confusion line are identified using orthogonal distance:
<math display="block">
d(\mathbf{xy}, L) = \frac{\| (\mathbf{xy} - \mathbf{p}_1) \times (\mathbf{p}_2 - \mathbf{p}_1) \|}{\|\mathbf{p}_2 - \mathbf{p}_1\|},
</math>
where <math display="inline">\mathbf{p}_1</math> and <math display="inline">\mathbf{p}_2</math> are the start and end points of the confusion line.

3. Identified colors are shifted orthogonally away from the line:
<math display="block">
\mathbf{xy}_{\text{adjusted}} = \mathbf{xy} + \lambda \mathbf{v}_{\perp},
</math>
where <math display="inline">\mathbf{v}_{\perp}</math> is a perpendicular vector, and <math display="inline">\lambda</math> is a scaling factor.

===== 3. Personalise with Severity Levels =====
To take into account of severity levels, the transformation matrix <math display="inline">\mathbf{T}</math> linearly interpolates between normal vision and full CVD perception based on severity and type:
<math display="block">
\mathbf{T} = (1 - s) \mathbf{I} + s \mathbf{T}_{\text{CVD}},
</math>
where <math display="inline">s</math> represents the severity of CVD (0-100%), <math display="inline">\mathbf{I}</math> is the identity matrix, and <math display="inline">\mathbf{T}_{\text{CVD}}</math> is the full transformation matrix specific to protanopia, deuteranopia, or tritanopia. Such a method is based on DaltonLens simulator [13].

These improvements significantly enhanced both the effectiveness and efficiency of the recoloring process on top of Method 2.

==== Method 4: Improved with GMM-based Method ====
The last mathematical method we exprimented enhances recoloring by integrating a Gaussian Mixture Model (GMM)-based global recoloring algorithm. The method also applies nonlinear adjustments for colors near confusion lines to ensure improved contrast and naturalness.

===== 1. GMM-Based Global Recoloring =====
The image is first resized and transformed into the Lab color space. A GMM is applied to cluster the color distribution into <math display="inline">K</math> components, optimizing the number of clusters using the Bayesian Information Criterion (BIC):
<math display="block">
\text{BIC} = -2 \cdot \text{log-likelihood} + P \cdot \log(N),
</math>
where <math display="inline">P</math> represents the model parameters and <math display="inline">N</math> is the number of pixels.

The GMM means are simulated using the transformation matrix <math display="inline">T</math> with severity levels taken into account, and the symmetric Kullback-Leibler (KL) divergence (<math display="inline">D_{\text{sKL}}</math>) is calculated between pairs of clusters:
<math display="block">
D_{\text{sKL}}(i, j) = D_{\text{KL}}(G_i \| G_j) + D_{\text{KL}}(G_j \| G_i),
</math>
where <math display="inline">G_i</math> and <math display="inline">G_j</math> are Gaussian components, and <math display="inline">D_{\text{KL}}</math> represents the KL divergence. The GMM cluster means are then adjusted by solving a nonlinear least-squares problem to minimize the discrepancy.

===== 2. Adjusting Near Confusion Lines Improved =====
Following global recoloring, colors near confusion lines in the CIE 1931 xyY color space are further adjusted based on formulas used in Method 3. Nonlinear scaling is applied to amplify the shifts for pixels closer to the line:
<math display="block">
w = \left( \frac{\text{threshold} - d}{\text{threshold}} \right)^2,
</math>
where <math display="inline">w</math> is the scaling factor.

The adjustments from the GMM and confusion line steps are combined to produce the final recolored image. These enhancements make the method more robust and effective for individuals with varying levels of CVD.

=== Deep Learning based ===

==== Task Overview ====
Given an input RGB image and a label for the user (as shown in the figure), we want a deep learning model to output a recolored RGB image that is specific to that user. More details on inputs and outputs are discussed in further sections but an overview is shown in Figure 1. All of the code was done in Python using a deep learning framework called [https://pytorch.org PyTorch]
[[File:Io.png|right|thumb|200px|Figure 1: Dataset]]

==== Types ====
1. ''' Supervised methods ''':
These are deep learning models that require a 'ground truth' recolored image for the neural network to learn recolorization. While these methods are simple, easy to train and integrate the user label, they require an already present ground truth comparison of expected output.

2. ''' Unsupervised methods ''':
These models are trained without a ground truth and can also encode user label information while training. They are generally better at generating more natural images, but they require more compute and sophisticated model architectures or loss functions for the recoloring task

==== Dataset ====
The dataset used for this project was constructed specifically to address the challenges of recoloring images for individuals with color vision deficiency (CVD). We first gathered an open-source RGB image dataset from [2]. To improve the capability of the proposed model to enhance the contrast between CVD-indistinguishable color
pairs, in their study, they created a new dataset consisting of 141,000 pictures of both natural scenes and artificial images containing
CVD-confusing colors without labels. To generate labels (and ground truth recolored images for supervised methods), we randomly sampled 15,000 images and recolored by simulating random labels for severity and type of CVD. The recoloring for ground truth images was done using a [https://github.com/jbhuang0604/RecolorForColorblind/tree/master MATLAB script] (adapted to Python) from [4]. Note: The open-source tools used in the Python version for the recoloring script were [https://scikit-image.org Scikit-Image], [https://scipy.org Scipy] and [https://python-colormath.readthedocs.io/en/latest/ Colormath].

As shown in Figure 1, each sample in the dataset consists of:

1. ''' Original RGB Image''' : High-resolution images, resized to <code> 256x256</code> pixels and normalized to <code>[0,1]</code> range, representing the standard color space.

2. ''' CVD Labels ''' : Condition labels encoded as <code>severity * [protan, deutan]</code>, where severity ranges from 0.1 to 1.0. For example, a label <code>[0.6, 0]</code> corresponds to protanopia at 60% severity.

Data augmentation techniques such as random rotations, crops, and brightness adjustments were applied to expand the dataset, ensuring robust model generalization across diverse scenarios.

==== Supervised Methods ====
===== Conditional Parallel RGB MLP =====
[[File:mlp.png|right|thumb|Figure 2: Conditional MLP architecture]]
As shown in Figure 2, the model predicts the R, G, and B channels separately using an independent multi-layer perceptron (MLP) for each channel. The input image is concatenated with the label encoding along the channel dimension and is passed to 3 parallel MLPs simultaneously. These parallel networks are learned to predicted R, G, B channels of a recolored image based on given ground truth. The outputs from each of these networks are concatenated to produce the recolored RGB image of same spatial dimensions as input. Essentially, each channel is disentangled, enabling targeted adjustments.

The loss function used to train was pixel wise, mean-squared error loss:
<math>
\mathcal{L}_{\text{MSE}} = \frac{1}{N} \sum_{p=1}^{N} \left( I(p) - I'(p) \right)^2
</math>

Where:
* I, I': Recolored (model output) image and ground truth recolored image respectively
* p: Image index
* N: Total number of images

===== Conditional U-Net =====
In a similar fashion of inputs, a convolutional neural network (CNN)-based U-Net architecture was tested to generate a full recolored image as output. The conditional inputs here affect both the encoder and decoder. [[File:Unet condtional.png|right|thumb|Figure 3: Conditional U-Net architecture]]
U-Nets are widely used in computer vision tasks and are very robust to new tasks as well. The architecture we adopted is shown in Figure 3.
The loss function used to train the U-Net was a commonly used VGG Perceptual Loss:
<math>
\mathcal{L}_{\text{VGG}} = \sum_{l} \frac{1}{N_l} \| \phi_l(I) - \phi_l(I') \|_2^2
</math>

Where:
* I and I': are recolored (model output) and ground recolored images respectively
* <math>\phi_l</math> is the l-th of the pre-trained VGG network

==== Unsupervised Methods ====
===== Conditional Autoencoder =====
As shown in Figure4, an unsupervised CNN-based encoder-decoder network was trained to reconstruct full recolored images with a CVD-aware color palette. The key to making this network align with the recoloring task was the loss functions. The loss functions we used to train this network were inspired from [2]. [[File:Ae.png|right|350px|thumb|Figure 4: Conditional Autoencoder architecture]]

The total loss function is given by:
<math>
\mathcal{L}_{\text{total}} = \alpha \cdot \mathcal{L}_{\text{naturalness}} + 2 \cdot (1 - \alpha) \cdot \mathcal{L}_{\text{contrast}}
</math>

Where:
<math>
\mathcal{L}_{\text{contrast}} = \beta \cdot \mathcal{L}_{\text{global}} + (2 - \beta) \cdot \mathcal{L}_{\text{local}}
</math>

The components of the loss functions are described below:

1. '''Global Contrast Loss''':
The global contrast loss ensures that the overall contrast of the recolored image is preserved. It is defined as
<math>
\mathcal{L}_{global} = \frac{1}{\|\omega\|} \sum_{<x, y> \in \epsilon \omega} \text{CL}(x, y)
</math>

2. '''Local Contrast Loss''':
The local contrast loss focuses on preserving the contrast within a small neighborhood around each pixel. <math>
\mathcal{L}_{l} = \frac{1}{N} \sum_{x=1}^{N} \sum_{y \in \omega_x} \frac{\text{CL}(x, y)}{\|\omega_x\|}
</math>

Note:

<math>
\text{CL}(x, y) = \|\hat{c}_x' - \hat{c}_y'\| - \|c_x - c_y\|
</math>

* x,y: Two distinct pixels in the image.
* cx and cy: CVD simulated colors of original image
* c^x′and c^y: CVD simulated colors of recolored image (model output)
* ||w||: Global (or large) window of image
* ||wx||: Local window or neighborhood around a pixel x

3. '''Naturalness Loss''':
The naturalness loss drives output image to have colors that are visually similar and close to natural distributions. <math>
\mathcal{L}_{\text{natural}} = 1 - \text{SSIM}(I', I)
</math>

Where:
* I(i), I'(i): Original and recolored images respectively

== Results ==

=== Mathematical based methods ===

==== Qualitative Results ====
The qualitative results and key observations from the experiments are summarized below.

The result images presented in Figures 10 through 13 follow this sequence: the original image, the CVD-simulated version of the original image, the recolored image, and the CVD-simulated version of the recolored image. The CVD-simulated images demonstrate how the images are perceived by individuals with the corresponding type of CVD. The examples provided focus on protanopia (first row) and deuteranopia (second row) due to space constraints. Additional results for tritanopia and recolored images at varying severity levels are included in the appendix.

1. '''Method 1: Daltonization Baseline''':
[[File:Method1.png|400px|thumb|right|Figure 10: Method 1 Results]]

The Daltonization method provides a foundational approach for recoloring images to enhance visibility for individuals with CVD. Key takeaways from Figure 10 include:

* The method demonstrates significant improvements for protanopia, as seen in the first row, where the recolored images show clear color differences and high contrast. However, for deuteranopia, as shown in the second row, the recolored images exhibit less visible improvements, with lower contrast. This inconsistency highlights the method's limited ability to generalize across different types of CVD.
* The method does not account for severity levels or individual differences in CVD perception, which presents an opportunity for further improvement.
* While the recolored images achieve high contrast between confusing colors, the overall perception of the original image is not preserved. This reduction in naturalness may impact the aesthetic quality and recognizability of the image.
* Performance: this method is the fastest among the methods tested, as it relies solely on matrix transformations. This makes it computationally efficient and suitable for real-time applications.

The Daltonization method provides a baseline for recoloring but requires enhancements in flexibility, contrast optimization across CVD types, and personalization for varying severity levels.

2. '''Method 2: Optimizing Objective Functions''':
[[File:Method2.png|400px|thumb|right|Figure 11 Method 2 Results]]
* While this method aims to balance naturalness and contrast, the resulting recolored images are similar to the original ones. A possible reason for this is the sensitivity of the loss function to the beta parameter, which requires careful tuning.
* The recolored images exhibit some loss of fine details, likely due to the use of the k-means clustering algorithm, which simplifies color representation across the image.
* This algorithm has a very slow runtime, taking over one minute per image. The primary bottlenecks are the color clustering step and the optimization of the objective function, which can be improved significantly.
* Despite its limitations, this method introduces a flexible framework for customizing loss functions, enabling further improvements. This flexibility was leveraged to refine the method in subsequent methods.

3. '''Method 3: Adjustments Near Confusion Lines with Improved Method 2''':
[[File:Method3.png|400px|thumb|right|Figure 12 Method 3 Results]]
* This method produces recolored images with reasonable contrasts between confusing colors while preserving the naturalness of the image well. It can also account for varying severity levels for each CVD type, providing more personalized recoloring.
* The performance of the algorithm was improved significantly, reducing from over one minute to approximately 4 seconds per image.
* In the appendix, we included results with color plates, which commonly used for diagnosing color vision deficiencies, are included in the appendix. This method shows good results, with numbers becoming more easily visible in the CVD-simulated recolored images.
* Some limitations include the fact that this method sometimes lacks sufficient contrast, particularly for the deuteranopia type. It is also sensitive to parameters, such as the shift factor for colors near the confusion lines, which requires careful tuning.

4. '''Method 4: Improved with GMM-based Method''':
[[File:Method4.png|400px|thumb|right|Figure 13 Method 4 Results]]
* This method creates recolored images with very high contrast, making the colors in the images easily distinguishable, even for individuals with severe CVD.
* By using GMM-based clustering instead of k-means, this method preserves most of the image details. The more sophisticated clustering allows for a better representation of the original color distribution, reducing the loss of fine details.
* The runtime for this method is significantly faster than most others, taking only around 1 second per image. This makes it highly practical for real-time applications.
* While the method performs well in enhancing contrast, some recolored images lose the naturalness of the original images. Additionally, certain colors in the recolored images do not transition smoothly, which might be attributed to the clustering step in the process.

==== Quantitative Results ====
Below are some quantitative results from six metrics. Method 3 and Method 4 perform the best overall.

{| class="wikitable"
|+ Table 1: Quantitative Evaluation Results for Mathematical Methods
! Original vs Recolored !! Method 1 !! Method 2 !! Method 3 !! Method 4
|-
| SSIM || 0.0066 || 0.9998 || 0.9988 || 0.9902
|-
| TCC || 0.4211 || 0.0001 || 0.0003 || 0.0005
|-
| CD ΔE76 || 57.4513 || 0.0217 || 0.0632 || 0.1057
|-
| CIEDE2000 || 41.2667 || 0.0229 || 0.0675 || 0.1312
|-
| CIEDE94 || 57.3637 || 0.0217 || 0.0630 || 0.1056
|-
| D-CIELAB ΔEab || 2.1314 || 3.8863 || 7.6867 || 8.0045
|-
| Time/image || 0.2s || 1m13s || 4.4s || 1.6s
|}

* SSIM: Measures structural similarity between images, combining luminance, contrast, and structure components. Computed using `torchmetrics.StructuralSimilarityIndexMeasure`.

* TCC: Evaluates changes in total color contrast, compares random pixel pairs in each image and calculates the difference in their color distances.

* D-CIELAB ΔEab: Quantifies perceptual color differences for dichromats under specific CVD types.

* CD ΔE76, CIEDE2000, CIEDE94: Standard perceptual color difference metrics, computed with scikit-image package. ΔE76 is basic Euclidean distance in Lab space, while CIEDE2000 and CIEDE94 include perceptual corrections.

=== Deep Learning based methods ===
The results focus on evaluating the performance of the above neural network architectures—Conditional Parallel RGB MLP, Deep U-Net, and Conditional Autoencoder. Quantitive metrics such as Structural Similarity Index (SSIM), total color contrast (TCC), Chromatic Difference (CD), and inference time were used to assess the effectiveness of the models provided in [1] and [2].

==== Qualitative Results ====
The recolored outputs were visually evaluated to determine their alignment with expected results. The 'expected' results for supervised mean how closely they resemble ground truth recolored image and for unsupervised method mean how much contrast and naturalness is observed in the CVD simulated recolored images compared to original.
The results and takeaways can be summarized as follows:

1. '''Conditional Parallel RGB MLP''': (Figure 5)
[[File:Mlp_res.png|right|400px|thumb|Figure 5 Conditional MLP: Model failure]]
* Recoloring was inconsistent, with visible artifacts in regions where spatial correlations were essential.
* The pixels seemed more discretized, suggesting that disentanglement was not very useful for this case (especially naturalness).
* Failed to preserve natural color transitions, particularly in complex images.
2. '''Conditional U-Net''': (Figure 6, 7)
[[File:Unet_res1.png|right|400px|thumb|Figure 6 Conditional U-Net: Model failure]]
[[File:Unet_res2.png|right|400px|thumb|Figure 7 Conditional U-Net: CVD Simulated examples]]
* Produced stable recoloring, preserving structural details.
* Initially showed improvement towards resembling ground truth, but over time started 'reconstructing' the colors of the original image.
* The CVD simulations of recolored versus original were similar or worse meaning that the model was not doing well for this task
* Sometimes it over-saturated some colors, affecting the visual appeal.
3. '''Conditional Autoencoder''': (Figure 8, 9)
[[File:ae_res1.png|right|400px|thumb|Figure 8 Conditional Autoencoder: Majority good results]]
[[File:ae_res1.png|right|400px|thumb|Figure 9 Conditional Autoencoder: Marginal or negative improvement + Blurriness]]
* Achieved smooth and natural recoloring, with fewer artifacts.
* Showed the highest contrast improvement among the three models.
* In some cases, hurt the contrast in the CVD simulated colors and in some there was marginal improvement in contrast.
* Blurriness in the recolored images was seen (possibly due to naturalness factor being more prioritized even though weight coefficients in the loss term favored contrast (alpha = 0.25, beta = 1.0)).

==== Quantitative Results ====
Based on the above qualitative results, we decided to score and evaluate metrics for comparison with related work only using the Conditional Autoencoder.
As mentioned above, the evaluation metrics are adapted from [1] and [2]. Please refer to the definitions in the paper, as we have used the same. On a high level, the three components are:
* SSIM: Measures the structural similarity between the original and recolored images, ensuring the structural integrity of the recolored image is maintained.
<math>
SSIM(X, Y) = \frac{(2\mu_X\mu_Y + c_1)(2\sigma_{XY} + c_2)}{(\mu_X^2 + \mu_Y^2 + c_1)(\sigma_X^2 + \sigma_Y^2 + c_2)}
</math>

* Total Color Contrast: Quantifies the visibility improvement between indistinguishable colors for CVD individuals.
<math>
TCC = \frac{1}{n_1} \sum_{(i,j) \in \Omega_1} |x_i - x_j|
+ \frac{1}{N \cdot n_2} \sum_{i=1}^{N} \sum_{j \in \Omega_2} |x_i - x_j|
</math>
* Chromatic Difference: Quantifies the perceptual differences in color before and after recoloring, ensuring enhanced distinguishability
<math>
CD(i) = \sqrt{\lambda (l_i' - l_i)^2 + (a_i' - a_i)^2 + (b_i' - b_i)^2}
</math>
(lamda is a constant, not wavelength and l,a,b represent LAB space coordinates of recolored (') and original respectively.)
* Inference Time: Determines the computational efficiency of the models.

The key results are in Table 2 and takeaways for the Conditional Autoencoder can be summarized as follows:

{| class="wikitable" style="text-align:center; width:40%; margin:auto;"
|-
! Metric
! Value
|-
| Inference Time
| 2.6 seconds/image
|-
| SSIM ("Structure")
| 0.8707
|-
| Total Color Contrast ("Distinguishability")
| 0.5771 (vs. ~0.851)*
|-
| Chromatic Difference ("Color")
| 0.3521 (vs. ~0.963)*
|+ '''Table 2: Quantitative Evaluation Results'''
|}

Note: * indicates results from paper [2] for protan/deutan whichever is larger.

* TCC and CD are good but not as good as paper [2] because they use optimize networks for each CVD type separately.
* Blurry (SSIM is not optimized for enough)
* Mixing CVD types in the same network needs to be more sophisticated

== Conclusions ==
Through our (many) experiments, we learned a couple of things:

1. '''Model Effectiveness''':
Among the models, the Conditional Autoencoder showed the best balance between enhancing color contrast and preserving naturalness. It improved the distinguishability of colors for CVD individuals while maintaining a smooth, visually appealing output. However, it produced slightly blurry images, which could be improved with better loss functions or refinement techniques. The Conditional U-Net was also effective in preserving structure and providing stable recoloring, but it required careful training to avoid overfitting. The Conditional Parallel RGB MLP, while computationally fast, lacked the ability to capture spatial relationships between pixels, making it unsuitable for this task.

2. '''Importance of Loss Functions''':
Designing appropriate loss functions was crucial for achieving the right balance between naturalness, contrast enhancement, and structural preservation. The global and local contrast losses significantly improved the visibility of recolored images, while the naturalness loss ensured that the outputs did not look artificial. Incorporating metrics like SSIM and Chromatic Difference into the evaluation also helped us better understand how well the models performed.

3. '''Challenges with Data''':
One of the biggest challenges was ensuring that the dataset effectively represented real-world scenarios for CVD individuals. Simulating CVD perceptions and generating recolored images that matched those perceptions required a well-defined pipeline. A more diverse dataset or additional user studies with CVD participants could help fine-tune the models further.

4. '''Computational Efficiency''':
While models like the Conditional Autoencoder and Conditional U-Net provided high-quality recoloring, their inference times were moderate, making them feasible for real-time applications. Optimizing these models further could make them more scalable for real-world use cases, such as accessibility tools in apps or websites.

5. '''What Worked and What Didn’t''':
* Worked: Contrast enhancement methods using local and global losses were effective in improving visibility for CVD individuals. Transformer-inspired loss functions borrowed from Swin architecture added robustness.
* Didn’t Work: Pixel-wise methods like the Conditional RGB MLP struggled due to their inability to handle spatial dependencies. Additionally, overfitting was a recurring issue in larger architectures without careful training.

6. '''Future Directions''':
* Better Loss Functions: Refining the loss functions to address issues like blurriness in outputs could further improve results.
* User Studies: Testing the models with real CVD participants would provide valuable insights and help validate the results.
* Model Optimization: Reducing the computational cost of high-performing models like the Conditional Autoencoder could make them more practical for deployment.
* Exploration of New Architectures: Trying newer methods, such as lightweight transformers or diffusion-based models, might enhance recoloring performance while maintaining efficiency.

While there’s still room for improvement, our models demonstrated the potential of deep learning in addressing the challenges faced by individuals with CVD. Our future work would focus on refining these methods and bringing them closer to practical, everyday applications.

== References ==
[1] Li, H., Zhang, L., Zhang, X., Zhang, M., Zhu, G., Shen, P., ... & Shah, S. A. A. (2020). Color vision deficiency datasets & recoloring evaluation using GANs. Multimedia Tools and Applications, 79, 27583-27614.

[2] Chen, L., Zhu, Z., Huang, W., Go, K., Chen, X., & Mao, X. (2024). Image recoloring for color vision deficiency compensation using Swin transformer. Neural Computing and Applications, 36(11), 6051-6066.

[3] Jiang, S., Liu, D., Li, D., & Xu, C. (2023). Personalized image generation for color vision deficiency population. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22571-22580).

[4] Huang, J.-B., Chen, C.-S., Jen, T.-C., & Wang, S.-J. (n.d.). Image recolorization for the colorblind [GitHub repository]. Retrieved December 12, 2024, from https://github.com/jbhuang0604/RecolorForColorblind

[5] Dietrich, J. (n.d.). Daltonize Python Package [GitHub repository]. Retrieved December 12, 2024, from https://github.com/joergdietrich/daltonize/blob/main/daltonize/daltonize.py

[6] Dougherty, B., & Wade, A. (2000). Vischeck. Retrieved December 12, 2024, from https://www.vischeck.com/

[7] Brettel, H., Viénot, F., & Mollon, J. D. (1997). Computerized simulation of color appearance for dichromats. Josa a, 14(10), 2647-2655.

[8] Zhu, Z., Toyoura, M., Go, K., Fujishiro, I., Kashiwagi, K., & Mao, X. (2019). Processing images for red–green dichromats compensation via naturalness and information-preservation considered recoloring. The Visual Computer, 35, 1053-1066.

[9] Zhu, Z., Toyoura, M., Go, K., Kashiwagi, K., Fujishiro, I., Wong, T. T., & Mao, X. (2021). Personalized image recoloring for color vision deficiency compensation. IEEE Transactions on Multimedia, 24, 1721-1734.

[10] Tsekouras, G. E., Rigos, A., Chatzistamatis, S., Tsimikas, J., Kotis, K., Caridakis, G., & Anagnostopoulos, C. N. (2021). A novel approach to image recoloring for color vision deficiency. Sensors, 21(8), 2740.

[11] Huang, J. B., Chen, C. S., Jen, T. C., & Wang, S. J. (2009, April). Image recolorization for the colorblind. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 1161-1164). IEEE.

[12] Color-Blindness.com. (n.d.). COBLIS - Color Blindness Simulator. Retrieved December 13, 2024, from https://www.color-blindness.com/coblis-color-blindness-simulator/

[13] DaltonLens. (n.d.). DaltonLens-Python [Computer software]. GitHub. Retrieved December 13, 2024, from https://github.com/DaltonLens/DaltonLens-Python

== Appendix I ==
* [https://github.com/rainasong/psych221-aut24-final-project.git Code]
* [https://drive.google.com/drive/folders/10WMXPbtpV7Hy5_qBA_TCEbW-kCpj1D7v Dataset]

=== Additional results ===
1. '''Recolored Images - Conditional Autoencoder'''
<div style="text-align: center;">
<div style="display: inline-block; vertical-align: middle;">
[[File:eb_1.png|400px|Wikipedia encyclopedia]]
</div>
<div style="display: inline-block; vertical-align: middle;">
[[File:eb_2.png|400px]]
</div>
</div>

2. '''Loss curves'''
<div style="text-align: center;">
<div style="display: inline-block; vertical-align: middle;">
[[File:loss_ae.png|350px|thumb|Conditional Autoencoder]]
</div>
<div style="display: inline-block; vertical-align: middle;">
[[File:loss_unet.png|350px|thumb|Conditional U-Net]]
</div>
<div style="display: inline-block; vertical-align: middle;">
[[File:loss_mlp.png|350px|thumb|Conditional MLP]]
</div>
<div style="clear: both; text-align: center;">
Losses: Conditional Autoencoder, Conditional U-Net, and Conditional MLP
</div>
</div>

3. '''Mathematical method results with color plates'''

<div style="text-align: center;">
<div style="display: inline-block; vertical-align: middle;">
[[File:Method1-color-plates.png|400px|thumb|Method 1 Color Plates Results]]
</div>
<div style="display: inline-block; vertical-align: middle;">
[[File:Method2-color-plates.png|400px|thumb|Method 2 Color Plates Results]]
</div>
</div>

<div style="text-align: center;">
<gallery mode="nolines" widths="400px" heights="300px" caption="Method 3 Color Plates Results for Protanopia, Deuteranopia, and Tritanopia with Severity Levels">
File:Method3-protan.png|Protanopia
File:Method3-deutan.png|Deuteranopia
File:Method3-tritan.png|Tritanopia
</gallery>
</div>

<div style="text-align: center;">
<gallery mode="nolines" widths="400px" heights="300px" caption="Method 4 Color Plates Results for Protanopia, Deuteranopia, and Tritanopia with Severity Levels">
File:Method4-protan.png|Protanopia
File:Method4-deutan.png|Deuteranopia
File:Method4-tritan.png|Tritanopia
</gallery>
</div>

== Appendix II ==
'''Ishikaa''':
* Training, evaluation and visualization for all deep learning methods (MLP, U-Net and Autoencoder)
* GMM recoloring method in Python & adding severity index
* 'Ground Truth' dataset creation and logging
* AWS Compute setup & configuration
* Written Report & Presentation

'''Raina''':
* Researching, writing and running scripts for four (and more) mathematical-based methods (Daltonization, Optimization-based, Confusion lines based, GMM based and some other experiments such as a segmentation-based method which was discarded due to slow performance)
* Results generation and validation for all scripts written
* Evaluation metrics scripts for mathematical methods
* Written Report & Presentation

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-13T12:21:59Z

Rainas: /* Appendix II */

== Introduction ==
Color Vision Deficiency (CVD) affects approximately 350 million individuals worldwide, impairing their ability to distinguish certain colors. Image recoloring for individuals with CVDs has been a well-researched area, with numerous attempts aimed at creating images that make colors more distinguishable for those with CVDs. CVDs are typically classified into three main types: protanopia (difficulty perceiving red), deuteranopia (difficulty perceiving green), and tritanopia (difficulty perceiving blue). These conditions arise due to the absence or malfunction of certain types of cone cells in the retina, which are responsible for color vision. For instance, the lack or defect of red or green cones leads to protanopia or deuteranopia, respectively. This can result in partial or complete loss of perception of specific colors. Moreover, individuals with CVD can be dichromats (completely missing one type of cone cell), or anomalous trichromats (having all three types of cones but with altered sensitivity), causing milder color perception issues. Most research has focused on protanopia and deuteranopia, as these conditions are more prevalent, and only a few consider different severity levels.

In this work, we aim to consider all three types of CVDs, taking into account varying levels of severity for personalization. We explore existing methods in the field and experiment through two main approaches: mathematical transformations and deep learning techniques. We will start by reviewing current advancements in these two domains, followed by presenting our experiments and results. Evaluations of each method will be provided, leading to a discussion of our findings and outlining potential directions for future work.

The motivation for this work arose from a personal experience. While admiring a rainbow, a phenomenon I find among the most beautiful in the world, with a friend who has deuteranopia, I realized that they were unable to distinguish the vibrant array of colors. This experience highlighted the emotional and perceptual gap caused by CVD, inspiring the goal of this project: to develop personalized and efficient tools that enhance color perception for individuals with CVDs. Ultimately, we aim to enable those with CVDs to enjoy the same vivid experiences, such as the beauty of a rainbow, experienced by those with normal color vision.

== Background ==
In recent years, numerous methods have been developed to recolor images for individuals with CVDs, ranging from traditional mathematical approaches to advanced deep learning techniques. This section focuses on the prominent recent works in these two categories.

=== Mathematical-based methods ===
Mathematical approaches to image recoloring for individuals with CVDs have been extensively developed to enhance color discrimination while trying to preserve the natural appearance of images. These methods typically involve color space transformations, optimization techniques, and perceptual modeling to achieve their objectives.

==== Daltonization ====
Daltonization enhances images for individuals with CVD by correcting colors based on the simulated deficiency. The process involves comparing the original LMS values with the simulated deficient values to compute the error:
<math display="block">
\text{Error}_{\text{LMS}} = \text{LMS}_{\text{original}} - \text{LMS}_{\text{simulated}}
</math>

The error is then mapped back to the RGB space using a correction matrix because the error contains the information that dichromats cannot see, and the correction matrix rotates it to a part of the spectrum that they can see. For example, the correction matrix, as implemented in tools like Daltonize [5] and Vischeck [6], is:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

The corrected RGB values are added back to the original LMS values to generate a daltonized image that improves contrast for CVD viewers.

==== Optimization-based Method ====
Zhu et al. [8] introduced an optimization-based recoloring framework for red-green dichromacy, aiming to balance naturalness and contrast. The framework minimizes a total loss function defined as:

<math display="block"> E = \beta E_{\text{nat}} + E_{\text{cont}} </math>

where <math>\beta</math> is a scalar weight that controls the trade-off between the two objectives: naturalness preservation (<math>E_{\text{nat}}</math>) and contrast enhancement (<math>E_{\text{cont}}</math>).

The naturalness term, <math>E_{\text{nat}}</math>, ensures that the recolored image closely resembles the original image for CVD viewers by minimizing perceptual differences:

<math display="block"> E_{\text{nat}} = \sum_{i=1}^N \| c_i^+ - c_i \|^2, </math>

where:
* <math>N</math> is the total number of pixels in the image,
* <math>c_i</math> is the original color of the <math>i</math>-th pixel,
* <math>c_i^+</math> is the recolored value of the <math>i</math>-th pixel,
* <math>\| c_i^+ - c_i \|</math> is the Euclidean distance, measuring the perceptual difference between the original and recolored colors.

The contrast term, <math>E_{\text{cont}}</math>, enhances the distinguishability of colors in the recolored image by minimizing changes in color contrast:

<math display="block"> E_{\text{cont}} = \sum_{i \neq j} \| (c_i^+ - c_j^+) - (c_i - c_j) \|^2, </math>

where:
* <math>(c_i^+ - c_j^+)</math> is the perceived color difference between pixels <math>i</math> and <math>j</math> after recoloring,
* <math>(c_i - c_j)</math> is the original color difference,
* <math>\| (c_i^+ - c_j^+) - (c_i - c_j) \|</math> represents the deviation in color contrast before and after recoloring.

To address the limitations of this approach, Zhu et al. [9] proposed a degree-adaptable framework incorporating a transformation matrix <math>T</math> that simulates CVD perception. The transformation matrix is defined as:

<math display="block"> T = \begin{bmatrix} t_{11} & t_{12} & t_{13} \\ t_{21} & t_{22} & t_{23} \\ t_{31} & t_{32} & t_{33} \end{bmatrix}, </math>

where <math>t_{ij}</math> are the elements representing the relationships between the original and perceived LMS (Long, Medium, Short wavelength) cone responses for individuals with CVD.

The degree-adaptable loss function extends the optimization by adjusting weights based on perceptual importance, defined as:

<math display="block"> E = \beta \sum_{i=1}^N \alpha_i \| T(c_i^+ - c_i) \|^2 + \sum_{i \neq j} \| T(c_i^+ - c_j^+) - T(c_i - c_j) \|^2. </math>

Here:
* <math>\alpha_i</math> assigns weights to each pixel, prioritizing the preservation of colors with smaller perception errors,
* <math>\| T(c_i^+ - c_i) \|</math> measures the perceptual difference after recoloring,
* <math>\| T(c_i^+ - c_j^+) - T(c_i - c_j) \|</math> quantifies the deviation in color contrast under CVD simulation.

This framework improves both contrast and personalization but requires further optimization for real-time performance.

==== Confusion lines based Method ====
Tsekouras et al. [10] proposed a novel image recoloring approach for individuals with protanopia and deuteranopia, focusing on improving color naturalness and enhancing contrast. Their framework consists of four modules, with a key focus on shifting confusing colors along confusion lines in the CIE 1931 chromaticity diagram.

The process begins with fuzzy clustering, which identifies representative colors (key colors) from the input image. These key colors are then analyzed on the chromaticity diagram, where confusion lines—paths representing colors indistinguishable by individuals with CVD—serve as the basis for recoloring. Confusion lines are defined using the copunctal point of the missing cone type and another reference point:

<math display="block">
d(v, L) = \frac{\left|(x_{cp} - x_0)(y_0 - y_v) - (x_0 - x_v)(y_{cp} - y_0)\right|}{\sqrt{(x_{cp} - x_0)^2 + (y_{cp} - y_0)^2}},
</math>

where:
* <math display="inline">v = (x_v, y_v)</math> is the chromaticity coordinate of the color,
* <math display="inline">L</math> is the confusion line passing through the copunctal point <math display="inline">(x_{cp}, y_{cp})</math> and another reference point <math display="inline">(x_0, y_0)</math>,
* <math display="inline">d(v, L)</math> measures the perpendicular distance from the point <math display="inline">v</math> to the confusion line <math display="inline">L</math>.

Confusing colors, identified as key colors lying on occupied confusion lines, are iteratively shifted to the nearest non-occupied confusion lines to enhance discriminability for CVD viewers. High-ranking colors, determined by their prominence in image clusters, are shifted to the nearest unoccupied confusion lines. This reallocation ensures that these colors are distinguishable to viewers with CVD while minimizing disruption to the image's overall color harmony.

After shifting, the luminance of the recolored key colors is optimized using a regularized objective function to balance naturalness and contrast:
<math display="block">
E = (E_1 + E_2) + \lambda E_3,
</math>

where:
* <math display="inline">E</math> is the total loss,
* <math display="inline">\lambda</math> is a weight parameter controlling the trade-off between contrast enhancement and naturalness preservation.

The first term, <math display="inline">E_1</math>, measures contrast enhancement for normal trichromats:

<math display="block">
E_1 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - b_j\| - \|f_D(a_{i,\text{rec}}) - f_D(b_j)\| \right|,
</math>

where:
* <math display="inline">n_A</math> and <math display="inline">n_B</math> are the number of key colors in clusters <math display="inline">A</math> and <math display="inline">B</math>, respectively,
* <math display="inline">a_i</math> is the chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">b_j</math> is the chromaticity of the <math display="inline">j</math>-th key color in cluster <math display="inline">B</math>,
* <math display="inline">f_D</math> is a function simulating the dichromatic vision of individuals with color vision deficiencies,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color.

The second term, <math display="inline">E_2</math>, measures contrast enhancement for dichromats:

<math display="block">
E_2 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - a_j\| - \|f_D(a_{i,\text{rec}}) - f_D(a_{j,\text{rec}})\| \right|,
</math>

where:
* <math display="inline">a_i</math> and <math display="inline">a_j</math> are the chromaticities of the <math display="inline">i</math>-th and <math display="inline">j</math>-th key colors in cluster <math display="inline">A</math>,
* <math display="inline">f_D(a_{i,\text{rec}})</math> simulates the dichromatic perception of the recolored chromaticity <math display="inline">a_{i,\text{rec}}</math>.

The third term, <math display="inline">E_3</math>, preserves the naturalness of the recolored image:

<math display="block">
E_3 = \frac{1}{n_A} \sum_{i=1}^{n_A} \|a_i - a_{i,\text{rec}}\|,
</math>

where:
* <math display="inline">a_i</math> is the original chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">\|a_i - a_{i,\text{rec}}\|</math> is the Euclidean distance between the original and recolored chromaticities, measuring how much the naturalness is preserved.

This method significantly enhances the contrast and naturalness of recolored images by leveraging confusion line geometry and regularized optimization. However, challenges remain in achieving real-time performance and handling cases where shifting may distort the aesthetic quality of the image.

==== GMM-based Method ====
Huang et al. [11] proposed an efficient and effective re-coloring algorithm for individuals with CVD using a Gaussian Mixture Model (GMM) to represent color distributions. The algorithm comprises four main steps: feature extraction, clustering using GMM, optimization of Gaussian components, and interpolation for recoloring.

Step 1 - Feature Extraction:
Each pixel in the input image is represented in the CIEL*a*b* color space, which approximates perceptual differences using the Euclidean distance between colors. The color feature vector <math display="inline">x</math> is used as input for clustering.

Step 2 - Clustering via GMM:
The color distribution of the image is modeled using a GMM with <math display="inline">K</math> Gaussian components:
<math display="block">
p(x|\Theta) = \sum_{i=1}^K \omega_i G_i(x|\theta_i),
</math>
where:
* <math display="inline">\Theta</math> is the parameter set containing all weights, means, and covariance matrices,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian,
* <math display="inline">G_i(x|\theta_i)</math> is the 3D normal distribution with parameters <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix).

Step 3 - Optimization:
To ensure color distinguishability for CVD viewers, the algorithm adjusts the mean vector of each Gaussian component using an optimization function that preserves the symmetric Kullback-Leibler (KL) divergence:
<math display="block">
D_{sKL}(G_i, G_j) = D_{KL}(G_i \| G_j) + D_{KL}(G_j \| G_i),
</math>
where:
* <math display="inline">D_{KL}(G_i \| G_j)</math> measures the dissimilarity between two Gaussian distributions <math display="inline">G_i</math> and <math display="inline">G_j</math>.

Step 4 - Interpolation for Recoloring:
After optimizing the Gaussians, the mapping function <math display="inline">M_i(\cdot)</math> relocates the mean vectors while maintaining covariance matrices. Interpolation ensures smooth transitions between recolored regions:
<math display="block">
T(x_j)_H = x_j^H + \sum_{i=1}^K p(i|x_j, \Theta) (M_i(\mu_i)_H - \mu_i^H),
</math>
where:
* <math display="inline">T(x_j)_H</math> is the hue adjustment for the <math display="inline">j</math>-th color,
* <math display="inline">M_i(\mu_i)_H</math> is the mapped hue of the <math display="inline">i</math>-th Gaussian's mean.

While the GMM-based approach effectively models color distributions and enhances the contrast of recolored images significantly, it has limitations:
* The accuracy of recoloring depends on the choice of <math display="inline">K</math>, which may vary for different images.
* The method assumes diagonal covariance matrices for computational efficiency, which may oversimplify real-world color distributions. Sometimes the colors in the recolored images are not very natural.
* The high computational complexity in the optimization step of this algorithm may be difficult for real-time applications.

=== Deep Learning based methods ===
Conventional methods for recoloring, including optimization-based approaches (as discussed above), fail to generalize well across varying severity levels and CVD types. While these methods improve color differentiation, they frequently compromise naturalness or require extensive computational resources, making them less suitable for real-time, efficient, personalized applications.

==== GAN-Based Recoloring for CVD ====

In [1] GANs (Generative Adversarial Networks) was explored for recoloring, with a backbone Pix2Pix-GAN, Cycle-GAN, and Bicycle-GAN structure showing promising results. These models are generate creative recolored images by learning mappings between normal and CVD-affected color spaces. However, this and existing GAN approaches struggle with balancing naturalness and contrast. This specific reference also requires paired datasets (since it is adapted from style transfer), making it computationally intensive and less suitable for personalization.

==== Swin Transformer Recoloring ====

The authors in [2] introduced a hierarchical vision transformer (SWIN) architecture that processes images through shifted windows, effectively capturing both local and global contextual information. In computer vision, this design generally allows efficient handling of high-resolution images and has been applied to various tasks, including image classification and object detection. Despite its robust performance, this architecture is still computationally intensive and does not inherently account for the specific needs of CVD individuals, as it lacks mechanisms for personalized color adjustments.

==== Personalized CVD-GAN ====

To cater to the diverse needs of the CVD population, the Personalized CVD-GAN [3] was developed. This model generates images that are not only CVD-friendly but also tailored to individual degrees of color vision deficiency. By disentangling color representations using a unique triple-latent structure in their method, continuous personalization was possible to adjust images according to specific CVD severities. While effective, this approach is computationally demanding, making it less practical for real-time applications. In our experiment, it took around 18 days for one epoch (or one iteration over the entire dataset).

Thus, existing methods either lack personalization or are too resource-intensive for widespread use.

== Methods ==
We aim to find effective and efficient ways to recolor images for people with CVD with the personalization of different severity levels. We start by exploring existing methods and identifying opportunities for improvement. Since mathematical-based approaches provide a solid foundation and are well-documented, we began our experiments by testing these methods, as described in the background. We later extended our exploration to deep learning based methods.

=== Mathematical based ===
We explored four main methods, building on the foundational work discussed in the background section.

==== Method 1: Daltonization as a Baseline ====
We started with the relatively intuitive Daltonization method, where we adjusted the colors in an image to compensate for color vision deficiencies by simulating how the colors appear to individuals with CVD. This involves computing the difference between the original and simulated color perception in the LMS (Long, Medium, Short wavelength) color space. The calculated error is then corrected and mapped back to the RGB space using a transformation matrix, resulting in a recolored image that enhances color differentiation for viewers with CVD.

The simulation of CVDs relies on the physiology of human vision, particularly the responses of the Long (L), Medium (M), and Short (S) wavelength-sensitive cones in the retina. The LMS color space is derived from the spectral sensitivities of these cones, making it an ideal framework for modeling human color perception.

To simulate CVD, we first transformed colors in RGB color space into the LMS color space using the following linear transformation matrix based on Stockman and Sharpe’s cone fundamentals:
<math display="block">
T_{\text{RGB-to-LMS}} = \begin{bmatrix}
0.3904725 & 0.54990437 & 0.00890159 \\
0.07092586 & 0.96310739 & 0.00135809 \\
0.02314268 & 0.12801221 & 0.93605194
\end{bmatrix}
</math>

For individuals with CVD, the missing cone’s response is replaced by a weighted combination of the remaining two cones. This approach, introduced by Brettel, Viénot, and Mollon (1997) [7], uses specific coefficients derived from cone sensitivities. For example, in protanopia (L-cone deficiency), the L-cone response is approximated using the M- and S-cone responses as:
<math display="block">
L_{\text{simulated}} = 0 \cdot L + 0.90822864 \cdot M + 0.008192 \cdot S
</math>

For deuteranopia (M-cone deficiency), the M-cone is replaced as:
<math display="block">
M_{\text{simulated}} = 1.10104433 \cdot L + 0 \cdot M - 0.00901975 \cdot S
</math>

For tritanopia (S-cone deficiency), the S-cone is replaced as:
<math display="block">
S_{\text{simulated}} = -0.15773032 \cdot L + 1.19465634 \cdot M + 0 \cdot S
</math>

These transformations allow accurate simulation of the perceptual experience of individuals with CVD. (The numbers are derived from [5]).

The error between the original and simulated is then mapped into the RGB color space using a deficiency-specific correction matrix, which adjusts the image to enhance contrast and recover lost color differences. The predefined correction matrix is applied to the error in RGB space, transforming it back into LMS space for final adjustments. The corrected LMS values are added back to the original values, producing a recolored image that improves visual accessibility for viewers with CVD. This approach uses the Daltonize-inspired correction matrix:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

==== Method 2: Optimizing Objective Function ====
To improve the results from the Daltonization method, we designed a framework inspired by methods discussed in the background, incorporating dominant color extraction, optimization-based recoloring, and edit propagation. This approach aims to find a balance between the naturalness and contrast while compensating colors that are not visible for corresponding CVD types.

===== 1. Extraction of Dominant Colors =====
We begin by extracting the dominant colors from the input image using fuzzy clustering via a K-means algorithm. This step identifies a reduced set of representative colors that capture the primary color information in the image:
<math display="block">
\mathbf{C} = \{\mathbf{c}_1, \mathbf{c}_2, \ldots, \mathbf{c}_N\},
</math>
where <math display="inline">N</math> represents the number of clusters, and <math display="inline">\mathbf{c}_i</math> represents the centroid of the <math display="inline">i</math>-th cluster.

===== 2. Optimization-Based Recoloring =====
Once the dominant colors are extracted, we apply an optimization process to adjust these colors. The optimization uses the formulas mentioned in [9], and aims to balance two key objectives:

1. Naturalness Preservation: Ensures the recolored image minimally deviates from the original.
<math display="block">
E_{\text{nat}} = \sum_{i=1}^N \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_i^{\text{original}}) \|^2,
</math>
where <math display="inline">\mathbf{T}</math> is the transformation matrix based on the severity and type of CVD, and <math display="inline">\mathbf{c}_i^{\text{original}}</math> is the original color.

2. Contrast Enhancement: Improves the differentiation of colors for individuals with CVD:
<math display="block">
E_{\text{cont}} = \sum_{i=1}^N \sum_{j>i} \left( \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_j) \|^2 - \| \mathbf{c}_i^{\text{original}} - \mathbf{c}_j^{\text{original}} \|^2 \right)^2.
</math>

The total objective function combines these two terms:
<math display="block">
E = \beta E_{\text{nat}} + E_{\text{cont}},
</math>
where <math display="inline">\beta</math> controls the trade-off between naturalness and contrast.

Optimization is performed using the L-BFGS-B algorithm to ensure efficient convergence under bounded constraints.

The transformation matrices for each type of CVD are the following, which are based on [12]:

<div style="text-align:center;">
<math display="inline">
T_{\text{Protanopia}} = \begin{bmatrix} 0.566 & 0.558 & 0 \\ 0.433 & 0.442 & 0.242 \\ 0 & 0 & 0.758 \end{bmatrix},
</math>
<math display="inline">
T_{\text{Deuteranopia}} = \begin{bmatrix} 0.625 & 0.7 & 0 \\ 0.375 & 0.3 & 0.3 \\ 0 & 0 & 0.7 \end{bmatrix},
</math>
<math display="inline">
T_{\text{Tritanopia}} = \begin{bmatrix} 0.95 & 0 & 0 \\ 0.05 & 0.433 & 0 \\ 0 & 0.567 & 1 \end{bmatrix}.
</math>
</div>

===== 3. Edit Propagation =====
After optimizing the dominant colors, we propagate these edits across the entire image to ensure smooth transitions. This propagation step leverages the CIE-Lab color space, which is perceptually uniform, meaning that the Euclidean distance in this space correlates well with human color perception. The process begins by mapping the original image and the optimized dominant colors into the Lab color space. In this space, the differences between the original and recolored dominant colors are computed to capture the adjustments made during the optimization step:
<math display="block">
\Delta L^* = \text{griddata}(\mathbf{c}^{\text{original}}, \mathbf{c}^{\text{recolored}} - \mathbf{c}^{\text{original}}, \mathbf{I}),
</math>
where <math display="inline">\mathbf{I}</math> represents the pixel values in the Lab color space. Once the interpolated changes are computed, they are applied to the Lab representation of the original image. Finally, the adjusted Lab values are converted back to the RGB color space to reconstruct the recolored image.

==== Method 3: Improved with Confusion Line Adjustments ====
This method builds upon the previous method by introducing enhancements in dominant color extraction, optimization, and edit propagation, while incorporating an additional step to adjust colors near confusion lines in the CIE 1931 xyY color space inspired by [10]. These improvements aim to further enhance contrast and naturalness of the recolored images. Moreover, this method adds flexibility in adjusting for different severity levels for each CVD type.

===== 1. Improvements on Method 2 =====
To improve the performance of dominant color extraction, we transitioned from traditional K-means to MiniBatch K-means. This algorithm processes data in small batches, significantly reducing computational time while maintaining accuracy in clustering. The number of dominant colors was also reduced from 50 to 30 to focus on key representative colors and further enhance efficiency. The optimization objective is refined to leverage vectorization, improving computational efficiency. The two key terms remain:
<math display="block">
E = \beta E_{\text{nat}} + (1 - \beta) E_{\text{cont}}.
</math>
The optimization objective was refined to significantly improve computational efficiency by replacing the nested loops in the contrast enhancement term with vectorized operations. In the original implementation, the pairwise differences between colors were calculated iteratively using <math display="inline">O(N^2)</math> nested loops. The improved version eliminates this overhead by leveraging array broadcasting to compute all pairwise differences simultaneously, and the transformation matrix <math display="inline">\mathbf{T}</math> is then applied to all pairwise differences in a single tensor operation:
<math display="block">
\mathbf{T}_{\Delta} = \text{tensordot}(\Delta_{ij}, \mathbf{T}),
</math>
and the norms are computed in parallel across the entire array. Additionally, the weighting parameter <math display="inline">\beta</math> was adjusted to favor naturalness preservation, ensuring better visual integrity in the recolored image.
The propagation step changed to use a k-d tree for fast nearest neighbor searches, replacing grid-based interpolation. This approach more efficiently matches each pixel in the Lab color space to the closest dominant color:
<math display="block">
\mathbf{I}_{\text{adjusted}} = \mathbf{C}_{\text{recolored}}[\text{k-d tree query}(\mathbf{I})],
</math>
where <math display="inline">\mathbf{I}</math> represents the pixel values in Lab space.
These refinements enable faster optimization while improving the balance between naturalness and contrast enhancement.

===== 2. Confusion Line Adjustments =====
An additional step adjusts colors near confusion lines in the CIE 1931 xyY color space to enhance distinguishability:

1. Confusion lines are defined for protanopia, deuteranopia, and tritanopia, based on [10]. For example, for protanopia:
<math display="block">
\text{Confusion Line: Start} = (0.735, 0.265), \quad \text{End} = (0.115, 0.885).
</math>

2. Colors near the confusion line are identified using orthogonal distance:
<math display="block">
d(\mathbf{xy}, L) = \frac{\| (\mathbf{xy} - \mathbf{p}_1) \times (\mathbf{p}_2 - \mathbf{p}_1) \|}{\|\mathbf{p}_2 - \mathbf{p}_1\|},
</math>
where <math display="inline">\mathbf{p}_1</math> and <math display="inline">\mathbf{p}_2</math> are the start and end points of the confusion line.

3. Identified colors are shifted orthogonally away from the line:
<math display="block">
\mathbf{xy}_{\text{adjusted}} = \mathbf{xy} + \lambda \mathbf{v}_{\perp},
</math>
where <math display="inline">\mathbf{v}_{\perp}</math> is a perpendicular vector, and <math display="inline">\lambda</math> is a scaling factor.

===== 3. Personalise with Severity Levels =====
To take into account of severity levels, the transformation matrix <math display="inline">\mathbf{T}</math> linearly interpolates between normal vision and full CVD perception based on severity and type:
<math display="block">
\mathbf{T} = (1 - s) \mathbf{I} + s \mathbf{T}_{\text{CVD}},
</math>
where <math display="inline">s</math> represents the severity of CVD (0-100%), <math display="inline">\mathbf{I}</math> is the identity matrix, and <math display="inline">\mathbf{T}_{\text{CVD}}</math> is the full transformation matrix specific to protanopia, deuteranopia, or tritanopia. Such a method is based on DaltonLens simulator [13].

These improvements significantly enhanced both the effectiveness and efficiency of the recoloring process on top of Method 2.

==== Method 4: Improved with GMM-based Method ====
The last mathematical method we exprimented enhances recoloring by integrating a Gaussian Mixture Model (GMM)-based global recoloring algorithm. The method also applies nonlinear adjustments for colors near confusion lines to ensure improved contrast and naturalness.

===== 1. GMM-Based Global Recoloring =====
The image is first resized and transformed into the Lab color space. A GMM is applied to cluster the color distribution into <math display="inline">K</math> components, optimizing the number of clusters using the Bayesian Information Criterion (BIC):
<math display="block">
\text{BIC} = -2 \cdot \text{log-likelihood} + P \cdot \log(N),
</math>
where <math display="inline">P</math> represents the model parameters and <math display="inline">N</math> is the number of pixels.

The GMM means are simulated using the transformation matrix <math display="inline">T</math> with severity levels taken into account, and the symmetric Kullback-Leibler (KL) divergence (<math display="inline">D_{\text{sKL}}</math>) is calculated between pairs of clusters:
<math display="block">
D_{\text{sKL}}(i, j) = D_{\text{KL}}(G_i \| G_j) + D_{\text{KL}}(G_j \| G_i),
</math>
where <math display="inline">G_i</math> and <math display="inline">G_j</math> are Gaussian components, and <math display="inline">D_{\text{KL}}</math> represents the KL divergence. The GMM cluster means are then adjusted by solving a nonlinear least-squares problem to minimize the discrepancy.

===== 2. Adjusting Near Confusion Lines Improved =====
Following global recoloring, colors near confusion lines in the CIE 1931 xyY color space are further adjusted based on formulas used in Method 3. Nonlinear scaling is applied to amplify the shifts for pixels closer to the line:
<math display="block">
w = \left( \frac{\text{threshold} - d}{\text{threshold}} \right)^2,
</math>
where <math display="inline">w</math> is the scaling factor.

The adjustments from the GMM and confusion line steps are combined to produce the final recolored image. These enhancements make the method more robust and effective for individuals with varying levels of CVD.

=== Deep Learning based ===

==== Task Overview ====
Given an input RGB image and a label for the user (as shown in the figure), we want a deep learning model to output a recolored RGB image that is specific to that user. More details on inputs and outputs are discussed in further sections but an overview is shown in Figure 1. All of the code was done in Python using a deep learning framework called [https://pytorch.org PyTorch]
[[File:Io.png|right|thumb|200px|Figure 1: Dataset]]

==== Types ====
1. ''' Supervised methods ''':
These are deep learning models that require a 'ground truth' recolored image for the neural network to learn recolorization. While these methods are simple, easy to train and integrate the user label, they require an already present ground truth comparison of expected output.

2. ''' Unsupervised methods ''':
These models are trained without a ground truth and can also encode user label information while training. They are generally better at generating more natural images, but they require more compute and sophisticated model architectures or loss functions for the recoloring task

==== Dataset ====
The dataset used for this project was constructed specifically to address the challenges of recoloring images for individuals with color vision deficiency (CVD). We first gathered an open-source RGB image dataset from [2]. To improve the capability of the proposed model to enhance the contrast between CVD-indistinguishable color
pairs, in their study, they created a new dataset consisting of 141,000 pictures of both natural scenes and artificial images containing
CVD-confusing colors without labels. To generate labels (and ground truth recolored images for supervised methods), we randomly sampled 15,000 images and recolored by simulating random labels for severity and type of CVD. The recoloring for ground truth images was done using a [https://github.com/jbhuang0604/RecolorForColorblind/tree/master MATLAB script] (adapted to Python) from [4]. Note: The open-source tools used in the Python version for the recoloring script were [https://scikit-image.org Scikit-Image], [https://scipy.org Scipy] and [https://python-colormath.readthedocs.io/en/latest/ Colormath].

As shown in Figure 1, each sample in the dataset consists of:

1. ''' Original RGB Image''' : High-resolution images, resized to <code> 256x256</code> pixels and normalized to <code>[0,1]</code> range, representing the standard color space.

2. ''' CVD Labels ''' : Condition labels encoded as <code>severity * [protan, deutan]</code>, where severity ranges from 0.1 to 1.0. For example, a label <code>[0.6, 0]</code> corresponds to protanopia at 60% severity.

Data augmentation techniques such as random rotations, crops, and brightness adjustments were applied to expand the dataset, ensuring robust model generalization across diverse scenarios.

==== Supervised Methods ====
===== Conditional Parallel RGB MLP =====
[[File:mlp.png|right|thumb|Figure 2: Conditional MLP architecture]]
As shown in Figure 2, the model predicts the R, G, and B channels separately using an independent multi-layer perceptron (MLP) for each channel. The input image is concatenated with the label encoding along the channel dimension and is passed to 3 parallel MLPs simultaneously. These parallel networks are learned to predicted R, G, B channels of a recolored image based on given ground truth. The outputs from each of these networks are concatenated to produce the recolored RGB image of same spatial dimensions as input. Essentially, each channel is disentangled, enabling targeted adjustments.

The loss function used to train was pixel wise, mean-squared error loss:
<math>
\mathcal{L}_{\text{MSE}} = \frac{1}{N} \sum_{p=1}^{N} \left( I(p) - I'(p) \right)^2
</math>

Where:
* I, I': Recolored (model output) image and ground truth recolored image respectively
* p: Image index
* N: Total number of images

===== Conditional U-Net =====
In a similar fashion of inputs, a convolutional neural network (CNN)-based U-Net architecture was tested to generate a full recolored image as output. The conditional inputs here affect both the encoder and decoder. [[File:Unet condtional.png|right|thumb|Figure 3: Conditional U-Net architecture]]
U-Nets are widely used in computer vision tasks and are very robust to new tasks as well. The architecture we adopted is shown in Figure 3.
The loss function used to train the U-Net was a commonly used VGG Perceptual Loss:
<math>
\mathcal{L}_{\text{VGG}} = \sum_{l} \frac{1}{N_l} \| \phi_l(I) - \phi_l(I') \|_2^2
</math>

Where:
* I and I': are recolored (model output) and ground recolored images respectively
* <math>\phi_l</math> is the l-th of the pre-trained VGG network

==== Unsupervised Methods ====
===== Conditional Autoencoder =====
As shown in Figure4, an unsupervised CNN-based encoder-decoder network was trained to reconstruct full recolored images with a CVD-aware color palette. The key to making this network align with the recoloring task was the loss functions. The loss functions we used to train this network were inspired from [2]. [[File:Ae.png|right|350px|thumb|Figure 4: Conditional Autoencoder architecture]]

The total loss function is given by:
<math>
\mathcal{L}_{\text{total}} = \alpha \cdot \mathcal{L}_{\text{naturalness}} + 2 \cdot (1 - \alpha) \cdot \mathcal{L}_{\text{contrast}}
</math>

Where:
<math>
\mathcal{L}_{\text{contrast}} = \beta \cdot \mathcal{L}_{\text{global}} + (2 - \beta) \cdot \mathcal{L}_{\text{local}}
</math>

The components of the loss functions are described below:

1. '''Global Contrast Loss''':
The global contrast loss ensures that the overall contrast of the recolored image is preserved. It is defined as
<math>
\mathcal{L}_{global} = \frac{1}{\|\omega\|} \sum_{<x, y> \in \epsilon \omega} \text{CL}(x, y)
</math>

2. '''Local Contrast Loss''':
The local contrast loss focuses on preserving the contrast within a small neighborhood around each pixel. <math>
\mathcal{L}_{l} = \frac{1}{N} \sum_{x=1}^{N} \sum_{y \in \omega_x} \frac{\text{CL}(x, y)}{\|\omega_x\|}
</math>

Note:

<math>
\text{CL}(x, y) = \|\hat{c}_x' - \hat{c}_y'\| - \|c_x - c_y\|
</math>

* x,y: Two distinct pixels in the image.
* cx and cy: CVD simulated colors of original image
* c^x′and c^y: CVD simulated colors of recolored image (model output)
* ||w||: Global (or large) window of image
* ||wx||: Local window or neighborhood around a pixel x

3. '''Naturalness Loss''':
The naturalness loss drives output image to have colors that are visually similar and close to natural distributions. <math>
\mathcal{L}_{\text{natural}} = 1 - \text{SSIM}(I', I)
</math>

Where:
* I(i), I'(i): Original and recolored images respectively

== Results ==

=== Mathematical based methods ===

==== Qualitative Results ====
The results and takeaways can be summarized as follows:

1. '''Method 1: Daltonization Baseline''':
[[File:Method1.png|400px|thumb|right|Figure 10 Method 1 Results]]
*

2. '''Method 2: Optimizing Objective Functions''':
[[File:Method2.png|400px|thumb|right|Figure 11 Method 2 Results]]

3. '''Method 3: Adjustments Near Confusion Lines with Improved Method 2''':
[[File:Method3.png|400px|thumb|right|Figure 12 Method 3 Results]]

4. '''Method 4: Improved with GMM-based Method''':
[[File:Method4.png|400px|thumb|right|Figure 13 Method 4 Results]]

==== Quantitative Results ====
{| class="wikitable"
|+ Table 1: Quantitative Evaluation Results for Mathematical Methods
! Original vs Recolored !! Method 1 !! Method 2 !! Method 3 !! Method 4
|-
| SSIM || 0.0066 || 0.9998 || 0.9988 || 0.9902
|-
| TCC || 0.4211 || 0.0001 || 0.0003 || 0.0005
|-
| CD ΔE76 || 57.4513 || 0.0217 || 0.0632 || 0.1057
|-
| CIEDE2000 || 41.2667 || 0.0229 || 0.0675 || 0.1312
|-
| CIEDE94 || 57.3637 || 0.0217 || 0.0630 || 0.1056
|-
| D-CIELAB ΔEab || 2.1314 || 3.8863 || 7.6867 || 8.0045
|-
| Time/image || 0.2s || 1m13s || 4.4s || 1.6s
|}

=== Deep Learning based methods ===
The results focus on evaluating the performance of the above neural network architectures—Conditional Parallel RGB MLP, Deep U-Net, and Conditional Autoencoder. Quantitive metrics such as Structural Similarity Index (SSIM), total color contrast (TCC), Chromatic Difference (CD), and inference time were used to assess the effectiveness of the models provided in [1] and [2].

==== Qualitative Results ====
The recolored outputs were visually evaluated to determine their alignment with expected results. The 'expected' results for supervised mean how closely they resemble ground truth recolored image and for unsupervised method mean how much contrast and naturalness is observed in the CVD simulated recolored images compared to original.
The results and takeaways can be summarized as follows:

1. '''Conditional Parallel RGB MLP''': (Figure 5)
[[File:Mlp_res.png|right|400px|thumb|Figure 5 Conditional MLP: Model failure]]
* Recoloring was inconsistent, with visible artifacts in regions where spatial correlations were essential.
* The pixels seemed more discretized, suggesting that disentanglement was not very useful for this case (especially naturalness).
* Failed to preserve natural color transitions, particularly in complex images.
2. '''Conditional U-Net''': (Figure 6, 7)
[[File:Unet_res1.png|right|400px|thumb|Figure 6 Conditional U-Net: Model failure]]
[[File:Unet_res2.png|right|400px|thumb|Figure 7 Conditional U-Net: CVD Simulated examples]]
* Produced stable recoloring, preserving structural details.
* Initially showed improvement towards resembling ground truth, but over time started 'reconstructing' the colors of the original image.
* The CVD simulations of recolored versus original were similar or worse meaning that the model was not doing well for this task
* Sometimes it over-saturated some colors, affecting the visual appeal.
3. '''Conditional Autoencoder''': (Figure 8, 9)
[[File:ae_res1.png|right|400px|thumb|Figure 8 Conditional Autoencoder: Majority good results]]
[[File:ae_res1.png|right|400px|thumb|Figure 9 Conditional Autoencoder: Marginal or negative improvement + Blurriness]]
* Achieved smooth and natural recoloring, with fewer artifacts.
* Showed the highest contrast improvement among the three models.
* In some cases, hurt the contrast in the CVD simulated colors and in some there was marginal improvement in contrast.
* Blurriness in the recolored images was seen (possibly due to naturalness factor being more prioritized even though weight coefficients in the loss term favored contrast (alpha = 0.25, beta = 1.0)).

==== Quantitative Results ====
Based on the above qualitative results, we decided to score and evaluate metrics for comparison with related work only using the Conditional Autoencoder.
As mentioned above, the evaluation metrics are adapted from [1] and [2]. Please refer to the definitions in the paper, as we have used the same. On a high level, the three components are:
* SSIM: Measures the structural similarity between the original and recolored images, ensuring the structural integrity of the recolored image is maintained.
<math>
SSIM(X, Y) = \frac{(2\mu_X\mu_Y + c_1)(2\sigma_{XY} + c_2)}{(\mu_X^2 + \mu_Y^2 + c_1)(\sigma_X^2 + \sigma_Y^2 + c_2)}
</math>

* Total Color Contrast: Quantifies the visibility improvement between indistinguishable colors for CVD individuals.
<math>
TCC = \frac{1}{n_1} \sum_{(i,j) \in \Omega_1} |x_i - x_j|
+ \frac{1}{N \cdot n_2} \sum_{i=1}^{N} \sum_{j \in \Omega_2} |x_i - x_j|
</math>
* Chromatic Difference: Quantifies the perceptual differences in color before and after recoloring, ensuring enhanced distinguishability
<math>
CD(i) = \sqrt{\lambda (l_i' - l_i)^2 + (a_i' - a_i)^2 + (b_i' - b_i)^2}
</math>
(lamda is a constant, not wavelength and l,a,b represent LAB space coordinates of recolored (') and original respectively.)
* Inference Time: Determines the computational efficiency of the models.

The key results are in Table 2 and takeaways for the Conditional Autoencoder can be summarized as follows:

{| class="wikitable" style="text-align:center; width:40%; margin:auto;"
|-
! Metric
! Value
|-
| Inference Time
| 2.6 seconds/image
|-
| SSIM ("Structure")
| 0.8707
|-
| Total Color Contrast ("Distinguishability")
| 0.5771 (vs. ~0.851)*
|-
| Chromatic Difference ("Color")
| 0.3521 (vs. ~0.963)*
|+ '''Table 2: Quantitative Evaluation Results'''
|}

Note: * indicates results from paper [2] for protan/deutan whichever is larger.

* TCC and CD are good but not as good as paper [2] because they use optimize networks for each CVD type separately.
* Blurry (SSIM is not optimized for enough)
* Mixing CVD types in the same network needs to be more sophisticated

== Conclusions ==
Through our (many) experiments, we learned a couple of things:

1. '''Model Effectiveness''':
Among the models, the Conditional Autoencoder showed the best balance between enhancing color contrast and preserving naturalness. It improved the distinguishability of colors for CVD individuals while maintaining a smooth, visually appealing output. However, it produced slightly blurry images, which could be improved with better loss functions or refinement techniques. The Conditional U-Net was also effective in preserving structure and providing stable recoloring, but it required careful training to avoid overfitting. The Conditional Parallel RGB MLP, while computationally fast, lacked the ability to capture spatial relationships between pixels, making it unsuitable for this task.

2. '''Importance of Loss Functions''':
Designing appropriate loss functions was crucial for achieving the right balance between naturalness, contrast enhancement, and structural preservation. The global and local contrast losses significantly improved the visibility of recolored images, while the naturalness loss ensured that the outputs did not look artificial. Incorporating metrics like SSIM and Chromatic Difference into the evaluation also helped us better understand how well the models performed.

3. '''Challenges with Data''':
One of the biggest challenges was ensuring that the dataset effectively represented real-world scenarios for CVD individuals. Simulating CVD perceptions and generating recolored images that matched those perceptions required a well-defined pipeline. A more diverse dataset or additional user studies with CVD participants could help fine-tune the models further.

4. '''Computational Efficiency''':
While models like the Conditional Autoencoder and Conditional U-Net provided high-quality recoloring, their inference times were moderate, making them feasible for real-time applications. Optimizing these models further could make them more scalable for real-world use cases, such as accessibility tools in apps or websites.

5. '''What Worked and What Didn’t''':
* Worked: Contrast enhancement methods using local and global losses were effective in improving visibility for CVD individuals. Transformer-inspired loss functions borrowed from Swin architecture added robustness.
* Didn’t Work: Pixel-wise methods like the Conditional RGB MLP struggled due to their inability to handle spatial dependencies. Additionally, overfitting was a recurring issue in larger architectures without careful training.

6. '''Future Directions''':
* Better Loss Functions: Refining the loss functions to address issues like blurriness in outputs could further improve results.
* User Studies: Testing the models with real CVD participants would provide valuable insights and help validate the results.
* Model Optimization: Reducing the computational cost of high-performing models like the Conditional Autoencoder could make them more practical for deployment.
* Exploration of New Architectures: Trying newer methods, such as lightweight transformers or diffusion-based models, might enhance recoloring performance while maintaining efficiency.

While there’s still room for improvement, our models demonstrated the potential of deep learning in addressing the challenges faced by individuals with CVD. Our future work would focus on refining these methods and bringing them closer to practical, everyday applications.

== References ==
[1] Li, H., Zhang, L., Zhang, X., Zhang, M., Zhu, G., Shen, P., ... & Shah, S. A. A. (2020). Color vision deficiency datasets & recoloring evaluation using GANs. Multimedia Tools and Applications, 79, 27583-27614.

[2] Chen, L., Zhu, Z., Huang, W., Go, K., Chen, X., & Mao, X. (2024). Image recoloring for color vision deficiency compensation using Swin transformer. Neural Computing and Applications, 36(11), 6051-6066.

[3] Jiang, S., Liu, D., Li, D., & Xu, C. (2023). Personalized image generation for color vision deficiency population. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22571-22580).

[4] Huang, J.-B., Chen, C.-S., Jen, T.-C., & Wang, S.-J. (n.d.). Image recolorization for the colorblind [GitHub repository]. Retrieved December 12, 2024, from https://github.com/jbhuang0604/RecolorForColorblind

[5] Dietrich, J. (n.d.). Daltonize Python Package [GitHub repository]. Retrieved December 12, 2024, from https://github.com/joergdietrich/daltonize/blob/main/daltonize/daltonize.py

[6] Dougherty, B., & Wade, A. (2000). Vischeck. Retrieved December 12, 2024, from https://www.vischeck.com/

[7] Brettel, H., Viénot, F., & Mollon, J. D. (1997). Computerized simulation of color appearance for dichromats. Josa a, 14(10), 2647-2655.

[8] Zhu, Z., Toyoura, M., Go, K., Fujishiro, I., Kashiwagi, K., & Mao, X. (2019). Processing images for red–green dichromats compensation via naturalness and information-preservation considered recoloring. The Visual Computer, 35, 1053-1066.

[9] Zhu, Z., Toyoura, M., Go, K., Kashiwagi, K., Fujishiro, I., Wong, T. T., & Mao, X. (2021). Personalized image recoloring for color vision deficiency compensation. IEEE Transactions on Multimedia, 24, 1721-1734.

[10] Tsekouras, G. E., Rigos, A., Chatzistamatis, S., Tsimikas, J., Kotis, K., Caridakis, G., & Anagnostopoulos, C. N. (2021). A novel approach to image recoloring for color vision deficiency. Sensors, 21(8), 2740.

[11] Huang, J. B., Chen, C. S., Jen, T. C., & Wang, S. J. (2009, April). Image recolorization for the colorblind. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 1161-1164). IEEE.

[12] Color-Blindness.com. (n.d.). COBLIS - Color Blindness Simulator. Retrieved December 13, 2024, from https://www.color-blindness.com/coblis-color-blindness-simulator/

[13] DaltonLens. (n.d.). DaltonLens-Python [Computer software]. GitHub. Retrieved December 13, 2024, from https://github.com/DaltonLens/DaltonLens-Python

== Appendix I ==
* [https://github.com/rainasong/psych221-aut24-final-project.git Code]
* [https://drive.google.com/drive/folders/10WMXPbtpV7Hy5_qBA_TCEbW-kCpj1D7v Dataset]

=== Additional results ===
1. '''Recolored Images - Conditional Autoencoder'''
<div style="text-align: center;">
<div style="display: inline-block; vertical-align: middle;">
[[File:eb_1.png|400px|Wikipedia encyclopedia]]
</div>
<div style="display: inline-block; vertical-align: middle;">
[[File:eb_2.png|400px]]
</div>
</div>

2. '''Loss curves'''
<div style="text-align: center;">
<div style="display: inline-block; vertical-align: middle;">
[[File:loss_ae.png|350px|thumb|Conditional Autoencoder]]
</div>
<div style="display: inline-block; vertical-align: middle;">
[[File:loss_unet.png|350px|thumb|Conditional U-Net]]
</div>
<div style="display: inline-block; vertical-align: middle;">
[[File:loss_mlp.png|350px|thumb|Conditional MLP]]
</div>
<div style="clear: both; text-align: center;">
Losses: Conditional Autoencoder, Conditional U-Net, and Conditional MLP
</div>
</div>

3. '''Mathematical method results with color plates'''

<div style="text-align: center;">
<div style="display: inline-block; vertical-align: middle;">
[[File:Method1-color-plates.png|400px|thumb|Method 1 Color Plates Results]]
</div>
<div style="display: inline-block; vertical-align: middle;">
[[File:Method2-color-plates.png|400px|thumb|Method 2 Color Plates Results]]
</div>
</div>

<div style="text-align: center;">
<gallery mode="nolines" widths="400px" heights="300px" caption="Method 3 Color Plates Results for Protanopia, Deuteranopia, and Tritanopia with Severity Levels">
File:Method3-protan.png|Protanopia
File:Method3-deutan.png|Deuteranopia
File:Method3-tritan.png|Tritanopia
</gallery>
</div>

<div style="text-align: center;">
<gallery mode="nolines" widths="400px" heights="300px" caption="Method 4 Color Plates Results for Protanopia, Deuteranopia, and Tritanopia with Severity Levels">
File:Method4-protan.png|Protanopia
File:Method4-deutan.png|Deuteranopia
File:Method4-tritan.png|Tritanopia
</gallery>
</div>

== Appendix II ==
'''Ishikaa''':
* Training, evaluation and visualization for all deep learning methods (MLP, U-Net and Autoencoder)
* GMM recoloring method in Python & adding severity index
* 'Ground Truth' dataset creation and logging
* AWS Compute setup & configuration
* Written Report & Presentation

'''Raina''':
* Researching, writing and running scripts for four (and more) mathematical-based methods (Daltonization, Optimization-based, Confusion lines based, GMM based and some other experiments such as a segmentation-based method which was discarded due to slow performance)
* Results generation and validation for all scripts written
* Evaluation metrics scripts for mathematical methods
* Written Report & Presentation

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-13T11:44:02Z

Rainas:

== Introduction ==
Color Vision Deficiency (CVD) affects approximately 350 million individuals worldwide, impairing their ability to distinguish certain colors. Image recoloring for individuals with CVDs has been a well-researched area, with numerous attempts aimed at creating images that make colors more distinguishable for those with CVDs. CVDs are typically classified into three main types: protanopia (difficulty perceiving red), deuteranopia (difficulty perceiving green), and tritanopia (difficulty perceiving blue). These conditions arise due to the absence or malfunction of certain types of cone cells in the retina, which are responsible for color vision. For instance, the lack or defect of red or green cones leads to protanopia or deuteranopia, respectively. This can result in partial or complete loss of perception of specific colors. Moreover, individuals with CVD can be dichromats (completely missing one type of cone cell), or anomalous trichromats (having all three types of cones but with altered sensitivity), causing milder color perception issues. Most research has focused on protanopia and deuteranopia, as these conditions are more prevalent, and only a few consider different severity levels.

In this work, we aim to consider all three types of CVDs, taking into account varying levels of severity for personalization. We explore existing methods in the field and experiment through two main approaches: mathematical transformations and deep learning techniques. We will start by reviewing current advancements in these two domains, followed by presenting our experiments and results. Evaluations of each method will be provided, leading to a discussion of our findings and outlining potential directions for future work.

The motivation for this work arose from a personal experience. While admiring a rainbow, a phenomenon I find among the most beautiful in the world, with a friend who has deuteranopia, I realized that they were unable to distinguish the vibrant array of colors. This experience highlighted the emotional and perceptual gap caused by CVD, inspiring the goal of this project: to develop personalized and efficient tools that enhance color perception for individuals with CVDs. Ultimately, we aim to enable those with CVDs to enjoy the same vivid experiences, such as the beauty of a rainbow, experienced by those with normal color vision.

== Background ==
In recent years, numerous methods have been developed to recolor images for individuals with CVDs, ranging from traditional mathematical approaches to advanced deep learning techniques. This section focuses on the prominent recent works in these two categories.

=== Mathematical-based methods ===
Mathematical approaches to image recoloring for individuals with CVDs have been extensively developed to enhance color discrimination while trying to preserve the natural appearance of images. These methods typically involve color space transformations, optimization techniques, and perceptual modeling to achieve their objectives.

==== Daltonization ====
Daltonization enhances images for individuals with CVD by correcting colors based on the simulated deficiency. The process involves comparing the original LMS values with the simulated deficient values to compute the error:
<math display="block">
\text{Error}_{\text{LMS}} = \text{LMS}_{\text{original}} - \text{LMS}_{\text{simulated}}
</math>

The error is then mapped back to the RGB space using a correction matrix because the error contains the information that dichromats cannot see, and the correction matrix rotates it to a part of the spectrum that they can see. For example, the correction matrix, as implemented in tools like Daltonize [5] and Vischeck [6], is:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

The corrected RGB values are added back to the original LMS values to generate a daltonized image that improves contrast for CVD viewers.

==== Optimization-based Method ====
Zhu et al. [8] introduced an optimization-based recoloring framework for red-green dichromacy, aiming to balance naturalness and contrast. The framework minimizes a total loss function defined as:

<math display="block"> E = \beta E_{\text{nat}} + E_{\text{cont}} </math>

where <math>\beta</math> is a scalar weight that controls the trade-off between the two objectives: naturalness preservation (<math>E_{\text{nat}}</math>) and contrast enhancement (<math>E_{\text{cont}}</math>).

The naturalness term, <math>E_{\text{nat}}</math>, ensures that the recolored image closely resembles the original image for CVD viewers by minimizing perceptual differences:

<math display="block"> E_{\text{nat}} = \sum_{i=1}^N \| c_i^+ - c_i \|^2, </math>

where:
* <math>N</math> is the total number of pixels in the image,
* <math>c_i</math> is the original color of the <math>i</math>-th pixel,
* <math>c_i^+</math> is the recolored value of the <math>i</math>-th pixel,
* <math>\| c_i^+ - c_i \|</math> is the Euclidean distance, measuring the perceptual difference between the original and recolored colors.

The contrast term, <math>E_{\text{cont}}</math>, enhances the distinguishability of colors in the recolored image by minimizing changes in color contrast:

<math display="block"> E_{\text{cont}} = \sum_{i \neq j} \| (c_i^+ - c_j^+) - (c_i - c_j) \|^2, </math>

where:
* <math>(c_i^+ - c_j^+)</math> is the perceived color difference between pixels <math>i</math> and <math>j</math> after recoloring,
* <math>(c_i - c_j)</math> is the original color difference,
* <math>\| (c_i^+ - c_j^+) - (c_i - c_j) \|</math> represents the deviation in color contrast before and after recoloring.

To address the limitations of this approach, Zhu et al. [9] proposed a degree-adaptable framework incorporating a transformation matrix <math>T</math> that simulates CVD perception. The transformation matrix is defined as:

<math display="block"> T = \begin{bmatrix} t_{11} & t_{12} & t_{13} \\ t_{21} & t_{22} & t_{23} \\ t_{31} & t_{32} & t_{33} \end{bmatrix}, </math>

where <math>t_{ij}</math> are the elements representing the relationships between the original and perceived LMS (Long, Medium, Short wavelength) cone responses for individuals with CVD.

The degree-adaptable loss function extends the optimization by adjusting weights based on perceptual importance, defined as:

<math display="block"> E = \beta \sum_{i=1}^N \alpha_i \| T(c_i^+ - c_i) \|^2 + \sum_{i \neq j} \| T(c_i^+ - c_j^+) - T(c_i - c_j) \|^2. </math>

Here:
* <math>\alpha_i</math> assigns weights to each pixel, prioritizing the preservation of colors with smaller perception errors,
* <math>\| T(c_i^+ - c_i) \|</math> measures the perceptual difference after recoloring,
* <math>\| T(c_i^+ - c_j^+) - T(c_i - c_j) \|</math> quantifies the deviation in color contrast under CVD simulation.

This framework improves both contrast and personalization but requires further optimization for real-time performance.

==== Confusion lines based Method ====
Tsekouras et al. [10] proposed a novel image recoloring approach for individuals with protanopia and deuteranopia, focusing on improving color naturalness and enhancing contrast. Their framework consists of four modules, with a key focus on shifting confusing colors along confusion lines in the CIE 1931 chromaticity diagram.

The process begins with fuzzy clustering, which identifies representative colors (key colors) from the input image. These key colors are then analyzed on the chromaticity diagram, where confusion lines—paths representing colors indistinguishable by individuals with CVD—serve as the basis for recoloring. Confusion lines are defined using the copunctal point of the missing cone type and another reference point:

<math display="block">
d(v, L) = \frac{\left|(x_{cp} - x_0)(y_0 - y_v) - (x_0 - x_v)(y_{cp} - y_0)\right|}{\sqrt{(x_{cp} - x_0)^2 + (y_{cp} - y_0)^2}},
</math>

where:
* <math display="inline">v = (x_v, y_v)</math> is the chromaticity coordinate of the color,
* <math display="inline">L</math> is the confusion line passing through the copunctal point <math display="inline">(x_{cp}, y_{cp})</math> and another reference point <math display="inline">(x_0, y_0)</math>,
* <math display="inline">d(v, L)</math> measures the perpendicular distance from the point <math display="inline">v</math> to the confusion line <math display="inline">L</math>.

Confusing colors, identified as key colors lying on occupied confusion lines, are iteratively shifted to the nearest non-occupied confusion lines to enhance discriminability for CVD viewers. High-ranking colors, determined by their prominence in image clusters, are shifted to the nearest unoccupied confusion lines. This reallocation ensures that these colors are distinguishable to viewers with CVD while minimizing disruption to the image's overall color harmony.

After shifting, the luminance of the recolored key colors is optimized using a regularized objective function to balance naturalness and contrast:
<math display="block">
E = (E_1 + E_2) + \lambda E_3,
</math>

where:
* <math display="inline">E</math> is the total loss,
* <math display="inline">\lambda</math> is a weight parameter controlling the trade-off between contrast enhancement and naturalness preservation.

The first term, <math display="inline">E_1</math>, measures contrast enhancement for normal trichromats:

<math display="block">
E_1 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - b_j\| - \|f_D(a_{i,\text{rec}}) - f_D(b_j)\| \right|,
</math>

where:
* <math display="inline">n_A</math> and <math display="inline">n_B</math> are the number of key colors in clusters <math display="inline">A</math> and <math display="inline">B</math>, respectively,
* <math display="inline">a_i</math> is the chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">b_j</math> is the chromaticity of the <math display="inline">j</math>-th key color in cluster <math display="inline">B</math>,
* <math display="inline">f_D</math> is a function simulating the dichromatic vision of individuals with color vision deficiencies,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color.

The second term, <math display="inline">E_2</math>, measures contrast enhancement for dichromats:

<math display="block">
E_2 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - a_j\| - \|f_D(a_{i,\text{rec}}) - f_D(a_{j,\text{rec}})\| \right|,
</math>

where:
* <math display="inline">a_i</math> and <math display="inline">a_j</math> are the chromaticities of the <math display="inline">i</math>-th and <math display="inline">j</math>-th key colors in cluster <math display="inline">A</math>,
* <math display="inline">f_D(a_{i,\text{rec}})</math> simulates the dichromatic perception of the recolored chromaticity <math display="inline">a_{i,\text{rec}}</math>.

The third term, <math display="inline">E_3</math>, preserves the naturalness of the recolored image:

<math display="block">
E_3 = \frac{1}{n_A} \sum_{i=1}^{n_A} \|a_i - a_{i,\text{rec}}\|,
</math>

where:
* <math display="inline">a_i</math> is the original chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">\|a_i - a_{i,\text{rec}}\|</math> is the Euclidean distance between the original and recolored chromaticities, measuring how much the naturalness is preserved.

This method significantly enhances the contrast and naturalness of recolored images by leveraging confusion line geometry and regularized optimization. However, challenges remain in achieving real-time performance and handling cases where shifting may distort the aesthetic quality of the image.

==== GMM-based Method ====
Huang et al. [11] proposed an efficient and effective re-coloring algorithm for individuals with CVD using a Gaussian Mixture Model (GMM) to represent color distributions. The algorithm comprises four main steps: feature extraction, clustering using GMM, optimization of Gaussian components, and interpolation for recoloring.

Step 1 - Feature Extraction:
Each pixel in the input image is represented in the CIEL*a*b* color space, which approximates perceptual differences using the Euclidean distance between colors. The color feature vector <math display="inline">x</math> is used as input for clustering.

Step 2 - Clustering via GMM:
The color distribution of the image is modeled using a GMM with <math display="inline">K</math> Gaussian components:
<math display="block">
p(x|\Theta) = \sum_{i=1}^K \omega_i G_i(x|\theta_i),
</math>
where:
* <math display="inline">\Theta</math> is the parameter set containing all weights, means, and covariance matrices,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian,
* <math display="inline">G_i(x|\theta_i)</math> is the 3D normal distribution with parameters <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix).

Step 3 - Optimization:
To ensure color distinguishability for CVD viewers, the algorithm adjusts the mean vector of each Gaussian component using an optimization function that preserves the symmetric Kullback-Leibler (KL) divergence:
<math display="block">
D_{sKL}(G_i, G_j) = D_{KL}(G_i \| G_j) + D_{KL}(G_j \| G_i),
</math>
where:
* <math display="inline">D_{KL}(G_i \| G_j)</math> measures the dissimilarity between two Gaussian distributions <math display="inline">G_i</math> and <math display="inline">G_j</math>.

Step 4 - Interpolation for Recoloring:
After optimizing the Gaussians, the mapping function <math display="inline">M_i(\cdot)</math> relocates the mean vectors while maintaining covariance matrices. Interpolation ensures smooth transitions between recolored regions:
<math display="block">
T(x_j)_H = x_j^H + \sum_{i=1}^K p(i|x_j, \Theta) (M_i(\mu_i)_H - \mu_i^H),
</math>
where:
* <math display="inline">T(x_j)_H</math> is the hue adjustment for the <math display="inline">j</math>-th color,
* <math display="inline">M_i(\mu_i)_H</math> is the mapped hue of the <math display="inline">i</math>-th Gaussian's mean.

While the GMM-based approach effectively models color distributions and enhances the contrast of recolored images significantly, it has limitations:
* The accuracy of recoloring depends on the choice of <math display="inline">K</math>, which may vary for different images.
* The method assumes diagonal covariance matrices for computational efficiency, which may oversimplify real-world color distributions. Sometimes the colors in the recolored images are not very natural.
* The high computational complexity in the optimization step of this algorithm may be difficult for real-time applications.

=== Deep Learning based methods ===
Conventional methods for recoloring, including optimization-based approaches (as discussed above), fail to generalize well across varying severity levels and CVD types. While these methods improve color differentiation, they frequently compromise naturalness or require extensive computational resources, making them less suitable for real-time, efficient, personalized applications.

==== GAN-Based Recoloring for CVD ====

In [1] GANs (Generative Adversarial Networks) was explored for recoloring, with a backbone Pix2Pix-GAN, Cycle-GAN, and Bicycle-GAN structure showing promising results. These models are generate creative recolored images by learning mappings between normal and CVD-affected color spaces. However, this and existing GAN approaches struggle with balancing naturalness and contrast. This specific reference also requires paired datasets (since it is adapted from style transfer), making it computationally intensive and less suitable for personalization.

==== Swin Transformer Recoloring ====

The authors in [2] introduced a hierarchical vision transformer (SWIN) architecture that processes images through shifted windows, effectively capturing both local and global contextual information. In computer vision, this design generally allows efficient handling of high-resolution images and has been applied to various tasks, including image classification and object detection. Despite its robust performance, this architecture is still computationally intensive and does not inherently account for the specific needs of CVD individuals, as it lacks mechanisms for personalized color adjustments.

==== Personalized CVD-GAN ====

To cater to the diverse needs of the CVD population, the Personalized CVD-GAN [3] was developed. This model generates images that are not only CVD-friendly but also tailored to individual degrees of color vision deficiency. By disentangling color representations using a unique triple-latent structure in their method, continuous personalization was possible to adjust images according to specific CVD severities. While effective, this approach is computationally demanding, making it less practical for real-time applications. In our experiment, it took around 18 days for one epoch (or one iteration over the entire dataset).

Thus, existing methods either lack personalization or are too resource-intensive for widespread use.

== Methods ==
We aim to find effective and efficient ways to recolor images for people with CVD with the personalization of different severity levels. We start by exploring existing methods and identifying opportunities for improvement. Since mathematical-based approaches provide a solid foundation and are well-documented, we began our experiments by testing these methods, as described in the background. We later extended our exploration to deep learning based methods.

=== Mathematical based ===
We explored four main methods, building on the foundational work discussed in the background section.

==== Method 1: Daltonization as a Baseline ====
We started with the relatively intuitive Daltonization method, where we adjusted the colors in an image to compensate for color vision deficiencies by simulating how the colors appear to individuals with CVD. This involves computing the difference between the original and simulated color perception in the LMS (Long, Medium, Short wavelength) color space. The calculated error is then corrected and mapped back to the RGB space using a transformation matrix, resulting in a recolored image that enhances color differentiation for viewers with CVD.

The simulation of CVDs relies on the physiology of human vision, particularly the responses of the Long (L), Medium (M), and Short (S) wavelength-sensitive cones in the retina. The LMS color space is derived from the spectral sensitivities of these cones, making it an ideal framework for modeling human color perception.

To simulate CVD, we first transformed colors in RGB color space into the LMS color space using the following linear transformation matrix based on Stockman and Sharpe’s cone fundamentals:
<math display="block">
T_{\text{RGB-to-LMS}} = \begin{bmatrix}
0.3904725 & 0.54990437 & 0.00890159 \\
0.07092586 & 0.96310739 & 0.00135809 \\
0.02314268 & 0.12801221 & 0.93605194
\end{bmatrix}
</math>

For individuals with CVD, the missing cone’s response is replaced by a weighted combination of the remaining two cones. This approach, introduced by Brettel, Viénot, and Mollon (1997) [7], uses specific coefficients derived from cone sensitivities. For example, in protanopia (L-cone deficiency), the L-cone response is approximated using the M- and S-cone responses as:
<math display="block">
L_{\text{simulated}} = 0 \cdot L + 0.90822864 \cdot M + 0.008192 \cdot S
</math>

For deuteranopia (M-cone deficiency), the M-cone is replaced as:
<math display="block">
M_{\text{simulated}} = 1.10104433 \cdot L + 0 \cdot M - 0.00901975 \cdot S
</math>

For tritanopia (S-cone deficiency), the S-cone is replaced as:
<math display="block">
S_{\text{simulated}} = -0.15773032 \cdot L + 1.19465634 \cdot M + 0 \cdot S
</math>

These transformations allow accurate simulation of the perceptual experience of individuals with CVD. (The numbers are derived from [5]).

The error between the original and simulated is then mapped into the RGB color space using a deficiency-specific correction matrix, which adjusts the image to enhance contrast and recover lost color differences. The predefined correction matrix is applied to the error in RGB space, transforming it back into LMS space for final adjustments. The corrected LMS values are added back to the original values, producing a recolored image that improves visual accessibility for viewers with CVD. This approach uses the Daltonize-inspired correction matrix:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

==== Method 2: Optimizing Objective Function ====
To improve the results from the Daltonization method, we designed a framework inspired by methods discussed in the background, incorporating dominant color extraction, optimization-based recoloring, and edit propagation. This approach aims to find a balance between the naturalness and contrast while compensating colors that are not visible for corresponding CVD types.

===== 1. Extraction of Dominant Colors =====
We begin by extracting the dominant colors from the input image using fuzzy clustering via a K-means algorithm. This step identifies a reduced set of representative colors that capture the primary color information in the image:
<math display="block">
\mathbf{C} = \{\mathbf{c}_1, \mathbf{c}_2, \ldots, \mathbf{c}_N\},
</math>
where <math display="inline">N</math> represents the number of clusters, and <math display="inline">\mathbf{c}_i</math> represents the centroid of the <math display="inline">i</math>-th cluster.

===== 2. Optimization-Based Recoloring =====
Once the dominant colors are extracted, we apply an optimization process to adjust these colors. The optimization uses the formulas mentioned in [9], and aims to balance two key objectives:

1. Naturalness Preservation: Ensures the recolored image minimally deviates from the original.
<math display="block">
E_{\text{nat}} = \sum_{i=1}^N \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_i^{\text{original}}) \|^2,
</math>
where <math display="inline">\mathbf{T}</math> is the transformation matrix based on the severity and type of CVD, and <math display="inline">\mathbf{c}_i^{\text{original}}</math> is the original color.

2. Contrast Enhancement: Improves the differentiation of colors for individuals with CVD:
<math display="block">
E_{\text{cont}} = \sum_{i=1}^N \sum_{j>i} \left( \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_j) \|^2 - \| \mathbf{c}_i^{\text{original}} - \mathbf{c}_j^{\text{original}} \|^2 \right)^2.
</math>

The total objective function combines these two terms:
<math display="block">
E = \beta E_{\text{nat}} + E_{\text{cont}},
</math>
where <math display="inline">\beta</math> controls the trade-off between naturalness and contrast.

Optimization is performed using the L-BFGS-B algorithm to ensure efficient convergence under bounded constraints.

The transformation matrices for each type of CVD are the following, which are based on [12]:

<div style="text-align:center;">
<math display="inline">
T_{\text{Protanopia}} = \begin{bmatrix} 0.566 & 0.558 & 0 \\ 0.433 & 0.442 & 0.242 \\ 0 & 0 & 0.758 \end{bmatrix},
</math>
<math display="inline">
T_{\text{Deuteranopia}} = \begin{bmatrix} 0.625 & 0.7 & 0 \\ 0.375 & 0.3 & 0.3 \\ 0 & 0 & 0.7 \end{bmatrix},
</math>
<math display="inline">
T_{\text{Tritanopia}} = \begin{bmatrix} 0.95 & 0 & 0 \\ 0.05 & 0.433 & 0 \\ 0 & 0.567 & 1 \end{bmatrix}.
</math>
</div>

===== 3. Edit Propagation =====
After optimizing the dominant colors, we propagate these edits across the entire image to ensure smooth transitions. This propagation step leverages the CIE-Lab color space, which is perceptually uniform, meaning that the Euclidean distance in this space correlates well with human color perception. The process begins by mapping the original image and the optimized dominant colors into the Lab color space. In this space, the differences between the original and recolored dominant colors are computed to capture the adjustments made during the optimization step:
<math display="block">
\Delta L^* = \text{griddata}(\mathbf{c}^{\text{original}}, \mathbf{c}^{\text{recolored}} - \mathbf{c}^{\text{original}}, \mathbf{I}),
</math>
where <math display="inline">\mathbf{I}</math> represents the pixel values in the Lab color space. Once the interpolated changes are computed, they are applied to the Lab representation of the original image. Finally, the adjusted Lab values are converted back to the RGB color space to reconstruct the recolored image.

==== Method 3: Improved with Confusion Line Adjustments ====
This method builds upon the previous method by introducing enhancements in dominant color extraction, optimization, and edit propagation, while incorporating an additional step to adjust colors near confusion lines in the CIE 1931 xyY color space inspired by [10]. These improvements aim to further enhance contrast and naturalness of the recolored images. Moreover, this method adds flexibility in adjusting for different severity levels for each CVD type.

===== 1. Improvements on Method 2 =====
To improve the performance of dominant color extraction, we transitioned from traditional K-means to MiniBatch K-means. This algorithm processes data in small batches, significantly reducing computational time while maintaining accuracy in clustering. The number of dominant colors was also reduced from 50 to 30 to focus on key representative colors and further enhance efficiency. The optimization objective is refined to leverage vectorization, improving computational efficiency. The two key terms remain:
<math display="block">
E = \beta E_{\text{nat}} + (1 - \beta) E_{\text{cont}}.
</math>
The optimization objective was refined to significantly improve computational efficiency by replacing the nested loops in the contrast enhancement term with vectorized operations. In the original implementation, the pairwise differences between colors were calculated iteratively using <math display="inline">O(N^2)</math> nested loops. The improved version eliminates this overhead by leveraging array broadcasting to compute all pairwise differences simultaneously, and the transformation matrix <math display="inline">\mathbf{T}</math> is then applied to all pairwise differences in a single tensor operation:
<math display="block">
\mathbf{T}_{\Delta} = \text{tensordot}(\Delta_{ij}, \mathbf{T}),
</math>
and the norms are computed in parallel across the entire array. Additionally, the weighting parameter <math display="inline">\beta</math> was adjusted to favor naturalness preservation, ensuring better visual integrity in the recolored image.
The propagation step changed to use a k-d tree for fast nearest neighbor searches, replacing grid-based interpolation. This approach more efficiently matches each pixel in the Lab color space to the closest dominant color:
<math display="block">
\mathbf{I}_{\text{adjusted}} = \mathbf{C}_{\text{recolored}}[\text{k-d tree query}(\mathbf{I})],
</math>
where <math display="inline">\mathbf{I}</math> represents the pixel values in Lab space.
These refinements enable faster optimization while improving the balance between naturalness and contrast enhancement.

===== 2. Confusion Line Adjustments =====
An additional step adjusts colors near confusion lines in the CIE 1931 xyY color space to enhance distinguishability:

1. Confusion lines are defined for protanopia, deuteranopia, and tritanopia, based on [10]. For example, for protanopia:
<math display="block">
\text{Confusion Line: Start} = (0.735, 0.265), \quad \text{End} = (0.115, 0.885).
</math>

2. Colors near the confusion line are identified using orthogonal distance:
<math display="block">
d(\mathbf{xy}, L) = \frac{\| (\mathbf{xy} - \mathbf{p}_1) \times (\mathbf{p}_2 - \mathbf{p}_1) \|}{\|\mathbf{p}_2 - \mathbf{p}_1\|},
</math>
where <math display="inline">\mathbf{p}_1</math> and <math display="inline">\mathbf{p}_2</math> are the start and end points of the confusion line.

3. Identified colors are shifted orthogonally away from the line:
<math display="block">
\mathbf{xy}_{\text{adjusted}} = \mathbf{xy} + \lambda \mathbf{v}_{\perp},
</math>
where <math display="inline">\mathbf{v}_{\perp}</math> is a perpendicular vector, and <math display="inline">\lambda</math> is a scaling factor.

===== 3. Personalise with Severity Levels =====
To take into account of severity levels, the transformation matrix <math display="inline">\mathbf{T}</math> linearly interpolates between normal vision and full CVD perception based on severity and type:
<math display="block">
\mathbf{T} = (1 - s) \mathbf{I} + s \mathbf{T}_{\text{CVD}},
</math>
where <math display="inline">s</math> represents the severity of CVD (0-100%), <math display="inline">\mathbf{I}</math> is the identity matrix, and <math display="inline">\mathbf{T}_{\text{CVD}}</math> is the full transformation matrix specific to protanopia, deuteranopia, or tritanopia. Such a method is based on DaltonLens simulator [13].

These improvements significantly enhanced both the effectiveness and efficiency of the recoloring process on top of Method 2.

==== Method 4: Improved with GMM-based Method ====
The last mathematical method we exprimented enhances recoloring by integrating a Gaussian Mixture Model (GMM)-based global recoloring algorithm. The method also applies nonlinear adjustments for colors near confusion lines to ensure improved contrast and naturalness.

===== 1. GMM-Based Global Recoloring =====
The image is first resized and transformed into the Lab color space. A GMM is applied to cluster the color distribution into <math display="inline">K</math> components, optimizing the number of clusters using the Bayesian Information Criterion (BIC):
<math display="block">
\text{BIC} = -2 \cdot \text{log-likelihood} + P \cdot \log(N),
</math>
where <math display="inline">P</math> represents the model parameters and <math display="inline">N</math> is the number of pixels.

The GMM means are simulated using the transformation matrix <math display="inline">T</math> with severity levels taken into account, and the symmetric Kullback-Leibler (KL) divergence (<math display="inline">D_{\text{sKL}}</math>) is calculated between pairs of clusters:
<math display="block">
D_{\text{sKL}}(i, j) = D_{\text{KL}}(G_i \| G_j) + D_{\text{KL}}(G_j \| G_i),
</math>
where <math display="inline">G_i</math> and <math display="inline">G_j</math> are Gaussian components, and <math display="inline">D_{\text{KL}}</math> represents the KL divergence. The GMM cluster means are then adjusted by solving a nonlinear least-squares problem to minimize the discrepancy.

===== 2. Adjusting Near Confusion Lines Improved =====
Following global recoloring, colors near confusion lines in the CIE 1931 xyY color space are further adjusted based on formulas used in Method 3. Nonlinear scaling is applied to amplify the shifts for pixels closer to the line:
<math display="block">
w = \left( \frac{\text{threshold} - d}{\text{threshold}} \right)^2,
</math>
where <math display="inline">w</math> is the scaling factor.

The adjustments from the GMM and confusion line steps are combined to produce the final recolored image. These enhancements make the method more robust and effective for individuals with varying levels of CVD.

=== Deep Learning based ===

==== Task Overview ====
Given an input RGB image and a label for the user (as shown in the figure), we want a deep learning model to output a recolored RGB image that is specific to that user. More details on inputs and outputs are discussed in further sections but an overview is shown in Figure 1. All of the code was done in Python using a deep learning framework called [https://pytorch.org PyTorch]
[[File:Io.png|right|thumb|200px|Figure 1: Dataset]]

==== Types ====
1. ''' Supervised methods ''':
These are deep learning models that require a 'ground truth' recolored image for the neural network to learn recolorization. While these methods are simple, easy to train and integrate the user label, they require an already present ground truth comparison of expected output.

2. ''' Unsupervised methods ''':
These models are trained without a ground truth and can also encode user label information while training. They are generally better at generating more natural images, but they require more compute and sophisticated model architectures or loss functions for the recoloring task

==== Dataset ====
The dataset used for this project was constructed specifically to address the challenges of recoloring images for individuals with color vision deficiency (CVD). We first gathered an open-source RGB image dataset from [2]. To improve the capability of the proposed model to enhance the contrast between CVD-indistinguishable color
pairs, in their study, they created a new dataset consisting of 141,000 pictures of both natural scenes and artificial images containing
CVD-confusing colors without labels. To generate labels (and ground truth recolored images for supervised methods), we randomly sampled 15,000 images and recolored by simulating random labels for severity and type of CVD. The recoloring for ground truth images was done using a [https://github.com/jbhuang0604/RecolorForColorblind/tree/master MATLAB script] (adapted to Python) from [4]. Note: The open-source tools used in the Python version for the recoloring script were [https://scikit-image.org Scikit-Image], [https://scipy.org Scipy] and [https://python-colormath.readthedocs.io/en/latest/ Colormath].

As shown in Figure 1, each sample in the dataset consists of:

1. ''' Original RGB Image''' : High-resolution images, resized to <code> 256x256</code> pixels and normalized to <code>[0,1]</code> range, representing the standard color space.

2. ''' CVD Labels ''' : Condition labels encoded as <code>severity * [protan, deutan]</code>, where severity ranges from 0.1 to 1.0. For example, a label <code>[0.6, 0]</code> corresponds to protanopia at 60% severity.

Data augmentation techniques such as random rotations, crops, and brightness adjustments were applied to expand the dataset, ensuring robust model generalization across diverse scenarios.

==== Supervised Methods ====
===== Conditional Parallel RGB MLP =====
[[File:mlp.png|right|thumb|Figure 2: Conditional MLP architecture]]
As shown in Figure 2, the model predicts the R, G, and B channels separately using an independent multi-layer perceptron (MLP) for each channel. The input image is concatenated with the label encoding along the channel dimension and is passed to 3 parallel MLPs simultaneously. These parallel networks are learned to predicted R, G, B channels of a recolored image based on given ground truth. The outputs from each of these networks are concatenated to produce the recolored RGB image of same spatial dimensions as input. Essentially, each channel is disentangled, enabling targeted adjustments.

The loss function used to train was pixel wise, mean-squared error loss:
<math>
\mathcal{L}_{\text{MSE}} = \frac{1}{N} \sum_{p=1}^{N} \left( I(p) - I'(p) \right)^2
</math>

Where:
* I, I': Recolored (model output) image and ground truth recolored image respectively
* p: Image index
* N: Total number of images

===== Conditional U-Net =====
In a similar fashion of inputs, a convolutional neural network (CNN)-based U-Net architecture was tested to generate a full recolored image as output. The conditional inputs here affect both the encoder and decoder. [[File:Unet condtional.png|right|thumb|Figure 3: Conditional U-Net architecture]]
U-Nets are widely used in computer vision tasks and are very robust to new tasks as well. The architecture we adopted is shown in Figure 3.
The loss function used to train the U-Net was a commonly used VGG Perceptual Loss:
<math>
\mathcal{L}_{\text{VGG}} = \sum_{l} \frac{1}{N_l} \| \phi_l(I) - \phi_l(I') \|_2^2
</math>

Where:
* I and I': are recolored (model output) and ground recolored images respectively
* <math>\phi_l</math> is the l-th of the pre-trained VGG network

==== Unsupervised Methods ====
===== Conditional Autoencoder =====
As shown in Figure4, an unsupervised CNN-based encoder-decoder network was trained to reconstruct full recolored images with a CVD-aware color palette. The key to making this network align with the recoloring task was the loss functions. The loss functions we used to train this network were inspired from [2]. [[File:Ae.png|right|350px|thumb|Figure 4: Conditional Autoencoder architecture]]

The total loss function is given by:
<math>
\mathcal{L}_{\text{total}} = \alpha \cdot \mathcal{L}_{\text{naturalness}} + 2 \cdot (1 - \alpha) \cdot \mathcal{L}_{\text{contrast}}
</math>

Where:
<math>
\mathcal{L}_{\text{contrast}} = \beta \cdot \mathcal{L}_{\text{global}} + (2 - \beta) \cdot \mathcal{L}_{\text{local}}
</math>

The components of the loss functions are described below:

1. '''Global Contrast Loss''':
The global contrast loss ensures that the overall contrast of the recolored image is preserved. It is defined as
<math>
\mathcal{L}_{global} = \frac{1}{\|\omega\|} \sum_{<x, y> \in \epsilon \omega} \text{CL}(x, y)
</math>

2. '''Local Contrast Loss''':
The local contrast loss focuses on preserving the contrast within a small neighborhood around each pixel. <math>
\mathcal{L}_{l} = \frac{1}{N} \sum_{x=1}^{N} \sum_{y \in \omega_x} \frac{\text{CL}(x, y)}{\|\omega_x\|}
</math>

Note:

<math>
\text{CL}(x, y) = \|\hat{c}_x' - \hat{c}_y'\| - \|c_x - c_y\|
</math>

* x,y: Two distinct pixels in the image.
* cx and cy: CVD simulated colors of original image
* c^x′and c^y: CVD simulated colors of recolored image (model output)
* ||w||: Global (or large) window of image
* ||wx||: Local window or neighborhood around a pixel x

3. '''Naturalness Loss''':
The naturalness loss drives output image to have colors that are visually similar and close to natural distributions. <math>
\mathcal{L}_{\text{natural}} = 1 - \text{SSIM}(I', I)
</math>

Where:
* I(i), I'(i): Original and recolored images respectively

== Results ==

=== Mathematical based methods ===

==== Qualitative Results ====
The results and takeaways can be summarized as follows:

1. '''Method 1: Daltonization Baseline''':
[[File:Method1.png|400px|thumb|right|Figure 10 Method 1 Results]]
*

2. '''Method 2: Optimizing Objective Functions''':
[[File:Method2.png|400px|thumb|right|Figure 11 Method 2 Results]]

3. '''Method 3: Adjustments Near Confusion Lines with Improved Method 2''':
[[File:Method3.png|400px|thumb|right|Figure 12 Method 3 Results]]

4. '''Method 4: Improved with GMM-based Method''':
[[File:Method4.png|400px|thumb|right|Figure 13 Method 4 Results]]

==== Quantitative Results ====
{| class="wikitable"
|+ Table 1: Quantitative Evaluation Results for Mathematical Methods
! Original vs Recolored !! Method 1 !! Method 2 !! Method 3 !! Method 4
|-
| SSIM || 0.0066 || 0.9998 || 0.9988 || 0.9902
|-
| TCC || 0.4211 || 0.0001 || 0.0003 || 0.0005
|-
| CD ΔE76 || 57.4513 || 0.0217 || 0.0632 || 0.1057
|-
| CIEDE2000 || 41.2667 || 0.0229 || 0.0675 || 0.1312
|-
| CIEDE94 || 57.3637 || 0.0217 || 0.0630 || 0.1056
|-
| D-CIELAB ΔEab || 2.1314 || 3.8863 || 7.6867 || 8.0045
|-
| Time/image || 0.2s || 1m13s || 4.4s || 1.6s
|}

=== Deep Learning based methods ===
The results focus on evaluating the performance of the above neural network architectures—Conditional Parallel RGB MLP, Deep U-Net, and Conditional Autoencoder. Quantitive metrics such as Structural Similarity Index (SSIM), total color contrast (TCC), Chromatic Difference (CD), and inference time were used to assess the effectiveness of the models provided in [1] and [2].

==== Qualitative Results ====
The recolored outputs were visually evaluated to determine their alignment with expected results. The 'expected' results for supervised mean how closely they resemble ground truth recolored image and for unsupervised method mean how much contrast and naturalness is observed in the CVD simulated recolored images compared to original.
The results and takeaways can be summarized as follows:

1. '''Conditional Parallel RGB MLP''': (Figure 5)
[[File:Mlp_res.png|right|400px|thumb|Figure 5 Conditional MLP: Model failure]]
* Recoloring was inconsistent, with visible artifacts in regions where spatial correlations were essential.
* The pixels seemed more discretized, suggesting that disentanglement was not very useful for this case (especially naturalness).
* Failed to preserve natural color transitions, particularly in complex images.
2. '''Conditional U-Net''': (Figure 6, 7)
[[File:Unet_res1.png|right|400px|thumb|Figure 6 Conditional U-Net: Model failure]]
[[File:Unet_res2.png|right|400px|thumb|Figure 7 Conditional U-Net: CVD Simulated examples]]
* Produced stable recoloring, preserving structural details.
* Initially showed improvement towards resembling ground truth, but over time started 'reconstructing' the colors of the original image.
* The CVD simulations of recolored versus original were similar or worse meaning that the model was not doing well for this task
* Sometimes it over-saturated some colors, affecting the visual appeal.
3. '''Conditional Autoencoder''': (Figure 8, 9)
[[File:ae_res1.png|right|400px|thumb|Figure 8 Conditional Autoencoder: Majority good results]]
[[File:ae_res1.png|right|400px|thumb|Figure 9 Conditional Autoencoder: Marginal or negative improvement + Blurriness]]
* Achieved smooth and natural recoloring, with fewer artifacts.
* Showed the highest contrast improvement among the three models.
* In some cases, hurt the contrast in the CVD simulated colors and in some there was marginal improvement in contrast.
* Blurriness in the recolored images was seen (possibly due to naturalness factor being more prioritized even though weight coefficients in the loss term favored contrast (alpha = 0.25, beta = 1.0)).

==== Quantitative Results ====
Based on the above qualitative results, we decided to score and evaluate metrics for comparison with related work only using the Conditional Autoencoder.
As mentioned above, the evaluation metrics are adapted from [1] and [2]. Please refer to the definitions in the paper, as we have used the same. On a high level, the three components are:
* SSIM: Measures the structural similarity between the original and recolored images, ensuring the structural integrity of the recolored image is maintained.
<math>
SSIM(X, Y) = \frac{(2\mu_X\mu_Y + c_1)(2\sigma_{XY} + c_2)}{(\mu_X^2 + \mu_Y^2 + c_1)(\sigma_X^2 + \sigma_Y^2 + c_2)}
</math>

* Total Color Contrast: Quantifies the visibility improvement between indistinguishable colors for CVD individuals.
<math>
TCC = \frac{1}{n_1} \sum_{(i,j) \in \Omega_1} |x_i - x_j|
+ \frac{1}{N \cdot n_2} \sum_{i=1}^{N} \sum_{j \in \Omega_2} |x_i - x_j|
</math>
* Chromatic Difference: Quantifies the perceptual differences in color before and after recoloring, ensuring enhanced distinguishability
<math>
CD(i) = \sqrt{\lambda (l_i' - l_i)^2 + (a_i' - a_i)^2 + (b_i' - b_i)^2}
</math>
(lamda is a constant, not wavelength and l,a,b represent LAB space coordinates of recolored (') and original respectively.)
* Inference Time: Determines the computational efficiency of the models.

The key results are in Table 2 and takeaways for the Conditional Autoencoder can be summarized as follows:

{| class="wikitable" style="text-align:center; width:40%; margin:auto;"
|-
! Metric
! Value
|-
| Inference Time
| 2.6 seconds/image
|-
| SSIM ("Structure")
| 0.8707
|-
| Total Color Contrast ("Distinguishability")
| 0.5771 (vs. ~0.851)*
|-
| Chromatic Difference ("Color")
| 0.3521 (vs. ~0.963)*
|+ '''Table 2: Quantitative Evaluation Results'''
|}

Note: * indicates results from paper [2] for protan/deutan whichever is larger.

* TCC and CD are good but not as good as paper [2] because they use optimize networks for each CVD type separately.
* Blurry (SSIM is not optimized for enough)
* Mixing CVD types in the same network needs to be more sophisticated

== Conclusions ==
Through our (many) experiments, we learned a couple of things:

1. '''Model Effectiveness''':
Among the models, the Conditional Autoencoder showed the best balance between enhancing color contrast and preserving naturalness. It improved the distinguishability of colors for CVD individuals while maintaining a smooth, visually appealing output. However, it produced slightly blurry images, which could be improved with better loss functions or refinement techniques. The Conditional U-Net was also effective in preserving structure and providing stable recoloring, but it required careful training to avoid overfitting. The Conditional Parallel RGB MLP, while computationally fast, lacked the ability to capture spatial relationships between pixels, making it unsuitable for this task.

2. '''Importance of Loss Functions''':
Designing appropriate loss functions was crucial for achieving the right balance between naturalness, contrast enhancement, and structural preservation. The global and local contrast losses significantly improved the visibility of recolored images, while the naturalness loss ensured that the outputs did not look artificial. Incorporating metrics like SSIM and Chromatic Difference into the evaluation also helped us better understand how well the models performed.

3. '''Challenges with Data''':
One of the biggest challenges was ensuring that the dataset effectively represented real-world scenarios for CVD individuals. Simulating CVD perceptions and generating recolored images that matched those perceptions required a well-defined pipeline. A more diverse dataset or additional user studies with CVD participants could help fine-tune the models further.

4. '''Computational Efficiency''':
While models like the Conditional Autoencoder and Conditional U-Net provided high-quality recoloring, their inference times were moderate, making them feasible for real-time applications. Optimizing these models further could make them more scalable for real-world use cases, such as accessibility tools in apps or websites.

5. '''What Worked and What Didn’t''':
* Worked: Contrast enhancement methods using local and global losses were effective in improving visibility for CVD individuals. Transformer-inspired loss functions borrowed from Swin architecture added robustness.
* Didn’t Work: Pixel-wise methods like the Conditional RGB MLP struggled due to their inability to handle spatial dependencies. Additionally, overfitting was a recurring issue in larger architectures without careful training.

6. '''Future Directions''':
* Better Loss Functions: Refining the loss functions to address issues like blurriness in outputs could further improve results.
* User Studies: Testing the models with real CVD participants would provide valuable insights and help validate the results.
* Model Optimization: Reducing the computational cost of high-performing models like the Conditional Autoencoder could make them more practical for deployment.
* Exploration of New Architectures: Trying newer methods, such as lightweight transformers or diffusion-based models, might enhance recoloring performance while maintaining efficiency.

While there’s still room for improvement, our models demonstrated the potential of deep learning in addressing the challenges faced by individuals with CVD. Our future work would focus on refining these methods and bringing them closer to practical, everyday applications.

== References ==
[1] Li, H., Zhang, L., Zhang, X., Zhang, M., Zhu, G., Shen, P., ... & Shah, S. A. A. (2020). Color vision deficiency datasets & recoloring evaluation using GANs. Multimedia Tools and Applications, 79, 27583-27614.

[2] Chen, L., Zhu, Z., Huang, W., Go, K., Chen, X., & Mao, X. (2024). Image recoloring for color vision deficiency compensation using Swin transformer. Neural Computing and Applications, 36(11), 6051-6066.

[3] Jiang, S., Liu, D., Li, D., & Xu, C. (2023). Personalized image generation for color vision deficiency population. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22571-22580).

[4] Huang, J.-B., Chen, C.-S., Jen, T.-C., & Wang, S.-J. (n.d.). Image recolorization for the colorblind [GitHub repository]. Retrieved December 12, 2024, from https://github.com/jbhuang0604/RecolorForColorblind

[5] Dietrich, J. (n.d.). Daltonize Python Package [GitHub repository]. Retrieved December 12, 2024, from https://github.com/joergdietrich/daltonize/blob/main/daltonize/daltonize.py

[6] Dougherty, B., & Wade, A. (2000). Vischeck. Retrieved December 12, 2024, from https://www.vischeck.com/

[7] Brettel, H., Viénot, F., & Mollon, J. D. (1997). Computerized simulation of color appearance for dichromats. Josa a, 14(10), 2647-2655.

[8] Zhu, Z., Toyoura, M., Go, K., Fujishiro, I., Kashiwagi, K., & Mao, X. (2019). Processing images for red–green dichromats compensation via naturalness and information-preservation considered recoloring. The Visual Computer, 35, 1053-1066.

[9] Zhu, Z., Toyoura, M., Go, K., Kashiwagi, K., Fujishiro, I., Wong, T. T., & Mao, X. (2021). Personalized image recoloring for color vision deficiency compensation. IEEE Transactions on Multimedia, 24, 1721-1734.

[10] Tsekouras, G. E., Rigos, A., Chatzistamatis, S., Tsimikas, J., Kotis, K., Caridakis, G., & Anagnostopoulos, C. N. (2021). A novel approach to image recoloring for color vision deficiency. Sensors, 21(8), 2740.

[11] Huang, J. B., Chen, C. S., Jen, T. C., & Wang, S. J. (2009, April). Image recolorization for the colorblind. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 1161-1164). IEEE.

[12] Color-Blindness.com. (n.d.). COBLIS - Color Blindness Simulator. Retrieved December 13, 2024, from https://www.color-blindness.com/coblis-color-blindness-simulator/

[13] DaltonLens. (n.d.). DaltonLens-Python [Computer software]. GitHub. Retrieved December 13, 2024, from https://github.com/DaltonLens/DaltonLens-Python

== Appendix I ==
* [https://github.com/rainasong/psych221-aut24-final-project.git Code]
* [https://drive.google.com/drive/folders/10WMXPbtpV7Hy5_qBA_TCEbW-kCpj1D7v Dataset]

=== Additional results ===
1. '''Recolored Images - Conditional Autoencoder'''
<div style="text-align: center;">
<div style="display: inline-block; vertical-align: middle;">
[[File:eb_1.png|400px|Wikipedia encyclopedia]]
</div>
<div style="display: inline-block; vertical-align: middle;">
[[File:eb_2.png|400px]]
</div>
</div>

2. '''Loss curves'''
<div style="text-align: center;">
<div style="display: inline-block; vertical-align: middle;">
[[File:loss_ae.png|350px|thumb|Conditional Autoencoder]]
</div>
<div style="display: inline-block; vertical-align: middle;">
[[File:loss_unet.png|350px|thumb|Conditional U-Net]]
</div>
<div style="display: inline-block; vertical-align: middle;">
[[File:loss_mlp.png|350px|thumb|Conditional MLP]]
</div>
<div style="clear: both; text-align: center;">
Losses: Conditional Autoencoder, Conditional U-Net, and Conditional MLP
</div>
</div>

3. '''Mathematical method results with color plates'''

<div style="text-align: center;">
<div style="display: inline-block; vertical-align: middle;">
[[File:Method1-color-plates.png|400px|thumb|Method 1 Color Plates Results]]
</div>
<div style="display: inline-block; vertical-align: middle;">
[[File:Method2-color-plates.png|400px|thumb|Method 2 Color Plates Results]]
</div>
</div>

<div style="text-align: center;">
<gallery mode="nolines" widths="400px" heights="300px" caption="Method 3 Color Plates Results for Protanopia, Deuteranopia, and Tritanopia with Severity Levels">
File:Method3-protan.png|Protanopia
File:Method3-deutan.png|Deuteranopia
File:Method3-tritan.png|Tritanopia
</gallery>
</div>

<div style="text-align: center;">
<gallery mode="nolines" widths="400px" heights="300px" caption="Method 4 Color Plates Results for Protanopia, Deuteranopia, and Tritanopia with Severity Levels">
File:Method4-protan.png|Protanopia
File:Method4-deutan.png|Deuteranopia
File:Method4-tritan.png|Tritanopia
</gallery>
</div>

== Appendix II ==
'''Ishikaa''':
* Training, evaluation and visualization for all deep learning methods (MLP, U-Net and Autoencoder)
* GMM recoloring method in Python & adding severity index
* 'Ground Truth' dataset creation and logging
* AWS Compute setup & configuration
* Written Report & Presentation

'''Raina''':

File:Method2-color-plates.png

2024-12-13T11:39:48Z

Rainas: Rainas uploaded a new version of File:Method2-color-plates.png

File:Method4-tritan.png

2024-12-13T11:29:36Z

Rainas:

File:Method4-protan.png

2024-12-13T11:29:25Z

Rainas:

File:Method4-deutan.png

2024-12-13T11:29:15Z

Rainas:

File:Method3-tritan.png

2024-12-13T11:29:05Z

Rainas:

File:Method3-protan.png

2024-12-13T11:28:51Z

Rainas:

File:Method3-deutan.png

2024-12-13T11:28:38Z

Rainas:

File:Method2-color-plates.png

2024-12-13T11:28:25Z

Rainas:

File:Method1-color-plates.png

2024-12-13T11:28:15Z

Rainas:

File:Method4.png

2024-12-13T11:25:34Z

Rainas: Method 4 Results

== Summary ==
Method 4 Results

File:Method3.png

2024-12-13T11:25:23Z

Rainas: Method 3 Results

== Summary ==
Method 3 Results

File:Method2.png

2024-12-13T11:25:12Z

Rainas: Method 2 Results

== Summary ==
Method 2 Results

File:Method1.png

2024-12-13T11:22:36Z

Rainas: Method 1 Results

== Summary ==
Method 1 Results

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-13T10:59:46Z

Rainas: /* Results */

== Introduction ==
Color Vision Deficiency (CVD) affects approximately 350 million individuals worldwide, impairing their ability to distinguish certain colors. Image recoloring for individuals with CVDs has been a well-researched area, with numerous attempts aimed at creating images that make colors more distinguishable for those with CVDs. CVDs are typically classified into three main types: protanopia (difficulty perceiving red), deuteranopia (difficulty perceiving green), and tritanopia (difficulty perceiving blue). These conditions arise due to the absence or malfunction of certain types of cone cells in the retina, which are responsible for color vision. For instance, the lack or defect of red or green cones leads to protanopia or deuteranopia, respectively. This can result in partial or complete loss of perception of specific colors. Moreover, individuals with CVD can be dichromats (completely missing one type of cone cell), or anomalous trichromats (having all three types of cones but with altered sensitivity), causing milder color perception issues. Most research has focused on protanopia and deuteranopia, as these conditions are more prevalent, and only a few consider different severity levels.

In this work, we aim to consider all three types of CVDs, taking into account varying levels of severity for personalization. We explore existing methods in the field and experiment through two main approaches: mathematical transformations and deep learning techniques. We will start by reviewing current advancements in these two domains, followed by presenting our experiments and results. Evaluations of each method will be provided, leading to a discussion of our findings and outlining potential directions for future work.

The motivation for this work arose from a personal experience. While admiring a rainbow, a phenomenon I find among the most beautiful in the world, with a friend who has deuteranopia, I realized that they were unable to distinguish the vibrant array of colors. This experience highlighted the emotional and perceptual gap caused by CVD, inspiring the goal of this project: to develop personalized and efficient tools that enhance color perception for individuals with CVDs. Ultimately, we aim to enable those with CVDs to enjoy the same vivid experiences, such as the beauty of a rainbow, experienced by those with normal color vision.

== Background ==
In recent years, numerous methods have been developed to recolor images for individuals with CVDs, ranging from traditional mathematical approaches to advanced deep learning techniques. This section focuses on the prominent recent works in these two categories.

=== Mathematical-based methods ===
Mathematical approaches to image recoloring for individuals with CVDs have been extensively developed to enhance color discrimination while trying to preserve the natural appearance of images. These methods typically involve color space transformations, optimization techniques, and perceptual modeling to achieve their objectives.

==== Daltonization ====
Daltonization enhances images for individuals with CVD by correcting colors based on the simulated deficiency. The process involves comparing the original LMS values with the simulated deficient values to compute the error:
<math display="block">
\text{Error}_{\text{LMS}} = \text{LMS}_{\text{original}} - \text{LMS}_{\text{simulated}}
</math>

The error is then mapped back to the RGB space using a correction matrix because the error contains the information that dichromats cannot see, and the correction matrix rotates it to a part of the spectrum that they can see. For example, the correction matrix, as implemented in tools like Daltonize [5] and Vischeck [6], is:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

The corrected RGB values are added back to the original LMS values to generate a daltonized image that improves contrast for CVD viewers.

==== Optimization-based Method ====
Zhu et al. [8] introduced an optimization-based recoloring framework for red-green dichromacy, aiming to balance naturalness and contrast. The framework minimizes a total loss function defined as:

<math display="block"> E = \beta E_{\text{nat}} + E_{\text{cont}} </math>

where <math>\beta</math> is a scalar weight that controls the trade-off between the two objectives: naturalness preservation (<math>E_{\text{nat}}</math>) and contrast enhancement (<math>E_{\text{cont}}</math>).

The naturalness term, <math>E_{\text{nat}}</math>, ensures that the recolored image closely resembles the original image for CVD viewers by minimizing perceptual differences:

<math display="block"> E_{\text{nat}} = \sum_{i=1}^N \| c_i^+ - c_i \|^2, </math>

where:
* <math>N</math> is the total number of pixels in the image,
* <math>c_i</math> is the original color of the <math>i</math>-th pixel,
* <math>c_i^+</math> is the recolored value of the <math>i</math>-th pixel,
* <math>\| c_i^+ - c_i \|</math> is the Euclidean distance, measuring the perceptual difference between the original and recolored colors.

The contrast term, <math>E_{\text{cont}}</math>, enhances the distinguishability of colors in the recolored image by minimizing changes in color contrast:

<math display="block"> E_{\text{cont}} = \sum_{i \neq j} \| (c_i^+ - c_j^+) - (c_i - c_j) \|^2, </math>

where:
* <math>(c_i^+ - c_j^+)</math> is the perceived color difference between pixels <math>i</math> and <math>j</math> after recoloring,
* <math>(c_i - c_j)</math> is the original color difference,
* <math>\| (c_i^+ - c_j^+) - (c_i - c_j) \|</math> represents the deviation in color contrast before and after recoloring.

To address the limitations of this approach, Zhu et al. [9] proposed a degree-adaptable framework incorporating a transformation matrix <math>T</math> that simulates CVD perception. The transformation matrix is defined as:

<math display="block"> T = \begin{bmatrix} t_{11} & t_{12} & t_{13} \\ t_{21} & t_{22} & t_{23} \\ t_{31} & t_{32} & t_{33} \end{bmatrix}, </math>

where <math>t_{ij}</math> are the elements representing the relationships between the original and perceived LMS (Long, Medium, Short wavelength) cone responses for individuals with CVD.

The degree-adaptable loss function extends the optimization by adjusting weights based on perceptual importance, defined as:

<math display="block"> E = \beta \sum_{i=1}^N \alpha_i \| T(c_i^+ - c_i) \|^2 + \sum_{i \neq j} \| T(c_i^+ - c_j^+) - T(c_i - c_j) \|^2. </math>

Here:
* <math>\alpha_i</math> assigns weights to each pixel, prioritizing the preservation of colors with smaller perception errors,
* <math>\| T(c_i^+ - c_i) \|</math> measures the perceptual difference after recoloring,
* <math>\| T(c_i^+ - c_j^+) - T(c_i - c_j) \|</math> quantifies the deviation in color contrast under CVD simulation.

This framework improves both contrast and personalization but requires further optimization for real-time performance.

==== Confusion lines based Method ====
Tsekouras et al. [10] proposed a novel image recoloring approach for individuals with protanopia and deuteranopia, focusing on improving color naturalness and enhancing contrast. Their framework consists of four modules, with a key focus on shifting confusing colors along confusion lines in the CIE 1931 chromaticity diagram.

The process begins with fuzzy clustering, which identifies representative colors (key colors) from the input image. These key colors are then analyzed on the chromaticity diagram, where confusion lines—paths representing colors indistinguishable by individuals with CVD—serve as the basis for recoloring. Confusion lines are defined using the copunctal point of the missing cone type and another reference point:

<math display="block">
d(v, L) = \frac{\left|(x_{cp} - x_0)(y_0 - y_v) - (x_0 - x_v)(y_{cp} - y_0)\right|}{\sqrt{(x_{cp} - x_0)^2 + (y_{cp} - y_0)^2}},
</math>

where:
* <math display="inline">v = (x_v, y_v)</math> is the chromaticity coordinate of the color,
* <math display="inline">L</math> is the confusion line passing through the copunctal point <math display="inline">(x_{cp}, y_{cp})</math> and another reference point <math display="inline">(x_0, y_0)</math>,
* <math display="inline">d(v, L)</math> measures the perpendicular distance from the point <math display="inline">v</math> to the confusion line <math display="inline">L</math>.

Confusing colors, identified as key colors lying on occupied confusion lines, are iteratively shifted to the nearest non-occupied confusion lines to enhance discriminability for CVD viewers. High-ranking colors, determined by their prominence in image clusters, are shifted to the nearest unoccupied confusion lines. This reallocation ensures that these colors are distinguishable to viewers with CVD while minimizing disruption to the image's overall color harmony.

After shifting, the luminance of the recolored key colors is optimized using a regularized objective function to balance naturalness and contrast:
<math display="block">
E = (E_1 + E_2) + \lambda E_3,
</math>

where:
* <math display="inline">E</math> is the total loss,
* <math display="inline">\lambda</math> is a weight parameter controlling the trade-off between contrast enhancement and naturalness preservation.

The first term, <math display="inline">E_1</math>, measures contrast enhancement for normal trichromats:

<math display="block">
E_1 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - b_j\| - \|f_D(a_{i,\text{rec}}) - f_D(b_j)\| \right|,
</math>

where:
* <math display="inline">n_A</math> and <math display="inline">n_B</math> are the number of key colors in clusters <math display="inline">A</math> and <math display="inline">B</math>, respectively,
* <math display="inline">a_i</math> is the chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">b_j</math> is the chromaticity of the <math display="inline">j</math>-th key color in cluster <math display="inline">B</math>,
* <math display="inline">f_D</math> is a function simulating the dichromatic vision of individuals with color vision deficiencies,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color.

The second term, <math display="inline">E_2</math>, measures contrast enhancement for dichromats:

<math display="block">
E_2 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - a_j\| - \|f_D(a_{i,\text{rec}}) - f_D(a_{j,\text{rec}})\| \right|,
</math>

where:
* <math display="inline">a_i</math> and <math display="inline">a_j</math> are the chromaticities of the <math display="inline">i</math>-th and <math display="inline">j</math>-th key colors in cluster <math display="inline">A</math>,
* <math display="inline">f_D(a_{i,\text{rec}})</math> simulates the dichromatic perception of the recolored chromaticity <math display="inline">a_{i,\text{rec}}</math>.

The third term, <math display="inline">E_3</math>, preserves the naturalness of the recolored image:

<math display="block">
E_3 = \frac{1}{n_A} \sum_{i=1}^{n_A} \|a_i - a_{i,\text{rec}}\|,
</math>

where:
* <math display="inline">a_i</math> is the original chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">\|a_i - a_{i,\text{rec}}\|</math> is the Euclidean distance between the original and recolored chromaticities, measuring how much the naturalness is preserved.

This method significantly enhances the contrast and naturalness of recolored images by leveraging confusion line geometry and regularized optimization. However, challenges remain in achieving real-time performance and handling cases where shifting may distort the aesthetic quality of the image.

==== GMM-based Method ====
Huang et al. [11] proposed an efficient and effective re-coloring algorithm for individuals with CVD using a Gaussian Mixture Model (GMM) to represent color distributions. The algorithm comprises four main steps: feature extraction, clustering using GMM, optimization of Gaussian components, and interpolation for recoloring.

Step 1 - Feature Extraction:
Each pixel in the input image is represented in the CIEL*a*b* color space, which approximates perceptual differences using the Euclidean distance between colors. The color feature vector <math display="inline">x</math> is used as input for clustering.

Step 2 - Clustering via GMM:
The color distribution of the image is modeled using a GMM with <math display="inline">K</math> Gaussian components:
<math display="block">
p(x|\Theta) = \sum_{i=1}^K \omega_i G_i(x|\theta_i),
</math>
where:
* <math display="inline">\Theta</math> is the parameter set containing all weights, means, and covariance matrices,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian,
* <math display="inline">G_i(x|\theta_i)</math> is the 3D normal distribution with parameters <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix).

Step 3 - Optimization:
To ensure color distinguishability for CVD viewers, the algorithm adjusts the mean vector of each Gaussian component using an optimization function that preserves the symmetric Kullback-Leibler (KL) divergence:
<math display="block">
D_{sKL}(G_i, G_j) = D_{KL}(G_i \| G_j) + D_{KL}(G_j \| G_i),
</math>
where:
* <math display="inline">D_{KL}(G_i \| G_j)</math> measures the dissimilarity between two Gaussian distributions <math display="inline">G_i</math> and <math display="inline">G_j</math>.

Step 4 - Interpolation for Recoloring:
After optimizing the Gaussians, the mapping function <math display="inline">M_i(\cdot)</math> relocates the mean vectors while maintaining covariance matrices. Interpolation ensures smooth transitions between recolored regions:
<math display="block">
T(x_j)_H = x_j^H + \sum_{i=1}^K p(i|x_j, \Theta) (M_i(\mu_i)_H - \mu_i^H),
</math>
where:
* <math display="inline">T(x_j)_H</math> is the hue adjustment for the <math display="inline">j</math>-th color,
* <math display="inline">M_i(\mu_i)_H</math> is the mapped hue of the <math display="inline">i</math>-th Gaussian's mean.

While the GMM-based approach effectively models color distributions and enhances the contrast of recolored images significantly, it has limitations:
* The accuracy of recoloring depends on the choice of <math display="inline">K</math>, which may vary for different images.
* The method assumes diagonal covariance matrices for computational efficiency, which may oversimplify real-world color distributions. Sometimes the colors in the recolored images are not very natural.
* The high computational complexity in the optimization step of this algorithm may be difficult for real-time applications.

=== Deep Learning based methods ===
Conventional methods for recoloring, including optimization-based approaches (as discussed above), fail to generalize well across varying severity levels and CVD types. While these methods improve color differentiation, they frequently compromise naturalness or require extensive computational resources, making them less suitable for real-time, efficient, personalized applications.

==== GAN-Based Recoloring for CVD ====

In [1] GANs (Generative Adversarial Networks) was explored for recoloring, with a backbone Pix2Pix-GAN, Cycle-GAN, and Bicycle-GAN structure showing promising results. These models are generate creative recolored images by learning mappings between normal and CVD-affected color spaces. However, this and existing GAN approaches struggle with balancing naturalness and contrast. This specific reference also requires paired datasets (since it is adapted from style transfer), making it computationally intensive and less suitable for personalization.

==== Swin Transformer Recoloring ====

The authors in [2] introduced a hierarchical vision transformer (SWIN) architecture that processes images through shifted windows, effectively capturing both local and global contextual information. In computer vision, this design generally allows efficient handling of high-resolution images and has been applied to various tasks, including image classification and object detection. Despite its robust performance, this architecture is still computationally intensive and does not inherently account for the specific needs of CVD individuals, as it lacks mechanisms for personalized color adjustments.

==== Personalized CVD-GAN ====

To cater to the diverse needs of the CVD population, the Personalized CVD-GAN [3] was developed. This model generates images that are not only CVD-friendly but also tailored to individual degrees of color vision deficiency. By disentangling color representations using a unique triple-latent structure in their method, continuous personalization was possible to adjust images according to specific CVD severities. While effective, this approach is computationally demanding, making it less practical for real-time applications. In our experiment, it took around 18 days for one epoch (or one iteration over the entire dataset).

Thus, existing methods either lack personalization or are too resource-intensive for widespread use.

== Methods ==
We aim to find effective and efficient ways to recolor images for people with CVD with the personalization of different severity levels. We start by exploring existing methods and identifying opportunities for improvement. Since mathematical-based approaches provide a solid foundation and are well-documented, we began our experiments by testing these methods, as described in the background. We later extended our exploration to deep learning based methods.

=== Mathematical based ===
We explored four main methods, building on the foundational work discussed in the background section.

==== Method 1: Daltonization as a Baseline ====
We started with the relatively intuitive Daltonization method, where we adjusted the colors in an image to compensate for color vision deficiencies by simulating how the colors appear to individuals with CVD. This involves computing the difference between the original and simulated color perception in the LMS (Long, Medium, Short wavelength) color space. The calculated error is then corrected and mapped back to the RGB space using a transformation matrix, resulting in a recolored image that enhances color differentiation for viewers with CVD.

The simulation of CVDs relies on the physiology of human vision, particularly the responses of the Long (L), Medium (M), and Short (S) wavelength-sensitive cones in the retina. The LMS color space is derived from the spectral sensitivities of these cones, making it an ideal framework for modeling human color perception.

To simulate CVD, we first transformed colors in RGB color space into the LMS color space using the following linear transformation matrix based on Stockman and Sharpe’s cone fundamentals:
<math display="block">
T_{\text{RGB-to-LMS}} = \begin{bmatrix}
0.3904725 & 0.54990437 & 0.00890159 \\
0.07092586 & 0.96310739 & 0.00135809 \\
0.02314268 & 0.12801221 & 0.93605194
\end{bmatrix}
</math>

For individuals with CVD, the missing cone’s response is replaced by a weighted combination of the remaining two cones. This approach, introduced by Brettel, Viénot, and Mollon (1997) [7], uses specific coefficients derived from cone sensitivities. For example, in protanopia (L-cone deficiency), the L-cone response is approximated using the M- and S-cone responses as:
<math display="block">
L_{\text{simulated}} = 0 \cdot L + 0.90822864 \cdot M + 0.008192 \cdot S
</math>

For deuteranopia (M-cone deficiency), the M-cone is replaced as:
<math display="block">
M_{\text{simulated}} = 1.10104433 \cdot L + 0 \cdot M - 0.00901975 \cdot S
</math>

For tritanopia (S-cone deficiency), the S-cone is replaced as:
<math display="block">
S_{\text{simulated}} = -0.15773032 \cdot L + 1.19465634 \cdot M + 0 \cdot S
</math>

These transformations allow accurate simulation of the perceptual experience of individuals with CVD. (The numbers are derived from [5]).

The error between the original and simulated is then mapped into the RGB color space using a deficiency-specific correction matrix, which adjusts the image to enhance contrast and recover lost color differences. The predefined correction matrix is applied to the error in RGB space, transforming it back into LMS space for final adjustments. The corrected LMS values are added back to the original values, producing a recolored image that improves visual accessibility for viewers with CVD. This approach uses the Daltonize-inspired correction matrix:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

==== Method 2: Optimizing Objective Function ====
To improve the results from the Daltonization method, we designed a framework inspired by methods discussed in the background, incorporating dominant color extraction, optimization-based recoloring, and edit propagation. This approach aims to find a balance between the naturalness and contrast while compensating colors that are not visible for corresponding CVD types.

===== 1. Extraction of Dominant Colors =====
We begin by extracting the dominant colors from the input image using fuzzy clustering via a K-means algorithm. This step identifies a reduced set of representative colors that capture the primary color information in the image:
<math display="block">
\mathbf{C} = \{\mathbf{c}_1, \mathbf{c}_2, \ldots, \mathbf{c}_N\},
</math>
where <math display="inline">N</math> represents the number of clusters, and <math display="inline">\mathbf{c}_i</math> represents the centroid of the <math display="inline">i</math>-th cluster.

===== 2. Optimization-Based Recoloring =====
Once the dominant colors are extracted, we apply an optimization process to adjust these colors. The optimization uses the formulas mentioned in [9], and aims to balance two key objectives:

1. Naturalness Preservation: Ensures the recolored image minimally deviates from the original.
<math display="block">
E_{\text{nat}} = \sum_{i=1}^N \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_i^{\text{original}}) \|^2,
</math>
where <math display="inline">\mathbf{T}</math> is the transformation matrix based on the severity and type of CVD, and <math display="inline">\mathbf{c}_i^{\text{original}}</math> is the original color.

2. Contrast Enhancement: Improves the differentiation of colors for individuals with CVD:
<math display="block">
E_{\text{cont}} = \sum_{i=1}^N \sum_{j>i} \left( \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_j) \|^2 - \| \mathbf{c}_i^{\text{original}} - \mathbf{c}_j^{\text{original}} \|^2 \right)^2.
</math>

The total objective function combines these two terms:
<math display="block">
E = \beta E_{\text{nat}} + E_{\text{cont}},
</math>
where <math display="inline">\beta</math> controls the trade-off between naturalness and contrast.

Optimization is performed using the L-BFGS-B algorithm to ensure efficient convergence under bounded constraints.

The transformation matrices for each type of CVD are the following, which are based on [12]:

<div style="text-align:center;">
<math display="inline">
T_{\text{Protanopia}} = \begin{bmatrix} 0.566 & 0.558 & 0 \\ 0.433 & 0.442 & 0.242 \\ 0 & 0 & 0.758 \end{bmatrix},
</math>
<math display="inline">
T_{\text{Deuteranopia}} = \begin{bmatrix} 0.625 & 0.7 & 0 \\ 0.375 & 0.3 & 0.3 \\ 0 & 0 & 0.7 \end{bmatrix},
</math>
<math display="inline">
T_{\text{Tritanopia}} = \begin{bmatrix} 0.95 & 0 & 0 \\ 0.05 & 0.433 & 0 \\ 0 & 0.567 & 1 \end{bmatrix}.
</math>
</div>

===== 3. Edit Propagation =====
After optimizing the dominant colors, we propagate these edits across the entire image to ensure smooth transitions. This propagation step leverages the CIE-Lab color space, which is perceptually uniform, meaning that the Euclidean distance in this space correlates well with human color perception. The process begins by mapping the original image and the optimized dominant colors into the Lab color space. In this space, the differences between the original and recolored dominant colors are computed to capture the adjustments made during the optimization step:
<math display="block">
\Delta L^* = \text{griddata}(\mathbf{c}^{\text{original}}, \mathbf{c}^{\text{recolored}} - \mathbf{c}^{\text{original}}, \mathbf{I}),
</math>
where <math display="inline">\mathbf{I}</math> represents the pixel values in the Lab color space. Once the interpolated changes are computed, they are applied to the Lab representation of the original image. Finally, the adjusted Lab values are converted back to the RGB color space to reconstruct the recolored image.

==== Method 3: Improved with Confusion Line Adjustments ====
This method builds upon the previous method by introducing enhancements in dominant color extraction, optimization, and edit propagation, while incorporating an additional step to adjust colors near confusion lines in the CIE 1931 xyY color space inspired by [10]. These improvements aim to further enhance contrast and naturalness of the recolored images. Moreover, this method adds flexibility in adjusting for different severity levels for each CVD type.

===== 1. Improvements on Method 2 =====
To improve the performance of dominant color extraction, we transitioned from traditional K-means to MiniBatch K-means. This algorithm processes data in small batches, significantly reducing computational time while maintaining accuracy in clustering. The number of dominant colors was also reduced from 50 to 30 to focus on key representative colors and further enhance efficiency. The optimization objective is refined to leverage vectorization, improving computational efficiency. The two key terms remain:
<math display="block">
E = \beta E_{\text{nat}} + (1 - \beta) E_{\text{cont}}.
</math>
The optimization objective was refined to significantly improve computational efficiency by replacing the nested loops in the contrast enhancement term with vectorized operations. In the original implementation, the pairwise differences between colors were calculated iteratively using <math display="inline">O(N^2)</math> nested loops. The improved version eliminates this overhead by leveraging array broadcasting to compute all pairwise differences simultaneously, and the transformation matrix <math display="inline">\mathbf{T}</math> is then applied to all pairwise differences in a single tensor operation:
<math display="block">
\mathbf{T}_{\Delta} = \text{tensordot}(\Delta_{ij}, \mathbf{T}),
</math>
and the norms are computed in parallel across the entire array. Additionally, the weighting parameter <math display="inline">\beta</math> was adjusted to favor naturalness preservation, ensuring better visual integrity in the recolored image.
The propagation step changed to use a k-d tree for fast nearest neighbor searches, replacing grid-based interpolation. This approach more efficiently matches each pixel in the Lab color space to the closest dominant color:
<math display="block">
\mathbf{I}_{\text{adjusted}} = \mathbf{C}_{\text{recolored}}[\text{k-d tree query}(\mathbf{I})],
</math>
where <math display="inline">\mathbf{I}</math> represents the pixel values in Lab space.
These refinements enable faster optimization while improving the balance between naturalness and contrast enhancement.

===== 2. Confusion Line Adjustments =====
An additional step adjusts colors near confusion lines in the CIE 1931 xyY color space to enhance distinguishability:

1. Confusion lines are defined for protanopia, deuteranopia, and tritanopia, based on [10]. For example, for protanopia:
<math display="block">
\text{Confusion Line: Start} = (0.735, 0.265), \quad \text{End} = (0.115, 0.885).
</math>

2. Colors near the confusion line are identified using orthogonal distance:
<math display="block">
d(\mathbf{xy}, L) = \frac{\| (\mathbf{xy} - \mathbf{p}_1) \times (\mathbf{p}_2 - \mathbf{p}_1) \|}{\|\mathbf{p}_2 - \mathbf{p}_1\|},
</math>
where <math display="inline">\mathbf{p}_1</math> and <math display="inline">\mathbf{p}_2</math> are the start and end points of the confusion line.

3. Identified colors are shifted orthogonally away from the line:
<math display="block">
\mathbf{xy}_{\text{adjusted}} = \mathbf{xy} + \lambda \mathbf{v}_{\perp},
</math>
where <math display="inline">\mathbf{v}_{\perp}</math> is a perpendicular vector, and <math display="inline">\lambda</math> is a scaling factor.

===== 3. Personalise with Severity Levels =====
To take into account of severity levels, the transformation matrix <math display="inline">\mathbf{T}</math> linearly interpolates between normal vision and full CVD perception based on severity and type:
<math display="block">
\mathbf{T} = (1 - s) \mathbf{I} + s \mathbf{T}_{\text{CVD}},
</math>
where <math display="inline">s</math> represents the severity of CVD (0-100%), <math display="inline">\mathbf{I}</math> is the identity matrix, and <math display="inline">\mathbf{T}_{\text{CVD}}</math> is the full transformation matrix specific to protanopia, deuteranopia, or tritanopia. Such a method is based on DaltonLens simulator [13].

These improvements significantly enhanced both the effectiveness and efficiency of the recoloring process on top of Method 2.

==== Method 4: Improved with GMM-based Method ====
The last mathematical method we exprimented enhances recoloring by integrating a Gaussian Mixture Model (GMM)-based global recoloring algorithm. The method also applies nonlinear adjustments for colors near confusion lines to ensure improved contrast and naturalness.

===== 1. GMM-Based Global Recoloring =====
The image is first resized and transformed into the Lab color space. A GMM is applied to cluster the color distribution into <math display="inline">K</math> components, optimizing the number of clusters using the Bayesian Information Criterion (BIC):
<math display="block">
\text{BIC} = -2 \cdot \text{log-likelihood} + P \cdot \log(N),
</math>
where <math display="inline">P</math> represents the model parameters and <math display="inline">N</math> is the number of pixels.

The GMM means are simulated using the transformation matrix <math display="inline">T</math> with severity levels taken into account, and the symmetric Kullback-Leibler (KL) divergence (<math display="inline">D_{\text{sKL}}</math>) is calculated between pairs of clusters:
<math display="block">
D_{\text{sKL}}(i, j) = D_{\text{KL}}(G_i \| G_j) + D_{\text{KL}}(G_j \| G_i),
</math>
where <math display="inline">G_i</math> and <math display="inline">G_j</math> are Gaussian components, and <math display="inline">D_{\text{KL}}</math> represents the KL divergence. The GMM cluster means are then adjusted by solving a nonlinear least-squares problem to minimize the discrepancy.

===== 2. Adjusting Near Confusion Lines Improved =====
Following global recoloring, colors near confusion lines in the CIE 1931 xyY color space are further adjusted based on formulas used in Method 3. Nonlinear scaling is applied to amplify the shifts for pixels closer to the line:
<math display="block">
w = \left( \frac{\text{threshold} - d}{\text{threshold}} \right)^2,
</math>
where <math display="inline">w</math> is the scaling factor.

The adjustments from the GMM and confusion line steps are combined to produce the final recolored image. These enhancements make the method more robust and effective for individuals with varying levels of CVD.

=== Deep Learning based ===

==== Task Overview ====
Given an input RGB image and a label for the user (as shown in the figure), we want a deep learning model to output a recolored RGB image that is specific to that user. More details on inputs and outputs are discussed in further sections but an overview is shown in Figure 1. All of the code was done in Python using a deep learning framework called [https://pytorch.org PyTorch]
[[File:Io.png|right|thumb|200px|Figure 1: Dataset]]

==== Types ====
1. ''' Supervised methods ''':
These are deep learning models that require a 'ground truth' recolored image for the neural network to learn recolorization. While these methods are simple, easy to train and integrate the user label, they require an already present ground truth comparison of expected output.

2. ''' Unsupervised methods ''':
These models are trained without a ground truth and can also encode user label information while training. They are generally better at generating more natural images, but they require more compute and sophisticated model architectures or loss functions for the recoloring task

==== Dataset ====
The dataset used for this project was constructed specifically to address the challenges of recoloring images for individuals with color vision deficiency (CVD). We first gathered an open-source RGB image dataset from [2]. To improve the capability of the proposed model to enhance the contrast between CVD-indistinguishable color
pairs, in their study, they created a new dataset consisting of 141,000 pictures of both natural scenes and artificial images containing
CVD-confusing colors without labels. To generate labels (and ground truth recolored images for supervised methods), we randomly sampled 15,000 images and recolored by simulating random labels for severity and type of CVD. The recoloring for ground truth images was done using a [https://github.com/jbhuang0604/RecolorForColorblind/tree/master MATLAB script] (adapted to Python) from [4]. Note: The open-source tools used in the Python version for the recoloring script were [https://scikit-image.org Scikit-Image], [https://scipy.org Scipy] and [https://python-colormath.readthedocs.io/en/latest/ Colormath].

As shown in Figure 1, each sample in the dataset consists of:

1. ''' Original RGB Image''' : High-resolution images, resized to <code> 256x256</code> pixels and normalized to <code>[0,1]</code> range, representing the standard color space.

2. ''' CVD Labels ''' : Condition labels encoded as <code>severity * [protan, deutan]</code>, where severity ranges from 0.1 to 1.0. For example, a label <code>[0.6, 0]</code> corresponds to protanopia at 60% severity.

Data augmentation techniques such as random rotations, crops, and brightness adjustments were applied to expand the dataset, ensuring robust model generalization across diverse scenarios.

==== Supervised Methods ====
===== Conditional Parallel RGB MLP =====
[[File:mlp.png|right|thumb|Figure 2: Conditional MLP architecture]]
As shown in Figure 2, the model predicts the R, G, and B channels separately using an independent multi-layer perceptron (MLP) for each channel. The input image is concatenated with the label encoding along the channel dimension and is passed to 3 parallel MLPs simultaneously. These parallel networks are learned to predicted R, G, B channels of a recolored image based on given ground truth. The outputs from each of these networks are concatenated to produce the recolored RGB image of same spatial dimensions as input. Essentially, each channel is disentangled, enabling targeted adjustments.

The loss function used to train was pixel wise, mean-squared error loss:
<math>
\mathcal{L}_{\text{MSE}} = \frac{1}{N} \sum_{p=1}^{N} \left( I(p) - I'(p) \right)^2
</math>

Where:
* I, I': Recolored (model output) image and ground truth recolored image respectively
* p: Image index
* N: Total number of images

===== Conditional U-Net =====
In a similar fashion of inputs, a convolutional neural network (CNN)-based U-Net architecture was tested to generate a full recolored image as output. The conditional inputs here affect both the encoder and decoder. [[File:Unet condtional.png|right|thumb|Figure 3: Conditional U-Net architecture]]
U-Nets are widely used in computer vision tasks and are very robust to new tasks as well. The architecture we adopted is shown in Figure 3.
The loss function used to train the U-Net was a commonly used VGG Perceptual Loss:
<math>
\mathcal{L}_{\text{VGG}} = \sum_{l} \frac{1}{N_l} \| \phi_l(I) - \phi_l(I') \|_2^2
</math>

Where:
* I and I': are recolored (model output) and ground recolored images respectively
* <math>\phi_l</math> is the l-th of the pre-trained VGG network

==== Unsupervised Methods ====
===== Conditional Autoencoder =====
As shown in Figure4, an unsupervised CNN-based encoder-decoder network was trained to reconstruct full recolored images with a CVD-aware color palette. The key to making this network align with the recoloring task was the loss functions. The loss functions we used to train this network were inspired from [2]. [[File:Ae.png|right|350px|thumb|Figure 4: Conditional Autoencoder architecture]]

The total loss function is given by:
<math>
\mathcal{L}_{\text{total}} = \alpha \cdot \mathcal{L}_{\text{naturalness}} + 2 \cdot (1 - \alpha) \cdot \mathcal{L}_{\text{contrast}}
</math>

Where:
<math>
\mathcal{L}_{\text{contrast}} = \beta \cdot \mathcal{L}_{\text{global}} + (2 - \beta) \cdot \mathcal{L}_{\text{local}}
</math>

The components of the loss functions are described below:

1. '''Global Contrast Loss''':
The global contrast loss ensures that the overall contrast of the recolored image is preserved. It is defined as
<math>
\mathcal{L}_{global} = \frac{1}{\|\omega\|} \sum_{<x, y> \in \epsilon \omega} \text{CL}(x, y)
</math>

2. '''Local Contrast Loss''':
The local contrast loss focuses on preserving the contrast within a small neighborhood around each pixel. <math>
\mathcal{L}_{l} = \frac{1}{N} \sum_{x=1}^{N} \sum_{y \in \omega_x} \frac{\text{CL}(x, y)}{\|\omega_x\|}
</math>

Note:

<math>
\text{CL}(x, y) = \|\hat{c}_x' - \hat{c}_y'\| - \|c_x - c_y\|
</math>

* x,y: Two distinct pixels in the image.
* cx and cy: CVD simulated colors of original image
* c^x′and c^y: CVD simulated colors of recolored image (model output)
* ||w||: Global (or large) window of image
* ||wx||: Local window or neighborhood around a pixel x

3. '''Naturalness Loss''':
The naturalness loss drives output image to have colors that are visually similar and close to natural distributions. <math>
\mathcal{L}_{\text{natural}} = 1 - \text{SSIM}(I', I)
</math>

Where:
* I(i), I'(i): Original and recolored images respectively

== Results ==

=== Mathematical based methods ===

==== Qualitative Results ====
The results and takeaways can be summarized as follows:

1. '''Method 1: Daltonization Baseline''':

2. '''Method 2: Optimizing Objective Functions''':

3. '''Method 3: Adjustments Near Confusion Lines with Improved Method 2''':

4. '''Method 4: Improved with GMM-based Method''':

==== Quantitative Results ====
{| class="wikitable"
|+ Table 1: Quantitative Evaluation Results for Mathematical Methods
! Original vs Recolored !! Method 1 !! Method 2 !! Method 3 !! Method 4
|-
| SSIM || 0.0066 || 0.9998 || 0.9988 || 0.9902
|-
| TCC || 0.4211 || 0.0001 || 0.0003 || 0.0005
|-
| CD ΔE76 || 57.4513 || 0.0217 || 0.0632 || 0.1057
|-
| CIEDE2000 || 41.2667 || 0.0229 || 0.0675 || 0.1312
|-
| CIEDE94 || 57.3637 || 0.0217 || 0.0630 || 0.1056
|-
| D-CIELAB ΔEab || 2.1314 || 3.8863 || 7.6867 || 8.0045
|-
| Time/image || 0.2s || 1m13s || 4.4s || 1.6s
|}

=== Deep Learning based methods ===
The results focus on evaluating the performance of the above neural network architectures—Conditional Parallel RGB MLP, Deep U-Net, and Conditional Autoencoder. Quantitive metrics such as Structural Similarity Index (SSIM), total color contrast (TCC), Chromatic Difference (CD), and inference time were used to assess the effectiveness of the models provided in [1] and [2].

==== Qualitative Results ====
The recolored outputs were visually evaluated to determine their alignment with expected results. The 'expected' results for supervised mean how closely they resemble ground truth recolored image and for unsupervised method mean how much contrast and naturalness is observed in the CVD simulated recolored images compared to original.
The results and takeaways can be summarized as follows:

1. '''Conditional Parallel RGB MLP''': (Figure 5)
[[File:Mlp_res.png|right|400px|thumb|Figure 5 Conditional MLP: Model failure]]
* Recoloring was inconsistent, with visible artifacts in regions where spatial correlations were essential.
* The pixels seemed more discretized, suggesting that disentanglement was not very useful for this case (especially naturalness).
* Failed to preserve natural color transitions, particularly in complex images.
2. '''Conditional U-Net''': (Figure 6, 7)
[[File:Unet_res1.png|right|400px|thumb|Figure 6 Conditional U-Net: Model failure]]
[[File:Unet_res2.png|right|400px|thumb|Figure 7 Conditional U-Net: CVD Simulated examples]]
* Produced stable recoloring, preserving structural details.
* Initially showed improvement towards resembling ground truth, but over time started 'reconstructing' the colors of the original image.
* The CVD simulations of recolored versus original were similar or worse meaning that the model was not doing well for this task
* Sometimes it over-saturated some colors, affecting the visual appeal.
3. '''Conditional Autoencoder''': (Figure 8, 9)
[[File:ae_res1.png|right|400px|thumb|Figure 8 Conditional Autoencoder: Majority good results]]
[[File:ae_res1.png|right|400px|thumb|Figure 9 Conditional Autoencoder: Marginal or negative improvement + Blurriness]]
* Achieved smooth and natural recoloring, with fewer artifacts.
* Showed the highest contrast improvement among the three models.
* In some cases, hurt the contrast in the CVD simulated colors and in some there was marginal improvement in contrast.
* Blurriness in the recolored images was seen (possibly due to naturalness factor being more prioritized even though weight coefficients in the loss term favored contrast (alpha = 0.25, beta = 1.0)).

==== Quantitative Results ====
Based on the above qualitative results, we decided to score and evaluate metrics for comparison with related work only using the Conditional Autoencoder.
As mentioned above, the evaluation metrics are adapted from [1] and [2]. Please refer to the definitions in the paper, as we have used the same. On a high level, the three components are:
* SSIM: Measures the structural similarity between the original and recolored images, ensuring the structural integrity of the recolored image is maintained.
<math>
SSIM(X, Y) = \frac{(2\mu_X\mu_Y + c_1)(2\sigma_{XY} + c_2)}{(\mu_X^2 + \mu_Y^2 + c_1)(\sigma_X^2 + \sigma_Y^2 + c_2)}
</math>

* Total Color Contrast: Quantifies the visibility improvement between indistinguishable colors for CVD individuals.
<math>
TCC = \frac{1}{n_1} \sum_{(i,j) \in \Omega_1} |x_i - x_j|
+ \frac{1}{N \cdot n_2} \sum_{i=1}^{N} \sum_{j \in \Omega_2} |x_i - x_j|
</math>
* Chromatic Difference: Quantifies the perceptual differences in color before and after recoloring, ensuring enhanced distinguishability
<math>
CD(i) = \sqrt{\lambda (l_i' - l_i)^2 + (a_i' - a_i)^2 + (b_i' - b_i)^2}
</math>
(lamda is a constant, not wavelength and l,a,b represent LAB space coordinates of recolored (') and original respectively.)
* Inference Time: Determines the computational efficiency of the models.

The key results are in Table 2 and takeaways for the Conditional Autoencoder can be summarized as follows:

{| class="wikitable" style="text-align:center; width:40%; margin:auto;"
|-
! Metric
! Value
|-
| Inference Time
| 2.6 seconds/image
|-
| SSIM ("Structure")
| 0.8707
|-
| Total Color Contrast ("Distinguishability")
| 0.5771 (vs. ~0.851)*
|-
| Chromatic Difference ("Color")
| 0.3521 (vs. ~0.963)*
|+ '''Table 2: Quantitative Evaluation Results'''
|}

Note: * indicates results from paper [2] for protan/deutan whichever is larger.

* TCC and CD are good but not as good as paper [2] because they use optimize networks for each CVD type separately.
* Blurry (SSIM is not optimized for enough)
* Mixing CVD types in the same network needs to be more sophisticated

== Conclusions ==
Through our (many) experiments, we learned a couple of things:

1. '''Model Effectiveness''':
Among the models, the Conditional Autoencoder showed the best balance between enhancing color contrast and preserving naturalness. It improved the distinguishability of colors for CVD individuals while maintaining a smooth, visually appealing output. However, it produced slightly blurry images, which could be improved with better loss functions or refinement techniques. The Conditional U-Net was also effective in preserving structure and providing stable recoloring, but it required careful training to avoid overfitting. The Conditional Parallel RGB MLP, while computationally fast, lacked the ability to capture spatial relationships between pixels, making it unsuitable for this task.

2. '''Importance of Loss Functions''':
Designing appropriate loss functions was crucial for achieving the right balance between naturalness, contrast enhancement, and structural preservation. The global and local contrast losses significantly improved the visibility of recolored images, while the naturalness loss ensured that the outputs did not look artificial. Incorporating metrics like SSIM and Chromatic Difference into the evaluation also helped us better understand how well the models performed.

3. '''Challenges with Data''':
One of the biggest challenges was ensuring that the dataset effectively represented real-world scenarios for CVD individuals. Simulating CVD perceptions and generating recolored images that matched those perceptions required a well-defined pipeline. A more diverse dataset or additional user studies with CVD participants could help fine-tune the models further.

4. '''Computational Efficiency''':
While models like the Conditional Autoencoder and Conditional U-Net provided high-quality recoloring, their inference times were moderate, making them feasible for real-time applications. Optimizing these models further could make them more scalable for real-world use cases, such as accessibility tools in apps or websites.

5. '''What Worked and What Didn’t''':
* Worked: Contrast enhancement methods using local and global losses were effective in improving visibility for CVD individuals. Transformer-inspired loss functions borrowed from Swin architecture added robustness.
* Didn’t Work: Pixel-wise methods like the Conditional RGB MLP struggled due to their inability to handle spatial dependencies. Additionally, overfitting was a recurring issue in larger architectures without careful training.

6. '''Future Directions''':
* Better Loss Functions: Refining the loss functions to address issues like blurriness in outputs could further improve results.
* User Studies: Testing the models with real CVD participants would provide valuable insights and help validate the results.
* Model Optimization: Reducing the computational cost of high-performing models like the Conditional Autoencoder could make them more practical for deployment.
* Exploration of New Architectures: Trying newer methods, such as lightweight transformers or diffusion-based models, might enhance recoloring performance while maintaining efficiency.

While there’s still room for improvement, our models demonstrated the potential of deep learning in addressing the challenges faced by individuals with CVD. Our future work would focus on refining these methods and bringing them closer to practical, everyday applications.

== References ==
[1] Li, H., Zhang, L., Zhang, X., Zhang, M., Zhu, G., Shen, P., ... & Shah, S. A. A. (2020). Color vision deficiency datasets & recoloring evaluation using GANs. Multimedia Tools and Applications, 79, 27583-27614.

[2] Chen, L., Zhu, Z., Huang, W., Go, K., Chen, X., & Mao, X. (2024). Image recoloring for color vision deficiency compensation using Swin transformer. Neural Computing and Applications, 36(11), 6051-6066.

[3] Jiang, S., Liu, D., Li, D., & Xu, C. (2023). Personalized image generation for color vision deficiency population. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22571-22580).

[4] Huang, J.-B., Chen, C.-S., Jen, T.-C., & Wang, S.-J. (n.d.). Image recolorization for the colorblind [GitHub repository]. Retrieved December 12, 2024, from https://github.com/jbhuang0604/RecolorForColorblind

[5] Dietrich, J. (n.d.). Daltonize Python Package [GitHub repository]. Retrieved December 12, 2024, from https://github.com/joergdietrich/daltonize/blob/main/daltonize/daltonize.py

[6] Dougherty, B., & Wade, A. (2000). Vischeck. Retrieved December 12, 2024, from https://www.vischeck.com/

[7] Brettel, H., Viénot, F., & Mollon, J. D. (1997). Computerized simulation of color appearance for dichromats. Josa a, 14(10), 2647-2655.

[8] Zhu, Z., Toyoura, M., Go, K., Fujishiro, I., Kashiwagi, K., & Mao, X. (2019). Processing images for red–green dichromats compensation via naturalness and information-preservation considered recoloring. The Visual Computer, 35, 1053-1066.

[9] Zhu, Z., Toyoura, M., Go, K., Kashiwagi, K., Fujishiro, I., Wong, T. T., & Mao, X. (2021). Personalized image recoloring for color vision deficiency compensation. IEEE Transactions on Multimedia, 24, 1721-1734.

[10] Tsekouras, G. E., Rigos, A., Chatzistamatis, S., Tsimikas, J., Kotis, K., Caridakis, G., & Anagnostopoulos, C. N. (2021). A novel approach to image recoloring for color vision deficiency. Sensors, 21(8), 2740.

[11] Huang, J. B., Chen, C. S., Jen, T. C., & Wang, S. J. (2009, April). Image recolorization for the colorblind. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 1161-1164). IEEE.

[12] Color-Blindness.com. (n.d.). COBLIS - Color Blindness Simulator. Retrieved December 13, 2024, from https://www.color-blindness.com/coblis-color-blindness-simulator/

[13] DaltonLens. (n.d.). DaltonLens-Python [Computer software]. GitHub. Retrieved December 13, 2024, from https://github.com/DaltonLens/DaltonLens-Python

== Appendix I ==
* [https://github.com/rainasong/psych221-aut24-final-project.git Code]
* [https://drive.google.com/drive/folders/10WMXPbtpV7Hy5_qBA_TCEbW-kCpj1D7v Dataset]

=== Additional results ===
1. '''Recolored Images - Conditional Autoencoder'''
<div style="display: inline; width: 220px; float: center;">
[[File:eb_1.png|400 px|Wikipedia encyclopedia]][[File:eb_2.png|400 px]] </div>

2. '''Loss curves'''
<div style="display: inline; width: 800px; float: center;">
[[File:loss_ae.png|300 px|center|thumb|Losses - Conditional Autoencoder]][[File:loss_unet.png|300 px|thumb|center|Losses - Conditional U-Net]][[File:loss_mlp.png|300 px|center|thumb|Losses - Conditional MLP]]</div>

== Appendix II ==
'''Ishikaa''':
* Training, evaluation and visualization for all deep learning methods (MLP, U-Net and Autoencoder)
* GMM recoloring method in Python & adding severity index
* 'Ground Truth' dataset creation and logging
* AWS Compute setup & configuration
* Written Report & Presentation

'''Raina''':

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-13T10:49:13Z

Rainas:

== Introduction ==
Color Vision Deficiency (CVD) affects approximately 350 million individuals worldwide, impairing their ability to distinguish certain colors. Image recoloring for individuals with CVDs has been a well-researched area, with numerous attempts aimed at creating images that make colors more distinguishable for those with CVDs. CVDs are typically classified into three main types: protanopia (difficulty perceiving red), deuteranopia (difficulty perceiving green), and tritanopia (difficulty perceiving blue). These conditions arise due to the absence or malfunction of certain types of cone cells in the retina, which are responsible for color vision. For instance, the lack or defect of red or green cones leads to protanopia or deuteranopia, respectively. This can result in partial or complete loss of perception of specific colors. Moreover, individuals with CVD can be dichromats (completely missing one type of cone cell), or anomalous trichromats (having all three types of cones but with altered sensitivity), causing milder color perception issues. Most research has focused on protanopia and deuteranopia, as these conditions are more prevalent, and only a few consider different severity levels.

In this work, we aim to consider all three types of CVDs, taking into account varying levels of severity for personalization. We explore existing methods in the field and experiment through two main approaches: mathematical transformations and deep learning techniques. We will start by reviewing current advancements in these two domains, followed by presenting our experiments and results. Evaluations of each method will be provided, leading to a discussion of our findings and outlining potential directions for future work.

The motivation for this work arose from a personal experience. While admiring a rainbow, a phenomenon I find among the most beautiful in the world, with a friend who has deuteranopia, I realized that they were unable to distinguish the vibrant array of colors. This experience highlighted the emotional and perceptual gap caused by CVD, inspiring the goal of this project: to develop personalized and efficient tools that enhance color perception for individuals with CVDs. Ultimately, we aim to enable those with CVDs to enjoy the same vivid experiences, such as the beauty of a rainbow, experienced by those with normal color vision.

== Background ==
In recent years, numerous methods have been developed to recolor images for individuals with CVDs, ranging from traditional mathematical approaches to advanced deep learning techniques. This section focuses on the prominent recent works in these two categories.

=== Mathematical-based methods ===
Mathematical approaches to image recoloring for individuals with CVDs have been extensively developed to enhance color discrimination while trying to preserve the natural appearance of images. These methods typically involve color space transformations, optimization techniques, and perceptual modeling to achieve their objectives.

==== Daltonization ====
Daltonization enhances images for individuals with CVD by correcting colors based on the simulated deficiency. The process involves comparing the original LMS values with the simulated deficient values to compute the error:
<math display="block">
\text{Error}_{\text{LMS}} = \text{LMS}_{\text{original}} - \text{LMS}_{\text{simulated}}
</math>

The error is then mapped back to the RGB space using a correction matrix because the error contains the information that dichromats cannot see, and the correction matrix rotates it to a part of the spectrum that they can see. For example, the correction matrix, as implemented in tools like Daltonize [5] and Vischeck [6], is:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

The corrected RGB values are added back to the original LMS values to generate a daltonized image that improves contrast for CVD viewers.

==== Optimization-based Method ====
Zhu et al. [8] introduced an optimization-based recoloring framework for red-green dichromacy, aiming to balance naturalness and contrast. The framework minimizes a total loss function defined as:

<math display="block"> E = \beta E_{\text{nat}} + E_{\text{cont}} </math>

where <math>\beta</math> is a scalar weight that controls the trade-off between the two objectives: naturalness preservation (<math>E_{\text{nat}}</math>) and contrast enhancement (<math>E_{\text{cont}}</math>).

The naturalness term, <math>E_{\text{nat}}</math>, ensures that the recolored image closely resembles the original image for CVD viewers by minimizing perceptual differences:

<math display="block"> E_{\text{nat}} = \sum_{i=1}^N \| c_i^+ - c_i \|^2, </math>

where:
* <math>N</math> is the total number of pixels in the image,
* <math>c_i</math> is the original color of the <math>i</math>-th pixel,
* <math>c_i^+</math> is the recolored value of the <math>i</math>-th pixel,
* <math>\| c_i^+ - c_i \|</math> is the Euclidean distance, measuring the perceptual difference between the original and recolored colors.

The contrast term, <math>E_{\text{cont}}</math>, enhances the distinguishability of colors in the recolored image by minimizing changes in color contrast:

<math display="block"> E_{\text{cont}} = \sum_{i \neq j} \| (c_i^+ - c_j^+) - (c_i - c_j) \|^2, </math>

where:
* <math>(c_i^+ - c_j^+)</math> is the perceived color difference between pixels <math>i</math> and <math>j</math> after recoloring,
* <math>(c_i - c_j)</math> is the original color difference,
* <math>\| (c_i^+ - c_j^+) - (c_i - c_j) \|</math> represents the deviation in color contrast before and after recoloring.

To address the limitations of this approach, Zhu et al. [9] proposed a degree-adaptable framework incorporating a transformation matrix <math>T</math> that simulates CVD perception. The transformation matrix is defined as:

<math display="block"> T = \begin{bmatrix} t_{11} & t_{12} & t_{13} \\ t_{21} & t_{22} & t_{23} \\ t_{31} & t_{32} & t_{33} \end{bmatrix}, </math>

where <math>t_{ij}</math> are the elements representing the relationships between the original and perceived LMS (Long, Medium, Short wavelength) cone responses for individuals with CVD.

The degree-adaptable loss function extends the optimization by adjusting weights based on perceptual importance, defined as:

<math display="block"> E = \beta \sum_{i=1}^N \alpha_i \| T(c_i^+ - c_i) \|^2 + \sum_{i \neq j} \| T(c_i^+ - c_j^+) - T(c_i - c_j) \|^2. </math>

Here:
* <math>\alpha_i</math> assigns weights to each pixel, prioritizing the preservation of colors with smaller perception errors,
* <math>\| T(c_i^+ - c_i) \|</math> measures the perceptual difference after recoloring,
* <math>\| T(c_i^+ - c_j^+) - T(c_i - c_j) \|</math> quantifies the deviation in color contrast under CVD simulation.

This framework improves both contrast and personalization but requires further optimization for real-time performance.

==== Confusion lines based Method ====
Tsekouras et al. [10] proposed a novel image recoloring approach for individuals with protanopia and deuteranopia, focusing on improving color naturalness and enhancing contrast. Their framework consists of four modules, with a key focus on shifting confusing colors along confusion lines in the CIE 1931 chromaticity diagram.

The process begins with fuzzy clustering, which identifies representative colors (key colors) from the input image. These key colors are then analyzed on the chromaticity diagram, where confusion lines—paths representing colors indistinguishable by individuals with CVD—serve as the basis for recoloring. Confusion lines are defined using the copunctal point of the missing cone type and another reference point:

<math display="block">
d(v, L) = \frac{\left|(x_{cp} - x_0)(y_0 - y_v) - (x_0 - x_v)(y_{cp} - y_0)\right|}{\sqrt{(x_{cp} - x_0)^2 + (y_{cp} - y_0)^2}},
</math>

where:
* <math display="inline">v = (x_v, y_v)</math> is the chromaticity coordinate of the color,
* <math display="inline">L</math> is the confusion line passing through the copunctal point <math display="inline">(x_{cp}, y_{cp})</math> and another reference point <math display="inline">(x_0, y_0)</math>,
* <math display="inline">d(v, L)</math> measures the perpendicular distance from the point <math display="inline">v</math> to the confusion line <math display="inline">L</math>.

Confusing colors, identified as key colors lying on occupied confusion lines, are iteratively shifted to the nearest non-occupied confusion lines to enhance discriminability for CVD viewers. High-ranking colors, determined by their prominence in image clusters, are shifted to the nearest unoccupied confusion lines. This reallocation ensures that these colors are distinguishable to viewers with CVD while minimizing disruption to the image's overall color harmony.

After shifting, the luminance of the recolored key colors is optimized using a regularized objective function to balance naturalness and contrast:
<math display="block">
E = (E_1 + E_2) + \lambda E_3,
</math>

where:
* <math display="inline">E</math> is the total loss,
* <math display="inline">\lambda</math> is a weight parameter controlling the trade-off between contrast enhancement and naturalness preservation.

The first term, <math display="inline">E_1</math>, measures contrast enhancement for normal trichromats:

<math display="block">
E_1 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - b_j\| - \|f_D(a_{i,\text{rec}}) - f_D(b_j)\| \right|,
</math>

where:
* <math display="inline">n_A</math> and <math display="inline">n_B</math> are the number of key colors in clusters <math display="inline">A</math> and <math display="inline">B</math>, respectively,
* <math display="inline">a_i</math> is the chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">b_j</math> is the chromaticity of the <math display="inline">j</math>-th key color in cluster <math display="inline">B</math>,
* <math display="inline">f_D</math> is a function simulating the dichromatic vision of individuals with color vision deficiencies,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color.

The second term, <math display="inline">E_2</math>, measures contrast enhancement for dichromats:

<math display="block">
E_2 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - a_j\| - \|f_D(a_{i,\text{rec}}) - f_D(a_{j,\text{rec}})\| \right|,
</math>

where:
* <math display="inline">a_i</math> and <math display="inline">a_j</math> are the chromaticities of the <math display="inline">i</math>-th and <math display="inline">j</math>-th key colors in cluster <math display="inline">A</math>,
* <math display="inline">f_D(a_{i,\text{rec}})</math> simulates the dichromatic perception of the recolored chromaticity <math display="inline">a_{i,\text{rec}}</math>.

The third term, <math display="inline">E_3</math>, preserves the naturalness of the recolored image:

<math display="block">
E_3 = \frac{1}{n_A} \sum_{i=1}^{n_A} \|a_i - a_{i,\text{rec}}\|,
</math>

where:
* <math display="inline">a_i</math> is the original chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">\|a_i - a_{i,\text{rec}}\|</math> is the Euclidean distance between the original and recolored chromaticities, measuring how much the naturalness is preserved.

This method significantly enhances the contrast and naturalness of recolored images by leveraging confusion line geometry and regularized optimization. However, challenges remain in achieving real-time performance and handling cases where shifting may distort the aesthetic quality of the image.

==== GMM-based Method ====
Huang et al. [11] proposed an efficient and effective re-coloring algorithm for individuals with CVD using a Gaussian Mixture Model (GMM) to represent color distributions. The algorithm comprises four main steps: feature extraction, clustering using GMM, optimization of Gaussian components, and interpolation for recoloring.

Step 1 - Feature Extraction:
Each pixel in the input image is represented in the CIEL*a*b* color space, which approximates perceptual differences using the Euclidean distance between colors. The color feature vector <math display="inline">x</math> is used as input for clustering.

Step 2 - Clustering via GMM:
The color distribution of the image is modeled using a GMM with <math display="inline">K</math> Gaussian components:
<math display="block">
p(x|\Theta) = \sum_{i=1}^K \omega_i G_i(x|\theta_i),
</math>
where:
* <math display="inline">\Theta</math> is the parameter set containing all weights, means, and covariance matrices,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian,
* <math display="inline">G_i(x|\theta_i)</math> is the 3D normal distribution with parameters <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix).

Step 3 - Optimization:
To ensure color distinguishability for CVD viewers, the algorithm adjusts the mean vector of each Gaussian component using an optimization function that preserves the symmetric Kullback-Leibler (KL) divergence:
<math display="block">
D_{sKL}(G_i, G_j) = D_{KL}(G_i \| G_j) + D_{KL}(G_j \| G_i),
</math>
where:
* <math display="inline">D_{KL}(G_i \| G_j)</math> measures the dissimilarity between two Gaussian distributions <math display="inline">G_i</math> and <math display="inline">G_j</math>.

Step 4 - Interpolation for Recoloring:
After optimizing the Gaussians, the mapping function <math display="inline">M_i(\cdot)</math> relocates the mean vectors while maintaining covariance matrices. Interpolation ensures smooth transitions between recolored regions:
<math display="block">
T(x_j)_H = x_j^H + \sum_{i=1}^K p(i|x_j, \Theta) (M_i(\mu_i)_H - \mu_i^H),
</math>
where:
* <math display="inline">T(x_j)_H</math> is the hue adjustment for the <math display="inline">j</math>-th color,
* <math display="inline">M_i(\mu_i)_H</math> is the mapped hue of the <math display="inline">i</math>-th Gaussian's mean.

While the GMM-based approach effectively models color distributions and enhances the contrast of recolored images significantly, it has limitations:
* The accuracy of recoloring depends on the choice of <math display="inline">K</math>, which may vary for different images.
* The method assumes diagonal covariance matrices for computational efficiency, which may oversimplify real-world color distributions. Sometimes the colors in the recolored images are not very natural.
* The high computational complexity in the optimization step of this algorithm may be difficult for real-time applications.

=== Deep Learning based methods ===
Conventional methods for recoloring, including optimization-based approaches (as discussed above), fail to generalize well across varying severity levels and CVD types. While these methods improve color differentiation, they frequently compromise naturalness or require extensive computational resources, making them less suitable for real-time, efficient, personalized applications.

==== GAN-Based Recoloring for CVD ====

In [1] GANs (Generative Adversarial Networks) was explored for recoloring, with a backbone Pix2Pix-GAN, Cycle-GAN, and Bicycle-GAN structure showing promising results. These models are generate creative recolored images by learning mappings between normal and CVD-affected color spaces. However, this and existing GAN approaches struggle with balancing naturalness and contrast. This specific reference also requires paired datasets (since it is adapted from style transfer), making it computationally intensive and less suitable for personalization.

==== Swin Transformer Recoloring ====

The authors in [2] introduced a hierarchical vision transformer (SWIN) architecture that processes images through shifted windows, effectively capturing both local and global contextual information. In computer vision, this design generally allows efficient handling of high-resolution images and has been applied to various tasks, including image classification and object detection. Despite its robust performance, this architecture is still computationally intensive and does not inherently account for the specific needs of CVD individuals, as it lacks mechanisms for personalized color adjustments.

==== Personalized CVD-GAN ====

To cater to the diverse needs of the CVD population, the Personalized CVD-GAN [3] was developed. This model generates images that are not only CVD-friendly but also tailored to individual degrees of color vision deficiency. By disentangling color representations using a unique triple-latent structure in their method, continuous personalization was possible to adjust images according to specific CVD severities. While effective, this approach is computationally demanding, making it less practical for real-time applications. In our experiment, it took around 18 days for one epoch (or one iteration over the entire dataset).

Thus, existing methods either lack personalization or are too resource-intensive for widespread use.

== Methods ==
We aim to find effective and efficient ways to recolor images for people with CVD with the personalization of different severity levels. We start by exploring existing methods and identifying opportunities for improvement. Since mathematical-based approaches provide a solid foundation and are well-documented, we began our experiments by testing these methods, as described in the background. We later extended our exploration to deep learning based methods.

=== Mathematical based ===
We explored four main methods, building on the foundational work discussed in the background section.

==== Method 1: Daltonization as a Baseline ====
We started with the relatively intuitive Daltonization method, where we adjusted the colors in an image to compensate for color vision deficiencies by simulating how the colors appear to individuals with CVD. This involves computing the difference between the original and simulated color perception in the LMS (Long, Medium, Short wavelength) color space. The calculated error is then corrected and mapped back to the RGB space using a transformation matrix, resulting in a recolored image that enhances color differentiation for viewers with CVD.

The simulation of CVDs relies on the physiology of human vision, particularly the responses of the Long (L), Medium (M), and Short (S) wavelength-sensitive cones in the retina. The LMS color space is derived from the spectral sensitivities of these cones, making it an ideal framework for modeling human color perception.

To simulate CVD, we first transformed colors in RGB color space into the LMS color space using the following linear transformation matrix based on Stockman and Sharpe’s cone fundamentals:
<math display="block">
T_{\text{RGB-to-LMS}} = \begin{bmatrix}
0.3904725 & 0.54990437 & 0.00890159 \\
0.07092586 & 0.96310739 & 0.00135809 \\
0.02314268 & 0.12801221 & 0.93605194
\end{bmatrix}
</math>

For individuals with CVD, the missing cone’s response is replaced by a weighted combination of the remaining two cones. This approach, introduced by Brettel, Viénot, and Mollon (1997) [7], uses specific coefficients derived from cone sensitivities. For example, in protanopia (L-cone deficiency), the L-cone response is approximated using the M- and S-cone responses as:
<math display="block">
L_{\text{simulated}} = 0 \cdot L + 0.90822864 \cdot M + 0.008192 \cdot S
</math>

For deuteranopia (M-cone deficiency), the M-cone is replaced as:
<math display="block">
M_{\text{simulated}} = 1.10104433 \cdot L + 0 \cdot M - 0.00901975 \cdot S
</math>

For tritanopia (S-cone deficiency), the S-cone is replaced as:
<math display="block">
S_{\text{simulated}} = -0.15773032 \cdot L + 1.19465634 \cdot M + 0 \cdot S
</math>

These transformations allow accurate simulation of the perceptual experience of individuals with CVD. (The numbers are derived from [5]).

The error between the original and simulated is then mapped into the RGB color space using a deficiency-specific correction matrix, which adjusts the image to enhance contrast and recover lost color differences. The predefined correction matrix is applied to the error in RGB space, transforming it back into LMS space for final adjustments. The corrected LMS values are added back to the original values, producing a recolored image that improves visual accessibility for viewers with CVD. This approach uses the Daltonize-inspired correction matrix:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

==== Method 2: Optimizing Objective Function ====
To improve the results from the Daltonization method, we designed a framework inspired by methods discussed in the background, incorporating dominant color extraction, optimization-based recoloring, and edit propagation. This approach aims to find a balance between the naturalness and contrast while compensating colors that are not visible for corresponding CVD types.

===== 1. Extraction of Dominant Colors =====
We begin by extracting the dominant colors from the input image using fuzzy clustering via a K-means algorithm. This step identifies a reduced set of representative colors that capture the primary color information in the image:
<math display="block">
\mathbf{C} = \{\mathbf{c}_1, \mathbf{c}_2, \ldots, \mathbf{c}_N\},
</math>
where <math display="inline">N</math> represents the number of clusters, and <math display="inline">\mathbf{c}_i</math> represents the centroid of the <math display="inline">i</math>-th cluster.

===== 2. Optimization-Based Recoloring =====
Once the dominant colors are extracted, we apply an optimization process to adjust these colors. The optimization uses the formulas mentioned in [9], and aims to balance two key objectives:

1. Naturalness Preservation: Ensures the recolored image minimally deviates from the original.
<math display="block">
E_{\text{nat}} = \sum_{i=1}^N \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_i^{\text{original}}) \|^2,
</math>
where <math display="inline">\mathbf{T}</math> is the transformation matrix based on the severity and type of CVD, and <math display="inline">\mathbf{c}_i^{\text{original}}</math> is the original color.

2. Contrast Enhancement: Improves the differentiation of colors for individuals with CVD:
<math display="block">
E_{\text{cont}} = \sum_{i=1}^N \sum_{j>i} \left( \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_j) \|^2 - \| \mathbf{c}_i^{\text{original}} - \mathbf{c}_j^{\text{original}} \|^2 \right)^2.
</math>

The total objective function combines these two terms:
<math display="block">
E = \beta E_{\text{nat}} + E_{\text{cont}},
</math>
where <math display="inline">\beta</math> controls the trade-off between naturalness and contrast.

Optimization is performed using the L-BFGS-B algorithm to ensure efficient convergence under bounded constraints.

The transformation matrices for each type of CVD are the following, which are based on [12]:

<div style="text-align:center;">
<math display="inline">
T_{\text{Protanopia}} = \begin{bmatrix} 0.566 & 0.558 & 0 \\ 0.433 & 0.442 & 0.242 \\ 0 & 0 & 0.758 \end{bmatrix},
</math>
<math display="inline">
T_{\text{Deuteranopia}} = \begin{bmatrix} 0.625 & 0.7 & 0 \\ 0.375 & 0.3 & 0.3 \\ 0 & 0 & 0.7 \end{bmatrix},
</math>
<math display="inline">
T_{\text{Tritanopia}} = \begin{bmatrix} 0.95 & 0 & 0 \\ 0.05 & 0.433 & 0 \\ 0 & 0.567 & 1 \end{bmatrix}.
</math>
</div>

===== 3. Edit Propagation =====
After optimizing the dominant colors, we propagate these edits across the entire image to ensure smooth transitions. This propagation step leverages the CIE-Lab color space, which is perceptually uniform, meaning that the Euclidean distance in this space correlates well with human color perception. The process begins by mapping the original image and the optimized dominant colors into the Lab color space. In this space, the differences between the original and recolored dominant colors are computed to capture the adjustments made during the optimization step:
<math display="block">
\Delta L^* = \text{griddata}(\mathbf{c}^{\text{original}}, \mathbf{c}^{\text{recolored}} - \mathbf{c}^{\text{original}}, \mathbf{I}),
</math>
where <math display="inline">\mathbf{I}</math> represents the pixel values in the Lab color space. Once the interpolated changes are computed, they are applied to the Lab representation of the original image. Finally, the adjusted Lab values are converted back to the RGB color space to reconstruct the recolored image.

==== Method 3: Improved with Confusion Line Adjustments ====
This method builds upon the previous method by introducing enhancements in dominant color extraction, optimization, and edit propagation, while incorporating an additional step to adjust colors near confusion lines in the CIE 1931 xyY color space inspired by [10]. These improvements aim to further enhance contrast and naturalness of the recolored images. Moreover, this method adds flexibility in adjusting for different severity levels for each CVD type.

===== 1. Improvements on Method 2 =====
To improve the performance of dominant color extraction, we transitioned from traditional K-means to MiniBatch K-means. This algorithm processes data in small batches, significantly reducing computational time while maintaining accuracy in clustering. The number of dominant colors was also reduced from 50 to 30 to focus on key representative colors and further enhance efficiency. The optimization objective is refined to leverage vectorization, improving computational efficiency. The two key terms remain:
<math display="block">
E = \beta E_{\text{nat}} + (1 - \beta) E_{\text{cont}}.
</math>
The optimization objective was refined to significantly improve computational efficiency by replacing the nested loops in the contrast enhancement term with vectorized operations. In the original implementation, the pairwise differences between colors were calculated iteratively using <math display="inline">O(N^2)</math> nested loops. The improved version eliminates this overhead by leveraging array broadcasting to compute all pairwise differences simultaneously, and the transformation matrix <math display="inline">\mathbf{T}</math> is then applied to all pairwise differences in a single tensor operation:
<math display="block">
\mathbf{T}_{\Delta} = \text{tensordot}(\Delta_{ij}, \mathbf{T}),
</math>
and the norms are computed in parallel across the entire array. Additionally, the weighting parameter <math display="inline">\beta</math> was adjusted to favor naturalness preservation, ensuring better visual integrity in the recolored image.
The propagation step changed to use a k-d tree for fast nearest neighbor searches, replacing grid-based interpolation. This approach more efficiently matches each pixel in the Lab color space to the closest dominant color:
<math display="block">
\mathbf{I}_{\text{adjusted}} = \mathbf{C}_{\text{recolored}}[\text{k-d tree query}(\mathbf{I})],
</math>
where <math display="inline">\mathbf{I}</math> represents the pixel values in Lab space.
These refinements enable faster optimization while improving the balance between naturalness and contrast enhancement.

===== 2. Confusion Line Adjustments =====
An additional step adjusts colors near confusion lines in the CIE 1931 xyY color space to enhance distinguishability:

1. Confusion lines are defined for protanopia, deuteranopia, and tritanopia, based on [10]. For example, for protanopia:
<math display="block">
\text{Confusion Line: Start} = (0.735, 0.265), \quad \text{End} = (0.115, 0.885).
</math>

2. Colors near the confusion line are identified using orthogonal distance:
<math display="block">
d(\mathbf{xy}, L) = \frac{\| (\mathbf{xy} - \mathbf{p}_1) \times (\mathbf{p}_2 - \mathbf{p}_1) \|}{\|\mathbf{p}_2 - \mathbf{p}_1\|},
</math>
where <math display="inline">\mathbf{p}_1</math> and <math display="inline">\mathbf{p}_2</math> are the start and end points of the confusion line.

3. Identified colors are shifted orthogonally away from the line:
<math display="block">
\mathbf{xy}_{\text{adjusted}} = \mathbf{xy} + \lambda \mathbf{v}_{\perp},
</math>
where <math display="inline">\mathbf{v}_{\perp}</math> is a perpendicular vector, and <math display="inline">\lambda</math> is a scaling factor.

===== 3. Personalise with Severity Levels =====
To take into account of severity levels, the transformation matrix <math display="inline">\mathbf{T}</math> linearly interpolates between normal vision and full CVD perception based on severity and type:
<math display="block">
\mathbf{T} = (1 - s) \mathbf{I} + s \mathbf{T}_{\text{CVD}},
</math>
where <math display="inline">s</math> represents the severity of CVD (0-100%), <math display="inline">\mathbf{I}</math> is the identity matrix, and <math display="inline">\mathbf{T}_{\text{CVD}}</math> is the full transformation matrix specific to protanopia, deuteranopia, or tritanopia. Such a method is based on DaltonLens simulator [13].

These improvements significantly enhanced both the effectiveness and efficiency of the recoloring process on top of Method 2.

==== Method 4: Improved with GMM-based Method ====
The last mathematical method we exprimented enhances recoloring by integrating a Gaussian Mixture Model (GMM)-based global recoloring algorithm. The method also applies nonlinear adjustments for colors near confusion lines to ensure improved contrast and naturalness.

===== 1. GMM-Based Global Recoloring =====
The image is first resized and transformed into the Lab color space. A GMM is applied to cluster the color distribution into <math display="inline">K</math> components, optimizing the number of clusters using the Bayesian Information Criterion (BIC):
<math display="block">
\text{BIC} = -2 \cdot \text{log-likelihood} + P \cdot \log(N),
</math>
where <math display="inline">P</math> represents the model parameters and <math display="inline">N</math> is the number of pixels.

The GMM means are simulated using the transformation matrix <math display="inline">T</math> with severity levels taken into account, and the symmetric Kullback-Leibler (KL) divergence (<math display="inline">D_{\text{sKL}}</math>) is calculated between pairs of clusters:
<math display="block">
D_{\text{sKL}}(i, j) = D_{\text{KL}}(G_i \| G_j) + D_{\text{KL}}(G_j \| G_i),
</math>
where <math display="inline">G_i</math> and <math display="inline">G_j</math> are Gaussian components, and <math display="inline">D_{\text{KL}}</math> represents the KL divergence. The GMM cluster means are then adjusted by solving a nonlinear least-squares problem to minimize the discrepancy.

===== 2. Adjusting Near Confusion Lines Improved =====
Following global recoloring, colors near confusion lines in the CIE 1931 xyY color space are further adjusted based on formulas used in Method 3. Nonlinear scaling is applied to amplify the shifts for pixels closer to the line:
<math display="block">
w = \left( \frac{\text{threshold} - d}{\text{threshold}} \right)^2,
</math>
where <math display="inline">w</math> is the scaling factor.

The adjustments from the GMM and confusion line steps are combined to produce the final recolored image. These enhancements make the method more robust and effective for individuals with varying levels of CVD.

=== Deep Learning based ===

==== Task Overview ====
Given an input RGB image and a label for the user (as shown in the figure), we want a deep learning model to output a recolored RGB image that is specific to that user. More details on inputs and outputs are discussed in further sections but an overview is shown in Figure 1. All of the code was done in Python using a deep learning framework called [https://pytorch.org PyTorch]
[[File:Io.png|right|thumb|200px|Figure 1: Dataset]]

==== Types ====
1. ''' Supervised methods ''':
These are deep learning models that require a 'ground truth' recolored image for the neural network to learn recolorization. While these methods are simple, easy to train and integrate the user label, they require an already present ground truth comparison of expected output.

2. ''' Unsupervised methods ''':
These models are trained without a ground truth and can also encode user label information while training. They are generally better at generating more natural images, but they require more compute and sophisticated model architectures or loss functions for the recoloring task

==== Dataset ====
The dataset used for this project was constructed specifically to address the challenges of recoloring images for individuals with color vision deficiency (CVD). We first gathered an open-source RGB image dataset from [2]. To improve the capability of the proposed model to enhance the contrast between CVD-indistinguishable color
pairs, in their study, they created a new dataset consisting of 141,000 pictures of both natural scenes and artificial images containing
CVD-confusing colors without labels. To generate labels (and ground truth recolored images for supervised methods), we randomly sampled 15,000 images and recolored by simulating random labels for severity and type of CVD. The recoloring for ground truth images was done using a [https://github.com/jbhuang0604/RecolorForColorblind/tree/master MATLAB script] (adapted to Python) from [4]. Note: The open-source tools used in the Python version for the recoloring script were [https://scikit-image.org Scikit-Image], [https://scipy.org Scipy] and [https://python-colormath.readthedocs.io/en/latest/ Colormath].

As shown in Figure 1, each sample in the dataset consists of:

1. ''' Original RGB Image''' : High-resolution images, resized to <code> 256x256</code> pixels and normalized to <code>[0,1]</code> range, representing the standard color space.

2. ''' CVD Labels ''' : Condition labels encoded as <code>severity * [protan, deutan]</code>, where severity ranges from 0.1 to 1.0. For example, a label <code>[0.6, 0]</code> corresponds to protanopia at 60% severity.

Data augmentation techniques such as random rotations, crops, and brightness adjustments were applied to expand the dataset, ensuring robust model generalization across diverse scenarios.

==== Supervised Methods ====
===== Conditional Parallel RGB MLP =====
[[File:mlp.png|right|thumb|Figure 2: Conditional MLP architecture]]
As shown in Figure 2, the model predicts the R, G, and B channels separately using an independent multi-layer perceptron (MLP) for each channel. The input image is concatenated with the label encoding along the channel dimension and is passed to 3 parallel MLPs simultaneously. These parallel networks are learned to predicted R, G, B channels of a recolored image based on given ground truth. The outputs from each of these networks are concatenated to produce the recolored RGB image of same spatial dimensions as input. Essentially, each channel is disentangled, enabling targeted adjustments.

The loss function used to train was pixel wise, mean-squared error loss:
<math>
\mathcal{L}_{\text{MSE}} = \frac{1}{N} \sum_{p=1}^{N} \left( I(p) - I'(p) \right)^2
</math>

Where:
* I, I': Recolored (model output) image and ground truth recolored image respectively
* p: Image index
* N: Total number of images

===== Conditional U-Net =====
In a similar fashion of inputs, a convolutional neural network (CNN)-based U-Net architecture was tested to generate a full recolored image as output. The conditional inputs here affect both the encoder and decoder. [[File:Unet condtional.png|right|thumb|Figure 3: Conditional U-Net architecture]]
U-Nets are widely used in computer vision tasks and are very robust to new tasks as well. The architecture we adopted is shown in Figure 3.
The loss function used to train the U-Net was a commonly used VGG Perceptual Loss:
<math>
\mathcal{L}_{\text{VGG}} = \sum_{l} \frac{1}{N_l} \| \phi_l(I) - \phi_l(I') \|_2^2
</math>

Where:
* I and I': are recolored (model output) and ground recolored images respectively
* <math>\phi_l</math> is the l-th of the pre-trained VGG network

==== Unsupervised Methods ====
===== Conditional Autoencoder =====
As shown in Figure4, an unsupervised CNN-based encoder-decoder network was trained to reconstruct full recolored images with a CVD-aware color palette. The key to making this network align with the recoloring task was the loss functions. The loss functions we used to train this network were inspired from [2]. [[File:Ae.png|right|350px|thumb|Figure 4: Conditional Autoencoder architecture]]

The total loss function is given by:
<math>
\mathcal{L}_{\text{total}} = \alpha \cdot \mathcal{L}_{\text{naturalness}} + 2 \cdot (1 - \alpha) \cdot \mathcal{L}_{\text{contrast}}
</math>

Where:
<math>
\mathcal{L}_{\text{contrast}} = \beta \cdot \mathcal{L}_{\text{global}} + (2 - \beta) \cdot \mathcal{L}_{\text{local}}
</math>

The components of the loss functions are described below:

1. '''Global Contrast Loss''':
The global contrast loss ensures that the overall contrast of the recolored image is preserved. It is defined as
<math>
\mathcal{L}_{global} = \frac{1}{\|\omega\|} \sum_{<x, y> \in \epsilon \omega} \text{CL}(x, y)
</math>

2. '''Local Contrast Loss''':
The local contrast loss focuses on preserving the contrast within a small neighborhood around each pixel. <math>
\mathcal{L}_{l} = \frac{1}{N} \sum_{x=1}^{N} \sum_{y \in \omega_x} \frac{\text{CL}(x, y)}{\|\omega_x\|}
</math>

Note:

<math>
\text{CL}(x, y) = \|\hat{c}_x' - \hat{c}_y'\| - \|c_x - c_y\|
</math>

* x,y: Two distinct pixels in the image.
* cx and cy: CVD simulated colors of original image
* c^x′and c^y: CVD simulated colors of recolored image (model output)
* ||w||: Global (or large) window of image
* ||wx||: Local window or neighborhood around a pixel x

3. '''Naturalness Loss''':
The naturalness loss drives output image to have colors that are visually similar and close to natural distributions. <math>
\mathcal{L}_{\text{natural}} = 1 - \text{SSIM}(I', I)
</math>

Where:
* I(i), I'(i): Original and recolored images respectively

== Results ==
=== Mathematical based methods ===
{| class="wikitable"
|+ Table 1: Quantitative Evaluation Results for Mathematical Methods
! !! Method 1 !! Method 2 !! Method 3 !! Method 4
|-
! colspan="5" | Performance
|-
| Time/image || 0.2s || 1m13s || 4.4s || 1.6s
|-
! colspan="5" | SSIM Metrics
|-
| Original vs Recolored || 0.0066 || 0.9998 || 0.9988 || 0.9902
|-
| Original vs Original Simulated || 0.9985 || 0.9985 || 0.9985 || 0.9985
|-
| Recolored vs Recolored Simulated || 0.9565 || 0.9986 || 0.9986 || 0.9968
|-
! colspan="5" | TCC Metrics
|-
| Original vs Recolored || 0.4211 || 0.0001 || 0.0003 || 0.0005
|-
| Original vs Original Simulated || 0.0004 || 0.0003 || 0.0003 || 0.0003
|-
| Recolored vs Recolored Simulated || 0.0380 || 0.0003 || 0.0002 || 0.0005
|-
! colspan="5" | CD ΔE76 Metrics
|-
| Original vs Recolored || 57.4513 || 0.0217 || 0.0632 || 0.1057
|-
| Original vs Original Simulated || 0.0462 || 0.0462 || 0.0462 || 0.0462
|-
| Recolored vs Recolored Simulated || 8.4251 || 0.0458 || 0.0435 || 0.0578
|-
! colspan="5" | CIEDE2000 Metrics
|-
| Original vs Recolored || 41.2667 || 0.0229 || 0.0675 || 0.1312
|-
| Original vs Original Simulated || 0.0681 || 0.0681 || 0.0681 || 0.0681
|-
| Recolored vs Recolored Simulated || 6.9145 || 0.0671 || 0.0630 || 0.0838
|-
! colspan="5" | CIEDE94 Metrics
|-
| Original vs Recolored || 57.3637 || 0.0217 || 0.0630 || 0.1056
|-
| Original vs Original Simulated || 0.0461 || 0.0461 || 0.0461 || 0.0461
|-
| Recolored vs Recolored Simulated || 5.3878 || 0.0457 || 0.0434 || 0.0576
|-
! colspan="5" | D-CIELAB ΔEab Metrics
|-
| Original vs Recolored || 2.1314 || 3.8863 || 7.6867 || 8.0045
|-
| Original vs Original Simulated || 1.7209 || 1.7209 || 1.7209 || 1.7209
|-
| Recolored vs Recolored Simulated || 1.5926 || 1.9673 || 1.4363 || 2.4009
|}

=== Deep Learning based methods ===
The results focus on evaluating the performance of the above neural network architectures—Conditional Parallel RGB MLP, Deep U-Net, and Conditional Autoencoder. Quantitive metrics such as Structural Similarity Index (SSIM), total color contrast (TCC), Chromatic Difference (CD), and inference time were used to assess the effectiveness of the models provided in [1] and [2].

==== Qualitative Results ====
The recolored outputs were visually evaluated to determine their alignment with expected results. The 'expected' results for supervised mean how closely they resemble ground truth recolored image and for unsupervised method mean how much contrast and naturalness is observed in the CVD simulated recolored images compared to original.
The results and takeaways can be summarized as follows:

1. '''Conditional Parallel RGB MLP''': (Figure 5)
[[File:Mlp_res.png|right|400px|thumb|Figure 5 Conditional MLP: Model failure]]
* Recoloring was inconsistent, with visible artifacts in regions where spatial correlations were essential.
* The pixels seemed more discretized, suggesting that disentanglement was not very useful for this case (especially naturalness).
* Failed to preserve natural color transitions, particularly in complex images.
2. '''Conditional U-Net''': (Figure 6, 7)
[[File:Unet_res1.png|right|400px|thumb|Figure 6 Conditional U-Net: Model failure]]
[[File:Unet_res2.png|right|400px|thumb|Figure 7 Conditional U-Net: CVD Simulated examples]]
* Produced stable recoloring, preserving structural details.
* Initially showed improvement towards resembling ground truth, but over time started 'reconstructing' the colors of the original image.
* The CVD simulations of recolored versus original were similar or worse meaning that the model was not doing well for this task
* Sometimes it over-saturated some colors, affecting the visual appeal.
3. '''Conditional Autoencoder''': (Figure 8, 9)
[[File:ae_res1.png|right|400px|thumb|Figure 8 Conditional Autoencoder: Majority good results]]
[[File:ae_res1.png|right|400px|thumb|Figure 9 Conditional Autoencoder: Marginal or negative improvement + Blurriness]]
* Achieved smooth and natural recoloring, with fewer artifacts.
* Showed the highest contrast improvement among the three models.
* In some cases, hurt the contrast in the CVD simulated colors and in some there was marginal improvement in contrast.
* Blurriness in the recolored images was seen (possibly due to naturalness factor being more prioritized even though weight coefficients in the loss term favored contrast (alpha = 0.25, beta = 1.0)).

==== Quantitative Results ====
Based on the above qualitative results, we decided to score and evaluate metrics for comparison with related work only using the Conditional Autoencoder.
As mentioned above, the evaluation metrics are adapted from [1] and [2]. Please refer to the definitions in the paper, as we have used the same. On a high level, the three components are:
* SSIM: Measures the structural similarity between the original and recolored images, ensuring the structural integrity of the recolored image is maintained.
<math>
SSIM(X, Y) = \frac{(2\mu_X\mu_Y + c_1)(2\sigma_{XY} + c_2)}{(\mu_X^2 + \mu_Y^2 + c_1)(\sigma_X^2 + \sigma_Y^2 + c_2)}
</math>

* Total Color Contrast: Quantifies the visibility improvement between indistinguishable colors for CVD individuals.
<math>
TCC = \frac{1}{n_1} \sum_{(i,j) \in \Omega_1} |x_i - x_j|
+ \frac{1}{N \cdot n_2} \sum_{i=1}^{N} \sum_{j \in \Omega_2} |x_i - x_j|
</math>
* Chromatic Difference: Quantifies the perceptual differences in color before and after recoloring, ensuring enhanced distinguishability
<math>
CD(i) = \sqrt{\lambda (l_i' - l_i)^2 + (a_i' - a_i)^2 + (b_i' - b_i)^2}
</math>
(lamda is a constant, not wavelength and l,a,b represent LAB space coordinates of recolored (') and original respectively.)
* Inference Time: Determines the computational efficiency of the models.

The key results are in Table 2 and takeaways for the Conditional Autoencoder can be summarized as follows:

{| class="wikitable" style="text-align:center; width:30%; margin:auto;"
|-
! Metric
! Value
|-
| Inference Time
| 2.6 seconds/image
|-
| SSIM ("Structure")
| 0.8707
|-
| Total Color Contrast ("Distinguishability")
| 0.5771 / (~0.851)*
|-
| Chromatic Difference ("Color")
| 0.3521 / (~0.963)*
|+ '''Table 2: Quantitative Evaluation Results'''
|}

Note: * indicates results from paper [2] for protan/deutan whichever is larger.

* TCC and CD are good but not as good as paper [2] because they use optimize networks for each CVD type separately.
* Blurry (SSIM is not optimized for enough)
* Mixing CVD types in the same network needs to be more sophisticated

== Conclusions ==
Through our (many) experiments, we learned a couple of things:

1. '''Model Effectiveness''':
Among the models, the Conditional Autoencoder showed the best balance between enhancing color contrast and preserving naturalness. It improved the distinguishability of colors for CVD individuals while maintaining a smooth, visually appealing output. However, it produced slightly blurry images, which could be improved with better loss functions or refinement techniques. The Conditional U-Net was also effective in preserving structure and providing stable recoloring, but it required careful training to avoid overfitting. The Conditional Parallel RGB MLP, while computationally fast, lacked the ability to capture spatial relationships between pixels, making it unsuitable for this task.

2. '''Importance of Loss Functions''':
Designing appropriate loss functions was crucial for achieving the right balance between naturalness, contrast enhancement, and structural preservation. The global and local contrast losses significantly improved the visibility of recolored images, while the naturalness loss ensured that the outputs did not look artificial. Incorporating metrics like SSIM and Chromatic Difference into the evaluation also helped us better understand how well the models performed.

3. '''Challenges with Data''':
One of the biggest challenges was ensuring that the dataset effectively represented real-world scenarios for CVD individuals. Simulating CVD perceptions and generating recolored images that matched those perceptions required a well-defined pipeline. A more diverse dataset or additional user studies with CVD participants could help fine-tune the models further.

4. '''Computational Efficiency''':
While models like the Conditional Autoencoder and Conditional U-Net provided high-quality recoloring, their inference times were moderate, making them feasible for real-time applications. Optimizing these models further could make them more scalable for real-world use cases, such as accessibility tools in apps or websites.

5. '''What Worked and What Didn’t''':
* Worked: Contrast enhancement methods using local and global losses were effective in improving visibility for CVD individuals. Transformer-inspired loss functions borrowed from Swin architecture added robustness.
* Didn’t Work: Pixel-wise methods like the Conditional RGB MLP struggled due to their inability to handle spatial dependencies. Additionally, overfitting was a recurring issue in larger architectures without careful training.

6. '''Future Directions''':
* Better Loss Functions: Refining the loss functions to address issues like blurriness in outputs could further improve results.
* User Studies: Testing the models with real CVD participants would provide valuable insights and help validate the results.
* Model Optimization: Reducing the computational cost of high-performing models like the Conditional Autoencoder could make them more practical for deployment.
* Exploration of New Architectures: Trying newer methods, such as lightweight transformers or diffusion-based models, might enhance recoloring performance while maintaining efficiency.

While there’s still room for improvement, our models demonstrated the potential of deep learning in addressing the challenges faced by individuals with CVD. Our future work would focus on refining these methods and bringing them closer to practical, everyday applications.

== References ==
[1] Li, H., Zhang, L., Zhang, X., Zhang, M., Zhu, G., Shen, P., ... & Shah, S. A. A. (2020). Color vision deficiency datasets & recoloring evaluation using GANs. Multimedia Tools and Applications, 79, 27583-27614.

[2] Chen, L., Zhu, Z., Huang, W., Go, K., Chen, X., & Mao, X. (2024). Image recoloring for color vision deficiency compensation using Swin transformer. Neural Computing and Applications, 36(11), 6051-6066.

[3] Jiang, S., Liu, D., Li, D., & Xu, C. (2023). Personalized image generation for color vision deficiency population. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22571-22580).

[4] Huang, J.-B., Chen, C.-S., Jen, T.-C., & Wang, S.-J. (n.d.). Image recolorization for the colorblind [GitHub repository]. Retrieved December 12, 2024, from https://github.com/jbhuang0604/RecolorForColorblind

[5] Dietrich, J. (n.d.). Daltonize Python Package [GitHub repository]. Retrieved December 12, 2024, from https://github.com/joergdietrich/daltonize/blob/main/daltonize/daltonize.py

[6] Dougherty, B., & Wade, A. (2000). Vischeck. Retrieved December 12, 2024, from https://www.vischeck.com/

[7] Brettel, H., Viénot, F., & Mollon, J. D. (1997). Computerized simulation of color appearance for dichromats. Josa a, 14(10), 2647-2655.

[8] Zhu, Z., Toyoura, M., Go, K., Fujishiro, I., Kashiwagi, K., & Mao, X. (2019). Processing images for red–green dichromats compensation via naturalness and information-preservation considered recoloring. The Visual Computer, 35, 1053-1066.

[9] Zhu, Z., Toyoura, M., Go, K., Kashiwagi, K., Fujishiro, I., Wong, T. T., & Mao, X. (2021). Personalized image recoloring for color vision deficiency compensation. IEEE Transactions on Multimedia, 24, 1721-1734.

[10] Tsekouras, G. E., Rigos, A., Chatzistamatis, S., Tsimikas, J., Kotis, K., Caridakis, G., & Anagnostopoulos, C. N. (2021). A novel approach to image recoloring for color vision deficiency. Sensors, 21(8), 2740.

[11] Huang, J. B., Chen, C. S., Jen, T. C., & Wang, S. J. (2009, April). Image recolorization for the colorblind. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 1161-1164). IEEE.

[12] Color-Blindness.com. (n.d.). COBLIS - Color Blindness Simulator. Retrieved December 13, 2024, from https://www.color-blindness.com/coblis-color-blindness-simulator/

[13] DaltonLens. (n.d.). DaltonLens-Python [Computer software]. GitHub. Retrieved December 13, 2024, from https://github.com/DaltonLens/DaltonLens-Python

== Appendix I ==
* [https://github.com/rainasong/psych221-aut24-final-project.git Code]
* [https://drive.google.com/drive/folders/10WMXPbtpV7Hy5_qBA_TCEbW-kCpj1D7v Dataset]

=== Additional results ===
1. '''Recolored Images - Conditional Autoencoder'''
<div style="display: inline; width: 220px; float: center;">
[[File:eb_1.png|400 px|Wikipedia encyclopedia]][[File:eb_2.png|400 px]] </div>

2. '''Loss curves'''
<div style="display: inline; width: 800px; float: center;">
[[File:loss_ae.png|300 px|center|thumb|Losses - Conditional Autoencoder]][[File:loss_unet.png|300 px|thumb|center|Losses - Conditional U-Net]][[File:loss_mlp.png|300 px|center|thumb|Losses - Conditional MLP]]</div>

== Appendix II ==
'''Ishikaa''':
* Training, evaluation and visualization for all deep learning methods (MLP, U-Net and Autoencoder)
* GMM recoloring method in Python & adding severity index
* 'Ground Truth' dataset creation and logging
* AWS Compute setup & configuration
* Written Report & Presentation

'''Raina''':

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-13T10:25:28Z

Rainas: /* GMM-based Method */

== Introduction ==
Color Vision Deficiency (CVD) affects approximately 350 million individuals worldwide, impairing their ability to distinguish certain colors. Image recoloring for individuals with CVDs has been a well-researched area, with numerous attempts aimed at creating images that make colors more distinguishable for those with CVDs. CVDs are typically classified into three main types: protanopia (difficulty perceiving red), deuteranopia (difficulty perceiving green), and tritanopia (difficulty perceiving blue). These conditions arise due to the absence or malfunction of certain types of cone cells in the retina, which are responsible for color vision. For instance, the lack or defect of red or green cones leads to protanopia or deuteranopia, respectively. This can result in partial or complete loss of perception of specific colors. Moreover, individuals with CVD can be dichromats (completely missing one type of cone cell), or anomalous trichromats (having all three types of cones but with altered sensitivity), causing milder color perception issues. Most research has focused on protanopia and deuteranopia, as these conditions are more prevalent, and only a few consider different severity levels.

In this work, we aim to consider all three types of CVDs, taking into account varying levels of severity for personalization. We explore existing methods in the field and experiment through two main approaches: mathematical transformations and deep learning techniques. We will start by reviewing current advancements in these two domains, followed by presenting our experiments and results. Evaluations of each method will be provided, leading to a discussion of our findings and outlining potential directions for future work.

The motivation for this work arose from a personal experience. While admiring a rainbow, a phenomenon I find among the most beautiful in the world, with a friend who has deuteranopia, I realized that they were unable to distinguish the vibrant array of colors. This experience highlighted the emotional and perceptual gap caused by CVD, inspiring the goal of this project: to develop personalized and efficient tools that enhance color perception for individuals with CVDs. Ultimately, we aim to enable those with CVDs to enjoy the same vivid experiences, such as the beauty of a rainbow, experienced by those with normal color vision.

== Background ==
In recent years, numerous methods have been developed to recolor images for individuals with CVDs, ranging from traditional mathematical approaches to advanced deep learning techniques. This section focuses on the prominent recent works in these two categories.

=== Mathematical-based methods ===
Mathematical approaches to image recoloring for individuals with CVDs have been extensively developed to enhance color discrimination while trying to preserve the natural appearance of images. These methods typically involve color space transformations, optimization techniques, and perceptual modeling to achieve their objectives.

==== Daltonization ====
Daltonization enhances images for individuals with CVD by correcting colors based on the simulated deficiency. The process involves comparing the original LMS values with the simulated deficient values to compute the error:
<math display="block">
\text{Error}_{\text{LMS}} = \text{LMS}_{\text{original}} - \text{LMS}_{\text{simulated}}
</math>

The error is then mapped back to the RGB space using a correction matrix because the error contains the information that dichromats cannot see, and the correction matrix rotates it to a part of the spectrum that they can see. For example, the correction matrix, as implemented in tools like Daltonize [5] and Vischeck [6], is:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

The corrected RGB values are added back to the original LMS values to generate a daltonized image that improves contrast for CVD viewers.

==== Optimization-based Method ====
Zhu et al. [8] introduced an optimization-based recoloring framework for red-green dichromacy, aiming to balance naturalness and contrast. The framework minimizes a total loss function defined as:

<math display="block"> E = \beta E_{\text{nat}} + E_{\text{cont}} </math>

where <math>\beta</math> is a scalar weight that controls the trade-off between the two objectives: naturalness preservation (<math>E_{\text{nat}}</math>) and contrast enhancement (<math>E_{\text{cont}}</math>).

The naturalness term, <math>E_{\text{nat}}</math>, ensures that the recolored image closely resembles the original image for CVD viewers by minimizing perceptual differences:

<math display="block"> E_{\text{nat}} = \sum_{i=1}^N \| c_i^+ - c_i \|^2, </math>

where:
* <math>N</math> is the total number of pixels in the image,
* <math>c_i</math> is the original color of the <math>i</math>-th pixel,
* <math>c_i^+</math> is the recolored value of the <math>i</math>-th pixel,
* <math>\| c_i^+ - c_i \|</math> is the Euclidean distance, measuring the perceptual difference between the original and recolored colors.

The contrast term, <math>E_{\text{cont}}</math>, enhances the distinguishability of colors in the recolored image by minimizing changes in color contrast:

<math display="block"> E_{\text{cont}} = \sum_{i \neq j} \| (c_i^+ - c_j^+) - (c_i - c_j) \|^2, </math>

where:
* <math>(c_i^+ - c_j^+)</math> is the perceived color difference between pixels <math>i</math> and <math>j</math> after recoloring,
* <math>(c_i - c_j)</math> is the original color difference,
* <math>\| (c_i^+ - c_j^+) - (c_i - c_j) \|</math> represents the deviation in color contrast before and after recoloring.

To address the limitations of this approach, Zhu et al. [9] proposed a degree-adaptable framework incorporating a transformation matrix <math>T</math> that simulates CVD perception. The transformation matrix is defined as:

<math display="block"> T = \begin{bmatrix} t_{11} & t_{12} & t_{13} \\ t_{21} & t_{22} & t_{23} \\ t_{31} & t_{32} & t_{33} \end{bmatrix}, </math>

where <math>t_{ij}</math> are the elements representing the relationships between the original and perceived LMS (Long, Medium, Short wavelength) cone responses for individuals with CVD.

The degree-adaptable loss function extends the optimization by adjusting weights based on perceptual importance, defined as:

<math display="block"> E = \beta \sum_{i=1}^N \alpha_i \| T(c_i^+ - c_i) \|^2 + \sum_{i \neq j} \| T(c_i^+ - c_j^+) - T(c_i - c_j) \|^2. </math>

Here:
* <math>\alpha_i</math> assigns weights to each pixel, prioritizing the preservation of colors with smaller perception errors,
* <math>\| T(c_i^+ - c_i) \|</math> measures the perceptual difference after recoloring,
* <math>\| T(c_i^+ - c_j^+) - T(c_i - c_j) \|</math> quantifies the deviation in color contrast under CVD simulation.

This framework improves both contrast and personalization but requires further optimization for real-time performance.

==== Confusion lines based Method ====
Tsekouras et al. [10] proposed a novel image recoloring approach for individuals with protanopia and deuteranopia, focusing on improving color naturalness and enhancing contrast. Their framework consists of four modules, with a key focus on shifting confusing colors along confusion lines in the CIE 1931 chromaticity diagram.

The process begins with fuzzy clustering, which identifies representative colors (key colors) from the input image. These key colors are then analyzed on the chromaticity diagram, where confusion lines—paths representing colors indistinguishable by individuals with CVD—serve as the basis for recoloring. Confusion lines are defined using the copunctal point of the missing cone type and another reference point:

<math display="block">
d(v, L) = \frac{\left|(x_{cp} - x_0)(y_0 - y_v) - (x_0 - x_v)(y_{cp} - y_0)\right|}{\sqrt{(x_{cp} - x_0)^2 + (y_{cp} - y_0)^2}},
</math>

where:
* <math display="inline">v = (x_v, y_v)</math> is the chromaticity coordinate of the color,
* <math display="inline">L</math> is the confusion line passing through the copunctal point <math display="inline">(x_{cp}, y_{cp})</math> and another reference point <math display="inline">(x_0, y_0)</math>,
* <math display="inline">d(v, L)</math> measures the perpendicular distance from the point <math display="inline">v</math> to the confusion line <math display="inline">L</math>.

Confusing colors, identified as key colors lying on occupied confusion lines, are iteratively shifted to the nearest non-occupied confusion lines to enhance discriminability for CVD viewers. High-ranking colors, determined by their prominence in image clusters, are shifted to the nearest unoccupied confusion lines. This reallocation ensures that these colors are distinguishable to viewers with CVD while minimizing disruption to the image's overall color harmony.

After shifting, the luminance of the recolored key colors is optimized using a regularized objective function to balance naturalness and contrast:
<math display="block">
E = (E_1 + E_2) + \lambda E_3,
</math>

where:
* <math display="inline">E</math> is the total loss,
* <math display="inline">\lambda</math> is a weight parameter controlling the trade-off between contrast enhancement and naturalness preservation.

The first term, <math display="inline">E_1</math>, measures contrast enhancement for normal trichromats:

<math display="block">
E_1 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - b_j\| - \|f_D(a_{i,\text{rec}}) - f_D(b_j)\| \right|,
</math>

where:
* <math display="inline">n_A</math> and <math display="inline">n_B</math> are the number of key colors in clusters <math display="inline">A</math> and <math display="inline">B</math>, respectively,
* <math display="inline">a_i</math> is the chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">b_j</math> is the chromaticity of the <math display="inline">j</math>-th key color in cluster <math display="inline">B</math>,
* <math display="inline">f_D</math> is a function simulating the dichromatic vision of individuals with color vision deficiencies,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color.

The second term, <math display="inline">E_2</math>, measures contrast enhancement for dichromats:

<math display="block">
E_2 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - a_j\| - \|f_D(a_{i,\text{rec}}) - f_D(a_{j,\text{rec}})\| \right|,
</math>

where:
* <math display="inline">a_i</math> and <math display="inline">a_j</math> are the chromaticities of the <math display="inline">i</math>-th and <math display="inline">j</math>-th key colors in cluster <math display="inline">A</math>,
* <math display="inline">f_D(a_{i,\text{rec}})</math> simulates the dichromatic perception of the recolored chromaticity <math display="inline">a_{i,\text{rec}}</math>.

The third term, <math display="inline">E_3</math>, preserves the naturalness of the recolored image:

<math display="block">
E_3 = \frac{1}{n_A} \sum_{i=1}^{n_A} \|a_i - a_{i,\text{rec}}\|,
</math>

where:
* <math display="inline">a_i</math> is the original chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">\|a_i - a_{i,\text{rec}}\|</math> is the Euclidean distance between the original and recolored chromaticities, measuring how much the naturalness is preserved.

This method significantly enhances the contrast and naturalness of recolored images by leveraging confusion line geometry and regularized optimization. However, challenges remain in achieving real-time performance and handling cases where shifting may distort the aesthetic quality of the image.

==== GMM-based Method ====
Huang et al. [11] proposed an efficient and effective re-coloring algorithm for individuals with CVD using a Gaussian Mixture Model (GMM) to represent color distributions. The algorithm comprises four main steps: feature extraction, clustering using GMM, optimization of Gaussian components, and interpolation for recoloring.

Step 1 - Feature Extraction:
Each pixel in the input image is represented in the CIEL*a*b* color space, which approximates perceptual differences using the Euclidean distance between colors. The color feature vector <math display="inline">x</math> is used as input for clustering.

Step 2 - Clustering via GMM:
The color distribution of the image is modeled using a GMM with <math display="inline">K</math> Gaussian components:
<math display="block">
p(x|\Theta) = \sum_{i=1}^K \omega_i G_i(x|\theta_i),
</math>
where:
* <math display="inline">\Theta</math> is the parameter set containing all weights, means, and covariance matrices,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian,
* <math display="inline">G_i(x|\theta_i)</math> is the 3D normal distribution with parameters <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix).

Step 3 - Optimization:
To ensure color distinguishability for CVD viewers, the algorithm adjusts the mean vector of each Gaussian component using an optimization function that preserves the symmetric Kullback-Leibler (KL) divergence:
<math display="block">
D_{sKL}(G_i, G_j) = D_{KL}(G_i \| G_j) + D_{KL}(G_j \| G_i),
</math>
where:
* <math display="inline">D_{KL}(G_i \| G_j)</math> measures the dissimilarity between two Gaussian distributions <math display="inline">G_i</math> and <math display="inline">G_j</math>.

Step 4 - Interpolation for Recoloring:
After optimizing the Gaussians, the mapping function <math display="inline">M_i(\cdot)</math> relocates the mean vectors while maintaining covariance matrices. Interpolation ensures smooth transitions between recolored regions:
<math display="block">
T(x_j)_H = x_j^H + \sum_{i=1}^K p(i|x_j, \Theta) (M_i(\mu_i)_H - \mu_i^H),
</math>
where:
* <math display="inline">T(x_j)_H</math> is the hue adjustment for the <math display="inline">j</math>-th color,
* <math display="inline">M_i(\mu_i)_H</math> is the mapped hue of the <math display="inline">i</math>-th Gaussian's mean.

While the GMM-based approach effectively models color distributions and enhances the contrast of recolored images significantly, it has limitations:
* The accuracy of recoloring depends on the choice of <math display="inline">K</math>, which may vary for different images.
* The method assumes diagonal covariance matrices for computational efficiency, which may oversimplify real-world color distributions. Sometimes the colors in the recolored images are not very natural.
* The high computational complexity in the optimization step of this algorithm may be difficult for real-time applications.

=== Deep Learning based methods ===
Conventional methods for recoloring, including optimization-based approaches (as discussed above), fail to generalize well across varying severity levels and CVD types. While these methods improve color differentiation, they frequently compromise naturalness or require extensive computational resources, making them less suitable for real-time, efficient, personalized applications.

==== GAN-Based Recoloring for CVD ====

In [1] GANs (Generative Adversarial Networks) was explored for recoloring, with a backbone Pix2Pix-GAN, Cycle-GAN, and Bicycle-GAN structure showing promising results. These models are generate creative recolored images by learning mappings between normal and CVD-affected color spaces. However, this and existing GAN approaches struggle with balancing naturalness and contrast. This specific reference also requires paired datasets (since it is adapted from style transfer), making it computationally intensive and less suitable for personalization.

==== Swin Transformer Recoloring ====

The authors in [2] introduced a hierarchical vision transformer (SWIN) architecture that processes images through shifted windows, effectively capturing both local and global contextual information. In computer vision, this design generally allows efficient handling of high-resolution images and has been applied to various tasks, including image classification and object detection. Despite its robust performance, this architecture is still computationally intensive and does not inherently account for the specific needs of CVD individuals, as it lacks mechanisms for personalized color adjustments.

==== Personalized CVD-GAN ====

To cater to the diverse needs of the CVD population, the Personalized CVD-GAN [3] was developed. This model generates images that are not only CVD-friendly but also tailored to individual degrees of color vision deficiency. By disentangling color representations using a unique triple-latent structure in their method, continuous personalization was possible to adjust images according to specific CVD severities. While effective, this approach is computationally demanding, making it less practical for real-time applications. In our experiment, it took around 18 days for one epoch (or one iteration over the entire dataset).

Thus, existing methods either lack personalization or are too resource-intensive for widespread use.

== Methods ==
We aim to find effective and efficient ways to recolor images for people with CVD with the personalization of different severity levels. We start by exploring existing methods and identifying opportunities for improvement. Since mathematical-based approaches provide a solid foundation and are well-documented, we began our experiments by testing these methods, as described in the background. We later extended our exploration to deep learning based methods.

=== Mathematical based ===
We explored four main methods, building on the foundational work discussed in the background section.

==== Method 1: Daltonization as a Baseline ====
We started with the relatively intuitive Daltonization method, where we adjusted the colors in an image to compensate for color vision deficiencies by simulating how the colors appear to individuals with CVD. This involves computing the difference between the original and simulated color perception in the LMS (Long, Medium, Short wavelength) color space. The calculated error is then corrected and mapped back to the RGB space using a transformation matrix, resulting in a recolored image that enhances color differentiation for viewers with CVD.

The simulation of CVDs relies on the physiology of human vision, particularly the responses of the Long (L), Medium (M), and Short (S) wavelength-sensitive cones in the retina. The LMS color space is derived from the spectral sensitivities of these cones, making it an ideal framework for modeling human color perception.

To simulate CVD, we first transformed colors in RGB color space into the LMS color space using the following linear transformation matrix based on Stockman and Sharpe’s cone fundamentals:
<math display="block">
T_{\text{RGB-to-LMS}} = \begin{bmatrix}
0.3904725 & 0.54990437 & 0.00890159 \\
0.07092586 & 0.96310739 & 0.00135809 \\
0.02314268 & 0.12801221 & 0.93605194
\end{bmatrix}
</math>

For individuals with CVD, the missing cone’s response is replaced by a weighted combination of the remaining two cones. This approach, introduced by Brettel, Viénot, and Mollon (1997) [7], uses specific coefficients derived from cone sensitivities. For example, in protanopia (L-cone deficiency), the L-cone response is approximated using the M- and S-cone responses as:
<math display="block">
L_{\text{simulated}} = 0 \cdot L + 0.90822864 \cdot M + 0.008192 \cdot S
</math>

For deuteranopia (M-cone deficiency), the M-cone is replaced as:
<math display="block">
M_{\text{simulated}} = 1.10104433 \cdot L + 0 \cdot M - 0.00901975 \cdot S
</math>

For tritanopia (S-cone deficiency), the S-cone is replaced as:
<math display="block">
S_{\text{simulated}} = -0.15773032 \cdot L + 1.19465634 \cdot M + 0 \cdot S
</math>

These transformations allow accurate simulation of the perceptual experience of individuals with CVD. (The numbers are derived from [5]).

The error between the original and simulated is then mapped into the RGB color space using a deficiency-specific correction matrix, which adjusts the image to enhance contrast and recover lost color differences. The predefined correction matrix is applied to the error in RGB space, transforming it back into LMS space for final adjustments. The corrected LMS values are added back to the original values, producing a recolored image that improves visual accessibility for viewers with CVD. This approach uses the Daltonize-inspired correction matrix:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

==== Method 2: Optimizing Objective Function ====
To improve the results from the Daltonization method, we designed a framework inspired by methods discussed in the background, incorporating dominant color extraction, optimization-based recoloring, and edit propagation. This approach aims to find a balance between the naturalness and contrast while compensating colors that are not visible for corresponding CVD types.

===== 1. Extraction of Dominant Colors =====
We begin by extracting the dominant colors from the input image using fuzzy clustering via a K-means algorithm. This step identifies a reduced set of representative colors that capture the primary color information in the image:
<math display="block">
\mathbf{C} = \{\mathbf{c}_1, \mathbf{c}_2, \ldots, \mathbf{c}_N\},
</math>
where <math display="inline">N</math> represents the number of clusters, and <math display="inline">\mathbf{c}_i</math> represents the centroid of the <math display="inline">i</math>-th cluster.

===== 2. Optimization-Based Recoloring =====
Once the dominant colors are extracted, we apply an optimization process to adjust these colors. The optimization uses the formulas mentioned in [9], and aims to balance two key objectives:

1. Naturalness Preservation: Ensures the recolored image minimally deviates from the original.
<math display="block">
E_{\text{nat}} = \sum_{i=1}^N \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_i^{\text{original}}) \|^2,
</math>
where <math display="inline">\mathbf{T}</math> is the transformation matrix based on the severity and type of CVD, and <math display="inline">\mathbf{c}_i^{\text{original}}</math> is the original color.

2. Contrast Enhancement: Improves the differentiation of colors for individuals with CVD:
<math display="block">
E_{\text{cont}} = \sum_{i=1}^N \sum_{j>i} \left( \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_j) \|^2 - \| \mathbf{c}_i^{\text{original}} - \mathbf{c}_j^{\text{original}} \|^2 \right)^2.
</math>

The total objective function combines these two terms:
<math display="block">
E = \beta E_{\text{nat}} + E_{\text{cont}},
</math>
where <math display="inline">\beta</math> controls the trade-off between naturalness and contrast.

Optimization is performed using the L-BFGS-B algorithm to ensure efficient convergence under bounded constraints.

The transformation matrices for each type of CVD are the following, which are based on [12]:

<div style="text-align:center;">
<math display="inline">
T_{\text{Protanopia}} = \begin{bmatrix} 0.566 & 0.558 & 0 \\ 0.433 & 0.442 & 0.242 \\ 0 & 0 & 0.758 \end{bmatrix},
</math>
<math display="inline">
T_{\text{Deuteranopia}} = \begin{bmatrix} 0.625 & 0.7 & 0 \\ 0.375 & 0.3 & 0.3 \\ 0 & 0 & 0.7 \end{bmatrix},
</math>
<math display="inline">
T_{\text{Tritanopia}} = \begin{bmatrix} 0.95 & 0 & 0 \\ 0.05 & 0.433 & 0 \\ 0 & 0.567 & 1 \end{bmatrix}.
</math>
</div>

===== 3. Edit Propagation =====
After optimizing the dominant colors, we propagate these edits across the entire image to ensure smooth transitions. This propagation step leverages the CIE-Lab color space, which is perceptually uniform, meaning that the Euclidean distance in this space correlates well with human color perception. The process begins by mapping the original image and the optimized dominant colors into the Lab color space. In this space, the differences between the original and recolored dominant colors are computed to capture the adjustments made during the optimization step:
<math display="block">
\Delta L^* = \text{griddata}(\mathbf{c}^{\text{original}}, \mathbf{c}^{\text{recolored}} - \mathbf{c}^{\text{original}}, \mathbf{I}),
</math>
where <math display="inline">\mathbf{I}</math> represents the pixel values in the Lab color space. Once the interpolated changes are computed, they are applied to the Lab representation of the original image. Finally, the adjusted Lab values are converted back to the RGB color space to reconstruct the recolored image.

==== Method 3: Improved with Confusion Line Adjustments ====
This method builds upon the previous method by introducing enhancements in dominant color extraction, optimization, and edit propagation, while incorporating an additional step to adjust colors near confusion lines in the CIE 1931 xyY color space inspired by [10]. These improvements aim to further enhance contrast and naturalness of the recolored images. Moreover, this method adds flexibility in adjusting for different severity levels for each CVD type.

===== 1. Improvements on Method 2 =====
To improve the performance of dominant color extraction, we transitioned from traditional K-means to MiniBatch K-means. This algorithm processes data in small batches, significantly reducing computational time while maintaining accuracy in clustering. The number of dominant colors was also reduced from 50 to 30 to focus on key representative colors and further enhance efficiency. The optimization objective is refined to leverage vectorization, improving computational efficiency. The two key terms remain:
<math display="block">
E = \beta E_{\text{nat}} + (1 - \beta) E_{\text{cont}}.
</math>
The optimization objective was refined to significantly improve computational efficiency by replacing the nested loops in the contrast enhancement term with vectorized operations. In the original implementation, the pairwise differences between colors were calculated iteratively using <math display="inline">O(N^2)</math> nested loops. The improved version eliminates this overhead by leveraging array broadcasting to compute all pairwise differences simultaneously, and the transformation matrix <math display="inline">\mathbf{T}</math> is then applied to all pairwise differences in a single tensor operation:
<math display="block">
\mathbf{T}_{\Delta} = \text{tensordot}(\Delta_{ij}, \mathbf{T}),
</math>
and the norms are computed in parallel across the entire array. Additionally, the weighting parameter <math display="inline">\beta</math> was adjusted to favor naturalness preservation, ensuring better visual integrity in the recolored image.
The propagation step changed to use a k-d tree for fast nearest neighbor searches, replacing grid-based interpolation. This approach more efficiently matches each pixel in the Lab color space to the closest dominant color:
<math display="block">
\mathbf{I}_{\text{adjusted}} = \mathbf{C}_{\text{recolored}}[\text{k-d tree query}(\mathbf{I})],
</math>
where <math display="inline">\mathbf{I}</math> represents the pixel values in Lab space.
These refinements enable faster optimization while improving the balance between naturalness and contrast enhancement.

===== 2. Confusion Line Adjustments =====
An additional step adjusts colors near confusion lines in the CIE 1931 xyY color space to enhance distinguishability:

1. Confusion lines are defined for protanopia, deuteranopia, and tritanopia, based on [10]. For example, for protanopia:
<math display="block">
\text{Confusion Line: Start} = (0.735, 0.265), \quad \text{End} = (0.115, 0.885).
</math>

2. Colors near the confusion line are identified using orthogonal distance:
<math display="block">
d(\mathbf{xy}, L) = \frac{\| (\mathbf{xy} - \mathbf{p}_1) \times (\mathbf{p}_2 - \mathbf{p}_1) \|}{\|\mathbf{p}_2 - \mathbf{p}_1\|},
</math>
where <math display="inline">\mathbf{p}_1</math> and <math display="inline">\mathbf{p}_2</math> are the start and end points of the confusion line.

3. Identified colors are shifted orthogonally away from the line:
<math display="block">
\mathbf{xy}_{\text{adjusted}} = \mathbf{xy} + \lambda \mathbf{v}_{\perp},
</math>
where <math display="inline">\mathbf{v}_{\perp}</math> is a perpendicular vector, and <math display="inline">\lambda</math> is a scaling factor.

These improvements significantly enhanced both the effectiveness and efficiency of the recoloring process for individuals with CVD on top of Method 2.

=== Deep Learning based ===

==== Task Overview ====
Given an input RGB image and a label for the user (as shown in the figure), we want a deep learning model to output a recolored RGB image that is specific to that user. More details on inputs and outputs are discussed in further sections but an overview is shown in Figure 1. All of the code was done in Python using a deep learning framework called [https://pytorch.org PyTorch]
[[File:Io.png|right|thumb|200px|Figure 1: Dataset]]

==== Types ====
1. ''' Supervised methods ''':
These are deep learning models that require a 'ground truth' recolored image for the neural network to learn recolorization. While these methods are simple, easy to train and integrate the user label, they require an already present ground truth comparison of expected output.

2. ''' Unsupervised methods ''':
These models are trained without a ground truth and can also encode user label information while training. They are generally better at generating more natural images, but they require more compute and sophisticated model architectures or loss functions for the recoloring task

==== Dataset ====
The dataset used for this project was constructed specifically to address the challenges of recoloring images for individuals with color vision deficiency (CVD). We first gathered an open-source RGB image dataset from [2]. To improve the capability of the proposed model to enhance the contrast between CVD-indistinguishable color
pairs, in their study, they created a new dataset consisting of 141,000 pictures of both natural scenes and artificial images containing
CVD-confusing colors without labels. To generate labels (and ground truth recolored images for supervised methods), we randomly sampled 15,000 images and recolored by simulating random labels for severity and type of CVD. The recoloring for ground truth images was done using a [https://github.com/jbhuang0604/RecolorForColorblind/tree/master MATLAB script] (adapted to Python) from [4]. Note: The open-source tools used in the Python version for the recoloring script were [https://scikit-image.org Scikit-Image], [https://scipy.org Scipy] and [https://python-colormath.readthedocs.io/en/latest/ Colormath].

As shown in Figure 1, each sample in the dataset consists of:

1. ''' Original RGB Image''' : High-resolution images, resized to <code> 256x256</code> pixels and normalized to <code>[0,1]</code> range, representing the standard color space.

2. ''' CVD Labels ''' : Condition labels encoded as <code>severity * [protan, deutan]</code>, where severity ranges from 0.1 to 1.0. For example, a label <code>[0.6, 0]</code> corresponds to protanopia at 60% severity.

Data augmentation techniques such as random rotations, crops, and brightness adjustments were applied to expand the dataset, ensuring robust model generalization across diverse scenarios.

==== Supervised Methods ====
===== Conditional Parallel RGB MLP =====
[[File:mlp.png|right|thumb|Figure 2: Conditional MLP architecture]]
As shown in Figure 2, the model predicts the R, G, and B channels separately using an independent multi-layer perceptron (MLP) for each channel. The input image is concatenated with the label encoding along the channel dimension and is passed to 3 parallel MLPs simultaneously. These parallel networks are learned to predicted R, G, B channels of a recolored image based on given ground truth. The outputs from each of these networks are concatenated to produce the recolored RGB image of same spatial dimensions as input. Essentially, each channel is disentangled, enabling targeted adjustments.

The loss function used to train was pixel wise, mean-squared error loss:
<math>
\mathcal{L}_{\text{MSE}} = \frac{1}{N} \sum_{p=1}^{N} \left( I(p) - I'(p) \right)^2
</math>

Where:
* I, I': Recolored (model output) image and ground truth recolored image respectively
* p: Image index
* N: Total number of images

===== Conditional U-Net =====
In a similar fashion of inputs, a convolutional neural network (CNN)-based U-Net architecture was tested to generate a full recolored image as output. The conditional inputs here affect both the encoder and decoder. [[File:Unet condtional.png|right|thumb|Figure 3: Conditional U-Net architecture]]
U-Nets are widely used in computer vision tasks and are very robust to new tasks as well. The architecture we adopted is shown in Figure 3.
The loss function used to train the U-Net was a commonly used VGG Perceptual Loss:
<math>
\mathcal{L}_{\text{VGG}} = \sum_{l} \frac{1}{N_l} \| \phi_l(I) - \phi_l(I') \|_2^2
</math>

Where:
* I and I': are recolored (model output) and ground recolored images respectively
* <math>\phi_l</math> is the l-th of the pre-trained VGG network

==== Unsupervised Methods ====
===== Conditional Autoencoder =====
As shown in Figure4, an unsupervised CNN-based encoder-decoder network was trained to reconstruct full recolored images with a CVD-aware color palette. The key to making this network align with the recoloring task was the loss functions. The loss functions we used to train this network were inspired from [2]. [[File:Ae.png|right|350px|thumb|Figure 4: Conditional Autoencoder architecture]]

The total loss function is given by:
<math>
\mathcal{L}_{\text{total}} = \alpha \cdot \mathcal{L}_{\text{naturalness}} + 2 \cdot (1 - \alpha) \cdot \mathcal{L}_{\text{contrast}}
</math>

Where:
<math>
\mathcal{L}_{\text{contrast}} = \beta \cdot \mathcal{L}_{\text{global}} + (2 - \beta) \cdot \mathcal{L}_{\text{local}}
</math>

The components of the loss functions are described below:

1. '''Global Contrast Loss''':
The global contrast loss ensures that the overall contrast of the recolored image is preserved. It is defined as
<math>
\mathcal{L}_{global} = \frac{1}{\|\omega\|} \sum_{<x, y> \in \epsilon \omega} \text{CL}(x, y)
</math>

2. '''Local Contrast Loss''':
The local contrast loss focuses on preserving the contrast within a small neighborhood around each pixel. <math>
\mathcal{L}_{l} = \frac{1}{N} \sum_{x=1}^{N} \sum_{y \in \omega_x} \frac{\text{CL}(x, y)}{\|\omega_x\|}
</math>

Note:

<math>
\text{CL}(x, y) = \|\hat{c}_x' - \hat{c}_y'\| - \|c_x - c_y\|
</math>

* x,y: Two distinct pixels in the image.
* cx and cy: CVD simulated colors of original image
* c^x′and c^y: CVD simulated colors of recolored image (model output)
* ||w||: Global (or large) window of image
* ||wx||: Local window or neighborhood around a pixel x

3. '''Naturalness Loss''':
The naturalness loss drives output image to have colors that are visually similar and close to natural distributions. <math>
\mathcal{L}_{\text{natural}} = 1 - \text{SSIM}(I', I)
</math>

Where:
* I(i), I'(i): Original and recolored images respectively

== Results ==
=== Mathematical based methods ===
{| class="wikitable"
|+ Table 1: Quantitative Evaluation Results for Mathematical Methods
! !! Method 1 !! Method 2 !! Method 3 !! Method 4
|-
! colspan="5" | Performance
|-
| Time/image || 0.2s || 1m13s || 4.4s || 1.6s
|-
! colspan="5" | SSIM Metrics
|-
| Original vs Recolored || 0.0066 || 0.9998 || 0.9988 || 0.9902
|-
| Original vs Original Simulated || 0.9985 || 0.9985 || 0.9985 || 0.9985
|-
| Recolored vs Recolored Simulated || 0.9565 || 0.9986 || 0.9986 || 0.9968
|-
! colspan="5" | TCC Metrics
|-
| Original vs Recolored || 0.4211 || 0.0001 || 0.0003 || 0.0005
|-
| Original vs Original Simulated || 0.0004 || 0.0003 || 0.0003 || 0.0003
|-
| Recolored vs Recolored Simulated || 0.0380 || 0.0003 || 0.0002 || 0.0005
|-
! colspan="5" | CD ΔE76 Metrics
|-
| Original vs Recolored || 57.4513 || 0.0217 || 0.0632 || 0.1057
|-
| Original vs Original Simulated || 0.0462 || 0.0462 || 0.0462 || 0.0462
|-
| Recolored vs Recolored Simulated || 8.4251 || 0.0458 || 0.0435 || 0.0578
|-
! colspan="5" | CIEDE2000 Metrics
|-
| Original vs Recolored || 41.2667 || 0.0229 || 0.0675 || 0.1312
|-
| Original vs Original Simulated || 0.0681 || 0.0681 || 0.0681 || 0.0681
|-
| Recolored vs Recolored Simulated || 6.9145 || 0.0671 || 0.0630 || 0.0838
|-
! colspan="5" | CIEDE94 Metrics
|-
| Original vs Recolored || 57.3637 || 0.0217 || 0.0630 || 0.1056
|-
| Original vs Original Simulated || 0.0461 || 0.0461 || 0.0461 || 0.0461
|-
| Recolored vs Recolored Simulated || 5.3878 || 0.0457 || 0.0434 || 0.0576
|-
! colspan="5" | D-CIELAB ΔEab Metrics
|-
| Original vs Recolored || 2.1314 || 3.8863 || 7.6867 || 8.0045
|-
| Original vs Original Simulated || 1.7209 || 1.7209 || 1.7209 || 1.7209
|-
| Recolored vs Recolored Simulated || 1.5926 || 1.9673 || 1.4363 || 2.4009
|}

=== Deep Learning based methods ===
The results focus on evaluating the performance of the above neural network architectures—Conditional Parallel RGB MLP, Deep U-Net, and Conditional Autoencoder. Quantitive metrics such as Structural Similarity Index (SSIM), total color contrast (TCC), Chromatic Difference (CD), and inference time were used to assess the effectiveness of the models provided in [1] and [2].

==== Qualitative Results ====
The recolored outputs were visually evaluated to determine their alignment with expected results. The 'expected' results for supervised mean how closely they resemble ground truth recolored image and for unsupervised method mean how much contrast and naturalness is observed in the CVD simulated recolored images compared to original.
The results and takeaways can be summarized as follows:

1. '''Conditional Parallel RGB MLP''': (Figure 5)
[[File:Mlp_res.png|right|400px|thumb|Figure 5 Conditional MLP: Model failure]]
* Recoloring was inconsistent, with visible artifacts in regions where spatial correlations were essential.
* The pixels seemed more discretized, suggesting that disentanglement was not very useful for this case (especially naturalness).
* Failed to preserve natural color transitions, particularly in complex images.
2. '''Conditional U-Net''': (Figure 6, 7)
[[File:Unet_res1.png|right|400px|thumb|Figure 6 Conditional U-Net: Model failure]]
[[File:Unet_res2.png|right|400px|thumb|Figure 7 Conditional U-Net: CVD Simulated examples]]
* Produced stable recoloring, preserving structural details.
* Initially showed improvement towards resembling ground truth, but over time started 'reconstructing' the colors of the original image.
* The CVD simulations of recolored versus original were similar or worse meaning that the model was not doing well for this task
* Sometimes it over-saturated some colors, affecting the visual appeal.
3. '''Conditional Autoencoder''': (Figure 8, 9)
[[File:ae_res1.png|right|400px|thumb|Figure 8 Conditional Autoencoder: Majority good results]]
[[File:ae_res1.png|right|400px|thumb|Figure 9 Conditional Autoencoder: Marginal or negative improvement + Blurriness]]
* Achieved smooth and natural recoloring, with fewer artifacts.
* Showed the highest contrast improvement among the three models.
* In some cases, hurt the contrast in the CVD simulated colors and in some there was marginal improvement in contrast.
* Blurriness in the recolored images was seen (possibly due to naturalness factor being more prioritized even though weight coefficients in the loss term favored contrast (alpha = 0.25, beta = 1.0)).

==== Quantitative Results ====
Based on the above qualitative results, we decided to score and evaluate metrics for comparison with related work only using the Conditional Autoencoder.
As mentioned above, the evaluation metrics are adapted from [1] and [2]. Please refer to the definitions in the paper, as we have used the same. On a high level, the three components are:
* SSIM: Measures the structural similarity between the original and recolored images, ensuring the structural integrity of the recolored image is maintained.
<math>
SSIM(X, Y) = \frac{(2\mu_X\mu_Y + c_1)(2\sigma_{XY} + c_2)}{(\mu_X^2 + \mu_Y^2 + c_1)(\sigma_X^2 + \sigma_Y^2 + c_2)}
</math>

* Total Color Contrast: Quantifies the visibility improvement between indistinguishable colors for CVD individuals.
<math>
TCC = \frac{1}{n_1} \sum_{(i,j) \in \Omega_1} |x_i - x_j|
+ \frac{1}{N \cdot n_2} \sum_{i=1}^{N} \sum_{j \in \Omega_2} |x_i - x_j|
</math>
* Chromatic Difference: Quantifies the perceptual differences in color before and after recoloring, ensuring enhanced distinguishability
<math>
CD(i) = \sqrt{\lambda (l_i' - l_i)^2 + (a_i' - a_i)^2 + (b_i' - b_i)^2}
</math>
(lamda is a constant, not wavelength and l,a,b represent LAB space coordinates of recolored (') and original respectively.)
* Inference Time: Determines the computational efficiency of the models.

The key results are in Table 2 and takeaways for the Conditional Autoencoder can be summarized as follows:

{| class="wikitable" style="text-align:center; width:30%; margin:auto;"
|-
! Metric
! Value
|-
| Inference Time
| 2.6 seconds/image
|-
| SSIM ("Structure")
| 0.8707
|-
| Total Color Contrast ("Distinguishability")
| 0.5771 / (~0.851)*
|-
| Chromatic Difference ("Color")
| 0.3521 / (~0.963)*
|+ '''Table 2: Quantitative Evaluation Results'''
|}

Note: * indicates results from paper [2] for protan/deutan whichever is larger.

* TCC and CD are good but not as good as paper [2] because they use optimize networks for each CVD type separately.
* Blurry (SSIM is not optimized for enough)
* Mixing CVD types in the same network needs to be more sophisticated

== Conclusions ==
Through our (many) experiments, we learned a couple of things:

1. '''Model Effectiveness''':
Among the models, the Conditional Autoencoder showed the best balance between enhancing color contrast and preserving naturalness. It improved the distinguishability of colors for CVD individuals while maintaining a smooth, visually appealing output. However, it produced slightly blurry images, which could be improved with better loss functions or refinement techniques. The Conditional U-Net was also effective in preserving structure and providing stable recoloring, but it required careful training to avoid overfitting. The Conditional Parallel RGB MLP, while computationally fast, lacked the ability to capture spatial relationships between pixels, making it unsuitable for this task.

2. '''Importance of Loss Functions''':
Designing appropriate loss functions was crucial for achieving the right balance between naturalness, contrast enhancement, and structural preservation. The global and local contrast losses significantly improved the visibility of recolored images, while the naturalness loss ensured that the outputs did not look artificial. Incorporating metrics like SSIM and Chromatic Difference into the evaluation also helped us better understand how well the models performed.

3. '''Challenges with Data''':
One of the biggest challenges was ensuring that the dataset effectively represented real-world scenarios for CVD individuals. Simulating CVD perceptions and generating recolored images that matched those perceptions required a well-defined pipeline. A more diverse dataset or additional user studies with CVD participants could help fine-tune the models further.

4. '''Computational Efficiency''':
While models like the Conditional Autoencoder and Conditional U-Net provided high-quality recoloring, their inference times were moderate, making them feasible for real-time applications. Optimizing these models further could make them more scalable for real-world use cases, such as accessibility tools in apps or websites.

5. '''What Worked and What Didn’t''':
* Worked: Contrast enhancement methods using local and global losses were effective in improving visibility for CVD individuals. Transformer-inspired loss functions borrowed from Swin architecture added robustness.
* Didn’t Work: Pixel-wise methods like the Conditional RGB MLP struggled due to their inability to handle spatial dependencies. Additionally, overfitting was a recurring issue in larger architectures without careful training.

6. '''Future Directions''':
* Better Loss Functions: Refining the loss functions to address issues like blurriness in outputs could further improve results.
* User Studies: Testing the models with real CVD participants would provide valuable insights and help validate the results.
* Model Optimization: Reducing the computational cost of high-performing models like the Conditional Autoencoder could make them more practical for deployment.
* Exploration of New Architectures: Trying newer methods, such as lightweight transformers or diffusion-based models, might enhance recoloring performance while maintaining efficiency.

While there’s still room for improvement, our models demonstrated the potential of deep learning in addressing the challenges faced by individuals with CVD. Our future work would focus on refining these methods and bringing them closer to practical, everyday applications.

== References ==
[1] Li, H., Zhang, L., Zhang, X., Zhang, M., Zhu, G., Shen, P., ... & Shah, S. A. A. (2020). Color vision deficiency datasets & recoloring evaluation using GANs. Multimedia Tools and Applications, 79, 27583-27614.

[2] Chen, L., Zhu, Z., Huang, W., Go, K., Chen, X., & Mao, X. (2024). Image recoloring for color vision deficiency compensation using Swin transformer. Neural Computing and Applications, 36(11), 6051-6066.

[3] Jiang, S., Liu, D., Li, D., & Xu, C. (2023). Personalized image generation for color vision deficiency population. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22571-22580).

[4] Huang, J.-B., Chen, C.-S., Jen, T.-C., & Wang, S.-J. (n.d.). Image recolorization for the colorblind [GitHub repository]. Retrieved December 12, 2024, from https://github.com/jbhuang0604/RecolorForColorblind

[5] Dietrich, J. (n.d.). Daltonize Python Package [GitHub repository]. Retrieved December 12, 2024, from https://github.com/joergdietrich/daltonize/blob/main/daltonize/daltonize.py

[6] Dougherty, B., & Wade, A. (2000). Vischeck. Retrieved December 12, 2024, from https://www.vischeck.com/

[7] Brettel, H., Viénot, F., & Mollon, J. D. (1997). Computerized simulation of color appearance for dichromats. Josa a, 14(10), 2647-2655.

[8] Zhu, Z., Toyoura, M., Go, K., Fujishiro, I., Kashiwagi, K., & Mao, X. (2019). Processing images for red–green dichromats compensation via naturalness and information-preservation considered recoloring. The Visual Computer, 35, 1053-1066.

[9] Zhu, Z., Toyoura, M., Go, K., Kashiwagi, K., Fujishiro, I., Wong, T. T., & Mao, X. (2021). Personalized image recoloring for color vision deficiency compensation. IEEE Transactions on Multimedia, 24, 1721-1734.

[10] Tsekouras, G. E., Rigos, A., Chatzistamatis, S., Tsimikas, J., Kotis, K., Caridakis, G., & Anagnostopoulos, C. N. (2021). A novel approach to image recoloring for color vision deficiency. Sensors, 21(8), 2740.

[11] Huang, J. B., Chen, C. S., Jen, T. C., & Wang, S. J. (2009, April). Image recolorization for the colorblind. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 1161-1164). IEEE.

[12] Color-Blindness.com. (n.d.). COBLIS - Color Blindness Simulator. Retrieved December 13, 2024, from https://www.color-blindness.com/coblis-color-blindness-simulator/

== Appendix I ==
* [https://github.com/rainasong/psych221-aut24-final-project.git Code]
* [https://drive.google.com/drive/folders/10WMXPbtpV7Hy5_qBA_TCEbW-kCpj1D7v Dataset]

=== Additional results ===
1. '''Recolored Images - Conditional Autoencoder'''
<div style="display: inline; width: 220px; float: center;">
[[File:eb_1.png|400 px|Wikipedia encyclopedia]][[File:eb_2.png|400 px]] </div>

2. '''Loss curves'''
<div style="display: inline; width: 800px; float: center;">
[[File:loss_ae.png|300 px|center|thumb|Losses - Conditional Autoencoder]][[File:loss_unet.png|300 px|thumb|center|Losses - Conditional U-Net]][[File:loss_mlp.png|300 px|center|thumb|Losses - Conditional MLP]]</div>

== Appendix II ==
'''Ishikaa''':
* Training, evaluation and visualization for all deep learning methods (MLP, U-Net and Autoencoder)
* GMM recoloring method in Python & adding severity index
* 'Ground Truth' dataset creation and logging
* AWS Compute setup & configuration
* Written Report & Presentation

'''Raina''':

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-13T10:24:19Z

Rainas: /* Method 3: Improved with Confusion Line Adjustments */

== Introduction ==
Color Vision Deficiency (CVD) affects approximately 350 million individuals worldwide, impairing their ability to distinguish certain colors. Image recoloring for individuals with CVDs has been a well-researched area, with numerous attempts aimed at creating images that make colors more distinguishable for those with CVDs. CVDs are typically classified into three main types: protanopia (difficulty perceiving red), deuteranopia (difficulty perceiving green), and tritanopia (difficulty perceiving blue). These conditions arise due to the absence or malfunction of certain types of cone cells in the retina, which are responsible for color vision. For instance, the lack or defect of red or green cones leads to protanopia or deuteranopia, respectively. This can result in partial or complete loss of perception of specific colors. Moreover, individuals with CVD can be dichromats (completely missing one type of cone cell), or anomalous trichromats (having all three types of cones but with altered sensitivity), causing milder color perception issues. Most research has focused on protanopia and deuteranopia, as these conditions are more prevalent, and only a few consider different severity levels.

In this work, we aim to consider all three types of CVDs, taking into account varying levels of severity for personalization. We explore existing methods in the field and experiment through two main approaches: mathematical transformations and deep learning techniques. We will start by reviewing current advancements in these two domains, followed by presenting our experiments and results. Evaluations of each method will be provided, leading to a discussion of our findings and outlining potential directions for future work.

The motivation for this work arose from a personal experience. While admiring a rainbow, a phenomenon I find among the most beautiful in the world, with a friend who has deuteranopia, I realized that they were unable to distinguish the vibrant array of colors. This experience highlighted the emotional and perceptual gap caused by CVD, inspiring the goal of this project: to develop personalized and efficient tools that enhance color perception for individuals with CVDs. Ultimately, we aim to enable those with CVDs to enjoy the same vivid experiences, such as the beauty of a rainbow, experienced by those with normal color vision.

== Background ==
In recent years, numerous methods have been developed to recolor images for individuals with CVDs, ranging from traditional mathematical approaches to advanced deep learning techniques. This section focuses on the prominent recent works in these two categories.

=== Mathematical-based methods ===
Mathematical approaches to image recoloring for individuals with CVDs have been extensively developed to enhance color discrimination while trying to preserve the natural appearance of images. These methods typically involve color space transformations, optimization techniques, and perceptual modeling to achieve their objectives.

==== Daltonization ====
Daltonization enhances images for individuals with CVD by correcting colors based on the simulated deficiency. The process involves comparing the original LMS values with the simulated deficient values to compute the error:
<math display="block">
\text{Error}_{\text{LMS}} = \text{LMS}_{\text{original}} - \text{LMS}_{\text{simulated}}
</math>

The error is then mapped back to the RGB space using a correction matrix because the error contains the information that dichromats cannot see, and the correction matrix rotates it to a part of the spectrum that they can see. For example, the correction matrix, as implemented in tools like Daltonize [5] and Vischeck [6], is:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

The corrected RGB values are added back to the original LMS values to generate a daltonized image that improves contrast for CVD viewers.

==== Optimization-based Method ====
Zhu et al. [8] introduced an optimization-based recoloring framework for red-green dichromacy, aiming to balance naturalness and contrast. The framework minimizes a total loss function defined as:

<math display="block"> E = \beta E_{\text{nat}} + E_{\text{cont}} </math>

where <math>\beta</math> is a scalar weight that controls the trade-off between the two objectives: naturalness preservation (<math>E_{\text{nat}}</math>) and contrast enhancement (<math>E_{\text{cont}}</math>).

The naturalness term, <math>E_{\text{nat}}</math>, ensures that the recolored image closely resembles the original image for CVD viewers by minimizing perceptual differences:

<math display="block"> E_{\text{nat}} = \sum_{i=1}^N \| c_i^+ - c_i \|^2, </math>

where:
* <math>N</math> is the total number of pixels in the image,
* <math>c_i</math> is the original color of the <math>i</math>-th pixel,
* <math>c_i^+</math> is the recolored value of the <math>i</math>-th pixel,
* <math>\| c_i^+ - c_i \|</math> is the Euclidean distance, measuring the perceptual difference between the original and recolored colors.

The contrast term, <math>E_{\text{cont}}</math>, enhances the distinguishability of colors in the recolored image by minimizing changes in color contrast:

<math display="block"> E_{\text{cont}} = \sum_{i \neq j} \| (c_i^+ - c_j^+) - (c_i - c_j) \|^2, </math>

where:
* <math>(c_i^+ - c_j^+)</math> is the perceived color difference between pixels <math>i</math> and <math>j</math> after recoloring,
* <math>(c_i - c_j)</math> is the original color difference,
* <math>\| (c_i^+ - c_j^+) - (c_i - c_j) \|</math> represents the deviation in color contrast before and after recoloring.

To address the limitations of this approach, Zhu et al. [9] proposed a degree-adaptable framework incorporating a transformation matrix <math>T</math> that simulates CVD perception. The transformation matrix is defined as:

<math display="block"> T = \begin{bmatrix} t_{11} & t_{12} & t_{13} \\ t_{21} & t_{22} & t_{23} \\ t_{31} & t_{32} & t_{33} \end{bmatrix}, </math>

where <math>t_{ij}</math> are the elements representing the relationships between the original and perceived LMS (Long, Medium, Short wavelength) cone responses for individuals with CVD.

The degree-adaptable loss function extends the optimization by adjusting weights based on perceptual importance, defined as:

<math display="block"> E = \beta \sum_{i=1}^N \alpha_i \| T(c_i^+ - c_i) \|^2 + \sum_{i \neq j} \| T(c_i^+ - c_j^+) - T(c_i - c_j) \|^2. </math>

Here:
* <math>\alpha_i</math> assigns weights to each pixel, prioritizing the preservation of colors with smaller perception errors,
* <math>\| T(c_i^+ - c_i) \|</math> measures the perceptual difference after recoloring,
* <math>\| T(c_i^+ - c_j^+) - T(c_i - c_j) \|</math> quantifies the deviation in color contrast under CVD simulation.

This framework improves both contrast and personalization but requires further optimization for real-time performance.

==== Confusion lines based Method ====
Tsekouras et al. [10] proposed a novel image recoloring approach for individuals with protanopia and deuteranopia, focusing on improving color naturalness and enhancing contrast. Their framework consists of four modules, with a key focus on shifting confusing colors along confusion lines in the CIE 1931 chromaticity diagram.

The process begins with fuzzy clustering, which identifies representative colors (key colors) from the input image. These key colors are then analyzed on the chromaticity diagram, where confusion lines—paths representing colors indistinguishable by individuals with CVD—serve as the basis for recoloring. Confusion lines are defined using the copunctal point of the missing cone type and another reference point:

<math display="block">
d(v, L) = \frac{\left|(x_{cp} - x_0)(y_0 - y_v) - (x_0 - x_v)(y_{cp} - y_0)\right|}{\sqrt{(x_{cp} - x_0)^2 + (y_{cp} - y_0)^2}},
</math>

where:
* <math display="inline">v = (x_v, y_v)</math> is the chromaticity coordinate of the color,
* <math display="inline">L</math> is the confusion line passing through the copunctal point <math display="inline">(x_{cp}, y_{cp})</math> and another reference point <math display="inline">(x_0, y_0)</math>,
* <math display="inline">d(v, L)</math> measures the perpendicular distance from the point <math display="inline">v</math> to the confusion line <math display="inline">L</math>.

Confusing colors, identified as key colors lying on occupied confusion lines, are iteratively shifted to the nearest non-occupied confusion lines to enhance discriminability for CVD viewers. High-ranking colors, determined by their prominence in image clusters, are shifted to the nearest unoccupied confusion lines. This reallocation ensures that these colors are distinguishable to viewers with CVD while minimizing disruption to the image's overall color harmony.

After shifting, the luminance of the recolored key colors is optimized using a regularized objective function to balance naturalness and contrast:
<math display="block">
E = (E_1 + E_2) + \lambda E_3,
</math>

where:
* <math display="inline">E</math> is the total loss,
* <math display="inline">\lambda</math> is a weight parameter controlling the trade-off between contrast enhancement and naturalness preservation.

The first term, <math display="inline">E_1</math>, measures contrast enhancement for normal trichromats:

<math display="block">
E_1 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - b_j\| - \|f_D(a_{i,\text{rec}}) - f_D(b_j)\| \right|,
</math>

where:
* <math display="inline">n_A</math> and <math display="inline">n_B</math> are the number of key colors in clusters <math display="inline">A</math> and <math display="inline">B</math>, respectively,
* <math display="inline">a_i</math> is the chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">b_j</math> is the chromaticity of the <math display="inline">j</math>-th key color in cluster <math display="inline">B</math>,
* <math display="inline">f_D</math> is a function simulating the dichromatic vision of individuals with color vision deficiencies,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color.

The second term, <math display="inline">E_2</math>, measures contrast enhancement for dichromats:

<math display="block">
E_2 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - a_j\| - \|f_D(a_{i,\text{rec}}) - f_D(a_{j,\text{rec}})\| \right|,
</math>

where:
* <math display="inline">a_i</math> and <math display="inline">a_j</math> are the chromaticities of the <math display="inline">i</math>-th and <math display="inline">j</math>-th key colors in cluster <math display="inline">A</math>,
* <math display="inline">f_D(a_{i,\text{rec}})</math> simulates the dichromatic perception of the recolored chromaticity <math display="inline">a_{i,\text{rec}}</math>.

The third term, <math display="inline">E_3</math>, preserves the naturalness of the recolored image:

<math display="block">
E_3 = \frac{1}{n_A} \sum_{i=1}^{n_A} \|a_i - a_{i,\text{rec}}\|,
</math>

where:
* <math display="inline">a_i</math> is the original chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">\|a_i - a_{i,\text{rec}}\|</math> is the Euclidean distance between the original and recolored chromaticities, measuring how much the naturalness is preserved.

This method significantly enhances the contrast and naturalness of recolored images by leveraging confusion line geometry and regularized optimization. However, challenges remain in achieving real-time performance and handling cases where shifting may distort the aesthetic quality of the image.

==== GMM-based Method ====
Huang et al. [11] proposed an efficient and effective re-coloring algorithm for individuals with CVD using a Gaussian Mixture Model (GMM) to represent color distributions. The algorithm comprises four main steps: feature extraction, clustering using GMM, optimization of Gaussian components, and interpolation for recoloring.

Step 1 - Feature Extraction:
Each pixel in the input image is represented in the CIEL*a*b* color space, which approximates perceptual differences using the Euclidean distance between colors. The color feature vector <math display="inline">x</math> is used as input for clustering.

Step 2 - Clustering via GMM:
The color distribution of the image is modeled using a GMM with <math display="inline">K</math> Gaussian components:
<math display="block">
p(x|\Theta) = \sum_{i=1}^K \omega_i G_i(x|\theta_i),
</math>
where:
* <math display="inline">\Theta</math> is the parameter set containing all weights, means, and covariance matrices,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian,
* <math display="inline">G_i(x|\theta_i)</math> is the 3D normal distribution with parameters <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix).

Step 3 - Optimization:
To ensure color distinguishability for CVD viewers, the algorithm adjusts the mean vector of each Gaussian component using an optimization function that preserves the symmetric Kullback-Leibler (KL) divergence:
<math display="block">
D_{sKL}(G_i, G_j) = D_{KL}(G_i \| G_j) + D_{KL}(G_j \| G_i),
</math>
where:
* <math display="inline">D_{KL}(G_i \| G_j)</math> measures the dissimilarity between two Gaussian distributions <math display="inline">G_i</math> and <math display="inline">G_j</math>.

The optimization aims to preserve the contrast perceived by CVD viewers while maintaining naturalness. Weights are assigned to Gaussian components based on the perceptual importance of colors:
<math display="block">
\lambda_i = \frac{\sum_{j=1}^N \alpha_j p(i|x_j, \Theta)}{\sum_{k=1}^K \sum_{j=1}^N \alpha_j p(k|x_j, \Theta)},
</math>
where:
* <math display="inline">\alpha_j = \|x_j - \text{Sim}(x_j)\|</math> is the perceptual error of the <math display="inline">j</math>-th color feature when simulated for CVD,
* <math display="inline">\text{Sim}(\cdot)</math> is the simulation function for CVD perception.

Step 4 - Interpolation for Recoloring:
After optimizing the Gaussians, the mapping function <math display="inline">M_i(\cdot)</math> relocates the mean vectors while maintaining covariance matrices. Interpolation ensures smooth transitions between recolored regions:
<math display="block">
T(x_j)_H = x_j^H + \sum_{i=1}^K p(i|x_j, \Theta) (M_i(\mu_i)_H - \mu_i^H),
</math>
where:
* <math display="inline">T(x_j)_H</math> is the hue adjustment for the <math display="inline">j</math>-th color,
* <math display="inline">M_i(\mu_i)_H</math> is the mapped hue of the <math display="inline">i</math>-th Gaussian's mean.

While the GMM-based approach effectively models color distributions and enhances the contrast of recolored images significantly, it has limitations:
* The accuracy of recoloring depends on the choice of <math display="inline">K</math>, which may vary for different images.
* The method assumes diagonal covariance matrices for computational efficiency, which may oversimplify real-world color distributions. Sometimes the colors in the recolored images are not very natural.
* The high computational complexity in the optimization step of this algorithm may be difficult for real-time applications.

=== Deep Learning based methods ===
Conventional methods for recoloring, including optimization-based approaches (as discussed above), fail to generalize well across varying severity levels and CVD types. While these methods improve color differentiation, they frequently compromise naturalness or require extensive computational resources, making them less suitable for real-time, efficient, personalized applications.

==== GAN-Based Recoloring for CVD ====

In [1] GANs (Generative Adversarial Networks) was explored for recoloring, with a backbone Pix2Pix-GAN, Cycle-GAN, and Bicycle-GAN structure showing promising results. These models are generate creative recolored images by learning mappings between normal and CVD-affected color spaces. However, this and existing GAN approaches struggle with balancing naturalness and contrast. This specific reference also requires paired datasets (since it is adapted from style transfer), making it computationally intensive and less suitable for personalization.

==== Swin Transformer Recoloring ====

The authors in [2] introduced a hierarchical vision transformer (SWIN) architecture that processes images through shifted windows, effectively capturing both local and global contextual information. In computer vision, this design generally allows efficient handling of high-resolution images and has been applied to various tasks, including image classification and object detection. Despite its robust performance, this architecture is still computationally intensive and does not inherently account for the specific needs of CVD individuals, as it lacks mechanisms for personalized color adjustments.

==== Personalized CVD-GAN ====

To cater to the diverse needs of the CVD population, the Personalized CVD-GAN [3] was developed. This model generates images that are not only CVD-friendly but also tailored to individual degrees of color vision deficiency. By disentangling color representations using a unique triple-latent structure in their method, continuous personalization was possible to adjust images according to specific CVD severities. While effective, this approach is computationally demanding, making it less practical for real-time applications. In our experiment, it took around 18 days for one epoch (or one iteration over the entire dataset).

Thus, existing methods either lack personalization or are too resource-intensive for widespread use.

== Methods ==
We aim to find effective and efficient ways to recolor images for people with CVD with the personalization of different severity levels. We start by exploring existing methods and identifying opportunities for improvement. Since mathematical-based approaches provide a solid foundation and are well-documented, we began our experiments by testing these methods, as described in the background. We later extended our exploration to deep learning based methods.

=== Mathematical based ===
We explored four main methods, building on the foundational work discussed in the background section.

==== Method 1: Daltonization as a Baseline ====
We started with the relatively intuitive Daltonization method, where we adjusted the colors in an image to compensate for color vision deficiencies by simulating how the colors appear to individuals with CVD. This involves computing the difference between the original and simulated color perception in the LMS (Long, Medium, Short wavelength) color space. The calculated error is then corrected and mapped back to the RGB space using a transformation matrix, resulting in a recolored image that enhances color differentiation for viewers with CVD.

The simulation of CVDs relies on the physiology of human vision, particularly the responses of the Long (L), Medium (M), and Short (S) wavelength-sensitive cones in the retina. The LMS color space is derived from the spectral sensitivities of these cones, making it an ideal framework for modeling human color perception.

To simulate CVD, we first transformed colors in RGB color space into the LMS color space using the following linear transformation matrix based on Stockman and Sharpe’s cone fundamentals:
<math display="block">
T_{\text{RGB-to-LMS}} = \begin{bmatrix}
0.3904725 & 0.54990437 & 0.00890159 \\
0.07092586 & 0.96310739 & 0.00135809 \\
0.02314268 & 0.12801221 & 0.93605194
\end{bmatrix}
</math>

For individuals with CVD, the missing cone’s response is replaced by a weighted combination of the remaining two cones. This approach, introduced by Brettel, Viénot, and Mollon (1997) [7], uses specific coefficients derived from cone sensitivities. For example, in protanopia (L-cone deficiency), the L-cone response is approximated using the M- and S-cone responses as:
<math display="block">
L_{\text{simulated}} = 0 \cdot L + 0.90822864 \cdot M + 0.008192 \cdot S
</math>

For deuteranopia (M-cone deficiency), the M-cone is replaced as:
<math display="block">
M_{\text{simulated}} = 1.10104433 \cdot L + 0 \cdot M - 0.00901975 \cdot S
</math>

For tritanopia (S-cone deficiency), the S-cone is replaced as:
<math display="block">
S_{\text{simulated}} = -0.15773032 \cdot L + 1.19465634 \cdot M + 0 \cdot S
</math>

These transformations allow accurate simulation of the perceptual experience of individuals with CVD. (The numbers are derived from [5]).

The error between the original and simulated is then mapped into the RGB color space using a deficiency-specific correction matrix, which adjusts the image to enhance contrast and recover lost color differences. The predefined correction matrix is applied to the error in RGB space, transforming it back into LMS space for final adjustments. The corrected LMS values are added back to the original values, producing a recolored image that improves visual accessibility for viewers with CVD. This approach uses the Daltonize-inspired correction matrix:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

==== Method 2: Optimizing Objective Function ====
To improve the results from the Daltonization method, we designed a framework inspired by methods discussed in the background, incorporating dominant color extraction, optimization-based recoloring, and edit propagation. This approach aims to find a balance between the naturalness and contrast while compensating colors that are not visible for corresponding CVD types.

===== 1. Extraction of Dominant Colors =====
We begin by extracting the dominant colors from the input image using fuzzy clustering via a K-means algorithm. This step identifies a reduced set of representative colors that capture the primary color information in the image:
<math display="block">
\mathbf{C} = \{\mathbf{c}_1, \mathbf{c}_2, \ldots, \mathbf{c}_N\},
</math>
where <math display="inline">N</math> represents the number of clusters, and <math display="inline">\mathbf{c}_i</math> represents the centroid of the <math display="inline">i</math>-th cluster.

===== 2. Optimization-Based Recoloring =====
Once the dominant colors are extracted, we apply an optimization process to adjust these colors. The optimization uses the formulas mentioned in [9], and aims to balance two key objectives:

1. Naturalness Preservation: Ensures the recolored image minimally deviates from the original.
<math display="block">
E_{\text{nat}} = \sum_{i=1}^N \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_i^{\text{original}}) \|^2,
</math>
where <math display="inline">\mathbf{T}</math> is the transformation matrix based on the severity and type of CVD, and <math display="inline">\mathbf{c}_i^{\text{original}}</math> is the original color.

2. Contrast Enhancement: Improves the differentiation of colors for individuals with CVD:
<math display="block">
E_{\text{cont}} = \sum_{i=1}^N \sum_{j>i} \left( \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_j) \|^2 - \| \mathbf{c}_i^{\text{original}} - \mathbf{c}_j^{\text{original}} \|^2 \right)^2.
</math>

The total objective function combines these two terms:
<math display="block">
E = \beta E_{\text{nat}} + E_{\text{cont}},
</math>
where <math display="inline">\beta</math> controls the trade-off between naturalness and contrast.

Optimization is performed using the L-BFGS-B algorithm to ensure efficient convergence under bounded constraints.

The transformation matrices for each type of CVD are the following, which are based on [12]:

<div style="text-align:center;">
<math display="inline">
T_{\text{Protanopia}} = \begin{bmatrix} 0.566 & 0.558 & 0 \\ 0.433 & 0.442 & 0.242 \\ 0 & 0 & 0.758 \end{bmatrix},
</math>
<math display="inline">
T_{\text{Deuteranopia}} = \begin{bmatrix} 0.625 & 0.7 & 0 \\ 0.375 & 0.3 & 0.3 \\ 0 & 0 & 0.7 \end{bmatrix},
</math>
<math display="inline">
T_{\text{Tritanopia}} = \begin{bmatrix} 0.95 & 0 & 0 \\ 0.05 & 0.433 & 0 \\ 0 & 0.567 & 1 \end{bmatrix}.
</math>
</div>

===== 3. Edit Propagation =====
After optimizing the dominant colors, we propagate these edits across the entire image to ensure smooth transitions. This propagation step leverages the CIE-Lab color space, which is perceptually uniform, meaning that the Euclidean distance in this space correlates well with human color perception. The process begins by mapping the original image and the optimized dominant colors into the Lab color space. In this space, the differences between the original and recolored dominant colors are computed to capture the adjustments made during the optimization step:
<math display="block">
\Delta L^* = \text{griddata}(\mathbf{c}^{\text{original}}, \mathbf{c}^{\text{recolored}} - \mathbf{c}^{\text{original}}, \mathbf{I}),
</math>
where <math display="inline">\mathbf{I}</math> represents the pixel values in the Lab color space. Once the interpolated changes are computed, they are applied to the Lab representation of the original image. Finally, the adjusted Lab values are converted back to the RGB color space to reconstruct the recolored image.

==== Method 3: Improved with Confusion Line Adjustments ====
This method builds upon the previous method by introducing enhancements in dominant color extraction, optimization, and edit propagation, while incorporating an additional step to adjust colors near confusion lines in the CIE 1931 xyY color space inspired by [10]. These improvements aim to further enhance contrast and naturalness of the recolored images. Moreover, this method adds flexibility in adjusting for different severity levels for each CVD type.

===== 1. Improvements on Method 2 =====
To improve the performance of dominant color extraction, we transitioned from traditional K-means to MiniBatch K-means. This algorithm processes data in small batches, significantly reducing computational time while maintaining accuracy in clustering. The number of dominant colors was also reduced from 50 to 30 to focus on key representative colors and further enhance efficiency. The optimization objective is refined to leverage vectorization, improving computational efficiency. The two key terms remain:
<math display="block">
E = \beta E_{\text{nat}} + (1 - \beta) E_{\text{cont}}.
</math>
The optimization objective was refined to significantly improve computational efficiency by replacing the nested loops in the contrast enhancement term with vectorized operations. In the original implementation, the pairwise differences between colors were calculated iteratively using <math display="inline">O(N^2)</math> nested loops. The improved version eliminates this overhead by leveraging array broadcasting to compute all pairwise differences simultaneously, and the transformation matrix <math display="inline">\mathbf{T}</math> is then applied to all pairwise differences in a single tensor operation:
<math display="block">
\mathbf{T}_{\Delta} = \text{tensordot}(\Delta_{ij}, \mathbf{T}),
</math>
and the norms are computed in parallel across the entire array. Additionally, the weighting parameter <math display="inline">\beta</math> was adjusted to favor naturalness preservation, ensuring better visual integrity in the recolored image.
The propagation step changed to use a k-d tree for fast nearest neighbor searches, replacing grid-based interpolation. This approach more efficiently matches each pixel in the Lab color space to the closest dominant color:
<math display="block">
\mathbf{I}_{\text{adjusted}} = \mathbf{C}_{\text{recolored}}[\text{k-d tree query}(\mathbf{I})],
</math>
where <math display="inline">\mathbf{I}</math> represents the pixel values in Lab space.
These refinements enable faster optimization while improving the balance between naturalness and contrast enhancement.

===== 2. Confusion Line Adjustments =====
An additional step adjusts colors near confusion lines in the CIE 1931 xyY color space to enhance distinguishability:

1. Confusion lines are defined for protanopia, deuteranopia, and tritanopia, based on [10]. For example, for protanopia:
<math display="block">
\text{Confusion Line: Start} = (0.735, 0.265), \quad \text{End} = (0.115, 0.885).
</math>

2. Colors near the confusion line are identified using orthogonal distance:
<math display="block">
d(\mathbf{xy}, L) = \frac{\| (\mathbf{xy} - \mathbf{p}_1) \times (\mathbf{p}_2 - \mathbf{p}_1) \|}{\|\mathbf{p}_2 - \mathbf{p}_1\|},
</math>
where <math display="inline">\mathbf{p}_1</math> and <math display="inline">\mathbf{p}_2</math> are the start and end points of the confusion line.

3. Identified colors are shifted orthogonally away from the line:
<math display="block">
\mathbf{xy}_{\text{adjusted}} = \mathbf{xy} + \lambda \mathbf{v}_{\perp},
</math>
where <math display="inline">\mathbf{v}_{\perp}</math> is a perpendicular vector, and <math display="inline">\lambda</math> is a scaling factor.

These improvements significantly enhanced both the effectiveness and efficiency of the recoloring process for individuals with CVD on top of Method 2.

=== Deep Learning based ===

==== Task Overview ====
Given an input RGB image and a label for the user (as shown in the figure), we want a deep learning model to output a recolored RGB image that is specific to that user. More details on inputs and outputs are discussed in further sections but an overview is shown in Figure 1. All of the code was done in Python using a deep learning framework called [https://pytorch.org PyTorch]
[[File:Io.png|right|thumb|200px|Figure 1: Dataset]]

==== Types ====
1. ''' Supervised methods ''':
These are deep learning models that require a 'ground truth' recolored image for the neural network to learn recolorization. While these methods are simple, easy to train and integrate the user label, they require an already present ground truth comparison of expected output.

2. ''' Unsupervised methods ''':
These models are trained without a ground truth and can also encode user label information while training. They are generally better at generating more natural images, but they require more compute and sophisticated model architectures or loss functions for the recoloring task

==== Dataset ====
The dataset used for this project was constructed specifically to address the challenges of recoloring images for individuals with color vision deficiency (CVD). We first gathered an open-source RGB image dataset from [2]. To improve the capability of the proposed model to enhance the contrast between CVD-indistinguishable color
pairs, in their study, they created a new dataset consisting of 141,000 pictures of both natural scenes and artificial images containing
CVD-confusing colors without labels. To generate labels (and ground truth recolored images for supervised methods), we randomly sampled 15,000 images and recolored by simulating random labels for severity and type of CVD. The recoloring for ground truth images was done using a [https://github.com/jbhuang0604/RecolorForColorblind/tree/master MATLAB script] (adapted to Python) from [4]. Note: The open-source tools used in the Python version for the recoloring script were [https://scikit-image.org Scikit-Image], [https://scipy.org Scipy] and [https://python-colormath.readthedocs.io/en/latest/ Colormath].

As shown in Figure 1, each sample in the dataset consists of:

1. ''' Original RGB Image''' : High-resolution images, resized to <code> 256x256</code> pixels and normalized to <code>[0,1]</code> range, representing the standard color space.

2. ''' CVD Labels ''' : Condition labels encoded as <code>severity * [protan, deutan]</code>, where severity ranges from 0.1 to 1.0. For example, a label <code>[0.6, 0]</code> corresponds to protanopia at 60% severity.

Data augmentation techniques such as random rotations, crops, and brightness adjustments were applied to expand the dataset, ensuring robust model generalization across diverse scenarios.

==== Supervised Methods ====
===== Conditional Parallel RGB MLP =====
[[File:mlp.png|right|thumb|Figure 2: Conditional MLP architecture]]
As shown in Figure 2, the model predicts the R, G, and B channels separately using an independent multi-layer perceptron (MLP) for each channel. The input image is concatenated with the label encoding along the channel dimension and is passed to 3 parallel MLPs simultaneously. These parallel networks are learned to predicted R, G, B channels of a recolored image based on given ground truth. The outputs from each of these networks are concatenated to produce the recolored RGB image of same spatial dimensions as input. Essentially, each channel is disentangled, enabling targeted adjustments.

The loss function used to train was pixel wise, mean-squared error loss:
<math>
\mathcal{L}_{\text{MSE}} = \frac{1}{N} \sum_{p=1}^{N} \left( I(p) - I'(p) \right)^2
</math>

Where:
* I, I': Recolored (model output) image and ground truth recolored image respectively
* p: Image index
* N: Total number of images

===== Conditional U-Net =====
In a similar fashion of inputs, a convolutional neural network (CNN)-based U-Net architecture was tested to generate a full recolored image as output. The conditional inputs here affect both the encoder and decoder. [[File:Unet condtional.png|right|thumb|Figure 3: Conditional U-Net architecture]]
U-Nets are widely used in computer vision tasks and are very robust to new tasks as well. The architecture we adopted is shown in Figure 3.
The loss function used to train the U-Net was a commonly used VGG Perceptual Loss:
<math>
\mathcal{L}_{\text{VGG}} = \sum_{l} \frac{1}{N_l} \| \phi_l(I) - \phi_l(I') \|_2^2
</math>

Where:
* I and I': are recolored (model output) and ground recolored images respectively
* <math>\phi_l</math> is the l-th of the pre-trained VGG network

==== Unsupervised Methods ====
===== Conditional Autoencoder =====
As shown in Figure4, an unsupervised CNN-based encoder-decoder network was trained to reconstruct full recolored images with a CVD-aware color palette. The key to making this network align with the recoloring task was the loss functions. The loss functions we used to train this network were inspired from [2]. [[File:Ae.png|right|350px|thumb|Figure 4: Conditional Autoencoder architecture]]

The total loss function is given by:
<math>
\mathcal{L}_{\text{total}} = \alpha \cdot \mathcal{L}_{\text{naturalness}} + 2 \cdot (1 - \alpha) \cdot \mathcal{L}_{\text{contrast}}
</math>

Where:
<math>
\mathcal{L}_{\text{contrast}} = \beta \cdot \mathcal{L}_{\text{global}} + (2 - \beta) \cdot \mathcal{L}_{\text{local}}
</math>

The components of the loss functions are described below:

1. '''Global Contrast Loss''':
The global contrast loss ensures that the overall contrast of the recolored image is preserved. It is defined as
<math>
\mathcal{L}_{global} = \frac{1}{\|\omega\|} \sum_{<x, y> \in \epsilon \omega} \text{CL}(x, y)
</math>

2. '''Local Contrast Loss''':
The local contrast loss focuses on preserving the contrast within a small neighborhood around each pixel. <math>
\mathcal{L}_{l} = \frac{1}{N} \sum_{x=1}^{N} \sum_{y \in \omega_x} \frac{\text{CL}(x, y)}{\|\omega_x\|}
</math>

Note:

<math>
\text{CL}(x, y) = \|\hat{c}_x' - \hat{c}_y'\| - \|c_x - c_y\|
</math>

* x,y: Two distinct pixels in the image.
* cx and cy: CVD simulated colors of original image
* c^x′and c^y: CVD simulated colors of recolored image (model output)
* ||w||: Global (or large) window of image
* ||wx||: Local window or neighborhood around a pixel x

3. '''Naturalness Loss''':
The naturalness loss drives output image to have colors that are visually similar and close to natural distributions. <math>
\mathcal{L}_{\text{natural}} = 1 - \text{SSIM}(I', I)
</math>

Where:
* I(i), I'(i): Original and recolored images respectively

== Results ==
=== Mathematical based methods ===
{| class="wikitable"
|+ Table 1: Quantitative Evaluation Results for Mathematical Methods
! !! Method 1 !! Method 2 !! Method 3 !! Method 4
|-
! colspan="5" | Performance
|-
| Time/image || 0.2s || 1m13s || 4.4s || 1.6s
|-
! colspan="5" | SSIM Metrics
|-
| Original vs Recolored || 0.0066 || 0.9998 || 0.9988 || 0.9902
|-
| Original vs Original Simulated || 0.9985 || 0.9985 || 0.9985 || 0.9985
|-
| Recolored vs Recolored Simulated || 0.9565 || 0.9986 || 0.9986 || 0.9968
|-
! colspan="5" | TCC Metrics
|-
| Original vs Recolored || 0.4211 || 0.0001 || 0.0003 || 0.0005
|-
| Original vs Original Simulated || 0.0004 || 0.0003 || 0.0003 || 0.0003
|-
| Recolored vs Recolored Simulated || 0.0380 || 0.0003 || 0.0002 || 0.0005
|-
! colspan="5" | CD ΔE76 Metrics
|-
| Original vs Recolored || 57.4513 || 0.0217 || 0.0632 || 0.1057
|-
| Original vs Original Simulated || 0.0462 || 0.0462 || 0.0462 || 0.0462
|-
| Recolored vs Recolored Simulated || 8.4251 || 0.0458 || 0.0435 || 0.0578
|-
! colspan="5" | CIEDE2000 Metrics
|-
| Original vs Recolored || 41.2667 || 0.0229 || 0.0675 || 0.1312
|-
| Original vs Original Simulated || 0.0681 || 0.0681 || 0.0681 || 0.0681
|-
| Recolored vs Recolored Simulated || 6.9145 || 0.0671 || 0.0630 || 0.0838
|-
! colspan="5" | CIEDE94 Metrics
|-
| Original vs Recolored || 57.3637 || 0.0217 || 0.0630 || 0.1056
|-
| Original vs Original Simulated || 0.0461 || 0.0461 || 0.0461 || 0.0461
|-
| Recolored vs Recolored Simulated || 5.3878 || 0.0457 || 0.0434 || 0.0576
|-
! colspan="5" | D-CIELAB ΔEab Metrics
|-
| Original vs Recolored || 2.1314 || 3.8863 || 7.6867 || 8.0045
|-
| Original vs Original Simulated || 1.7209 || 1.7209 || 1.7209 || 1.7209
|-
| Recolored vs Recolored Simulated || 1.5926 || 1.9673 || 1.4363 || 2.4009
|}

=== Deep Learning based methods ===
The results focus on evaluating the performance of the above neural network architectures—Conditional Parallel RGB MLP, Deep U-Net, and Conditional Autoencoder. Quantitive metrics such as Structural Similarity Index (SSIM), total color contrast (TCC), Chromatic Difference (CD), and inference time were used to assess the effectiveness of the models provided in [1] and [2].

==== Qualitative Results ====
The recolored outputs were visually evaluated to determine their alignment with expected results. The 'expected' results for supervised mean how closely they resemble ground truth recolored image and for unsupervised method mean how much contrast and naturalness is observed in the CVD simulated recolored images compared to original.
The results and takeaways can be summarized as follows:

1. '''Conditional Parallel RGB MLP''': (Figure 5)
[[File:Mlp_res.png|right|400px|thumb|Figure 5 Conditional MLP: Model failure]]
* Recoloring was inconsistent, with visible artifacts in regions where spatial correlations were essential.
* The pixels seemed more discretized, suggesting that disentanglement was not very useful for this case (especially naturalness).
* Failed to preserve natural color transitions, particularly in complex images.
2. '''Conditional U-Net''': (Figure 6, 7)
[[File:Unet_res1.png|right|400px|thumb|Figure 6 Conditional U-Net: Model failure]]
[[File:Unet_res2.png|right|400px|thumb|Figure 7 Conditional U-Net: CVD Simulated examples]]
* Produced stable recoloring, preserving structural details.
* Initially showed improvement towards resembling ground truth, but over time started 'reconstructing' the colors of the original image.
* The CVD simulations of recolored versus original were similar or worse meaning that the model was not doing well for this task
* Sometimes it over-saturated some colors, affecting the visual appeal.
3. '''Conditional Autoencoder''': (Figure 8, 9)
[[File:ae_res1.png|right|400px|thumb|Figure 8 Conditional Autoencoder: Majority good results]]
[[File:ae_res1.png|right|400px|thumb|Figure 9 Conditional Autoencoder: Marginal or negative improvement + Blurriness]]
* Achieved smooth and natural recoloring, with fewer artifacts.
* Showed the highest contrast improvement among the three models.
* In some cases, hurt the contrast in the CVD simulated colors and in some there was marginal improvement in contrast.
* Blurriness in the recolored images was seen (possibly due to naturalness factor being more prioritized even though weight coefficients in the loss term favored contrast (alpha = 0.25, beta = 1.0)).

==== Quantitative Results ====
Based on the above qualitative results, we decided to score and evaluate metrics for comparison with related work only using the Conditional Autoencoder.
As mentioned above, the evaluation metrics are adapted from [1] and [2]. Please refer to the definitions in the paper, as we have used the same. On a high level, the three components are:
* SSIM: Measures the structural similarity between the original and recolored images, ensuring the structural integrity of the recolored image is maintained.
<math>
SSIM(X, Y) = \frac{(2\mu_X\mu_Y + c_1)(2\sigma_{XY} + c_2)}{(\mu_X^2 + \mu_Y^2 + c_1)(\sigma_X^2 + \sigma_Y^2 + c_2)}
</math>

* Total Color Contrast: Quantifies the visibility improvement between indistinguishable colors for CVD individuals.
<math>
TCC = \frac{1}{n_1} \sum_{(i,j) \in \Omega_1} |x_i - x_j|
+ \frac{1}{N \cdot n_2} \sum_{i=1}^{N} \sum_{j \in \Omega_2} |x_i - x_j|
</math>
* Chromatic Difference: Quantifies the perceptual differences in color before and after recoloring, ensuring enhanced distinguishability
<math>
CD(i) = \sqrt{\lambda (l_i' - l_i)^2 + (a_i' - a_i)^2 + (b_i' - b_i)^2}
</math>
(lamda is a constant, not wavelength and l,a,b represent LAB space coordinates of recolored (') and original respectively.)
* Inference Time: Determines the computational efficiency of the models.

The key results are in Table 2 and takeaways for the Conditional Autoencoder can be summarized as follows:

{| class="wikitable" style="text-align:center; width:30%; margin:auto;"
|-
! Metric
! Value
|-
| Inference Time
| 2.6 seconds/image
|-
| SSIM ("Structure")
| 0.8707
|-
| Total Color Contrast ("Distinguishability")
| 0.5771 / (~0.851)*
|-
| Chromatic Difference ("Color")
| 0.3521 / (~0.963)*
|+ '''Table 2: Quantitative Evaluation Results'''
|}

Note: * indicates results from paper [2] for protan/deutan whichever is larger.

* TCC and CD are good but not as good as paper [2] because they use optimize networks for each CVD type separately.
* Blurry (SSIM is not optimized for enough)
* Mixing CVD types in the same network needs to be more sophisticated

== Conclusions ==
Through our (many) experiments, we learned a couple of things:

1. '''Model Effectiveness''':
Among the models, the Conditional Autoencoder showed the best balance between enhancing color contrast and preserving naturalness. It improved the distinguishability of colors for CVD individuals while maintaining a smooth, visually appealing output. However, it produced slightly blurry images, which could be improved with better loss functions or refinement techniques. The Conditional U-Net was also effective in preserving structure and providing stable recoloring, but it required careful training to avoid overfitting. The Conditional Parallel RGB MLP, while computationally fast, lacked the ability to capture spatial relationships between pixels, making it unsuitable for this task.

2. '''Importance of Loss Functions''':
Designing appropriate loss functions was crucial for achieving the right balance between naturalness, contrast enhancement, and structural preservation. The global and local contrast losses significantly improved the visibility of recolored images, while the naturalness loss ensured that the outputs did not look artificial. Incorporating metrics like SSIM and Chromatic Difference into the evaluation also helped us better understand how well the models performed.

3. '''Challenges with Data''':
One of the biggest challenges was ensuring that the dataset effectively represented real-world scenarios for CVD individuals. Simulating CVD perceptions and generating recolored images that matched those perceptions required a well-defined pipeline. A more diverse dataset or additional user studies with CVD participants could help fine-tune the models further.

4. '''Computational Efficiency''':
While models like the Conditional Autoencoder and Conditional U-Net provided high-quality recoloring, their inference times were moderate, making them feasible for real-time applications. Optimizing these models further could make them more scalable for real-world use cases, such as accessibility tools in apps or websites.

5. '''What Worked and What Didn’t''':
* Worked: Contrast enhancement methods using local and global losses were effective in improving visibility for CVD individuals. Transformer-inspired loss functions borrowed from Swin architecture added robustness.
* Didn’t Work: Pixel-wise methods like the Conditional RGB MLP struggled due to their inability to handle spatial dependencies. Additionally, overfitting was a recurring issue in larger architectures without careful training.

6. '''Future Directions''':
* Better Loss Functions: Refining the loss functions to address issues like blurriness in outputs could further improve results.
* User Studies: Testing the models with real CVD participants would provide valuable insights and help validate the results.
* Model Optimization: Reducing the computational cost of high-performing models like the Conditional Autoencoder could make them more practical for deployment.
* Exploration of New Architectures: Trying newer methods, such as lightweight transformers or diffusion-based models, might enhance recoloring performance while maintaining efficiency.

While there’s still room for improvement, our models demonstrated the potential of deep learning in addressing the challenges faced by individuals with CVD. Our future work would focus on refining these methods and bringing them closer to practical, everyday applications.

== References ==
[1] Li, H., Zhang, L., Zhang, X., Zhang, M., Zhu, G., Shen, P., ... & Shah, S. A. A. (2020). Color vision deficiency datasets & recoloring evaluation using GANs. Multimedia Tools and Applications, 79, 27583-27614.

[2] Chen, L., Zhu, Z., Huang, W., Go, K., Chen, X., & Mao, X. (2024). Image recoloring for color vision deficiency compensation using Swin transformer. Neural Computing and Applications, 36(11), 6051-6066.

[3] Jiang, S., Liu, D., Li, D., & Xu, C. (2023). Personalized image generation for color vision deficiency population. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22571-22580).

[4] Huang, J.-B., Chen, C.-S., Jen, T.-C., & Wang, S.-J. (n.d.). Image recolorization for the colorblind [GitHub repository]. Retrieved December 12, 2024, from https://github.com/jbhuang0604/RecolorForColorblind

[5] Dietrich, J. (n.d.). Daltonize Python Package [GitHub repository]. Retrieved December 12, 2024, from https://github.com/joergdietrich/daltonize/blob/main/daltonize/daltonize.py

[6] Dougherty, B., & Wade, A. (2000). Vischeck. Retrieved December 12, 2024, from https://www.vischeck.com/

[7] Brettel, H., Viénot, F., & Mollon, J. D. (1997). Computerized simulation of color appearance for dichromats. Josa a, 14(10), 2647-2655.

[8] Zhu, Z., Toyoura, M., Go, K., Fujishiro, I., Kashiwagi, K., & Mao, X. (2019). Processing images for red–green dichromats compensation via naturalness and information-preservation considered recoloring. The Visual Computer, 35, 1053-1066.

[9] Zhu, Z., Toyoura, M., Go, K., Kashiwagi, K., Fujishiro, I., Wong, T. T., & Mao, X. (2021). Personalized image recoloring for color vision deficiency compensation. IEEE Transactions on Multimedia, 24, 1721-1734.

[10] Tsekouras, G. E., Rigos, A., Chatzistamatis, S., Tsimikas, J., Kotis, K., Caridakis, G., & Anagnostopoulos, C. N. (2021). A novel approach to image recoloring for color vision deficiency. Sensors, 21(8), 2740.

[11] Huang, J. B., Chen, C. S., Jen, T. C., & Wang, S. J. (2009, April). Image recolorization for the colorblind. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 1161-1164). IEEE.

[12] Color-Blindness.com. (n.d.). COBLIS - Color Blindness Simulator. Retrieved December 13, 2024, from https://www.color-blindness.com/coblis-color-blindness-simulator/

== Appendix I ==
* [https://github.com/rainasong/psych221-aut24-final-project.git Code]
* [https://drive.google.com/drive/folders/10WMXPbtpV7Hy5_qBA_TCEbW-kCpj1D7v Dataset]

=== Additional results ===
1. '''Recolored Images - Conditional Autoencoder'''
<div style="display: inline; width: 220px; float: center;">
[[File:eb_1.png|400 px|Wikipedia encyclopedia]][[File:eb_2.png|400 px]] </div>

2. '''Loss curves'''
<div style="display: inline; width: 800px; float: center;">
[[File:loss_ae.png|300 px|center|thumb|Losses - Conditional Autoencoder]][[File:loss_unet.png|300 px|thumb|center|Losses - Conditional U-Net]][[File:loss_mlp.png|300 px|center|thumb|Losses - Conditional MLP]]</div>

== Appendix II ==
'''Ishikaa''':
* Training, evaluation and visualization for all deep learning methods (MLP, U-Net and Autoencoder)
* GMM recoloring method in Python & adding severity index
* 'Ground Truth' dataset creation and logging
* AWS Compute setup & configuration
* Written Report & Presentation

'''Raina''':

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-13T10:23:41Z

Rainas: /* Mathematical based */

== Introduction ==
Color Vision Deficiency (CVD) affects approximately 350 million individuals worldwide, impairing their ability to distinguish certain colors. Image recoloring for individuals with CVDs has been a well-researched area, with numerous attempts aimed at creating images that make colors more distinguishable for those with CVDs. CVDs are typically classified into three main types: protanopia (difficulty perceiving red), deuteranopia (difficulty perceiving green), and tritanopia (difficulty perceiving blue). These conditions arise due to the absence or malfunction of certain types of cone cells in the retina, which are responsible for color vision. For instance, the lack or defect of red or green cones leads to protanopia or deuteranopia, respectively. This can result in partial or complete loss of perception of specific colors. Moreover, individuals with CVD can be dichromats (completely missing one type of cone cell), or anomalous trichromats (having all three types of cones but with altered sensitivity), causing milder color perception issues. Most research has focused on protanopia and deuteranopia, as these conditions are more prevalent, and only a few consider different severity levels.

In this work, we aim to consider all three types of CVDs, taking into account varying levels of severity for personalization. We explore existing methods in the field and experiment through two main approaches: mathematical transformations and deep learning techniques. We will start by reviewing current advancements in these two domains, followed by presenting our experiments and results. Evaluations of each method will be provided, leading to a discussion of our findings and outlining potential directions for future work.

The motivation for this work arose from a personal experience. While admiring a rainbow, a phenomenon I find among the most beautiful in the world, with a friend who has deuteranopia, I realized that they were unable to distinguish the vibrant array of colors. This experience highlighted the emotional and perceptual gap caused by CVD, inspiring the goal of this project: to develop personalized and efficient tools that enhance color perception for individuals with CVDs. Ultimately, we aim to enable those with CVDs to enjoy the same vivid experiences, such as the beauty of a rainbow, experienced by those with normal color vision.

== Background ==
In recent years, numerous methods have been developed to recolor images for individuals with CVDs, ranging from traditional mathematical approaches to advanced deep learning techniques. This section focuses on the prominent recent works in these two categories.

=== Mathematical-based methods ===
Mathematical approaches to image recoloring for individuals with CVDs have been extensively developed to enhance color discrimination while trying to preserve the natural appearance of images. These methods typically involve color space transformations, optimization techniques, and perceptual modeling to achieve their objectives.

==== Daltonization ====
Daltonization enhances images for individuals with CVD by correcting colors based on the simulated deficiency. The process involves comparing the original LMS values with the simulated deficient values to compute the error:
<math display="block">
\text{Error}_{\text{LMS}} = \text{LMS}_{\text{original}} - \text{LMS}_{\text{simulated}}
</math>

The error is then mapped back to the RGB space using a correction matrix because the error contains the information that dichromats cannot see, and the correction matrix rotates it to a part of the spectrum that they can see. For example, the correction matrix, as implemented in tools like Daltonize [5] and Vischeck [6], is:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

The corrected RGB values are added back to the original LMS values to generate a daltonized image that improves contrast for CVD viewers.

==== Optimization-based Method ====
Zhu et al. [8] introduced an optimization-based recoloring framework for red-green dichromacy, aiming to balance naturalness and contrast. The framework minimizes a total loss function defined as:

<math display="block"> E = \beta E_{\text{nat}} + E_{\text{cont}} </math>

where <math>\beta</math> is a scalar weight that controls the trade-off between the two objectives: naturalness preservation (<math>E_{\text{nat}}</math>) and contrast enhancement (<math>E_{\text{cont}}</math>).

The naturalness term, <math>E_{\text{nat}}</math>, ensures that the recolored image closely resembles the original image for CVD viewers by minimizing perceptual differences:

<math display="block"> E_{\text{nat}} = \sum_{i=1}^N \| c_i^+ - c_i \|^2, </math>

where:
* <math>N</math> is the total number of pixels in the image,
* <math>c_i</math> is the original color of the <math>i</math>-th pixel,
* <math>c_i^+</math> is the recolored value of the <math>i</math>-th pixel,
* <math>\| c_i^+ - c_i \|</math> is the Euclidean distance, measuring the perceptual difference between the original and recolored colors.

The contrast term, <math>E_{\text{cont}}</math>, enhances the distinguishability of colors in the recolored image by minimizing changes in color contrast:

<math display="block"> E_{\text{cont}} = \sum_{i \neq j} \| (c_i^+ - c_j^+) - (c_i - c_j) \|^2, </math>

where:
* <math>(c_i^+ - c_j^+)</math> is the perceived color difference between pixels <math>i</math> and <math>j</math> after recoloring,
* <math>(c_i - c_j)</math> is the original color difference,
* <math>\| (c_i^+ - c_j^+) - (c_i - c_j) \|</math> represents the deviation in color contrast before and after recoloring.

To address the limitations of this approach, Zhu et al. [9] proposed a degree-adaptable framework incorporating a transformation matrix <math>T</math> that simulates CVD perception. The transformation matrix is defined as:

<math display="block"> T = \begin{bmatrix} t_{11} & t_{12} & t_{13} \\ t_{21} & t_{22} & t_{23} \\ t_{31} & t_{32} & t_{33} \end{bmatrix}, </math>

where <math>t_{ij}</math> are the elements representing the relationships between the original and perceived LMS (Long, Medium, Short wavelength) cone responses for individuals with CVD.

The degree-adaptable loss function extends the optimization by adjusting weights based on perceptual importance, defined as:

<math display="block"> E = \beta \sum_{i=1}^N \alpha_i \| T(c_i^+ - c_i) \|^2 + \sum_{i \neq j} \| T(c_i^+ - c_j^+) - T(c_i - c_j) \|^2. </math>

Here:
* <math>\alpha_i</math> assigns weights to each pixel, prioritizing the preservation of colors with smaller perception errors,
* <math>\| T(c_i^+ - c_i) \|</math> measures the perceptual difference after recoloring,
* <math>\| T(c_i^+ - c_j^+) - T(c_i - c_j) \|</math> quantifies the deviation in color contrast under CVD simulation.

This framework improves both contrast and personalization but requires further optimization for real-time performance.

==== Confusion lines based Method ====
Tsekouras et al. [10] proposed a novel image recoloring approach for individuals with protanopia and deuteranopia, focusing on improving color naturalness and enhancing contrast. Their framework consists of four modules, with a key focus on shifting confusing colors along confusion lines in the CIE 1931 chromaticity diagram.

The process begins with fuzzy clustering, which identifies representative colors (key colors) from the input image. These key colors are then analyzed on the chromaticity diagram, where confusion lines—paths representing colors indistinguishable by individuals with CVD—serve as the basis for recoloring. Confusion lines are defined using the copunctal point of the missing cone type and another reference point:

<math display="block">
d(v, L) = \frac{\left|(x_{cp} - x_0)(y_0 - y_v) - (x_0 - x_v)(y_{cp} - y_0)\right|}{\sqrt{(x_{cp} - x_0)^2 + (y_{cp} - y_0)^2}},
</math>

where:
* <math display="inline">v = (x_v, y_v)</math> is the chromaticity coordinate of the color,
* <math display="inline">L</math> is the confusion line passing through the copunctal point <math display="inline">(x_{cp}, y_{cp})</math> and another reference point <math display="inline">(x_0, y_0)</math>,
* <math display="inline">d(v, L)</math> measures the perpendicular distance from the point <math display="inline">v</math> to the confusion line <math display="inline">L</math>.

Confusing colors, identified as key colors lying on occupied confusion lines, are iteratively shifted to the nearest non-occupied confusion lines to enhance discriminability for CVD viewers. High-ranking colors, determined by their prominence in image clusters, are shifted to the nearest unoccupied confusion lines. This reallocation ensures that these colors are distinguishable to viewers with CVD while minimizing disruption to the image's overall color harmony.

After shifting, the luminance of the recolored key colors is optimized using a regularized objective function to balance naturalness and contrast:
<math display="block">
E = (E_1 + E_2) + \lambda E_3,
</math>

where:
* <math display="inline">E</math> is the total loss,
* <math display="inline">\lambda</math> is a weight parameter controlling the trade-off between contrast enhancement and naturalness preservation.

The first term, <math display="inline">E_1</math>, measures contrast enhancement for normal trichromats:

<math display="block">
E_1 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - b_j\| - \|f_D(a_{i,\text{rec}}) - f_D(b_j)\| \right|,
</math>

where:
* <math display="inline">n_A</math> and <math display="inline">n_B</math> are the number of key colors in clusters <math display="inline">A</math> and <math display="inline">B</math>, respectively,
* <math display="inline">a_i</math> is the chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">b_j</math> is the chromaticity of the <math display="inline">j</math>-th key color in cluster <math display="inline">B</math>,
* <math display="inline">f_D</math> is a function simulating the dichromatic vision of individuals with color vision deficiencies,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color.

The second term, <math display="inline">E_2</math>, measures contrast enhancement for dichromats:

<math display="block">
E_2 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - a_j\| - \|f_D(a_{i,\text{rec}}) - f_D(a_{j,\text{rec}})\| \right|,
</math>

where:
* <math display="inline">a_i</math> and <math display="inline">a_j</math> are the chromaticities of the <math display="inline">i</math>-th and <math display="inline">j</math>-th key colors in cluster <math display="inline">A</math>,
* <math display="inline">f_D(a_{i,\text{rec}})</math> simulates the dichromatic perception of the recolored chromaticity <math display="inline">a_{i,\text{rec}}</math>.

The third term, <math display="inline">E_3</math>, preserves the naturalness of the recolored image:

<math display="block">
E_3 = \frac{1}{n_A} \sum_{i=1}^{n_A} \|a_i - a_{i,\text{rec}}\|,
</math>

where:
* <math display="inline">a_i</math> is the original chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">\|a_i - a_{i,\text{rec}}\|</math> is the Euclidean distance between the original and recolored chromaticities, measuring how much the naturalness is preserved.

This method significantly enhances the contrast and naturalness of recolored images by leveraging confusion line geometry and regularized optimization. However, challenges remain in achieving real-time performance and handling cases where shifting may distort the aesthetic quality of the image.

==== GMM-based Method ====
Huang et al. [11] proposed an efficient and effective re-coloring algorithm for individuals with CVD using a Gaussian Mixture Model (GMM) to represent color distributions. The algorithm comprises four main steps: feature extraction, clustering using GMM, optimization of Gaussian components, and interpolation for recoloring.

Step 1 - Feature Extraction:
Each pixel in the input image is represented in the CIEL*a*b* color space, which approximates perceptual differences using the Euclidean distance between colors. The color feature vector <math display="inline">x</math> is used as input for clustering.

Step 2 - Clustering via GMM:
The color distribution of the image is modeled using a GMM with <math display="inline">K</math> Gaussian components:
<math display="block">
p(x|\Theta) = \sum_{i=1}^K \omega_i G_i(x|\theta_i),
</math>
where:
* <math display="inline">\Theta</math> is the parameter set containing all weights, means, and covariance matrices,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian,
* <math display="inline">G_i(x|\theta_i)</math> is the 3D normal distribution with parameters <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix).

Step 3 - Optimization:
To ensure color distinguishability for CVD viewers, the algorithm adjusts the mean vector of each Gaussian component using an optimization function that preserves the symmetric Kullback-Leibler (KL) divergence:
<math display="block">
D_{sKL}(G_i, G_j) = D_{KL}(G_i \| G_j) + D_{KL}(G_j \| G_i),
</math>
where:
* <math display="inline">D_{KL}(G_i \| G_j)</math> measures the dissimilarity between two Gaussian distributions <math display="inline">G_i</math> and <math display="inline">G_j</math>.

The optimization aims to preserve the contrast perceived by CVD viewers while maintaining naturalness. Weights are assigned to Gaussian components based on the perceptual importance of colors:
<math display="block">
\lambda_i = \frac{\sum_{j=1}^N \alpha_j p(i|x_j, \Theta)}{\sum_{k=1}^K \sum_{j=1}^N \alpha_j p(k|x_j, \Theta)},
</math>
where:
* <math display="inline">\alpha_j = \|x_j - \text{Sim}(x_j)\|</math> is the perceptual error of the <math display="inline">j</math>-th color feature when simulated for CVD,
* <math display="inline">\text{Sim}(\cdot)</math> is the simulation function for CVD perception.

Step 4 - Interpolation for Recoloring:
After optimizing the Gaussians, the mapping function <math display="inline">M_i(\cdot)</math> relocates the mean vectors while maintaining covariance matrices. Interpolation ensures smooth transitions between recolored regions:
<math display="block">
T(x_j)_H = x_j^H + \sum_{i=1}^K p(i|x_j, \Theta) (M_i(\mu_i)_H - \mu_i^H),
</math>
where:
* <math display="inline">T(x_j)_H</math> is the hue adjustment for the <math display="inline">j</math>-th color,
* <math display="inline">M_i(\mu_i)_H</math> is the mapped hue of the <math display="inline">i</math>-th Gaussian's mean.

While the GMM-based approach effectively models color distributions and enhances the contrast of recolored images significantly, it has limitations:
* The accuracy of recoloring depends on the choice of <math display="inline">K</math>, which may vary for different images.
* The method assumes diagonal covariance matrices for computational efficiency, which may oversimplify real-world color distributions. Sometimes the colors in the recolored images are not very natural.
* The high computational complexity in the optimization step of this algorithm may be difficult for real-time applications.

=== Deep Learning based methods ===
Conventional methods for recoloring, including optimization-based approaches (as discussed above), fail to generalize well across varying severity levels and CVD types. While these methods improve color differentiation, they frequently compromise naturalness or require extensive computational resources, making them less suitable for real-time, efficient, personalized applications.

==== GAN-Based Recoloring for CVD ====

In [1] GANs (Generative Adversarial Networks) was explored for recoloring, with a backbone Pix2Pix-GAN, Cycle-GAN, and Bicycle-GAN structure showing promising results. These models are generate creative recolored images by learning mappings between normal and CVD-affected color spaces. However, this and existing GAN approaches struggle with balancing naturalness and contrast. This specific reference also requires paired datasets (since it is adapted from style transfer), making it computationally intensive and less suitable for personalization.

==== Swin Transformer Recoloring ====

The authors in [2] introduced a hierarchical vision transformer (SWIN) architecture that processes images through shifted windows, effectively capturing both local and global contextual information. In computer vision, this design generally allows efficient handling of high-resolution images and has been applied to various tasks, including image classification and object detection. Despite its robust performance, this architecture is still computationally intensive and does not inherently account for the specific needs of CVD individuals, as it lacks mechanisms for personalized color adjustments.

==== Personalized CVD-GAN ====

To cater to the diverse needs of the CVD population, the Personalized CVD-GAN [3] was developed. This model generates images that are not only CVD-friendly but also tailored to individual degrees of color vision deficiency. By disentangling color representations using a unique triple-latent structure in their method, continuous personalization was possible to adjust images according to specific CVD severities. While effective, this approach is computationally demanding, making it less practical for real-time applications. In our experiment, it took around 18 days for one epoch (or one iteration over the entire dataset).

Thus, existing methods either lack personalization or are too resource-intensive for widespread use.

== Methods ==
We aim to find effective and efficient ways to recolor images for people with CVD with the personalization of different severity levels. We start by exploring existing methods and identifying opportunities for improvement. Since mathematical-based approaches provide a solid foundation and are well-documented, we began our experiments by testing these methods, as described in the background. We later extended our exploration to deep learning based methods.

=== Mathematical based ===
We explored four main methods, building on the foundational work discussed in the background section.

==== Method 1: Daltonization as a Baseline ====
We started with the relatively intuitive Daltonization method, where we adjusted the colors in an image to compensate for color vision deficiencies by simulating how the colors appear to individuals with CVD. This involves computing the difference between the original and simulated color perception in the LMS (Long, Medium, Short wavelength) color space. The calculated error is then corrected and mapped back to the RGB space using a transformation matrix, resulting in a recolored image that enhances color differentiation for viewers with CVD.

The simulation of CVDs relies on the physiology of human vision, particularly the responses of the Long (L), Medium (M), and Short (S) wavelength-sensitive cones in the retina. The LMS color space is derived from the spectral sensitivities of these cones, making it an ideal framework for modeling human color perception.

To simulate CVD, we first transformed colors in RGB color space into the LMS color space using the following linear transformation matrix based on Stockman and Sharpe’s cone fundamentals:
<math display="block">
T_{\text{RGB-to-LMS}} = \begin{bmatrix}
0.3904725 & 0.54990437 & 0.00890159 \\
0.07092586 & 0.96310739 & 0.00135809 \\
0.02314268 & 0.12801221 & 0.93605194
\end{bmatrix}
</math>

For individuals with CVD, the missing cone’s response is replaced by a weighted combination of the remaining two cones. This approach, introduced by Brettel, Viénot, and Mollon (1997) [7], uses specific coefficients derived from cone sensitivities. For example, in protanopia (L-cone deficiency), the L-cone response is approximated using the M- and S-cone responses as:
<math display="block">
L_{\text{simulated}} = 0 \cdot L + 0.90822864 \cdot M + 0.008192 \cdot S
</math>

For deuteranopia (M-cone deficiency), the M-cone is replaced as:
<math display="block">
M_{\text{simulated}} = 1.10104433 \cdot L + 0 \cdot M - 0.00901975 \cdot S
</math>

For tritanopia (S-cone deficiency), the S-cone is replaced as:
<math display="block">
S_{\text{simulated}} = -0.15773032 \cdot L + 1.19465634 \cdot M + 0 \cdot S
</math>

These transformations allow accurate simulation of the perceptual experience of individuals with CVD. (The numbers are derived from [5]).

The error between the original and simulated is then mapped into the RGB color space using a deficiency-specific correction matrix, which adjusts the image to enhance contrast and recover lost color differences. The predefined correction matrix is applied to the error in RGB space, transforming it back into LMS space for final adjustments. The corrected LMS values are added back to the original values, producing a recolored image that improves visual accessibility for viewers with CVD. This approach uses the Daltonize-inspired correction matrix:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

==== Method 2: Optimizing Objective Function ====
To improve the results from the Daltonization method, we designed a framework inspired by methods discussed in the background, incorporating dominant color extraction, optimization-based recoloring, and edit propagation. This approach aims to find a balance between the naturalness and contrast while compensating colors that are not visible for corresponding CVD types.

===== 1. Extraction of Dominant Colors =====
We begin by extracting the dominant colors from the input image using fuzzy clustering via a K-means algorithm. This step identifies a reduced set of representative colors that capture the primary color information in the image:
<math display="block">
\mathbf{C} = \{\mathbf{c}_1, \mathbf{c}_2, \ldots, \mathbf{c}_N\},
</math>
where <math display="inline">N</math> represents the number of clusters, and <math display="inline">\mathbf{c}_i</math> represents the centroid of the <math display="inline">i</math>-th cluster.

===== 2. Optimization-Based Recoloring =====
Once the dominant colors are extracted, we apply an optimization process to adjust these colors. The optimization uses the formulas mentioned in [9], and aims to balance two key objectives:

1. Naturalness Preservation: Ensures the recolored image minimally deviates from the original.
<math display="block">
E_{\text{nat}} = \sum_{i=1}^N \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_i^{\text{original}}) \|^2,
</math>
where <math display="inline">\mathbf{T}</math> is the transformation matrix based on the severity and type of CVD, and <math display="inline">\mathbf{c}_i^{\text{original}}</math> is the original color.

2. Contrast Enhancement: Improves the differentiation of colors for individuals with CVD:
<math display="block">
E_{\text{cont}} = \sum_{i=1}^N \sum_{j>i} \left( \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_j) \|^2 - \| \mathbf{c}_i^{\text{original}} - \mathbf{c}_j^{\text{original}} \|^2 \right)^2.
</math>

The total objective function combines these two terms:
<math display="block">
E = \beta E_{\text{nat}} + E_{\text{cont}},
</math>
where <math display="inline">\beta</math> controls the trade-off between naturalness and contrast.

Optimization is performed using the L-BFGS-B algorithm to ensure efficient convergence under bounded constraints.

The transformation matrices for each type of CVD are the following, which are based on [12]:

<div style="text-align:center;">
<math display="inline">
T_{\text{Protanopia}} = \begin{bmatrix} 0.566 & 0.558 & 0 \\ 0.433 & 0.442 & 0.242 \\ 0 & 0 & 0.758 \end{bmatrix},
</math>
<math display="inline">
T_{\text{Deuteranopia}} = \begin{bmatrix} 0.625 & 0.7 & 0 \\ 0.375 & 0.3 & 0.3 \\ 0 & 0 & 0.7 \end{bmatrix},
</math>
<math display="inline">
T_{\text{Tritanopia}} = \begin{bmatrix} 0.95 & 0 & 0 \\ 0.05 & 0.433 & 0 \\ 0 & 0.567 & 1 \end{bmatrix}.
</math>
</div>

===== 3. Edit Propagation =====
After optimizing the dominant colors, we propagate these edits across the entire image to ensure smooth transitions. This propagation step leverages the CIE-Lab color space, which is perceptually uniform, meaning that the Euclidean distance in this space correlates well with human color perception. The process begins by mapping the original image and the optimized dominant colors into the Lab color space. In this space, the differences between the original and recolored dominant colors are computed to capture the adjustments made during the optimization step:
<math display="block">
\Delta L^* = \text{griddata}(\mathbf{c}^{\text{original}}, \mathbf{c}^{\text{recolored}} - \mathbf{c}^{\text{original}}, \mathbf{I}),
</math>
where <math display="inline">\mathbf{I}</math> represents the pixel values in the Lab color space. Once the interpolated changes are computed, they are applied to the Lab representation of the original image. Finally, the adjusted Lab values are converted back to the RGB color space to reconstruct the recolored image.

==== Method 3: Improved with Confusion Line Adjustments ====
This method builds upon the previous method by introducing enhancements in dominant color extraction, optimization, and edit propagation, while incorporating an additional step to adjust colors near confusion lines in the CIE 1931 xyY color space inspired by []. These improvements aim to further enhance contrast and naturalness of the recolored images. Moreover, this method adds flexibility in adjusting for different severity levels for each CVD type.

===== 1. Improvements on Method 2 =====
To improve the performance of dominant color extraction, we transitioned from traditional K-means to MiniBatch K-means. This algorithm processes data in small batches, significantly reducing computational time while maintaining accuracy in clustering. The number of dominant colors was also reduced from 50 to 30 to focus on key representative colors and further enhance efficiency. The optimization objective is refined to leverage vectorization, improving computational efficiency. The two key terms remain:
<math display="block">
E = \beta E_{\text{nat}} + (1 - \beta) E_{\text{cont}}.
</math>
The optimization objective was refined to significantly improve computational efficiency by replacing the nested loops in the contrast enhancement term with vectorized operations. In the original implementation, the pairwise differences between colors were calculated iteratively using <math display="inline">O(N^2)</math> nested loops. The improved version eliminates this overhead by leveraging array broadcasting to compute all pairwise differences simultaneously, and the transformation matrix <math display="inline">\mathbf{T}</math> is then applied to all pairwise differences in a single tensor operation:
<math display="block">
\mathbf{T}_{\Delta} = \text{tensordot}(\Delta_{ij}, \mathbf{T}),
</math>
and the norms are computed in parallel across the entire array. Additionally, the weighting parameter <math display="inline">\beta</math> was adjusted to favor naturalness preservation, ensuring better visual integrity in the recolored image.
The propagation step changed to use a k-d tree for fast nearest neighbor searches, replacing grid-based interpolation. This approach more efficiently matches each pixel in the Lab color space to the closest dominant color:
<math display="block">
\mathbf{I}_{\text{adjusted}} = \mathbf{C}_{\text{recolored}}[\text{k-d tree query}(\mathbf{I})],
</math>
where <math display="inline">\mathbf{I}</math> represents the pixel values in Lab space.
These refinements enable faster optimization while improving the balance between naturalness and contrast enhancement.

===== 2. Confusion Line Adjustments =====
An additional step adjusts colors near confusion lines in the CIE 1931 xyY color space to enhance distinguishability:

1. Confusion lines are defined for protanopia, deuteranopia, and tritanopia, based on []. For example, for protanopia:
<math display="block">
\text{Confusion Line: Start} = (0.735, 0.265), \quad \text{End} = (0.115, 0.885).
</math>

2. Colors near the confusion line are identified using orthogonal distance:
<math display="block">
d(\mathbf{xy}, L) = \frac{\| (\mathbf{xy} - \mathbf{p}_1) \times (\mathbf{p}_2 - \mathbf{p}_1) \|}{\|\mathbf{p}_2 - \mathbf{p}_1\|},
</math>
where <math display="inline">\mathbf{p}_1</math> and <math display="inline">\mathbf{p}_2</math> are the start and end points of the confusion line.

3. Identified colors are shifted orthogonally away from the line:
<math display="block">
\mathbf{xy}_{\text{adjusted}} = \mathbf{xy} + \lambda \mathbf{v}_{\perp},
</math>
where <math display="inline">\mathbf{v}_{\perp}</math> is a perpendicular vector, and <math display="inline">\lambda</math> is a scaling factor.

These improvements significantly enhanced both the effectiveness and efficiency of the recoloring process for individuals with CVD on top of Method 2.

=== Deep Learning based ===

==== Task Overview ====
Given an input RGB image and a label for the user (as shown in the figure), we want a deep learning model to output a recolored RGB image that is specific to that user. More details on inputs and outputs are discussed in further sections but an overview is shown in Figure 1. All of the code was done in Python using a deep learning framework called [https://pytorch.org PyTorch]
[[File:Io.png|right|thumb|200px|Figure 1: Dataset]]

==== Types ====
1. ''' Supervised methods ''':
These are deep learning models that require a 'ground truth' recolored image for the neural network to learn recolorization. While these methods are simple, easy to train and integrate the user label, they require an already present ground truth comparison of expected output.

2. ''' Unsupervised methods ''':
These models are trained without a ground truth and can also encode user label information while training. They are generally better at generating more natural images, but they require more compute and sophisticated model architectures or loss functions for the recoloring task

==== Dataset ====
The dataset used for this project was constructed specifically to address the challenges of recoloring images for individuals with color vision deficiency (CVD). We first gathered an open-source RGB image dataset from [2]. To improve the capability of the proposed model to enhance the contrast between CVD-indistinguishable color
pairs, in their study, they created a new dataset consisting of 141,000 pictures of both natural scenes and artificial images containing
CVD-confusing colors without labels. To generate labels (and ground truth recolored images for supervised methods), we randomly sampled 15,000 images and recolored by simulating random labels for severity and type of CVD. The recoloring for ground truth images was done using a [https://github.com/jbhuang0604/RecolorForColorblind/tree/master MATLAB script] (adapted to Python) from [4]. Note: The open-source tools used in the Python version for the recoloring script were [https://scikit-image.org Scikit-Image], [https://scipy.org Scipy] and [https://python-colormath.readthedocs.io/en/latest/ Colormath].

As shown in Figure 1, each sample in the dataset consists of:

1. ''' Original RGB Image''' : High-resolution images, resized to <code> 256x256</code> pixels and normalized to <code>[0,1]</code> range, representing the standard color space.

2. ''' CVD Labels ''' : Condition labels encoded as <code>severity * [protan, deutan]</code>, where severity ranges from 0.1 to 1.0. For example, a label <code>[0.6, 0]</code> corresponds to protanopia at 60% severity.

Data augmentation techniques such as random rotations, crops, and brightness adjustments were applied to expand the dataset, ensuring robust model generalization across diverse scenarios.

==== Supervised Methods ====
===== Conditional Parallel RGB MLP =====
[[File:mlp.png|right|thumb|Figure 2: Conditional MLP architecture]]
As shown in Figure 2, the model predicts the R, G, and B channels separately using an independent multi-layer perceptron (MLP) for each channel. The input image is concatenated with the label encoding along the channel dimension and is passed to 3 parallel MLPs simultaneously. These parallel networks are learned to predicted R, G, B channels of a recolored image based on given ground truth. The outputs from each of these networks are concatenated to produce the recolored RGB image of same spatial dimensions as input. Essentially, each channel is disentangled, enabling targeted adjustments.

The loss function used to train was pixel wise, mean-squared error loss:
<math>
\mathcal{L}_{\text{MSE}} = \frac{1}{N} \sum_{p=1}^{N} \left( I(p) - I'(p) \right)^2
</math>

Where:
* I, I': Recolored (model output) image and ground truth recolored image respectively
* p: Image index
* N: Total number of images

===== Conditional U-Net =====
In a similar fashion of inputs, a convolutional neural network (CNN)-based U-Net architecture was tested to generate a full recolored image as output. The conditional inputs here affect both the encoder and decoder. [[File:Unet condtional.png|right|thumb|Figure 3: Conditional U-Net architecture]]
U-Nets are widely used in computer vision tasks and are very robust to new tasks as well. The architecture we adopted is shown in Figure 3.
The loss function used to train the U-Net was a commonly used VGG Perceptual Loss:
<math>
\mathcal{L}_{\text{VGG}} = \sum_{l} \frac{1}{N_l} \| \phi_l(I) - \phi_l(I') \|_2^2
</math>

Where:
* I and I': are recolored (model output) and ground recolored images respectively
* <math>\phi_l</math> is the l-th of the pre-trained VGG network

==== Unsupervised Methods ====
===== Conditional Autoencoder =====
As shown in Figure4, an unsupervised CNN-based encoder-decoder network was trained to reconstruct full recolored images with a CVD-aware color palette. The key to making this network align with the recoloring task was the loss functions. The loss functions we used to train this network were inspired from [2]. [[File:Ae.png|right|350px|thumb|Figure 4: Conditional Autoencoder architecture]]

The total loss function is given by:
<math>
\mathcal{L}_{\text{total}} = \alpha \cdot \mathcal{L}_{\text{naturalness}} + 2 \cdot (1 - \alpha) \cdot \mathcal{L}_{\text{contrast}}
</math>

Where:
<math>
\mathcal{L}_{\text{contrast}} = \beta \cdot \mathcal{L}_{\text{global}} + (2 - \beta) \cdot \mathcal{L}_{\text{local}}
</math>

The components of the loss functions are described below:

1. '''Global Contrast Loss''':
The global contrast loss ensures that the overall contrast of the recolored image is preserved. It is defined as
<math>
\mathcal{L}_{global} = \frac{1}{\|\omega\|} \sum_{<x, y> \in \epsilon \omega} \text{CL}(x, y)
</math>

2. '''Local Contrast Loss''':
The local contrast loss focuses on preserving the contrast within a small neighborhood around each pixel. <math>
\mathcal{L}_{l} = \frac{1}{N} \sum_{x=1}^{N} \sum_{y \in \omega_x} \frac{\text{CL}(x, y)}{\|\omega_x\|}
</math>

Note:

<math>
\text{CL}(x, y) = \|\hat{c}_x' - \hat{c}_y'\| - \|c_x - c_y\|
</math>

* x,y: Two distinct pixels in the image.
* cx and cy: CVD simulated colors of original image
* c^x′and c^y: CVD simulated colors of recolored image (model output)
* ||w||: Global (or large) window of image
* ||wx||: Local window or neighborhood around a pixel x

3. '''Naturalness Loss''':
The naturalness loss drives output image to have colors that are visually similar and close to natural distributions. <math>
\mathcal{L}_{\text{natural}} = 1 - \text{SSIM}(I', I)
</math>

Where:
* I(i), I'(i): Original and recolored images respectively

== Results ==
=== Mathematical based methods ===
{| class="wikitable"
|+ Table 1: Quantitative Evaluation Results for Mathematical Methods
! !! Method 1 !! Method 2 !! Method 3 !! Method 4
|-
! colspan="5" | Performance
|-
| Time/image || 0.2s || 1m13s || 4.4s || 1.6s
|-
! colspan="5" | SSIM Metrics
|-
| Original vs Recolored || 0.0066 || 0.9998 || 0.9988 || 0.9902
|-
| Original vs Original Simulated || 0.9985 || 0.9985 || 0.9985 || 0.9985
|-
| Recolored vs Recolored Simulated || 0.9565 || 0.9986 || 0.9986 || 0.9968
|-
! colspan="5" | TCC Metrics
|-
| Original vs Recolored || 0.4211 || 0.0001 || 0.0003 || 0.0005
|-
| Original vs Original Simulated || 0.0004 || 0.0003 || 0.0003 || 0.0003
|-
| Recolored vs Recolored Simulated || 0.0380 || 0.0003 || 0.0002 || 0.0005
|-
! colspan="5" | CD ΔE76 Metrics
|-
| Original vs Recolored || 57.4513 || 0.0217 || 0.0632 || 0.1057
|-
| Original vs Original Simulated || 0.0462 || 0.0462 || 0.0462 || 0.0462
|-
| Recolored vs Recolored Simulated || 8.4251 || 0.0458 || 0.0435 || 0.0578
|-
! colspan="5" | CIEDE2000 Metrics
|-
| Original vs Recolored || 41.2667 || 0.0229 || 0.0675 || 0.1312
|-
| Original vs Original Simulated || 0.0681 || 0.0681 || 0.0681 || 0.0681
|-
| Recolored vs Recolored Simulated || 6.9145 || 0.0671 || 0.0630 || 0.0838
|-
! colspan="5" | CIEDE94 Metrics
|-
| Original vs Recolored || 57.3637 || 0.0217 || 0.0630 || 0.1056
|-
| Original vs Original Simulated || 0.0461 || 0.0461 || 0.0461 || 0.0461
|-
| Recolored vs Recolored Simulated || 5.3878 || 0.0457 || 0.0434 || 0.0576
|-
! colspan="5" | D-CIELAB ΔEab Metrics
|-
| Original vs Recolored || 2.1314 || 3.8863 || 7.6867 || 8.0045
|-
| Original vs Original Simulated || 1.7209 || 1.7209 || 1.7209 || 1.7209
|-
| Recolored vs Recolored Simulated || 1.5926 || 1.9673 || 1.4363 || 2.4009
|}

=== Deep Learning based methods ===
The results focus on evaluating the performance of the above neural network architectures—Conditional Parallel RGB MLP, Deep U-Net, and Conditional Autoencoder. Quantitive metrics such as Structural Similarity Index (SSIM), total color contrast (TCC), Chromatic Difference (CD), and inference time were used to assess the effectiveness of the models provided in [1] and [2].

==== Qualitative Results ====
The recolored outputs were visually evaluated to determine their alignment with expected results. The 'expected' results for supervised mean how closely they resemble ground truth recolored image and for unsupervised method mean how much contrast and naturalness is observed in the CVD simulated recolored images compared to original.
The results and takeaways can be summarized as follows:

1. '''Conditional Parallel RGB MLP''': (Figure 5)
[[File:Mlp_res.png|right|400px|thumb|Figure 5 Conditional MLP: Model failure]]
* Recoloring was inconsistent, with visible artifacts in regions where spatial correlations were essential.
* The pixels seemed more discretized, suggesting that disentanglement was not very useful for this case (especially naturalness).
* Failed to preserve natural color transitions, particularly in complex images.
2. '''Conditional U-Net''': (Figure 6, 7)
[[File:Unet_res1.png|right|400px|thumb|Figure 6 Conditional U-Net: Model failure]]
[[File:Unet_res2.png|right|400px|thumb|Figure 7 Conditional U-Net: CVD Simulated examples]]
* Produced stable recoloring, preserving structural details.
* Initially showed improvement towards resembling ground truth, but over time started 'reconstructing' the colors of the original image.
* The CVD simulations of recolored versus original were similar or worse meaning that the model was not doing well for this task
* Sometimes it over-saturated some colors, affecting the visual appeal.
3. '''Conditional Autoencoder''': (Figure 8, 9)
[[File:ae_res1.png|right|400px|thumb|Figure 8 Conditional Autoencoder: Majority good results]]
[[File:ae_res1.png|right|400px|thumb|Figure 9 Conditional Autoencoder: Marginal or negative improvement + Blurriness]]
* Achieved smooth and natural recoloring, with fewer artifacts.
* Showed the highest contrast improvement among the three models.
* In some cases, hurt the contrast in the CVD simulated colors and in some there was marginal improvement in contrast.
* Blurriness in the recolored images was seen (possibly due to naturalness factor being more prioritized even though weight coefficients in the loss term favored contrast (alpha = 0.25, beta = 1.0)).

==== Quantitative Results ====
Based on the above qualitative results, we decided to score and evaluate metrics for comparison with related work only using the Conditional Autoencoder.
As mentioned above, the evaluation metrics are adapted from [1] and [2]. Please refer to the definitions in the paper, as we have used the same. On a high level, the three components are:
* SSIM: Measures the structural similarity between the original and recolored images, ensuring the structural integrity of the recolored image is maintained.
<math>
SSIM(X, Y) = \frac{(2\mu_X\mu_Y + c_1)(2\sigma_{XY} + c_2)}{(\mu_X^2 + \mu_Y^2 + c_1)(\sigma_X^2 + \sigma_Y^2 + c_2)}
</math>

* Total Color Contrast: Quantifies the visibility improvement between indistinguishable colors for CVD individuals.
<math>
TCC = \frac{1}{n_1} \sum_{(i,j) \in \Omega_1} |x_i - x_j|
+ \frac{1}{N \cdot n_2} \sum_{i=1}^{N} \sum_{j \in \Omega_2} |x_i - x_j|
</math>
* Chromatic Difference: Quantifies the perceptual differences in color before and after recoloring, ensuring enhanced distinguishability
<math>
CD(i) = \sqrt{\lambda (l_i' - l_i)^2 + (a_i' - a_i)^2 + (b_i' - b_i)^2}
</math>
(lamda is a constant, not wavelength and l,a,b represent LAB space coordinates of recolored (') and original respectively.)
* Inference Time: Determines the computational efficiency of the models.

The key results are in Table 2 and takeaways for the Conditional Autoencoder can be summarized as follows:

{| class="wikitable" style="text-align:center; width:30%; margin:auto;"
|-
! Metric
! Value
|-
| Inference Time
| 2.6 seconds/image
|-
| SSIM ("Structure")
| 0.8707
|-
| Total Color Contrast ("Distinguishability")
| 0.5771 / (~0.851)*
|-
| Chromatic Difference ("Color")
| 0.3521 / (~0.963)*
|+ '''Table 2: Quantitative Evaluation Results'''
|}

Note: * indicates results from paper [2] for protan/deutan whichever is larger.

* TCC and CD are good but not as good as paper [2] because they use optimize networks for each CVD type separately.
* Blurry (SSIM is not optimized for enough)
* Mixing CVD types in the same network needs to be more sophisticated

== Conclusions ==
Through our (many) experiments, we learned a couple of things:

1. '''Model Effectiveness''':
Among the models, the Conditional Autoencoder showed the best balance between enhancing color contrast and preserving naturalness. It improved the distinguishability of colors for CVD individuals while maintaining a smooth, visually appealing output. However, it produced slightly blurry images, which could be improved with better loss functions or refinement techniques. The Conditional U-Net was also effective in preserving structure and providing stable recoloring, but it required careful training to avoid overfitting. The Conditional Parallel RGB MLP, while computationally fast, lacked the ability to capture spatial relationships between pixels, making it unsuitable for this task.

2. '''Importance of Loss Functions''':
Designing appropriate loss functions was crucial for achieving the right balance between naturalness, contrast enhancement, and structural preservation. The global and local contrast losses significantly improved the visibility of recolored images, while the naturalness loss ensured that the outputs did not look artificial. Incorporating metrics like SSIM and Chromatic Difference into the evaluation also helped us better understand how well the models performed.

3. '''Challenges with Data''':
One of the biggest challenges was ensuring that the dataset effectively represented real-world scenarios for CVD individuals. Simulating CVD perceptions and generating recolored images that matched those perceptions required a well-defined pipeline. A more diverse dataset or additional user studies with CVD participants could help fine-tune the models further.

4. '''Computational Efficiency''':
While models like the Conditional Autoencoder and Conditional U-Net provided high-quality recoloring, their inference times were moderate, making them feasible for real-time applications. Optimizing these models further could make them more scalable for real-world use cases, such as accessibility tools in apps or websites.

5. '''What Worked and What Didn’t''':
* Worked: Contrast enhancement methods using local and global losses were effective in improving visibility for CVD individuals. Transformer-inspired loss functions borrowed from Swin architecture added robustness.
* Didn’t Work: Pixel-wise methods like the Conditional RGB MLP struggled due to their inability to handle spatial dependencies. Additionally, overfitting was a recurring issue in larger architectures without careful training.

6. '''Future Directions''':
* Better Loss Functions: Refining the loss functions to address issues like blurriness in outputs could further improve results.
* User Studies: Testing the models with real CVD participants would provide valuable insights and help validate the results.
* Model Optimization: Reducing the computational cost of high-performing models like the Conditional Autoencoder could make them more practical for deployment.
* Exploration of New Architectures: Trying newer methods, such as lightweight transformers or diffusion-based models, might enhance recoloring performance while maintaining efficiency.

While there’s still room for improvement, our models demonstrated the potential of deep learning in addressing the challenges faced by individuals with CVD. Our future work would focus on refining these methods and bringing them closer to practical, everyday applications.

== References ==
[1] Li, H., Zhang, L., Zhang, X., Zhang, M., Zhu, G., Shen, P., ... & Shah, S. A. A. (2020). Color vision deficiency datasets & recoloring evaluation using GANs. Multimedia Tools and Applications, 79, 27583-27614.

[2] Chen, L., Zhu, Z., Huang, W., Go, K., Chen, X., & Mao, X. (2024). Image recoloring for color vision deficiency compensation using Swin transformer. Neural Computing and Applications, 36(11), 6051-6066.

[3] Jiang, S., Liu, D., Li, D., & Xu, C. (2023). Personalized image generation for color vision deficiency population. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22571-22580).

[4] Huang, J.-B., Chen, C.-S., Jen, T.-C., & Wang, S.-J. (n.d.). Image recolorization for the colorblind [GitHub repository]. Retrieved December 12, 2024, from https://github.com/jbhuang0604/RecolorForColorblind

[5] Dietrich, J. (n.d.). Daltonize Python Package [GitHub repository]. Retrieved December 12, 2024, from https://github.com/joergdietrich/daltonize/blob/main/daltonize/daltonize.py

[6] Dougherty, B., & Wade, A. (2000). Vischeck. Retrieved December 12, 2024, from https://www.vischeck.com/

[7] Brettel, H., Viénot, F., & Mollon, J. D. (1997). Computerized simulation of color appearance for dichromats. Josa a, 14(10), 2647-2655.

[8] Zhu, Z., Toyoura, M., Go, K., Fujishiro, I., Kashiwagi, K., & Mao, X. (2019). Processing images for red–green dichromats compensation via naturalness and information-preservation considered recoloring. The Visual Computer, 35, 1053-1066.

[9] Zhu, Z., Toyoura, M., Go, K., Kashiwagi, K., Fujishiro, I., Wong, T. T., & Mao, X. (2021). Personalized image recoloring for color vision deficiency compensation. IEEE Transactions on Multimedia, 24, 1721-1734.

[10] Tsekouras, G. E., Rigos, A., Chatzistamatis, S., Tsimikas, J., Kotis, K., Caridakis, G., & Anagnostopoulos, C. N. (2021). A novel approach to image recoloring for color vision deficiency. Sensors, 21(8), 2740.

[11] Huang, J. B., Chen, C. S., Jen, T. C., & Wang, S. J. (2009, April). Image recolorization for the colorblind. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 1161-1164). IEEE.

[12] Color-Blindness.com. (n.d.). COBLIS - Color Blindness Simulator. Retrieved December 13, 2024, from https://www.color-blindness.com/coblis-color-blindness-simulator/

== Appendix I ==
* [https://github.com/rainasong/psych221-aut24-final-project.git Code]
* [https://drive.google.com/drive/folders/10WMXPbtpV7Hy5_qBA_TCEbW-kCpj1D7v Dataset]

=== Additional results ===
1. '''Recolored Images - Conditional Autoencoder'''
<div style="display: inline; width: 220px; float: center;">
[[File:eb_1.png|400 px|Wikipedia encyclopedia]][[File:eb_2.png|400 px]] </div>

2. '''Loss curves'''
<div style="display: inline; width: 800px; float: center;">
[[File:loss_ae.png|300 px|center|thumb|Losses - Conditional Autoencoder]][[File:loss_unet.png|300 px|thumb|center|Losses - Conditional U-Net]][[File:loss_mlp.png|300 px|center|thumb|Losses - Conditional MLP]]</div>

== Appendix II ==
'''Ishikaa''':
* Training, evaluation and visualization for all deep learning methods (MLP, U-Net and Autoencoder)
* GMM recoloring method in Python & adding severity index
* 'Ground Truth' dataset creation and logging
* AWS Compute setup & configuration
* Written Report & Presentation

'''Raina''':

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-13T10:03:06Z

Rainas: /* Method 2: Optimizing Objective Function */

== Introduction ==
Color Vision Deficiency (CVD) affects approximately 350 million individuals worldwide, impairing their ability to distinguish certain colors. Image recoloring for individuals with CVDs has been a well-researched area, with numerous attempts aimed at creating images that make colors more distinguishable for those with CVDs. CVDs are typically classified into three main types: protanopia (difficulty perceiving red), deuteranopia (difficulty perceiving green), and tritanopia (difficulty perceiving blue). These conditions arise due to the absence or malfunction of certain types of cone cells in the retina, which are responsible for color vision. For instance, the lack or defect of red or green cones leads to protanopia or deuteranopia, respectively. This can result in partial or complete loss of perception of specific colors. Moreover, individuals with CVD can be dichromats (completely missing one type of cone cell), or anomalous trichromats (having all three types of cones but with altered sensitivity), causing milder color perception issues. Most research has focused on protanopia and deuteranopia, as these conditions are more prevalent, and only a few consider different severity levels.

In this work, we aim to consider all three types of CVDs, taking into account varying levels of severity for personalization. We explore existing methods in the field and experiment through two main approaches: mathematical transformations and deep learning techniques. We will start by reviewing current advancements in these two domains, followed by presenting our experiments and results. Evaluations of each method will be provided, leading to a discussion of our findings and outlining potential directions for future work.

The motivation for this work arose from a personal experience. While admiring a rainbow, a phenomenon I find among the most beautiful in the world, with a friend who has deuteranopia, I realized that they were unable to distinguish the vibrant array of colors. This experience highlighted the emotional and perceptual gap caused by CVD, inspiring the goal of this project: to develop personalized and efficient tools that enhance color perception for individuals with CVDs. Ultimately, we aim to enable those with CVDs to enjoy the same vivid experiences, such as the beauty of a rainbow, experienced by those with normal color vision.

== Background ==
In recent years, numerous methods have been developed to recolor images for individuals with CVDs, ranging from traditional mathematical approaches to advanced deep learning techniques. This section focuses on the prominent recent works in these two categories.

=== Mathematical-based methods ===
Mathematical approaches to image recoloring for individuals with CVDs have been extensively developed to enhance color discrimination while trying to preserve the natural appearance of images. These methods typically involve color space transformations, optimization techniques, and perceptual modeling to achieve their objectives.

==== Daltonization ====
Daltonization enhances images for individuals with CVD by correcting colors based on the simulated deficiency. The process involves comparing the original LMS values with the simulated deficient values to compute the error:
<math display="block">
\text{Error}_{\text{LMS}} = \text{LMS}_{\text{original}} - \text{LMS}_{\text{simulated}}
</math>

The error is then mapped back to the RGB space using a correction matrix because the error contains the information that dichromats cannot see, and the correction matrix rotates it to a part of the spectrum that they can see. For example, the correction matrix, as implemented in tools like Daltonize [5] and Vischeck [6], is:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

The corrected RGB values are added back to the original LMS values to generate a daltonized image that improves contrast for CVD viewers.

==== Optimization-based Method ====
Zhu et al. [8] introduced an optimization-based recoloring framework for red-green dichromacy, aiming to balance naturalness and contrast. The framework minimizes a total loss function defined as:

<math display="block"> E = \beta E_{\text{nat}} + E_{\text{cont}} </math>

where <math>\beta</math> is a scalar weight that controls the trade-off between the two objectives: naturalness preservation (<math>E_{\text{nat}}</math>) and contrast enhancement (<math>E_{\text{cont}}</math>).

The naturalness term, <math>E_{\text{nat}}</math>, ensures that the recolored image closely resembles the original image for CVD viewers by minimizing perceptual differences:

<math display="block"> E_{\text{nat}} = \sum_{i=1}^N \| c_i^+ - c_i \|^2, </math>

where:
* <math>N</math> is the total number of pixels in the image,
* <math>c_i</math> is the original color of the <math>i</math>-th pixel,
* <math>c_i^+</math> is the recolored value of the <math>i</math>-th pixel,
* <math>\| c_i^+ - c_i \|</math> is the Euclidean distance, measuring the perceptual difference between the original and recolored colors.

The contrast term, <math>E_{\text{cont}}</math>, enhances the distinguishability of colors in the recolored image by minimizing changes in color contrast:

<math display="block"> E_{\text{cont}} = \sum_{i \neq j} \| (c_i^+ - c_j^+) - (c_i - c_j) \|^2, </math>

where:
* <math>(c_i^+ - c_j^+)</math> is the perceived color difference between pixels <math>i</math> and <math>j</math> after recoloring,
* <math>(c_i - c_j)</math> is the original color difference,
* <math>\| (c_i^+ - c_j^+) - (c_i - c_j) \|</math> represents the deviation in color contrast before and after recoloring.

To address the limitations of this approach, Zhu et al. [9] proposed a degree-adaptable framework incorporating a transformation matrix <math>T</math> that simulates CVD perception. The transformation matrix is defined as:

<math display="block"> T = \begin{bmatrix} t_{11} & t_{12} & t_{13} \\ t_{21} & t_{22} & t_{23} \\ t_{31} & t_{32} & t_{33} \end{bmatrix}, </math>

where <math>t_{ij}</math> are the elements representing the relationships between the original and perceived LMS (Long, Medium, Short wavelength) cone responses for individuals with CVD.

The degree-adaptable loss function extends the optimization by adjusting weights based on perceptual importance, defined as:

<math display="block"> E = \beta \sum_{i=1}^N \alpha_i \| T(c_i^+ - c_i) \|^2 + \sum_{i \neq j} \| T(c_i^+ - c_j^+) - T(c_i - c_j) \|^2. </math>

Here:
* <math>\alpha_i</math> assigns weights to each pixel, prioritizing the preservation of colors with smaller perception errors,
* <math>\| T(c_i^+ - c_i) \|</math> measures the perceptual difference after recoloring,
* <math>\| T(c_i^+ - c_j^+) - T(c_i - c_j) \|</math> quantifies the deviation in color contrast under CVD simulation.

This framework improves both contrast and personalization but requires further optimization for real-time performance.

==== Confusion lines based Method ====
Tsekouras et al. [10] proposed a novel image recoloring approach for individuals with protanopia and deuteranopia, focusing on improving color naturalness and enhancing contrast. Their framework consists of four modules, with a key focus on shifting confusing colors along confusion lines in the CIE 1931 chromaticity diagram.

The process begins with fuzzy clustering, which identifies representative colors (key colors) from the input image. These key colors are then analyzed on the chromaticity diagram, where confusion lines—paths representing colors indistinguishable by individuals with CVD—serve as the basis for recoloring. Confusion lines are defined using the copunctal point of the missing cone type and another reference point:

<math display="block">
d(v, L) = \frac{\left|(x_{cp} - x_0)(y_0 - y_v) - (x_0 - x_v)(y_{cp} - y_0)\right|}{\sqrt{(x_{cp} - x_0)^2 + (y_{cp} - y_0)^2}},
</math>

where:
* <math display="inline">v = (x_v, y_v)</math> is the chromaticity coordinate of the color,
* <math display="inline">L</math> is the confusion line passing through the copunctal point <math display="inline">(x_{cp}, y_{cp})</math> and another reference point <math display="inline">(x_0, y_0)</math>,
* <math display="inline">d(v, L)</math> measures the perpendicular distance from the point <math display="inline">v</math> to the confusion line <math display="inline">L</math>.

Confusing colors, identified as key colors lying on occupied confusion lines, are iteratively shifted to the nearest non-occupied confusion lines to enhance discriminability for CVD viewers. High-ranking colors, determined by their prominence in image clusters, are shifted to the nearest unoccupied confusion lines. This reallocation ensures that these colors are distinguishable to viewers with CVD while minimizing disruption to the image's overall color harmony.

After shifting, the luminance of the recolored key colors is optimized using a regularized objective function to balance naturalness and contrast:
<math display="block">
E = (E_1 + E_2) + \lambda E_3,
</math>

where:
* <math display="inline">E</math> is the total loss,
* <math display="inline">\lambda</math> is a weight parameter controlling the trade-off between contrast enhancement and naturalness preservation.

The first term, <math display="inline">E_1</math>, measures contrast enhancement for normal trichromats:

<math display="block">
E_1 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - b_j\| - \|f_D(a_{i,\text{rec}}) - f_D(b_j)\| \right|,
</math>

where:
* <math display="inline">n_A</math> and <math display="inline">n_B</math> are the number of key colors in clusters <math display="inline">A</math> and <math display="inline">B</math>, respectively,
* <math display="inline">a_i</math> is the chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">b_j</math> is the chromaticity of the <math display="inline">j</math>-th key color in cluster <math display="inline">B</math>,
* <math display="inline">f_D</math> is a function simulating the dichromatic vision of individuals with color vision deficiencies,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color.

The second term, <math display="inline">E_2</math>, measures contrast enhancement for dichromats:

<math display="block">
E_2 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - a_j\| - \|f_D(a_{i,\text{rec}}) - f_D(a_{j,\text{rec}})\| \right|,
</math>

where:
* <math display="inline">a_i</math> and <math display="inline">a_j</math> are the chromaticities of the <math display="inline">i</math>-th and <math display="inline">j</math>-th key colors in cluster <math display="inline">A</math>,
* <math display="inline">f_D(a_{i,\text{rec}})</math> simulates the dichromatic perception of the recolored chromaticity <math display="inline">a_{i,\text{rec}}</math>.

The third term, <math display="inline">E_3</math>, preserves the naturalness of the recolored image:

<math display="block">
E_3 = \frac{1}{n_A} \sum_{i=1}^{n_A} \|a_i - a_{i,\text{rec}}\|,
</math>

where:
* <math display="inline">a_i</math> is the original chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">\|a_i - a_{i,\text{rec}}\|</math> is the Euclidean distance between the original and recolored chromaticities, measuring how much the naturalness is preserved.

This method significantly enhances the contrast and naturalness of recolored images by leveraging confusion line geometry and regularized optimization. However, challenges remain in achieving real-time performance and handling cases where shifting may distort the aesthetic quality of the image.

==== GMM-based Method ====
Huang et al. [11] proposed an efficient and effective re-coloring algorithm for individuals with CVD using a Gaussian Mixture Model (GMM) to represent color distributions. The algorithm comprises four main steps: feature extraction, clustering using GMM, optimization of Gaussian components, and interpolation for recoloring.

Step 1 - Feature Extraction:
Each pixel in the input image is represented in the CIEL*a*b* color space, which approximates perceptual differences using the Euclidean distance between colors. The color feature vector <math display="inline">x</math> is used as input for clustering.

Step 2 - Clustering via GMM:
The color distribution of the image is modeled using a GMM with <math display="inline">K</math> Gaussian components:
<math display="block">
p(x|\Theta) = \sum_{i=1}^K \omega_i G_i(x|\theta_i),
</math>
where:
* <math display="inline">\Theta</math> is the parameter set containing all weights, means, and covariance matrices,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian,
* <math display="inline">G_i(x|\theta_i)</math> is the 3D normal distribution with parameters <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix).

Step 3 - Optimization:
To ensure color distinguishability for CVD viewers, the algorithm adjusts the mean vector of each Gaussian component using an optimization function that preserves the symmetric Kullback-Leibler (KL) divergence:
<math display="block">
D_{sKL}(G_i, G_j) = D_{KL}(G_i \| G_j) + D_{KL}(G_j \| G_i),
</math>
where:
* <math display="inline">D_{KL}(G_i \| G_j)</math> measures the dissimilarity between two Gaussian distributions <math display="inline">G_i</math> and <math display="inline">G_j</math>.

The optimization aims to preserve the contrast perceived by CVD viewers while maintaining naturalness. Weights are assigned to Gaussian components based on the perceptual importance of colors:
<math display="block">
\lambda_i = \frac{\sum_{j=1}^N \alpha_j p(i|x_j, \Theta)}{\sum_{k=1}^K \sum_{j=1}^N \alpha_j p(k|x_j, \Theta)},
</math>
where:
* <math display="inline">\alpha_j = \|x_j - \text{Sim}(x_j)\|</math> is the perceptual error of the <math display="inline">j</math>-th color feature when simulated for CVD,
* <math display="inline">\text{Sim}(\cdot)</math> is the simulation function for CVD perception.

Step 4 - Interpolation for Recoloring:
After optimizing the Gaussians, the mapping function <math display="inline">M_i(\cdot)</math> relocates the mean vectors while maintaining covariance matrices. Interpolation ensures smooth transitions between recolored regions:
<math display="block">
T(x_j)_H = x_j^H + \sum_{i=1}^K p(i|x_j, \Theta) (M_i(\mu_i)_H - \mu_i^H),
</math>
where:
* <math display="inline">T(x_j)_H</math> is the hue adjustment for the <math display="inline">j</math>-th color,
* <math display="inline">M_i(\mu_i)_H</math> is the mapped hue of the <math display="inline">i</math>-th Gaussian's mean.

While the GMM-based approach effectively models color distributions and enhances the contrast of recolored images significantly, it has limitations:
* The accuracy of recoloring depends on the choice of <math display="inline">K</math>, which may vary for different images.
* The method assumes diagonal covariance matrices for computational efficiency, which may oversimplify real-world color distributions. Sometimes the colors in the recolored images are not very natural.
* The high computational complexity in the optimization step of this algorithm may be difficult for real-time applications.

=== Deep Learning based methods ===
Conventional methods for recoloring, including optimization-based approaches (as discussed above), fail to generalize well across varying severity levels and CVD types. While these methods improve color differentiation, they frequently compromise naturalness or require extensive computational resources, making them less suitable for real-time, efficient, personalized applications.

==== GAN-Based Recoloring for CVD ====

In [1] GANs (Generative Adversarial Networks) was explored for recoloring, with a backbone Pix2Pix-GAN, Cycle-GAN, and Bicycle-GAN structure showing promising results. These models are generate creative recolored images by learning mappings between normal and CVD-affected color spaces. However, this and existing GAN approaches struggle with balancing naturalness and contrast. This specific reference also requires paired datasets (since it is adapted from style transfer), making it computationally intensive and less suitable for personalization.

==== Swin Transformer Recoloring ====

The authors in [2] introduced a hierarchical vision transformer (SWIN) architecture that processes images through shifted windows, effectively capturing both local and global contextual information. In computer vision, this design generally allows efficient handling of high-resolution images and has been applied to various tasks, including image classification and object detection. Despite its robust performance, this architecture is still computationally intensive and does not inherently account for the specific needs of CVD individuals, as it lacks mechanisms for personalized color adjustments.

==== Personalized CVD-GAN ====

To cater to the diverse needs of the CVD population, the Personalized CVD-GAN [3] was developed. This model generates images that are not only CVD-friendly but also tailored to individual degrees of color vision deficiency. By disentangling color representations using a unique triple-latent structure in their method, continuous personalization was possible to adjust images according to specific CVD severities. While effective, this approach is computationally demanding, making it less practical for real-time applications. In our experiment, it took around 18 days for one epoch (or one iteration over the entire dataset).

Thus, existing methods either lack personalization or are too resource-intensive for widespread use.

== Methods ==
We aim to find effective and efficient ways to recolor images for people with CVD with the personalization of different severity levels. We start by exploring existing methods and identifying opportunities for improvement. Since mathematical-based approaches provide a solid foundation and are well-documented, we began our experiments by testing these methods, as described in the background. We later extended our exploration to deep learning based methods.

=== Mathematical based ===
We explored four main methods, building on the foundational work discussed in the background section.

==== Method 1: Daltonization as a baseline ====
We started with the relatively intuitive Daltonization method, where we adjusted the colors in an image to compensate for color vision deficiencies by simulating how the colors appear to individuals with CVD. This involves computing the difference between the original and simulated color perception in the LMS (Long, Medium, Short wavelength) color space. The calculated error is then corrected and mapped back to the RGB space using a transformation matrix, resulting in a recolored image that enhances color differentiation for viewers with CVD.

The simulation of CVDs relies on the physiology of human vision, particularly the responses of the Long (L), Medium (M), and Short (S) wavelength-sensitive cones in the retina. The LMS color space is derived from the spectral sensitivities of these cones, making it an ideal framework for modeling human color perception.

To simulate CVD, we first transformed colors in RGB color space into the LMS color space using the following linear transformation matrix based on Stockman and Sharpe’s cone fundamentals:
<math display="block">
T_{\text{RGB-to-LMS}} = \begin{bmatrix}
0.3904725 & 0.54990437 & 0.00890159 \\
0.07092586 & 0.96310739 & 0.00135809 \\
0.02314268 & 0.12801221 & 0.93605194
\end{bmatrix}
</math>

For individuals with CVD, the missing cone’s response is replaced by a weighted combination of the remaining two cones. This approach, introduced by Brettel, Viénot, and Mollon (1997) [7], uses specific coefficients derived from cone sensitivities. For example, in protanopia (L-cone deficiency), the L-cone response is approximated using the M- and S-cone responses as:
<math display="block">
L_{\text{simulated}} = 0 \cdot L + 0.90822864 \cdot M + 0.008192 \cdot S
</math>

For deuteranopia (M-cone deficiency), the M-cone is replaced as:
<math display="block">
M_{\text{simulated}} = 1.10104433 \cdot L + 0 \cdot M - 0.00901975 \cdot S
</math>

For tritanopia (S-cone deficiency), the S-cone is replaced as:
<math display="block">
S_{\text{simulated}} = -0.15773032 \cdot L + 1.19465634 \cdot M + 0 \cdot S
</math>

These transformations allow accurate simulation of the perceptual experience of individuals with CVD. (The numbers are derived from [5]).

The error between the original and simulated is then mapped into the RGB color space using a deficiency-specific correction matrix, which adjusts the image to enhance contrast and recover lost color differences. The predefined correction matrix is applied to the error in RGB space, transforming it back into LMS space for final adjustments. The corrected LMS values are added back to the original values, producing a recolored image that improves visual accessibility for viewers with CVD. This approach uses the Daltonize-inspired correction matrix:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

==== Method 2: Optimizing Objective Function ====
To improve the results from the Daltonization method, we designed a framework inspired by methods discussed in the background, incorporating dominant color extraction, optimization-based recoloring, and edit propagation. This approach aims to find a balance between the naturalness and contrast while compensating colors that are not visible for corresponding CVD types.

===== 1. Extraction of Dominant Colors =====
We begin by extracting the dominant colors from the input image using fuzzy clustering via a K-means algorithm. This step identifies a reduced set of representative colors that capture the primary color information in the image:
<math display="block">
\mathbf{C} = \{\mathbf{c}_1, \mathbf{c}_2, \ldots, \mathbf{c}_N\},
</math>
where <math display="inline">N</math> represents the number of clusters, and <math display="inline">\mathbf{c}_i</math> represents the centroid of the <math display="inline">i</math>-th cluster.

===== 2. Optimization-Based Recoloring =====
Once the dominant colors are extracted, we apply an optimization process to adjust these colors. The optimization uses the formulas mentioned in [9], and aims to balance two key objectives:

1. Naturalness Preservation: Ensures the recolored image minimally deviates from the original.
<math display="block">
E_{\text{nat}} = \sum_{i=1}^N \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_i^{\text{original}}) \|^2,
</math>
where <math display="inline">\mathbf{T}</math> is the transformation matrix based on the severity and type of CVD, and <math display="inline">\mathbf{c}_i^{\text{original}}</math> is the original color.

2. Contrast Enhancement: Improves the differentiation of colors for individuals with CVD:
<math display="block">
E_{\text{cont}} = \sum_{i=1}^N \sum_{j>i} \left( \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_j) \|^2 - \| \mathbf{c}_i^{\text{original}} - \mathbf{c}_j^{\text{original}} \|^2 \right)^2.
</math>

The total objective function combines these two terms:
<math display="block">
E = \beta E_{\text{nat}} + E_{\text{cont}},
</math>
where <math display="inline">\beta</math> controls the trade-off between naturalness and contrast.

Optimization is performed using the L-BFGS-B algorithm to ensure efficient convergence under bounded constraints.

The transformation matrices for each type of CVD are the following, which are based on [12]:

<div style="text-align:center;">
<math display="inline">
T_{\text{Protanopia}} = \begin{bmatrix} 0.566 & 0.558 & 0 \\ 0.433 & 0.442 & 0.242 \\ 0 & 0 & 0.758 \end{bmatrix},
</math>
<math display="inline">
T_{\text{Deuteranopia}} = \begin{bmatrix} 0.625 & 0.7 & 0 \\ 0.375 & 0.3 & 0.3 \\ 0 & 0 & 0.7 \end{bmatrix},
</math>
<math display="inline">
T_{\text{Tritanopia}} = \begin{bmatrix} 0.95 & 0 & 0 \\ 0.05 & 0.433 & 0 \\ 0 & 0.567 & 1 \end{bmatrix}.
</math>
</div>

===== 3. Edit Propagation =====
After optimizing the dominant colors, we propagate these edits across the entire image to ensure smooth transitions. This propagation step leverages the CIE-Lab color space, which is perceptually uniform, meaning that the Euclidean distance in this space correlates well with human color perception. The process begins by mapping the original image and the optimized dominant colors into the Lab color space. In this space, the differences between the original and recolored dominant colors are computed to capture the adjustments made during the optimization step:
<math display="block">
\Delta L^* = \text{griddata}(\mathbf{c}^{\text{original}}, \mathbf{c}^{\text{recolored}} - \mathbf{c}^{\text{original}}, \mathbf{I}),
</math>
where <math display="inline">\mathbf{I}</math> represents the pixel values in the Lab color space. Once the interpolated changes are computed, they are applied to the Lab representation of the original image. Finally, the adjusted Lab values are converted back to the RGB color space to reconstruct the recolored image.

=== Deep Learning based ===

==== Task Overview ====
Given an input RGB image and a label for the user (as shown in the figure), we want a deep learning model to output a recolored RGB image that is specific to that user. More details on inputs and outputs are discussed in further sections but an overview is shown in Figure 1. All of the code was done in Python using a deep learning framework called [https://pytorch.org PyTorch]
[[File:Io.png|right|thumb|200px|Figure 1: Dataset]]

==== Types ====
1. ''' Supervised methods ''':
These are deep learning models that require a 'ground truth' recolored image for the neural network to learn recolorization. While these methods are simple, easy to train and integrate the user label, they require an already present ground truth comparison of expected output.

2. ''' Unsupervised methods ''':
These models are trained without a ground truth and can also encode user label information while training. They are generally better at generating more natural images, but they require more compute and sophisticated model architectures or loss functions for the recoloring task

==== Dataset ====
The dataset used for this project was constructed specifically to address the challenges of recoloring images for individuals with color vision deficiency (CVD). We first gathered an open-source RGB image dataset from [2]. To improve the capability of the proposed model to enhance the contrast between CVD-indistinguishable color
pairs, in their study, they created a new dataset consisting of 141,000 pictures of both natural scenes and artificial images containing
CVD-confusing colors without labels. To generate labels (and ground truth recolored images for supervised methods), we randomly sampled 15,000 images and recolored by simulating random labels for severity and type of CVD. The recoloring for ground truth images was done using a [https://github.com/jbhuang0604/RecolorForColorblind/tree/master MATLAB script] (adapted to Python) from [4]. Note: The open-source tools used in the Python version for the recoloring script were [https://scikit-image.org Scikit-Image], [https://scipy.org Scipy] and [https://python-colormath.readthedocs.io/en/latest/ Colormath].

As shown in Figure 1, each sample in the dataset consists of:

1. ''' Original RGB Image''' : High-resolution images, resized to <code> 256x256</code> pixels and normalized to <code>[0,1]</code> range, representing the standard color space.

2. ''' CVD Labels ''' : Condition labels encoded as <code>severity * [protan, deutan]</code>, where severity ranges from 0.1 to 1.0. For example, a label <code>[0.6, 0]</code> corresponds to protanopia at 60% severity.

Data augmentation techniques such as random rotations, crops, and brightness adjustments were applied to expand the dataset, ensuring robust model generalization across diverse scenarios.

==== Supervised Methods ====
===== Conditional Parallel RGB MLP =====
[[File:mlp.png|right|thumb|Figure 2: Conditional MLP architecture]]
As shown in Figure 2, the model predicts the R, G, and B channels separately using an independent multi-layer perceptron (MLP) for each channel. The input image is concatenated with the label encoding along the channel dimension and is passed to 3 parallel MLPs simultaneously. These parallel networks are learned to predicted R, G, B channels of a recolored image based on given ground truth. The outputs from each of these networks are concatenated to produce the recolored RGB image of same spatial dimensions as input. Essentially, each channel is disentangled, enabling targeted adjustments.

The loss function used to train was pixel wise, mean-squared error loss:
<math>
\mathcal{L}_{\text{MSE}} = \frac{1}{N} \sum_{p=1}^{N} \left( I(p) - I'(p) \right)^2
</math>

Where:
* I, I': Recolored (model output) image and ground truth recolored image respectively
* p: Image index
* N: Total number of images

===== Conditional U-Net =====
In a similar fashion of inputs, a convolutional neural network (CNN)-based U-Net architecture was tested to generate a full recolored image as output. The conditional inputs here affect both the encoder and decoder. [[File:Unet condtional.png|right|thumb|Figure 3: Conditional U-Net architecture]]
U-Nets are widely used in computer vision tasks and are very robust to new tasks as well. The architecture we adopted is shown in Figure 3.
The loss function used to train the U-Net was a commonly used VGG Perceptual Loss:
<math>
\mathcal{L}_{\text{VGG}} = \sum_{l} \frac{1}{N_l} \| \phi_l(I) - \phi_l(I') \|_2^2
</math>

Where:
* I and I': are recolored (model output) and ground recolored images respectively
* <math>\phi_l</math> is the l-th of the pre-trained VGG network

==== Unsupervised Methods ====
===== Conditional Autoencoder =====
As shown in Figure4, an unsupervised CNN-based encoder-decoder network was trained to reconstruct full recolored images with a CVD-aware color palette. The key to making this network align with the recoloring task was the loss functions. The loss functions we used to train this network were inspired from [2]. [[File:Ae.png|right|350px|thumb|Figure 4: Conditional Autoencoder architecture]]

The total loss function is given by:
<math>
\mathcal{L}_{\text{total}} = \alpha \cdot \mathcal{L}_{\text{naturalness}} + 2 \cdot (1 - \alpha) \cdot \mathcal{L}_{\text{contrast}}
</math>

Where:
<math>
\mathcal{L}_{\text{contrast}} = \beta \cdot \mathcal{L}_{\text{global}} + (2 - \beta) \cdot \mathcal{L}_{\text{local}}
</math>

The components of the loss functions are described below:

1. '''Global Contrast Loss''':
The global contrast loss ensures that the overall contrast of the recolored image is preserved. It is defined as
<math>
\mathcal{L}_{global} = \frac{1}{\|\omega\|} \sum_{<x, y> \in \epsilon \omega} \text{CL}(x, y)
</math>

2. '''Local Contrast Loss''':
The local contrast loss focuses on preserving the contrast within a small neighborhood around each pixel. <math>
\mathcal{L}_{l} = \frac{1}{N} \sum_{x=1}^{N} \sum_{y \in \omega_x} \frac{\text{CL}(x, y)}{\|\omega_x\|}
</math>

Note:

<math>
\text{CL}(x, y) = \|\hat{c}_x' - \hat{c}_y'\| - \|c_x - c_y\|
</math>

* x,y: Two distinct pixels in the image.
* cx and cy: CVD simulated colors of original image
* c^x′and c^y: CVD simulated colors of recolored image (model output)
* ||w||: Global (or large) window of image
* ||wx||: Local window or neighborhood around a pixel x

3. '''Naturalness Loss''':
The naturalness loss drives output image to have colors that are visually similar and close to natural distributions. <math>
\mathcal{L}_{\text{natural}} = 1 - \text{SSIM}(I', I)
</math>

Where:
* I(i), I'(i): Original and recolored images respectively

== Results ==
=== Mathematical based methods ===
{| class="wikitable"
|+ Table 1: Quantitative Evaluation Results for Mathematical Methods
! !! Method 1 !! Method 2 !! Method 3 !! Method 4
|-
! colspan="5" | Performance
|-
| Time/image || 0.2s || 1m13s || 4.4s || 1.6s
|-
! colspan="5" | SSIM Metrics
|-
| Original vs Recolored || 0.0066 || 0.9998 || 0.9988 || 0.9902
|-
| Original vs Original Simulated || 0.9985 || 0.9985 || 0.9985 || 0.9985
|-
| Recolored vs Recolored Simulated || 0.9565 || 0.9986 || 0.9986 || 0.9968
|-
! colspan="5" | TCC Metrics
|-
| Original vs Recolored || 0.4211 || 0.0001 || 0.0003 || 0.0005
|-
| Original vs Original Simulated || 0.0004 || 0.0003 || 0.0003 || 0.0003
|-
| Recolored vs Recolored Simulated || 0.0380 || 0.0003 || 0.0002 || 0.0005
|-
! colspan="5" | CD ΔE76 Metrics
|-
| Original vs Recolored || 57.4513 || 0.0217 || 0.0632 || 0.1057
|-
| Original vs Original Simulated || 0.0462 || 0.0462 || 0.0462 || 0.0462
|-
| Recolored vs Recolored Simulated || 8.4251 || 0.0458 || 0.0435 || 0.0578
|-
! colspan="5" | CIEDE2000 Metrics
|-
| Original vs Recolored || 41.2667 || 0.0229 || 0.0675 || 0.1312
|-
| Original vs Original Simulated || 0.0681 || 0.0681 || 0.0681 || 0.0681
|-
| Recolored vs Recolored Simulated || 6.9145 || 0.0671 || 0.0630 || 0.0838
|-
! colspan="5" | CIEDE94 Metrics
|-
| Original vs Recolored || 57.3637 || 0.0217 || 0.0630 || 0.1056
|-
| Original vs Original Simulated || 0.0461 || 0.0461 || 0.0461 || 0.0461
|-
| Recolored vs Recolored Simulated || 5.3878 || 0.0457 || 0.0434 || 0.0576
|-
! colspan="5" | D-CIELAB ΔEab Metrics
|-
| Original vs Recolored || 2.1314 || 3.8863 || 7.6867 || 8.0045
|-
| Original vs Original Simulated || 1.7209 || 1.7209 || 1.7209 || 1.7209
|-
| Recolored vs Recolored Simulated || 1.5926 || 1.9673 || 1.4363 || 2.4009
|}

=== Deep Learning based methods ===
The results focus on evaluating the performance of the above neural network architectures—Conditional Parallel RGB MLP, Deep U-Net, and Conditional Autoencoder. Quantitive metrics such as Structural Similarity Index (SSIM), total color contrast (TCC), Chromatic Difference (CD), and inference time were used to assess the effectiveness of the models provided in [1] and [2].

==== Qualitative Results ====
The recolored outputs were visually evaluated to determine their alignment with expected results. The 'expected' results for supervised mean how closely they resemble ground truth recolored image and for unsupervised method mean how much contrast and naturalness is observed in the CVD simulated recolored images compared to original.
The results and takeaways can be summarized as follows:

1. '''Conditional Parallel RGB MLP''': (Figure 5)
[[File:Mlp_res.png|right|400px|thumb|Figure 5 Conditional MLP: Model failure]]
* Recoloring was inconsistent, with visible artifacts in regions where spatial correlations were essential.
* The pixels seemed more discretized, suggesting that disentanglement was not very useful for this case (especially naturalness).
* Failed to preserve natural color transitions, particularly in complex images.
2. '''Conditional U-Net''': (Figure 6, 7)
[[File:Unet_res1.png|right|400px|thumb|Figure 6 Conditional U-Net: Model failure]]
[[File:Unet_res2.png|right|400px|thumb|Figure 7 Conditional U-Net: CVD Simulated examples]]
* Produced stable recoloring, preserving structural details.
* Initially showed improvement towards resembling ground truth, but over time started 'reconstructing' the colors of the original image.
* The CVD simulations of recolored versus original were similar or worse meaning that the model was not doing well for this task
* Sometimes it over-saturated some colors, affecting the visual appeal.
3. '''Conditional Autoencoder''': (Figure 8, 9)
[[File:ae_res1.png|right|400px|thumb|Figure 8 Conditional Autoencoder: Majority good results]]
[[File:ae_res1.png|right|400px|thumb|Figure 9 Conditional Autoencoder: Marginal or negative improvement + Blurriness]]
* Achieved smooth and natural recoloring, with fewer artifacts.
* Showed the highest contrast improvement among the three models.
* In some cases, hurt the contrast in the CVD simulated colors and in some there was marginal improvement in contrast.
* Blurriness in the recolored images was seen (possibly due to naturalness factor being more prioritized even though weight coefficients in the loss term favored contrast (alpha = 0.25, beta = 1.0)).

==== Quantitative Results ====
Based on the above qualitative results, we decided to score and evaluate metrics for comparison with related work only using the Conditional Autoencoder.
As mentioned above, the evaluation metrics are adapted from [1] and [2]. Please refer to the definitions in the paper, as we have used the same. On a high level, the three components are:
* SSIM: Measures the structural similarity between the original and recolored images, ensuring the structural integrity of the recolored image is maintained.
<math>
SSIM(X, Y) = \frac{(2\mu_X\mu_Y + c_1)(2\sigma_{XY} + c_2)}{(\mu_X^2 + \mu_Y^2 + c_1)(\sigma_X^2 + \sigma_Y^2 + c_2)}
</math>

* Total Color Contrast: Quantifies the visibility improvement between indistinguishable colors for CVD individuals.
<math>
TCC = \frac{1}{n_1} \sum_{(i,j) \in \Omega_1} |x_i - x_j|
+ \frac{1}{N \cdot n_2} \sum_{i=1}^{N} \sum_{j \in \Omega_2} |x_i - x_j|
</math>
* Chromatic Difference: Quantifies the perceptual differences in color before and after recoloring, ensuring enhanced distinguishability
<math>
CD(i) = \sqrt{\lambda (l_i' - l_i)^2 + (a_i' - a_i)^2 + (b_i' - b_i)^2}
</math>
(lamda is a constant, not wavelength and l,a,b represent LAB space coordinates of recolored (') and original respectively.)
* Inference Time: Determines the computational efficiency of the models.

The key results are in Table 2 and takeaways for the Conditional Autoencoder can be summarized as follows:

{| class="wikitable" style="text-align:center; width:30%; margin:auto;"
|-
! Metric
! Value
|-
| Inference Time
| 2.6 seconds/image
|-
| SSIM ("Structure")
| 0.8707
|-
| Total Color Contrast ("Distinguishability")
| 0.5771 / (~0.851)*
|-
| Chromatic Difference ("Color")
| 0.3521 / (~0.963)*
|+ '''Table 2: Quantitative Evaluation Results'''
|}

Note: * indicates results from paper [2] for protan/deutan whichever is larger.

* TCC and CD are good but not as good as paper [2] because they use optimize networks for each CVD type separately.
* Blurry (SSIM is not optimized for enough)
* Mixing CVD types in the same network needs to be more sophisticated

== Conclusions ==
Through our (many) experiments, we learned a couple of things:

1. '''Model Effectiveness''':
Among the models, the Conditional Autoencoder showed the best balance between enhancing color contrast and preserving naturalness. It improved the distinguishability of colors for CVD individuals while maintaining a smooth, visually appealing output. However, it produced slightly blurry images, which could be improved with better loss functions or refinement techniques. The Conditional U-Net was also effective in preserving structure and providing stable recoloring, but it required careful training to avoid overfitting. The Conditional Parallel RGB MLP, while computationally fast, lacked the ability to capture spatial relationships between pixels, making it unsuitable for this task.

2. '''Importance of Loss Functions''':
Designing appropriate loss functions was crucial for achieving the right balance between naturalness, contrast enhancement, and structural preservation. The global and local contrast losses significantly improved the visibility of recolored images, while the naturalness loss ensured that the outputs did not look artificial. Incorporating metrics like SSIM and Chromatic Difference into the evaluation also helped us better understand how well the models performed.

3. '''Challenges with Data''':
One of the biggest challenges was ensuring that the dataset effectively represented real-world scenarios for CVD individuals. Simulating CVD perceptions and generating recolored images that matched those perceptions required a well-defined pipeline. A more diverse dataset or additional user studies with CVD participants could help fine-tune the models further.

4. '''Computational Efficiency''':
While models like the Conditional Autoencoder and Conditional U-Net provided high-quality recoloring, their inference times were moderate, making them feasible for real-time applications. Optimizing these models further could make them more scalable for real-world use cases, such as accessibility tools in apps or websites.

5. '''What Worked and What Didn’t''':
* Worked: Contrast enhancement methods using local and global losses were effective in improving visibility for CVD individuals. Transformer-inspired loss functions borrowed from Swin architecture added robustness.
* Didn’t Work: Pixel-wise methods like the Conditional RGB MLP struggled due to their inability to handle spatial dependencies. Additionally, overfitting was a recurring issue in larger architectures without careful training.

6. '''Future Directions''':
* Better Loss Functions: Refining the loss functions to address issues like blurriness in outputs could further improve results.
* User Studies: Testing the models with real CVD participants would provide valuable insights and help validate the results.
* Model Optimization: Reducing the computational cost of high-performing models like the Conditional Autoencoder could make them more practical for deployment.
* Exploration of New Architectures: Trying newer methods, such as lightweight transformers or diffusion-based models, might enhance recoloring performance while maintaining efficiency.

While there’s still room for improvement, our models demonstrated the potential of deep learning in addressing the challenges faced by individuals with CVD. Our future work would focus on refining these methods and bringing them closer to practical, everyday applications.

== References ==
[1] Li, H., Zhang, L., Zhang, X., Zhang, M., Zhu, G., Shen, P., ... & Shah, S. A. A. (2020). Color vision deficiency datasets & recoloring evaluation using GANs. Multimedia Tools and Applications, 79, 27583-27614.

[2] Chen, L., Zhu, Z., Huang, W., Go, K., Chen, X., & Mao, X. (2024). Image recoloring for color vision deficiency compensation using Swin transformer. Neural Computing and Applications, 36(11), 6051-6066.

[3] Jiang, S., Liu, D., Li, D., & Xu, C. (2023). Personalized image generation for color vision deficiency population. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22571-22580).

[4] Huang, J.-B., Chen, C.-S., Jen, T.-C., & Wang, S.-J. (n.d.). Image recolorization for the colorblind [GitHub repository]. Retrieved December 12, 2024, from https://github.com/jbhuang0604/RecolorForColorblind

[5] Dietrich, J. (n.d.). Daltonize Python Package [GitHub repository]. Retrieved December 12, 2024, from https://github.com/joergdietrich/daltonize/blob/main/daltonize/daltonize.py

[6] Dougherty, B., & Wade, A. (2000). Vischeck. Retrieved December 12, 2024, from https://www.vischeck.com/

[7] Brettel, H., Viénot, F., & Mollon, J. D. (1997). Computerized simulation of color appearance for dichromats. Josa a, 14(10), 2647-2655.

[8] Zhu, Z., Toyoura, M., Go, K., Fujishiro, I., Kashiwagi, K., & Mao, X. (2019). Processing images for red–green dichromats compensation via naturalness and information-preservation considered recoloring. The Visual Computer, 35, 1053-1066.

[9] Zhu, Z., Toyoura, M., Go, K., Kashiwagi, K., Fujishiro, I., Wong, T. T., & Mao, X. (2021). Personalized image recoloring for color vision deficiency compensation. IEEE Transactions on Multimedia, 24, 1721-1734.

[10] Tsekouras, G. E., Rigos, A., Chatzistamatis, S., Tsimikas, J., Kotis, K., Caridakis, G., & Anagnostopoulos, C. N. (2021). A novel approach to image recoloring for color vision deficiency. Sensors, 21(8), 2740.

[11] Huang, J. B., Chen, C. S., Jen, T. C., & Wang, S. J. (2009, April). Image recolorization for the colorblind. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 1161-1164). IEEE.

[12] Color-Blindness.com. (n.d.). COBLIS - Color Blindness Simulator. Retrieved December 13, 2024, from https://www.color-blindness.com/coblis-color-blindness-simulator/

== Appendix I ==
* [https://github.com/rainasong/psych221-aut24-final-project.git Code]
* [https://drive.google.com/drive/folders/10WMXPbtpV7Hy5_qBA_TCEbW-kCpj1D7v Dataset]

=== Additional results ===
1. '''Recolored Images - Conditional Autoencoder'''
<div style="display: inline; width: 220px; float: center;">
[[File:eb_1.png|400 px|Wikipedia encyclopedia]][[File:eb_2.png|400 px]] </div>

2. '''Loss curves'''
<div style="display: inline; width: 800px; float: center;">
[[File:loss_ae.png|300 px|center|thumb|Losses - Conditional Autoencoder]][[File:loss_unet.png|300 px|thumb|center|Losses - Conditional U-Net]][[File:loss_mlp.png|300 px|center|thumb|Losses - Conditional MLP]]</div>

== Appendix II ==
'''Ishikaa''':
* Training, evaluation and visualization for all deep learning methods (MLP, U-Net and Autoencoder)
* GMM recoloring method in Python & adding severity index
* 'Ground Truth' dataset creation and logging
* AWS Compute setup & configuration
* Written Report & Presentation

'''Raina''':

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-13T10:00:06Z

Rainas: /* 2. Optimization-Based Recoloring */

== Introduction ==
Color Vision Deficiency (CVD) affects approximately 350 million individuals worldwide, impairing their ability to distinguish certain colors. Image recoloring for individuals with CVDs has been a well-researched area, with numerous attempts aimed at creating images that make colors more distinguishable for those with CVDs. CVDs are typically classified into three main types: protanopia (difficulty perceiving red), deuteranopia (difficulty perceiving green), and tritanopia (difficulty perceiving blue). These conditions arise due to the absence or malfunction of certain types of cone cells in the retina, which are responsible for color vision. For instance, the lack or defect of red or green cones leads to protanopia or deuteranopia, respectively. This can result in partial or complete loss of perception of specific colors. Moreover, individuals with CVD can be dichromats (completely missing one type of cone cell), or anomalous trichromats (having all three types of cones but with altered sensitivity), causing milder color perception issues. Most research has focused on protanopia and deuteranopia, as these conditions are more prevalent, and only a few consider different severity levels.

In this work, we aim to consider all three types of CVDs, taking into account varying levels of severity for personalization. We explore existing methods in the field and experiment through two main approaches: mathematical transformations and deep learning techniques. We will start by reviewing current advancements in these two domains, followed by presenting our experiments and results. Evaluations of each method will be provided, leading to a discussion of our findings and outlining potential directions for future work.

The motivation for this work arose from a personal experience. While admiring a rainbow, a phenomenon I find among the most beautiful in the world, with a friend who has deuteranopia, I realized that they were unable to distinguish the vibrant array of colors. This experience highlighted the emotional and perceptual gap caused by CVD, inspiring the goal of this project: to develop personalized and efficient tools that enhance color perception for individuals with CVDs. Ultimately, we aim to enable those with CVDs to enjoy the same vivid experiences, such as the beauty of a rainbow, experienced by those with normal color vision.

== Background ==
In recent years, numerous methods have been developed to recolor images for individuals with CVDs, ranging from traditional mathematical approaches to advanced deep learning techniques. This section focuses on the prominent recent works in these two categories.

=== Mathematical-based methods ===
Mathematical approaches to image recoloring for individuals with CVDs have been extensively developed to enhance color discrimination while trying to preserve the natural appearance of images. These methods typically involve color space transformations, optimization techniques, and perceptual modeling to achieve their objectives.

==== Daltonization ====
Daltonization enhances images for individuals with CVD by correcting colors based on the simulated deficiency. The process involves comparing the original LMS values with the simulated deficient values to compute the error:
<math display="block">
\text{Error}_{\text{LMS}} = \text{LMS}_{\text{original}} - \text{LMS}_{\text{simulated}}
</math>

The error is then mapped back to the RGB space using a correction matrix because the error contains the information that dichromats cannot see, and the correction matrix rotates it to a part of the spectrum that they can see. For example, the correction matrix, as implemented in tools like Daltonize [5] and Vischeck [6], is:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

The corrected RGB values are added back to the original LMS values to generate a daltonized image that improves contrast for CVD viewers.

==== Optimization-based Method ====
Zhu et al. [8] introduced an optimization-based recoloring framework for red-green dichromacy, aiming to balance naturalness and contrast. The framework minimizes a total loss function defined as:

<math display="block"> E = \beta E_{\text{nat}} + E_{\text{cont}} </math>

where <math>\beta</math> is a scalar weight that controls the trade-off between the two objectives: naturalness preservation (<math>E_{\text{nat}}</math>) and contrast enhancement (<math>E_{\text{cont}}</math>).

The naturalness term, <math>E_{\text{nat}}</math>, ensures that the recolored image closely resembles the original image for CVD viewers by minimizing perceptual differences:

<math display="block"> E_{\text{nat}} = \sum_{i=1}^N \| c_i^+ - c_i \|^2, </math>

where:
* <math>N</math> is the total number of pixels in the image,
* <math>c_i</math> is the original color of the <math>i</math>-th pixel,
* <math>c_i^+</math> is the recolored value of the <math>i</math>-th pixel,
* <math>\| c_i^+ - c_i \|</math> is the Euclidean distance, measuring the perceptual difference between the original and recolored colors.

The contrast term, <math>E_{\text{cont}}</math>, enhances the distinguishability of colors in the recolored image by minimizing changes in color contrast:

<math display="block"> E_{\text{cont}} = \sum_{i \neq j} \| (c_i^+ - c_j^+) - (c_i - c_j) \|^2, </math>

where:
* <math>(c_i^+ - c_j^+)</math> is the perceived color difference between pixels <math>i</math> and <math>j</math> after recoloring,
* <math>(c_i - c_j)</math> is the original color difference,
* <math>\| (c_i^+ - c_j^+) - (c_i - c_j) \|</math> represents the deviation in color contrast before and after recoloring.

To address the limitations of this approach, Zhu et al. [9] proposed a degree-adaptable framework incorporating a transformation matrix <math>T</math> that simulates CVD perception. The transformation matrix is defined as:

<math display="block"> T = \begin{bmatrix} t_{11} & t_{12} & t_{13} \\ t_{21} & t_{22} & t_{23} \\ t_{31} & t_{32} & t_{33} \end{bmatrix}, </math>

where <math>t_{ij}</math> are the elements representing the relationships between the original and perceived LMS (Long, Medium, Short wavelength) cone responses for individuals with CVD.

The degree-adaptable loss function extends the optimization by adjusting weights based on perceptual importance, defined as:

<math display="block"> E = \beta \sum_{i=1}^N \alpha_i \| T(c_i^+ - c_i) \|^2 + \sum_{i \neq j} \| T(c_i^+ - c_j^+) - T(c_i - c_j) \|^2. </math>

Here:
* <math>\alpha_i</math> assigns weights to each pixel, prioritizing the preservation of colors with smaller perception errors,
* <math>\| T(c_i^+ - c_i) \|</math> measures the perceptual difference after recoloring,
* <math>\| T(c_i^+ - c_j^+) - T(c_i - c_j) \|</math> quantifies the deviation in color contrast under CVD simulation.

This framework improves both contrast and personalization but requires further optimization for real-time performance.

==== Confusion lines based Method ====
Tsekouras et al. [10] proposed a novel image recoloring approach for individuals with protanopia and deuteranopia, focusing on improving color naturalness and enhancing contrast. Their framework consists of four modules, with a key focus on shifting confusing colors along confusion lines in the CIE 1931 chromaticity diagram.

The process begins with fuzzy clustering, which identifies representative colors (key colors) from the input image. These key colors are then analyzed on the chromaticity diagram, where confusion lines—paths representing colors indistinguishable by individuals with CVD—serve as the basis for recoloring. Confusion lines are defined using the copunctal point of the missing cone type and another reference point:

<math display="block">
d(v, L) = \frac{\left|(x_{cp} - x_0)(y_0 - y_v) - (x_0 - x_v)(y_{cp} - y_0)\right|}{\sqrt{(x_{cp} - x_0)^2 + (y_{cp} - y_0)^2}},
</math>

where:
* <math display="inline">v = (x_v, y_v)</math> is the chromaticity coordinate of the color,
* <math display="inline">L</math> is the confusion line passing through the copunctal point <math display="inline">(x_{cp}, y_{cp})</math> and another reference point <math display="inline">(x_0, y_0)</math>,
* <math display="inline">d(v, L)</math> measures the perpendicular distance from the point <math display="inline">v</math> to the confusion line <math display="inline">L</math>.

Confusing colors, identified as key colors lying on occupied confusion lines, are iteratively shifted to the nearest non-occupied confusion lines to enhance discriminability for CVD viewers. High-ranking colors, determined by their prominence in image clusters, are shifted to the nearest unoccupied confusion lines. This reallocation ensures that these colors are distinguishable to viewers with CVD while minimizing disruption to the image's overall color harmony.

After shifting, the luminance of the recolored key colors is optimized using a regularized objective function to balance naturalness and contrast:
<math display="block">
E = (E_1 + E_2) + \lambda E_3,
</math>

where:
* <math display="inline">E</math> is the total loss,
* <math display="inline">\lambda</math> is a weight parameter controlling the trade-off between contrast enhancement and naturalness preservation.

The first term, <math display="inline">E_1</math>, measures contrast enhancement for normal trichromats:

<math display="block">
E_1 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - b_j\| - \|f_D(a_{i,\text{rec}}) - f_D(b_j)\| \right|,
</math>

where:
* <math display="inline">n_A</math> and <math display="inline">n_B</math> are the number of key colors in clusters <math display="inline">A</math> and <math display="inline">B</math>, respectively,
* <math display="inline">a_i</math> is the chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">b_j</math> is the chromaticity of the <math display="inline">j</math>-th key color in cluster <math display="inline">B</math>,
* <math display="inline">f_D</math> is a function simulating the dichromatic vision of individuals with color vision deficiencies,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color.

The second term, <math display="inline">E_2</math>, measures contrast enhancement for dichromats:

<math display="block">
E_2 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - a_j\| - \|f_D(a_{i,\text{rec}}) - f_D(a_{j,\text{rec}})\| \right|,
</math>

where:
* <math display="inline">a_i</math> and <math display="inline">a_j</math> are the chromaticities of the <math display="inline">i</math>-th and <math display="inline">j</math>-th key colors in cluster <math display="inline">A</math>,
* <math display="inline">f_D(a_{i,\text{rec}})</math> simulates the dichromatic perception of the recolored chromaticity <math display="inline">a_{i,\text{rec}}</math>.

The third term, <math display="inline">E_3</math>, preserves the naturalness of the recolored image:

<math display="block">
E_3 = \frac{1}{n_A} \sum_{i=1}^{n_A} \|a_i - a_{i,\text{rec}}\|,
</math>

where:
* <math display="inline">a_i</math> is the original chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">\|a_i - a_{i,\text{rec}}\|</math> is the Euclidean distance between the original and recolored chromaticities, measuring how much the naturalness is preserved.

This method significantly enhances the contrast and naturalness of recolored images by leveraging confusion line geometry and regularized optimization. However, challenges remain in achieving real-time performance and handling cases where shifting may distort the aesthetic quality of the image.

==== GMM-based Method ====
Huang et al. [11] proposed an efficient and effective re-coloring algorithm for individuals with CVD using a Gaussian Mixture Model (GMM) to represent color distributions. The algorithm comprises four main steps: feature extraction, clustering using GMM, optimization of Gaussian components, and interpolation for recoloring.

Step 1 - Feature Extraction:
Each pixel in the input image is represented in the CIEL*a*b* color space, which approximates perceptual differences using the Euclidean distance between colors. The color feature vector <math display="inline">x</math> is used as input for clustering.

Step 2 - Clustering via GMM:
The color distribution of the image is modeled using a GMM with <math display="inline">K</math> Gaussian components:
<math display="block">
p(x|\Theta) = \sum_{i=1}^K \omega_i G_i(x|\theta_i),
</math>
where:
* <math display="inline">\Theta</math> is the parameter set containing all weights, means, and covariance matrices,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian,
* <math display="inline">G_i(x|\theta_i)</math> is the 3D normal distribution with parameters <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix).

Step 3 - Optimization:
To ensure color distinguishability for CVD viewers, the algorithm adjusts the mean vector of each Gaussian component using an optimization function that preserves the symmetric Kullback-Leibler (KL) divergence:
<math display="block">
D_{sKL}(G_i, G_j) = D_{KL}(G_i \| G_j) + D_{KL}(G_j \| G_i),
</math>
where:
* <math display="inline">D_{KL}(G_i \| G_j)</math> measures the dissimilarity between two Gaussian distributions <math display="inline">G_i</math> and <math display="inline">G_j</math>.

The optimization aims to preserve the contrast perceived by CVD viewers while maintaining naturalness. Weights are assigned to Gaussian components based on the perceptual importance of colors:
<math display="block">
\lambda_i = \frac{\sum_{j=1}^N \alpha_j p(i|x_j, \Theta)}{\sum_{k=1}^K \sum_{j=1}^N \alpha_j p(k|x_j, \Theta)},
</math>
where:
* <math display="inline">\alpha_j = \|x_j - \text{Sim}(x_j)\|</math> is the perceptual error of the <math display="inline">j</math>-th color feature when simulated for CVD,
* <math display="inline">\text{Sim}(\cdot)</math> is the simulation function for CVD perception.

Step 4 - Interpolation for Recoloring:
After optimizing the Gaussians, the mapping function <math display="inline">M_i(\cdot)</math> relocates the mean vectors while maintaining covariance matrices. Interpolation ensures smooth transitions between recolored regions:
<math display="block">
T(x_j)_H = x_j^H + \sum_{i=1}^K p(i|x_j, \Theta) (M_i(\mu_i)_H - \mu_i^H),
</math>
where:
* <math display="inline">T(x_j)_H</math> is the hue adjustment for the <math display="inline">j</math>-th color,
* <math display="inline">M_i(\mu_i)_H</math> is the mapped hue of the <math display="inline">i</math>-th Gaussian's mean.

While the GMM-based approach effectively models color distributions and enhances the contrast of recolored images significantly, it has limitations:
* The accuracy of recoloring depends on the choice of <math display="inline">K</math>, which may vary for different images.
* The method assumes diagonal covariance matrices for computational efficiency, which may oversimplify real-world color distributions. Sometimes the colors in the recolored images are not very natural.
* The high computational complexity in the optimization step of this algorithm may be difficult for real-time applications.

=== Deep Learning based methods ===
Conventional methods for recoloring, including optimization-based approaches (as discussed above), fail to generalize well across varying severity levels and CVD types. While these methods improve color differentiation, they frequently compromise naturalness or require extensive computational resources, making them less suitable for real-time, efficient, personalized applications.

==== GAN-Based Recoloring for CVD ====

In [1] GANs (Generative Adversarial Networks) was explored for recoloring, with a backbone Pix2Pix-GAN, Cycle-GAN, and Bicycle-GAN structure showing promising results. These models are generate creative recolored images by learning mappings between normal and CVD-affected color spaces. However, this and existing GAN approaches struggle with balancing naturalness and contrast. This specific reference also requires paired datasets (since it is adapted from style transfer), making it computationally intensive and less suitable for personalization.

==== Swin Transformer Recoloring ====

The authors in [2] introduced a hierarchical vision transformer (SWIN) architecture that processes images through shifted windows, effectively capturing both local and global contextual information. In computer vision, this design generally allows efficient handling of high-resolution images and has been applied to various tasks, including image classification and object detection. Despite its robust performance, this architecture is still computationally intensive and does not inherently account for the specific needs of CVD individuals, as it lacks mechanisms for personalized color adjustments.

==== Personalized CVD-GAN ====

To cater to the diverse needs of the CVD population, the Personalized CVD-GAN [3] was developed. This model generates images that are not only CVD-friendly but also tailored to individual degrees of color vision deficiency. By disentangling color representations using a unique triple-latent structure in their method, continuous personalization was possible to adjust images according to specific CVD severities. While effective, this approach is computationally demanding, making it less practical for real-time applications. In our experiment, it took around 18 days for one epoch (or one iteration over the entire dataset).

Thus, existing methods either lack personalization or are too resource-intensive for widespread use.

== Methods ==
We aim to find effective and efficient ways to recolor images for people with CVD with the personalization of different severity levels. We start by exploring existing methods and identifying opportunities for improvement. Since mathematical-based approaches provide a solid foundation and are well-documented, we began our experiments by testing these methods, as described in the background. We later extended our exploration to deep learning based methods.

=== Mathematical based ===
We explored four main methods, building on the foundational work discussed in the background section.

==== Method 1: Daltonization as a baseline ====
We started with the relatively intuitive Daltonization method, where we adjusted the colors in an image to compensate for color vision deficiencies by simulating how the colors appear to individuals with CVD. This involves computing the difference between the original and simulated color perception in the LMS (Long, Medium, Short wavelength) color space. The calculated error is then corrected and mapped back to the RGB space using a transformation matrix, resulting in a recolored image that enhances color differentiation for viewers with CVD.

The simulation of CVDs relies on the physiology of human vision, particularly the responses of the Long (L), Medium (M), and Short (S) wavelength-sensitive cones in the retina. The LMS color space is derived from the spectral sensitivities of these cones, making it an ideal framework for modeling human color perception.

To simulate CVD, we first transformed colors in RGB color space into the LMS color space using the following linear transformation matrix based on Stockman and Sharpe’s cone fundamentals:
<math display="block">
T_{\text{RGB-to-LMS}} = \begin{bmatrix}
0.3904725 & 0.54990437 & 0.00890159 \\
0.07092586 & 0.96310739 & 0.00135809 \\
0.02314268 & 0.12801221 & 0.93605194
\end{bmatrix}
</math>

For individuals with CVD, the missing cone’s response is replaced by a weighted combination of the remaining two cones. This approach, introduced by Brettel, Viénot, and Mollon (1997) [7], uses specific coefficients derived from cone sensitivities. For example, in protanopia (L-cone deficiency), the L-cone response is approximated using the M- and S-cone responses as:
<math display="block">
L_{\text{simulated}} = 0 \cdot L + 0.90822864 \cdot M + 0.008192 \cdot S
</math>

For deuteranopia (M-cone deficiency), the M-cone is replaced as:
<math display="block">
M_{\text{simulated}} = 1.10104433 \cdot L + 0 \cdot M - 0.00901975 \cdot S
</math>

For tritanopia (S-cone deficiency), the S-cone is replaced as:
<math display="block">
S_{\text{simulated}} = -0.15773032 \cdot L + 1.19465634 \cdot M + 0 \cdot S
</math>

These transformations allow accurate simulation of the perceptual experience of individuals with CVD. (The numbers are derived from [5]).

The error between the original and simulated is then mapped into the RGB color space using a deficiency-specific correction matrix, which adjusts the image to enhance contrast and recover lost color differences. The predefined correction matrix is applied to the error in RGB space, transforming it back into LMS space for final adjustments. The corrected LMS values are added back to the original values, producing a recolored image that improves visual accessibility for viewers with CVD. This approach uses the Daltonize-inspired correction matrix:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

==== Method 2: Optimizing Objective Function ====
To improve the results from the Daltonization method, we designed a framework inspired by methods discussed in the background, incorporating dominant color extraction, optimization-based recoloring, and edit propagation. This approach aims to find a balance between the naturalness and contrast while compensating colors that are not visible for corresponding CVD types.

===== 1. Extraction of Dominant Colors =====
We begin by extracting the dominant colors from the input image using fuzzy clustering via a MiniBatch K-means algorithm. This step identifies a reduced set of representative colors that capture the primary color information in the image:
<math display="block">
\mathbf{C} = \{\mathbf{c}_1, \mathbf{c}_2, \ldots, \mathbf{c}_N\},
</math>
where <math display="inline">N</math> represents the number of clusters, and <math display="inline">\mathbf{c}_i</math> represents the centroid of the <math display="inline">i</math>-th cluster.

===== 2. Optimization-Based Recoloring =====
Once the dominant colors are extracted, we apply an optimization process to adjust these colors. The optimization uses the formulas mentioned in [9], and aims to balance two key objectives:

1. Naturalness Preservation: Ensures the recolored image minimally deviates from the original.
<math display="block">
E_{\text{nat}} = \sum_{i=1}^N \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_i^{\text{original}}) \|^2,
</math>
where <math display="inline">\mathbf{T}</math> is the transformation matrix based on the severity and type of CVD, and <math display="inline">\mathbf{c}_i^{\text{original}}</math> is the original color.

2. Contrast Enhancement: Improves the differentiation of colors for individuals with CVD:
<math display="block">
E_{\text{cont}} = \sum_{i=1}^N \sum_{j>i} \left( \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_j) \|^2 - \| \mathbf{c}_i^{\text{original}} - \mathbf{c}_j^{\text{original}} \|^2 \right)^2.
</math>

The total objective function combines these two terms:
<math display="block">
E = \beta E_{\text{nat}} + E_{\text{cont}},
</math>
where <math display="inline">\beta</math> controls the trade-off between naturalness and contrast.

Optimization is performed using the L-BFGS-B algorithm to ensure efficient convergence under bounded constraints.

The transformation matrices for each type of CVD are the following, which are based on [12]:

<div style="text-align:center;">
<math display="inline">
T_{\text{Protanopia}} = \begin{bmatrix} 0.566 & 0.558 & 0 \\ 0.433 & 0.442 & 0.242 \\ 0 & 0 & 0.758 \end{bmatrix},
</math>
<math display="inline">
T_{\text{Deuteranopia}} = \begin{bmatrix} 0.625 & 0.7 & 0 \\ 0.375 & 0.3 & 0.3 \\ 0 & 0 & 0.7 \end{bmatrix},
</math>
<math display="inline">
T_{\text{Tritanopia}} = \begin{bmatrix} 0.95 & 0 & 0 \\ 0.05 & 0.433 & 0 \\ 0 & 0.567 & 1 \end{bmatrix}.
</math>
</div>

===== 3. Edit Propagation =====
After optimizing the dominant colors, we propagate these edits across the entire image to ensure smooth transitions. This propagation step leverages the CIE-Lab color space, which is perceptually uniform, meaning that the Euclidean distance in this space correlates well with human color perception. The process begins by mapping the original image and the optimized dominant colors into the Lab color space. In this space, the differences between the original and recolored dominant colors are computed to capture the adjustments made during the optimization step:
<math display="block">
\Delta L^* = \text{griddata}(\mathbf{c}^{\text{original}}, \mathbf{c}^{\text{recolored}} - \mathbf{c}^{\text{original}}, \mathbf{I}),
</math>
where <math display="inline">\mathbf{I}</math> represents the pixel values in the Lab color space. Once the interpolated changes are computed, they are applied to the Lab representation of the original image. Finally, the adjusted Lab values are converted back to the RGB color space to reconstruct the recolored image.

=== Deep Learning based ===

==== Task Overview ====
Given an input RGB image and a label for the user (as shown in the figure), we want a deep learning model to output a recolored RGB image that is specific to that user. More details on inputs and outputs are discussed in further sections but an overview is shown in Figure 1. All of the code was done in Python using a deep learning framework called [https://pytorch.org PyTorch]
[[File:Io.png|right|thumb|200px|Figure 1: Dataset]]

==== Types ====
1. ''' Supervised methods ''':
These are deep learning models that require a 'ground truth' recolored image for the neural network to learn recolorization. While these methods are simple, easy to train and integrate the user label, they require an already present ground truth comparison of expected output.

2. ''' Unsupervised methods ''':
These models are trained without a ground truth and can also encode user label information while training. They are generally better at generating more natural images, but they require more compute and sophisticated model architectures or loss functions for the recoloring task

==== Dataset ====
The dataset used for this project was constructed specifically to address the challenges of recoloring images for individuals with color vision deficiency (CVD). We first gathered an open-source RGB image dataset from [2]. To improve the capability of the proposed model to enhance the contrast between CVD-indistinguishable color
pairs, in their study, they created a new dataset consisting of 141,000 pictures of both natural scenes and artificial images containing
CVD-confusing colors without labels. To generate labels (and ground truth recolored images for supervised methods), we randomly sampled 15,000 images and recolored by simulating random labels for severity and type of CVD. The recoloring for ground truth images was done using a [https://github.com/jbhuang0604/RecolorForColorblind/tree/master MATLAB script] (adapted to Python) from [4]. Note: The open-source tools used in the Python version for the recoloring script were [https://scikit-image.org Scikit-Image], [https://scipy.org Scipy] and [https://python-colormath.readthedocs.io/en/latest/ Colormath].

As shown in Figure 1, each sample in the dataset consists of:

1. ''' Original RGB Image''' : High-resolution images, resized to <code> 256x256</code> pixels and normalized to <code>[0,1]</code> range, representing the standard color space.

2. ''' CVD Labels ''' : Condition labels encoded as <code>severity * [protan, deutan]</code>, where severity ranges from 0.1 to 1.0. For example, a label <code>[0.6, 0]</code> corresponds to protanopia at 60% severity.

Data augmentation techniques such as random rotations, crops, and brightness adjustments were applied to expand the dataset, ensuring robust model generalization across diverse scenarios.

==== Supervised Methods ====
===== Conditional Parallel RGB MLP =====
[[File:mlp.png|right|thumb|Figure 2: Conditional MLP architecture]]
As shown in Figure 2, the model predicts the R, G, and B channels separately using an independent multi-layer perceptron (MLP) for each channel. The input image is concatenated with the label encoding along the channel dimension and is passed to 3 parallel MLPs simultaneously. These parallel networks are learned to predicted R, G, B channels of a recolored image based on given ground truth. The outputs from each of these networks are concatenated to produce the recolored RGB image of same spatial dimensions as input. Essentially, each channel is disentangled, enabling targeted adjustments.

The loss function used to train was pixel wise, mean-squared error loss:
<math>
\mathcal{L}_{\text{MSE}} = \frac{1}{N} \sum_{p=1}^{N} \left( I(p) - I'(p) \right)^2
</math>

Where:
* I, I': Recolored (model output) image and ground truth recolored image respectively
* p: Image index
* N: Total number of images

===== Conditional U-Net =====
In a similar fashion of inputs, a convolutional neural network (CNN)-based U-Net architecture was tested to generate a full recolored image as output. The conditional inputs here affect both the encoder and decoder. [[File:Unet condtional.png|right|thumb|Figure 3: Conditional U-Net architecture]]
U-Nets are widely used in computer vision tasks and are very robust to new tasks as well. The architecture we adopted is shown in Figure 3.
The loss function used to train the U-Net was a commonly used VGG Perceptual Loss:
<math>
\mathcal{L}_{\text{VGG}} = \sum_{l} \frac{1}{N_l} \| \phi_l(I) - \phi_l(I') \|_2^2
</math>

Where:
* I and I': are recolored (model output) and ground recolored images respectively
* <math>\phi_l</math> is the l-th of the pre-trained VGG network

==== Unsupervised Methods ====
===== Conditional Autoencoder =====
As shown in Figure4, an unsupervised CNN-based encoder-decoder network was trained to reconstruct full recolored images with a CVD-aware color palette. The key to making this network align with the recoloring task was the loss functions. The loss functions we used to train this network were inspired from [2]. [[File:Ae.png|right|350px|thumb|Figure 4: Conditional Autoencoder architecture]]

The total loss function is given by:
<math>
\mathcal{L}_{\text{total}} = \alpha \cdot \mathcal{L}_{\text{naturalness}} + 2 \cdot (1 - \alpha) \cdot \mathcal{L}_{\text{contrast}}
</math>

Where:
<math>
\mathcal{L}_{\text{contrast}} = \beta \cdot \mathcal{L}_{\text{global}} + (2 - \beta) \cdot \mathcal{L}_{\text{local}}
</math>

The components of the loss functions are described below:

1. '''Global Contrast Loss''':
The global contrast loss ensures that the overall contrast of the recolored image is preserved. It is defined as
<math>
\mathcal{L}_{global} = \frac{1}{\|\omega\|} \sum_{<x, y> \in \epsilon \omega} \text{CL}(x, y)
</math>

2. '''Local Contrast Loss''':
The local contrast loss focuses on preserving the contrast within a small neighborhood around each pixel. <math>
\mathcal{L}_{l} = \frac{1}{N} \sum_{x=1}^{N} \sum_{y \in \omega_x} \frac{\text{CL}(x, y)}{\|\omega_x\|}
</math>

Note:

<math>
\text{CL}(x, y) = \|\hat{c}_x' - \hat{c}_y'\| - \|c_x - c_y\|
</math>

* x,y: Two distinct pixels in the image.
* cx and cy: CVD simulated colors of original image
* c^x′and c^y: CVD simulated colors of recolored image (model output)
* ||w||: Global (or large) window of image
* ||wx||: Local window or neighborhood around a pixel x

3. '''Naturalness Loss''':
The naturalness loss drives output image to have colors that are visually similar and close to natural distributions. <math>
\mathcal{L}_{\text{natural}} = 1 - \text{SSIM}(I', I)
</math>

Where:
* I(i), I'(i): Original and recolored images respectively

== Results ==
=== Mathematical based methods ===
{| class="wikitable"
|+ Table 1: Quantitative Evaluation Results for Mathematical Methods
! !! Method 1 !! Method 2 !! Method 3 !! Method 4
|-
! colspan="5" | Performance
|-
| Time/image || 0.2s || 1m13s || 4.4s || 1.6s
|-
! colspan="5" | SSIM Metrics
|-
| Original vs Recolored || 0.0066 || 0.9998 || 0.9988 || 0.9902
|-
| Original vs Original Simulated || 0.9985 || 0.9985 || 0.9985 || 0.9985
|-
| Recolored vs Recolored Simulated || 0.9565 || 0.9986 || 0.9986 || 0.9968
|-
! colspan="5" | TCC Metrics
|-
| Original vs Recolored || 0.4211 || 0.0001 || 0.0003 || 0.0005
|-
| Original vs Original Simulated || 0.0004 || 0.0003 || 0.0003 || 0.0003
|-
| Recolored vs Recolored Simulated || 0.0380 || 0.0003 || 0.0002 || 0.0005
|-
! colspan="5" | CD ΔE76 Metrics
|-
| Original vs Recolored || 57.4513 || 0.0217 || 0.0632 || 0.1057
|-
| Original vs Original Simulated || 0.0462 || 0.0462 || 0.0462 || 0.0462
|-
| Recolored vs Recolored Simulated || 8.4251 || 0.0458 || 0.0435 || 0.0578
|-
! colspan="5" | CIEDE2000 Metrics
|-
| Original vs Recolored || 41.2667 || 0.0229 || 0.0675 || 0.1312
|-
| Original vs Original Simulated || 0.0681 || 0.0681 || 0.0681 || 0.0681
|-
| Recolored vs Recolored Simulated || 6.9145 || 0.0671 || 0.0630 || 0.0838
|-
! colspan="5" | CIEDE94 Metrics
|-
| Original vs Recolored || 57.3637 || 0.0217 || 0.0630 || 0.1056
|-
| Original vs Original Simulated || 0.0461 || 0.0461 || 0.0461 || 0.0461
|-
| Recolored vs Recolored Simulated || 5.3878 || 0.0457 || 0.0434 || 0.0576
|-
! colspan="5" | D-CIELAB ΔEab Metrics
|-
| Original vs Recolored || 2.1314 || 3.8863 || 7.6867 || 8.0045
|-
| Original vs Original Simulated || 1.7209 || 1.7209 || 1.7209 || 1.7209
|-
| Recolored vs Recolored Simulated || 1.5926 || 1.9673 || 1.4363 || 2.4009
|}

=== Deep Learning based methods ===
The results focus on evaluating the performance of the above neural network architectures—Conditional Parallel RGB MLP, Deep U-Net, and Conditional Autoencoder. Quantitive metrics such as Structural Similarity Index (SSIM), total color contrast (TCC), Chromatic Difference (CD), and inference time were used to assess the effectiveness of the models provided in [1] and [2].

==== Qualitative Results ====
The recolored outputs were visually evaluated to determine their alignment with expected results. The 'expected' results for supervised mean how closely they resemble ground truth recolored image and for unsupervised method mean how much contrast and naturalness is observed in the CVD simulated recolored images compared to original.
The results and takeaways can be summarized as follows:

1. '''Conditional Parallel RGB MLP''': (Figure 5)
[[File:Mlp_res.png|right|400px|thumb|Figure 5 Conditional MLP: Model failure]]
* Recoloring was inconsistent, with visible artifacts in regions where spatial correlations were essential.
* The pixels seemed more discretized, suggesting that disentanglement was not very useful for this case (especially naturalness).
* Failed to preserve natural color transitions, particularly in complex images.
2. '''Conditional U-Net''': (Figure 6, 7)
[[File:Unet_res1.png|right|400px|thumb|Figure 6 Conditional U-Net: Model failure]]
[[File:Unet_res2.png|right|400px|thumb|Figure 7 Conditional U-Net: CVD Simulated examples]]
* Produced stable recoloring, preserving structural details.
* Initially showed improvement towards resembling ground truth, but over time started 'reconstructing' the colors of the original image.
* The CVD simulations of recolored versus original were similar or worse meaning that the model was not doing well for this task
* Sometimes it over-saturated some colors, affecting the visual appeal.
3. '''Conditional Autoencoder''': (Figure 8, 9)
[[File:ae_res1.png|right|400px|thumb|Figure 8 Conditional Autoencoder: Majority good results]]
[[File:ae_res1.png|right|400px|thumb|Figure 9 Conditional Autoencoder: Marginal or negative improvement + Blurriness]]
* Achieved smooth and natural recoloring, with fewer artifacts.
* Showed the highest contrast improvement among the three models.
* In some cases, hurt the contrast in the CVD simulated colors and in some there was marginal improvement in contrast.
* Blurriness in the recolored images was seen (possibly due to naturalness factor being more prioritized even though weight coefficients in the loss term favored contrast (alpha = 0.25, beta = 1.0)).

==== Quantitative Results ====
Based on the above qualitative results, we decided to score and evaluate metrics for comparison with related work only using the Conditional Autoencoder.
As mentioned above, the evaluation metrics are adapted from [1] and [2]. Please refer to the definitions in the paper, as we have used the same. On a high level, the three components are:
* SSIM: Measures the structural similarity between the original and recolored images, ensuring the structural integrity of the recolored image is maintained.
<math>
SSIM(X, Y) = \frac{(2\mu_X\mu_Y + c_1)(2\sigma_{XY} + c_2)}{(\mu_X^2 + \mu_Y^2 + c_1)(\sigma_X^2 + \sigma_Y^2 + c_2)}
</math>

* Total Color Contrast: Quantifies the visibility improvement between indistinguishable colors for CVD individuals.
<math>
TCC = \frac{1}{n_1} \sum_{(i,j) \in \Omega_1} |x_i - x_j|
+ \frac{1}{N \cdot n_2} \sum_{i=1}^{N} \sum_{j \in \Omega_2} |x_i - x_j|
</math>
* Chromatic Difference: Quantifies the perceptual differences in color before and after recoloring, ensuring enhanced distinguishability
<math>
CD(i) = \sqrt{\lambda (l_i' - l_i)^2 + (a_i' - a_i)^2 + (b_i' - b_i)^2}
</math>
(lamda is a constant, not wavelength and l,a,b represent LAB space coordinates of recolored (') and original respectively.)
* Inference Time: Determines the computational efficiency of the models.

The key results are in Table 2 and takeaways for the Conditional Autoencoder can be summarized as follows:

{| class="wikitable" style="text-align:center; width:30%; margin:auto;"
|-
! Metric
! Value
|-
| Inference Time
| 2.6 seconds/image
|-
| SSIM ("Structure")
| 0.8707
|-
| Total Color Contrast ("Distinguishability")
| 0.5771 / (~0.851)*
|-
| Chromatic Difference ("Color")
| 0.3521 / (~0.963)*
|+ '''Table 2: Quantitative Evaluation Results'''
|}

Note: * indicates results from paper [2] for protan/deutan whichever is larger.

* TCC and CD are good but not as good as paper [2] because they use optimize networks for each CVD type separately.
* Blurry (SSIM is not optimized for enough)
* Mixing CVD types in the same network needs to be more sophisticated

== Conclusions ==
Through our (many) experiments, we learned a couple of things:

1. '''Model Effectiveness''':
Among the models, the Conditional Autoencoder showed the best balance between enhancing color contrast and preserving naturalness. It improved the distinguishability of colors for CVD individuals while maintaining a smooth, visually appealing output. However, it produced slightly blurry images, which could be improved with better loss functions or refinement techniques. The Conditional U-Net was also effective in preserving structure and providing stable recoloring, but it required careful training to avoid overfitting. The Conditional Parallel RGB MLP, while computationally fast, lacked the ability to capture spatial relationships between pixels, making it unsuitable for this task.

2. '''Importance of Loss Functions''':
Designing appropriate loss functions was crucial for achieving the right balance between naturalness, contrast enhancement, and structural preservation. The global and local contrast losses significantly improved the visibility of recolored images, while the naturalness loss ensured that the outputs did not look artificial. Incorporating metrics like SSIM and Chromatic Difference into the evaluation also helped us better understand how well the models performed.

3. '''Challenges with Data''':
One of the biggest challenges was ensuring that the dataset effectively represented real-world scenarios for CVD individuals. Simulating CVD perceptions and generating recolored images that matched those perceptions required a well-defined pipeline. A more diverse dataset or additional user studies with CVD participants could help fine-tune the models further.

4. '''Computational Efficiency''':
While models like the Conditional Autoencoder and Conditional U-Net provided high-quality recoloring, their inference times were moderate, making them feasible for real-time applications. Optimizing these models further could make them more scalable for real-world use cases, such as accessibility tools in apps or websites.

5. '''What Worked and What Didn’t''':
* Worked: Contrast enhancement methods using local and global losses were effective in improving visibility for CVD individuals. Transformer-inspired loss functions borrowed from Swin architecture added robustness.
* Didn’t Work: Pixel-wise methods like the Conditional RGB MLP struggled due to their inability to handle spatial dependencies. Additionally, overfitting was a recurring issue in larger architectures without careful training.

6. '''Future Directions''':
* Better Loss Functions: Refining the loss functions to address issues like blurriness in outputs could further improve results.
* User Studies: Testing the models with real CVD participants would provide valuable insights and help validate the results.
* Model Optimization: Reducing the computational cost of high-performing models like the Conditional Autoencoder could make them more practical for deployment.
* Exploration of New Architectures: Trying newer methods, such as lightweight transformers or diffusion-based models, might enhance recoloring performance while maintaining efficiency.

While there’s still room for improvement, our models demonstrated the potential of deep learning in addressing the challenges faced by individuals with CVD. Our future work would focus on refining these methods and bringing them closer to practical, everyday applications.

== References ==
[1] Li, H., Zhang, L., Zhang, X., Zhang, M., Zhu, G., Shen, P., ... & Shah, S. A. A. (2020). Color vision deficiency datasets & recoloring evaluation using GANs. Multimedia Tools and Applications, 79, 27583-27614.

[2] Chen, L., Zhu, Z., Huang, W., Go, K., Chen, X., & Mao, X. (2024). Image recoloring for color vision deficiency compensation using Swin transformer. Neural Computing and Applications, 36(11), 6051-6066.

[3] Jiang, S., Liu, D., Li, D., & Xu, C. (2023). Personalized image generation for color vision deficiency population. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22571-22580).

[4] Huang, J.-B., Chen, C.-S., Jen, T.-C., & Wang, S.-J. (n.d.). Image recolorization for the colorblind [GitHub repository]. Retrieved December 12, 2024, from https://github.com/jbhuang0604/RecolorForColorblind

[5] Dietrich, J. (n.d.). Daltonize Python Package [GitHub repository]. Retrieved December 12, 2024, from https://github.com/joergdietrich/daltonize/blob/main/daltonize/daltonize.py

[6] Dougherty, B., & Wade, A. (2000). Vischeck. Retrieved December 12, 2024, from https://www.vischeck.com/

[7] Brettel, H., Viénot, F., & Mollon, J. D. (1997). Computerized simulation of color appearance for dichromats. Josa a, 14(10), 2647-2655.

[8] Zhu, Z., Toyoura, M., Go, K., Fujishiro, I., Kashiwagi, K., & Mao, X. (2019). Processing images for red–green dichromats compensation via naturalness and information-preservation considered recoloring. The Visual Computer, 35, 1053-1066.

[9] Zhu, Z., Toyoura, M., Go, K., Kashiwagi, K., Fujishiro, I., Wong, T. T., & Mao, X. (2021). Personalized image recoloring for color vision deficiency compensation. IEEE Transactions on Multimedia, 24, 1721-1734.

[10] Tsekouras, G. E., Rigos, A., Chatzistamatis, S., Tsimikas, J., Kotis, K., Caridakis, G., & Anagnostopoulos, C. N. (2021). A novel approach to image recoloring for color vision deficiency. Sensors, 21(8), 2740.

[11] Huang, J. B., Chen, C. S., Jen, T. C., & Wang, S. J. (2009, April). Image recolorization for the colorblind. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 1161-1164). IEEE.

[12] Color-Blindness.com. (n.d.). COBLIS - Color Blindness Simulator. Retrieved December 13, 2024, from https://www.color-blindness.com/coblis-color-blindness-simulator/

== Appendix I ==
* [https://github.com/rainasong/psych221-aut24-final-project.git Code]
* [https://drive.google.com/drive/folders/10WMXPbtpV7Hy5_qBA_TCEbW-kCpj1D7v Dataset]

=== Additional results ===
1. '''Recolored Images - Conditional Autoencoder'''
<div style="display: inline; width: 220px; float: center;">
[[File:eb_1.png|400 px|Wikipedia encyclopedia]][[File:eb_2.png|400 px]] </div>

2. '''Loss curves'''
<div style="display: inline; width: 800px; float: center;">
[[File:loss_ae.png|300 px|center|thumb|Losses - Conditional Autoencoder]][[File:loss_unet.png|300 px|thumb|center|Losses - Conditional U-Net]][[File:loss_mlp.png|300 px|center|thumb|Losses - Conditional MLP]]</div>

== Appendix II ==
'''Ishikaa''':
* Training, evaluation and visualization for all deep learning methods (MLP, U-Net and Autoencoder)
* GMM recoloring method in Python & adding severity index
* 'Ground Truth' dataset creation and logging
* AWS Compute setup & configuration
* Written Report & Presentation

'''Raina''':

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-13T09:59:23Z

Rainas: /* 3. Edit Propagation */

== Introduction ==
Color Vision Deficiency (CVD) affects approximately 350 million individuals worldwide, impairing their ability to distinguish certain colors. Image recoloring for individuals with CVDs has been a well-researched area, with numerous attempts aimed at creating images that make colors more distinguishable for those with CVDs. CVDs are typically classified into three main types: protanopia (difficulty perceiving red), deuteranopia (difficulty perceiving green), and tritanopia (difficulty perceiving blue). These conditions arise due to the absence or malfunction of certain types of cone cells in the retina, which are responsible for color vision. For instance, the lack or defect of red or green cones leads to protanopia or deuteranopia, respectively. This can result in partial or complete loss of perception of specific colors. Moreover, individuals with CVD can be dichromats (completely missing one type of cone cell), or anomalous trichromats (having all three types of cones but with altered sensitivity), causing milder color perception issues. Most research has focused on protanopia and deuteranopia, as these conditions are more prevalent, and only a few consider different severity levels.

In this work, we aim to consider all three types of CVDs, taking into account varying levels of severity for personalization. We explore existing methods in the field and experiment through two main approaches: mathematical transformations and deep learning techniques. We will start by reviewing current advancements in these two domains, followed by presenting our experiments and results. Evaluations of each method will be provided, leading to a discussion of our findings and outlining potential directions for future work.

The motivation for this work arose from a personal experience. While admiring a rainbow, a phenomenon I find among the most beautiful in the world, with a friend who has deuteranopia, I realized that they were unable to distinguish the vibrant array of colors. This experience highlighted the emotional and perceptual gap caused by CVD, inspiring the goal of this project: to develop personalized and efficient tools that enhance color perception for individuals with CVDs. Ultimately, we aim to enable those with CVDs to enjoy the same vivid experiences, such as the beauty of a rainbow, experienced by those with normal color vision.

== Background ==
In recent years, numerous methods have been developed to recolor images for individuals with CVDs, ranging from traditional mathematical approaches to advanced deep learning techniques. This section focuses on the prominent recent works in these two categories.

=== Mathematical-based methods ===
Mathematical approaches to image recoloring for individuals with CVDs have been extensively developed to enhance color discrimination while trying to preserve the natural appearance of images. These methods typically involve color space transformations, optimization techniques, and perceptual modeling to achieve their objectives.

==== Daltonization ====
Daltonization enhances images for individuals with CVD by correcting colors based on the simulated deficiency. The process involves comparing the original LMS values with the simulated deficient values to compute the error:
<math display="block">
\text{Error}_{\text{LMS}} = \text{LMS}_{\text{original}} - \text{LMS}_{\text{simulated}}
</math>

The error is then mapped back to the RGB space using a correction matrix because the error contains the information that dichromats cannot see, and the correction matrix rotates it to a part of the spectrum that they can see. For example, the correction matrix, as implemented in tools like Daltonize [5] and Vischeck [6], is:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

The corrected RGB values are added back to the original LMS values to generate a daltonized image that improves contrast for CVD viewers.

==== Optimization-based Method ====
Zhu et al. [8] introduced an optimization-based recoloring framework for red-green dichromacy, aiming to balance naturalness and contrast. The framework minimizes a total loss function defined as:

<math display="block"> E = \beta E_{\text{nat}} + E_{\text{cont}} </math>

where <math>\beta</math> is a scalar weight that controls the trade-off between the two objectives: naturalness preservation (<math>E_{\text{nat}}</math>) and contrast enhancement (<math>E_{\text{cont}}</math>).

The naturalness term, <math>E_{\text{nat}}</math>, ensures that the recolored image closely resembles the original image for CVD viewers by minimizing perceptual differences:

<math display="block"> E_{\text{nat}} = \sum_{i=1}^N \| c_i^+ - c_i \|^2, </math>

where:
* <math>N</math> is the total number of pixels in the image,
* <math>c_i</math> is the original color of the <math>i</math>-th pixel,
* <math>c_i^+</math> is the recolored value of the <math>i</math>-th pixel,
* <math>\| c_i^+ - c_i \|</math> is the Euclidean distance, measuring the perceptual difference between the original and recolored colors.

The contrast term, <math>E_{\text{cont}}</math>, enhances the distinguishability of colors in the recolored image by minimizing changes in color contrast:

<math display="block"> E_{\text{cont}} = \sum_{i \neq j} \| (c_i^+ - c_j^+) - (c_i - c_j) \|^2, </math>

where:
* <math>(c_i^+ - c_j^+)</math> is the perceived color difference between pixels <math>i</math> and <math>j</math> after recoloring,
* <math>(c_i - c_j)</math> is the original color difference,
* <math>\| (c_i^+ - c_j^+) - (c_i - c_j) \|</math> represents the deviation in color contrast before and after recoloring.

To address the limitations of this approach, Zhu et al. [9] proposed a degree-adaptable framework incorporating a transformation matrix <math>T</math> that simulates CVD perception. The transformation matrix is defined as:

<math display="block"> T = \begin{bmatrix} t_{11} & t_{12} & t_{13} \\ t_{21} & t_{22} & t_{23} \\ t_{31} & t_{32} & t_{33} \end{bmatrix}, </math>

where <math>t_{ij}</math> are the elements representing the relationships between the original and perceived LMS (Long, Medium, Short wavelength) cone responses for individuals with CVD.

The degree-adaptable loss function extends the optimization by adjusting weights based on perceptual importance, defined as:

<math display="block"> E = \beta \sum_{i=1}^N \alpha_i \| T(c_i^+ - c_i) \|^2 + \sum_{i \neq j} \| T(c_i^+ - c_j^+) - T(c_i - c_j) \|^2. </math>

Here:
* <math>\alpha_i</math> assigns weights to each pixel, prioritizing the preservation of colors with smaller perception errors,
* <math>\| T(c_i^+ - c_i) \|</math> measures the perceptual difference after recoloring,
* <math>\| T(c_i^+ - c_j^+) - T(c_i - c_j) \|</math> quantifies the deviation in color contrast under CVD simulation.

This framework improves both contrast and personalization but requires further optimization for real-time performance.

==== Confusion lines based Method ====
Tsekouras et al. [10] proposed a novel image recoloring approach for individuals with protanopia and deuteranopia, focusing on improving color naturalness and enhancing contrast. Their framework consists of four modules, with a key focus on shifting confusing colors along confusion lines in the CIE 1931 chromaticity diagram.

The process begins with fuzzy clustering, which identifies representative colors (key colors) from the input image. These key colors are then analyzed on the chromaticity diagram, where confusion lines—paths representing colors indistinguishable by individuals with CVD—serve as the basis for recoloring. Confusion lines are defined using the copunctal point of the missing cone type and another reference point:

<math display="block">
d(v, L) = \frac{\left|(x_{cp} - x_0)(y_0 - y_v) - (x_0 - x_v)(y_{cp} - y_0)\right|}{\sqrt{(x_{cp} - x_0)^2 + (y_{cp} - y_0)^2}},
</math>

where:
* <math display="inline">v = (x_v, y_v)</math> is the chromaticity coordinate of the color,
* <math display="inline">L</math> is the confusion line passing through the copunctal point <math display="inline">(x_{cp}, y_{cp})</math> and another reference point <math display="inline">(x_0, y_0)</math>,
* <math display="inline">d(v, L)</math> measures the perpendicular distance from the point <math display="inline">v</math> to the confusion line <math display="inline">L</math>.

Confusing colors, identified as key colors lying on occupied confusion lines, are iteratively shifted to the nearest non-occupied confusion lines to enhance discriminability for CVD viewers. High-ranking colors, determined by their prominence in image clusters, are shifted to the nearest unoccupied confusion lines. This reallocation ensures that these colors are distinguishable to viewers with CVD while minimizing disruption to the image's overall color harmony.

After shifting, the luminance of the recolored key colors is optimized using a regularized objective function to balance naturalness and contrast:
<math display="block">
E = (E_1 + E_2) + \lambda E_3,
</math>

where:
* <math display="inline">E</math> is the total loss,
* <math display="inline">\lambda</math> is a weight parameter controlling the trade-off between contrast enhancement and naturalness preservation.

The first term, <math display="inline">E_1</math>, measures contrast enhancement for normal trichromats:

<math display="block">
E_1 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - b_j\| - \|f_D(a_{i,\text{rec}}) - f_D(b_j)\| \right|,
</math>

where:
* <math display="inline">n_A</math> and <math display="inline">n_B</math> are the number of key colors in clusters <math display="inline">A</math> and <math display="inline">B</math>, respectively,
* <math display="inline">a_i</math> is the chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">b_j</math> is the chromaticity of the <math display="inline">j</math>-th key color in cluster <math display="inline">B</math>,
* <math display="inline">f_D</math> is a function simulating the dichromatic vision of individuals with color vision deficiencies,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color.

The second term, <math display="inline">E_2</math>, measures contrast enhancement for dichromats:

<math display="block">
E_2 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - a_j\| - \|f_D(a_{i,\text{rec}}) - f_D(a_{j,\text{rec}})\| \right|,
</math>

where:
* <math display="inline">a_i</math> and <math display="inline">a_j</math> are the chromaticities of the <math display="inline">i</math>-th and <math display="inline">j</math>-th key colors in cluster <math display="inline">A</math>,
* <math display="inline">f_D(a_{i,\text{rec}})</math> simulates the dichromatic perception of the recolored chromaticity <math display="inline">a_{i,\text{rec}}</math>.

The third term, <math display="inline">E_3</math>, preserves the naturalness of the recolored image:

<math display="block">
E_3 = \frac{1}{n_A} \sum_{i=1}^{n_A} \|a_i - a_{i,\text{rec}}\|,
</math>

where:
* <math display="inline">a_i</math> is the original chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">\|a_i - a_{i,\text{rec}}\|</math> is the Euclidean distance between the original and recolored chromaticities, measuring how much the naturalness is preserved.

This method significantly enhances the contrast and naturalness of recolored images by leveraging confusion line geometry and regularized optimization. However, challenges remain in achieving real-time performance and handling cases where shifting may distort the aesthetic quality of the image.

==== GMM-based Method ====
Huang et al. [11] proposed an efficient and effective re-coloring algorithm for individuals with CVD using a Gaussian Mixture Model (GMM) to represent color distributions. The algorithm comprises four main steps: feature extraction, clustering using GMM, optimization of Gaussian components, and interpolation for recoloring.

Step 1 - Feature Extraction:
Each pixel in the input image is represented in the CIEL*a*b* color space, which approximates perceptual differences using the Euclidean distance between colors. The color feature vector <math display="inline">x</math> is used as input for clustering.

Step 2 - Clustering via GMM:
The color distribution of the image is modeled using a GMM with <math display="inline">K</math> Gaussian components:
<math display="block">
p(x|\Theta) = \sum_{i=1}^K \omega_i G_i(x|\theta_i),
</math>
where:
* <math display="inline">\Theta</math> is the parameter set containing all weights, means, and covariance matrices,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian,
* <math display="inline">G_i(x|\theta_i)</math> is the 3D normal distribution with parameters <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix).

Step 3 - Optimization:
To ensure color distinguishability for CVD viewers, the algorithm adjusts the mean vector of each Gaussian component using an optimization function that preserves the symmetric Kullback-Leibler (KL) divergence:
<math display="block">
D_{sKL}(G_i, G_j) = D_{KL}(G_i \| G_j) + D_{KL}(G_j \| G_i),
</math>
where:
* <math display="inline">D_{KL}(G_i \| G_j)</math> measures the dissimilarity between two Gaussian distributions <math display="inline">G_i</math> and <math display="inline">G_j</math>.

The optimization aims to preserve the contrast perceived by CVD viewers while maintaining naturalness. Weights are assigned to Gaussian components based on the perceptual importance of colors:
<math display="block">
\lambda_i = \frac{\sum_{j=1}^N \alpha_j p(i|x_j, \Theta)}{\sum_{k=1}^K \sum_{j=1}^N \alpha_j p(k|x_j, \Theta)},
</math>
where:
* <math display="inline">\alpha_j = \|x_j - \text{Sim}(x_j)\|</math> is the perceptual error of the <math display="inline">j</math>-th color feature when simulated for CVD,
* <math display="inline">\text{Sim}(\cdot)</math> is the simulation function for CVD perception.

Step 4 - Interpolation for Recoloring:
After optimizing the Gaussians, the mapping function <math display="inline">M_i(\cdot)</math> relocates the mean vectors while maintaining covariance matrices. Interpolation ensures smooth transitions between recolored regions:
<math display="block">
T(x_j)_H = x_j^H + \sum_{i=1}^K p(i|x_j, \Theta) (M_i(\mu_i)_H - \mu_i^H),
</math>
where:
* <math display="inline">T(x_j)_H</math> is the hue adjustment for the <math display="inline">j</math>-th color,
* <math display="inline">M_i(\mu_i)_H</math> is the mapped hue of the <math display="inline">i</math>-th Gaussian's mean.

While the GMM-based approach effectively models color distributions and enhances the contrast of recolored images significantly, it has limitations:
* The accuracy of recoloring depends on the choice of <math display="inline">K</math>, which may vary for different images.
* The method assumes diagonal covariance matrices for computational efficiency, which may oversimplify real-world color distributions. Sometimes the colors in the recolored images are not very natural.
* The high computational complexity in the optimization step of this algorithm may be difficult for real-time applications.

=== Deep Learning based methods ===
Conventional methods for recoloring, including optimization-based approaches (as discussed above), fail to generalize well across varying severity levels and CVD types. While these methods improve color differentiation, they frequently compromise naturalness or require extensive computational resources, making them less suitable for real-time, efficient, personalized applications.

==== GAN-Based Recoloring for CVD ====

In [1] GANs (Generative Adversarial Networks) was explored for recoloring, with a backbone Pix2Pix-GAN, Cycle-GAN, and Bicycle-GAN structure showing promising results. These models are generate creative recolored images by learning mappings between normal and CVD-affected color spaces. However, this and existing GAN approaches struggle with balancing naturalness and contrast. This specific reference also requires paired datasets (since it is adapted from style transfer), making it computationally intensive and less suitable for personalization.

==== Swin Transformer Recoloring ====

The authors in [2] introduced a hierarchical vision transformer (SWIN) architecture that processes images through shifted windows, effectively capturing both local and global contextual information. In computer vision, this design generally allows efficient handling of high-resolution images and has been applied to various tasks, including image classification and object detection. Despite its robust performance, this architecture is still computationally intensive and does not inherently account for the specific needs of CVD individuals, as it lacks mechanisms for personalized color adjustments.

==== Personalized CVD-GAN ====

To cater to the diverse needs of the CVD population, the Personalized CVD-GAN [3] was developed. This model generates images that are not only CVD-friendly but also tailored to individual degrees of color vision deficiency. By disentangling color representations using a unique triple-latent structure in their method, continuous personalization was possible to adjust images according to specific CVD severities. While effective, this approach is computationally demanding, making it less practical for real-time applications. In our experiment, it took around 18 days for one epoch (or one iteration over the entire dataset).

Thus, existing methods either lack personalization or are too resource-intensive for widespread use.

== Methods ==
We aim to find effective and efficient ways to recolor images for people with CVD with the personalization of different severity levels. We start by exploring existing methods and identifying opportunities for improvement. Since mathematical-based approaches provide a solid foundation and are well-documented, we began our experiments by testing these methods, as described in the background. We later extended our exploration to deep learning based methods.

=== Mathematical based ===
We explored four main methods, building on the foundational work discussed in the background section.

==== Method 1: Daltonization as a baseline ====
We started with the relatively intuitive Daltonization method, where we adjusted the colors in an image to compensate for color vision deficiencies by simulating how the colors appear to individuals with CVD. This involves computing the difference between the original and simulated color perception in the LMS (Long, Medium, Short wavelength) color space. The calculated error is then corrected and mapped back to the RGB space using a transformation matrix, resulting in a recolored image that enhances color differentiation for viewers with CVD.

The simulation of CVDs relies on the physiology of human vision, particularly the responses of the Long (L), Medium (M), and Short (S) wavelength-sensitive cones in the retina. The LMS color space is derived from the spectral sensitivities of these cones, making it an ideal framework for modeling human color perception.

To simulate CVD, we first transformed colors in RGB color space into the LMS color space using the following linear transformation matrix based on Stockman and Sharpe’s cone fundamentals:
<math display="block">
T_{\text{RGB-to-LMS}} = \begin{bmatrix}
0.3904725 & 0.54990437 & 0.00890159 \\
0.07092586 & 0.96310739 & 0.00135809 \\
0.02314268 & 0.12801221 & 0.93605194
\end{bmatrix}
</math>

For individuals with CVD, the missing cone’s response is replaced by a weighted combination of the remaining two cones. This approach, introduced by Brettel, Viénot, and Mollon (1997) [7], uses specific coefficients derived from cone sensitivities. For example, in protanopia (L-cone deficiency), the L-cone response is approximated using the M- and S-cone responses as:
<math display="block">
L_{\text{simulated}} = 0 \cdot L + 0.90822864 \cdot M + 0.008192 \cdot S
</math>

For deuteranopia (M-cone deficiency), the M-cone is replaced as:
<math display="block">
M_{\text{simulated}} = 1.10104433 \cdot L + 0 \cdot M - 0.00901975 \cdot S
</math>

For tritanopia (S-cone deficiency), the S-cone is replaced as:
<math display="block">
S_{\text{simulated}} = -0.15773032 \cdot L + 1.19465634 \cdot M + 0 \cdot S
</math>

These transformations allow accurate simulation of the perceptual experience of individuals with CVD. (The numbers are derived from [5]).

The error between the original and simulated is then mapped into the RGB color space using a deficiency-specific correction matrix, which adjusts the image to enhance contrast and recover lost color differences. The predefined correction matrix is applied to the error in RGB space, transforming it back into LMS space for final adjustments. The corrected LMS values are added back to the original values, producing a recolored image that improves visual accessibility for viewers with CVD. This approach uses the Daltonize-inspired correction matrix:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

==== Method 2: Optimizing Objective Function ====
To improve the results from the Daltonization method, we designed a framework inspired by methods discussed in the background, incorporating dominant color extraction, optimization-based recoloring, and edit propagation. This approach aims to find a balance between the naturalness and contrast while compensating colors that are not visible for corresponding CVD types.

===== 1. Extraction of Dominant Colors =====
We begin by extracting the dominant colors from the input image using fuzzy clustering via a MiniBatch K-means algorithm. This step identifies a reduced set of representative colors that capture the primary color information in the image:
<math display="block">
\mathbf{C} = \{\mathbf{c}_1, \mathbf{c}_2, \ldots, \mathbf{c}_N\},
</math>
where <math display="inline">N</math> represents the number of clusters, and <math display="inline">\mathbf{c}_i</math> represents the centroid of the <math display="inline">i</math>-th cluster.

===== 2. Optimization-Based Recoloring =====
Once the dominant colors are extracted, we apply an optimization process to adjust these colors. The optimization uses the formulas mentioned in, and aims to balance two key objectives:

1. Naturalness Preservation: Ensures the recolored image minimally deviates from the original.
<math display="block">
E_{\text{nat}} = \sum_{i=1}^N \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_i^{\text{original}}) \|^2,
</math>
where <math display="inline">\mathbf{T}</math> is the transformation matrix based on the severity and type of CVD, and <math display="inline">\mathbf{c}_i^{\text{original}}</math> is the original color.

2. Contrast Enhancement: Improves the differentiation of colors for individuals with CVD:
<math display="block">
E_{\text{cont}} = \sum_{i=1}^N \sum_{j>i} \left( \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_j) \|^2 - \| \mathbf{c}_i^{\text{original}} - \mathbf{c}_j^{\text{original}} \|^2 \right)^2.
</math>

The total objective function combines these two terms:
<math display="block">
E = \beta E_{\text{nat}} + E_{\text{cont}},
</math>
where <math display="inline">\beta</math> controls the trade-off between naturalness and contrast.

Optimization is performed using the L-BFGS-B algorithm to ensure efficient convergence under bounded constraints.

The transformation matrices for each type of CVD are the following, which are based on [12]:

<div style="text-align:center;">
<math display="inline">
T_{\text{Protanopia}} = \begin{bmatrix} 0.566 & 0.558 & 0 \\ 0.433 & 0.442 & 0.242 \\ 0 & 0 & 0.758 \end{bmatrix},
</math>
<math display="inline">
T_{\text{Deuteranopia}} = \begin{bmatrix} 0.625 & 0.7 & 0 \\ 0.375 & 0.3 & 0.3 \\ 0 & 0 & 0.7 \end{bmatrix},
</math>
<math display="inline">
T_{\text{Tritanopia}} = \begin{bmatrix} 0.95 & 0 & 0 \\ 0.05 & 0.433 & 0 \\ 0 & 0.567 & 1 \end{bmatrix}.
</math>
</div>

===== 3. Edit Propagation =====
After optimizing the dominant colors, we propagate these edits across the entire image to ensure smooth transitions. This propagation step leverages the CIE-Lab color space, which is perceptually uniform, meaning that the Euclidean distance in this space correlates well with human color perception. The process begins by mapping the original image and the optimized dominant colors into the Lab color space. In this space, the differences between the original and recolored dominant colors are computed to capture the adjustments made during the optimization step:
<math display="block">
\Delta L^* = \text{griddata}(\mathbf{c}^{\text{original}}, \mathbf{c}^{\text{recolored}} - \mathbf{c}^{\text{original}}, \mathbf{I}),
</math>
where <math display="inline">\mathbf{I}</math> represents the pixel values in the Lab color space. Once the interpolated changes are computed, they are applied to the Lab representation of the original image. Finally, the adjusted Lab values are converted back to the RGB color space to reconstruct the recolored image.

=== Deep Learning based ===

==== Task Overview ====
Given an input RGB image and a label for the user (as shown in the figure), we want a deep learning model to output a recolored RGB image that is specific to that user. More details on inputs and outputs are discussed in further sections but an overview is shown in Figure 1. All of the code was done in Python using a deep learning framework called [https://pytorch.org PyTorch]
[[File:Io.png|right|thumb|200px|Figure 1: Dataset]]

==== Types ====
1. ''' Supervised methods ''':
These are deep learning models that require a 'ground truth' recolored image for the neural network to learn recolorization. While these methods are simple, easy to train and integrate the user label, they require an already present ground truth comparison of expected output.

2. ''' Unsupervised methods ''':
These models are trained without a ground truth and can also encode user label information while training. They are generally better at generating more natural images, but they require more compute and sophisticated model architectures or loss functions for the recoloring task

==== Dataset ====
The dataset used for this project was constructed specifically to address the challenges of recoloring images for individuals with color vision deficiency (CVD). We first gathered an open-source RGB image dataset from [2]. To improve the capability of the proposed model to enhance the contrast between CVD-indistinguishable color
pairs, in their study, they created a new dataset consisting of 141,000 pictures of both natural scenes and artificial images containing
CVD-confusing colors without labels. To generate labels (and ground truth recolored images for supervised methods), we randomly sampled 15,000 images and recolored by simulating random labels for severity and type of CVD. The recoloring for ground truth images was done using a [https://github.com/jbhuang0604/RecolorForColorblind/tree/master MATLAB script] (adapted to Python) from [4]. Note: The open-source tools used in the Python version for the recoloring script were [https://scikit-image.org Scikit-Image], [https://scipy.org Scipy] and [https://python-colormath.readthedocs.io/en/latest/ Colormath].

As shown in Figure 1, each sample in the dataset consists of:

1. ''' Original RGB Image''' : High-resolution images, resized to <code> 256x256</code> pixels and normalized to <code>[0,1]</code> range, representing the standard color space.

2. ''' CVD Labels ''' : Condition labels encoded as <code>severity * [protan, deutan]</code>, where severity ranges from 0.1 to 1.0. For example, a label <code>[0.6, 0]</code> corresponds to protanopia at 60% severity.

Data augmentation techniques such as random rotations, crops, and brightness adjustments were applied to expand the dataset, ensuring robust model generalization across diverse scenarios.

==== Supervised Methods ====
===== Conditional Parallel RGB MLP =====
[[File:mlp.png|right|thumb|Figure 2: Conditional MLP architecture]]
As shown in Figure 2, the model predicts the R, G, and B channels separately using an independent multi-layer perceptron (MLP) for each channel. The input image is concatenated with the label encoding along the channel dimension and is passed to 3 parallel MLPs simultaneously. These parallel networks are learned to predicted R, G, B channels of a recolored image based on given ground truth. The outputs from each of these networks are concatenated to produce the recolored RGB image of same spatial dimensions as input. Essentially, each channel is disentangled, enabling targeted adjustments.

The loss function used to train was pixel wise, mean-squared error loss:
<math>
\mathcal{L}_{\text{MSE}} = \frac{1}{N} \sum_{p=1}^{N} \left( I(p) - I'(p) \right)^2
</math>

Where:
* I, I': Recolored (model output) image and ground truth recolored image respectively
* p: Image index
* N: Total number of images

===== Conditional U-Net =====
In a similar fashion of inputs, a convolutional neural network (CNN)-based U-Net architecture was tested to generate a full recolored image as output. The conditional inputs here affect both the encoder and decoder. [[File:Unet condtional.png|right|thumb|Figure 3: Conditional U-Net architecture]]
U-Nets are widely used in computer vision tasks and are very robust to new tasks as well. The architecture we adopted is shown in Figure 3.
The loss function used to train the U-Net was a commonly used VGG Perceptual Loss:
<math>
\mathcal{L}_{\text{VGG}} = \sum_{l} \frac{1}{N_l} \| \phi_l(I) - \phi_l(I') \|_2^2
</math>

Where:
* I and I': are recolored (model output) and ground recolored images respectively
* <math>\phi_l</math> is the l-th of the pre-trained VGG network

==== Unsupervised Methods ====
===== Conditional Autoencoder =====
As shown in Figure4, an unsupervised CNN-based encoder-decoder network was trained to reconstruct full recolored images with a CVD-aware color palette. The key to making this network align with the recoloring task was the loss functions. The loss functions we used to train this network were inspired from [2]. [[File:Ae.png|right|350px|thumb|Figure 4: Conditional Autoencoder architecture]]

The total loss function is given by:
<math>
\mathcal{L}_{\text{total}} = \alpha \cdot \mathcal{L}_{\text{naturalness}} + 2 \cdot (1 - \alpha) \cdot \mathcal{L}_{\text{contrast}}
</math>

Where:
<math>
\mathcal{L}_{\text{contrast}} = \beta \cdot \mathcal{L}_{\text{global}} + (2 - \beta) \cdot \mathcal{L}_{\text{local}}
</math>

The components of the loss functions are described below:

1. '''Global Contrast Loss''':
The global contrast loss ensures that the overall contrast of the recolored image is preserved. It is defined as
<math>
\mathcal{L}_{global} = \frac{1}{\|\omega\|} \sum_{<x, y> \in \epsilon \omega} \text{CL}(x, y)
</math>

2. '''Local Contrast Loss''':
The local contrast loss focuses on preserving the contrast within a small neighborhood around each pixel. <math>
\mathcal{L}_{l} = \frac{1}{N} \sum_{x=1}^{N} \sum_{y \in \omega_x} \frac{\text{CL}(x, y)}{\|\omega_x\|}
</math>

Note:

<math>
\text{CL}(x, y) = \|\hat{c}_x' - \hat{c}_y'\| - \|c_x - c_y\|
</math>

* x,y: Two distinct pixels in the image.
* cx and cy: CVD simulated colors of original image
* c^x′and c^y: CVD simulated colors of recolored image (model output)
* ||w||: Global (or large) window of image
* ||wx||: Local window or neighborhood around a pixel x

3. '''Naturalness Loss''':
The naturalness loss drives output image to have colors that are visually similar and close to natural distributions. <math>
\mathcal{L}_{\text{natural}} = 1 - \text{SSIM}(I', I)
</math>

Where:
* I(i), I'(i): Original and recolored images respectively

== Results ==
=== Mathematical based methods ===
{| class="wikitable"
|+ Table 1: Quantitative Evaluation Results for Mathematical Methods
! !! Method 1 !! Method 2 !! Method 3 !! Method 4
|-
! colspan="5" | Performance
|-
| Time/image || 0.2s || 1m13s || 4.4s || 1.6s
|-
! colspan="5" | SSIM Metrics
|-
| Original vs Recolored || 0.0066 || 0.9998 || 0.9988 || 0.9902
|-
| Original vs Original Simulated || 0.9985 || 0.9985 || 0.9985 || 0.9985
|-
| Recolored vs Recolored Simulated || 0.9565 || 0.9986 || 0.9986 || 0.9968
|-
! colspan="5" | TCC Metrics
|-
| Original vs Recolored || 0.4211 || 0.0001 || 0.0003 || 0.0005
|-
| Original vs Original Simulated || 0.0004 || 0.0003 || 0.0003 || 0.0003
|-
| Recolored vs Recolored Simulated || 0.0380 || 0.0003 || 0.0002 || 0.0005
|-
! colspan="5" | CD ΔE76 Metrics
|-
| Original vs Recolored || 57.4513 || 0.0217 || 0.0632 || 0.1057
|-
| Original vs Original Simulated || 0.0462 || 0.0462 || 0.0462 || 0.0462
|-
| Recolored vs Recolored Simulated || 8.4251 || 0.0458 || 0.0435 || 0.0578
|-
! colspan="5" | CIEDE2000 Metrics
|-
| Original vs Recolored || 41.2667 || 0.0229 || 0.0675 || 0.1312
|-
| Original vs Original Simulated || 0.0681 || 0.0681 || 0.0681 || 0.0681
|-
| Recolored vs Recolored Simulated || 6.9145 || 0.0671 || 0.0630 || 0.0838
|-
! colspan="5" | CIEDE94 Metrics
|-
| Original vs Recolored || 57.3637 || 0.0217 || 0.0630 || 0.1056
|-
| Original vs Original Simulated || 0.0461 || 0.0461 || 0.0461 || 0.0461
|-
| Recolored vs Recolored Simulated || 5.3878 || 0.0457 || 0.0434 || 0.0576
|-
! colspan="5" | D-CIELAB ΔEab Metrics
|-
| Original vs Recolored || 2.1314 || 3.8863 || 7.6867 || 8.0045
|-
| Original vs Original Simulated || 1.7209 || 1.7209 || 1.7209 || 1.7209
|-
| Recolored vs Recolored Simulated || 1.5926 || 1.9673 || 1.4363 || 2.4009
|}

=== Deep Learning based methods ===
The results focus on evaluating the performance of the above neural network architectures—Conditional Parallel RGB MLP, Deep U-Net, and Conditional Autoencoder. Quantitive metrics such as Structural Similarity Index (SSIM), total color contrast (TCC), Chromatic Difference (CD), and inference time were used to assess the effectiveness of the models provided in [1] and [2].

==== Qualitative Results ====
The recolored outputs were visually evaluated to determine their alignment with expected results. The 'expected' results for supervised mean how closely they resemble ground truth recolored image and for unsupervised method mean how much contrast and naturalness is observed in the CVD simulated recolored images compared to original.
The results and takeaways can be summarized as follows:

1. '''Conditional Parallel RGB MLP''': (Figure 5)
[[File:Mlp_res.png|right|400px|thumb|Figure 5 Conditional MLP: Model failure]]
* Recoloring was inconsistent, with visible artifacts in regions where spatial correlations were essential.
* The pixels seemed more discretized, suggesting that disentanglement was not very useful for this case (especially naturalness).
* Failed to preserve natural color transitions, particularly in complex images.
2. '''Conditional U-Net''': (Figure 6, 7)
[[File:Unet_res1.png|right|400px|thumb|Figure 6 Conditional U-Net: Model failure]]
[[File:Unet_res2.png|right|400px|thumb|Figure 7 Conditional U-Net: CVD Simulated examples]]
* Produced stable recoloring, preserving structural details.
* Initially showed improvement towards resembling ground truth, but over time started 'reconstructing' the colors of the original image.
* The CVD simulations of recolored versus original were similar or worse meaning that the model was not doing well for this task
* Sometimes it over-saturated some colors, affecting the visual appeal.
3. '''Conditional Autoencoder''': (Figure 8, 9)
[[File:ae_res1.png|right|400px|thumb|Figure 8 Conditional Autoencoder: Majority good results]]
[[File:ae_res1.png|right|400px|thumb|Figure 9 Conditional Autoencoder: Marginal or negative improvement + Blurriness]]
* Achieved smooth and natural recoloring, with fewer artifacts.
* Showed the highest contrast improvement among the three models.
* In some cases, hurt the contrast in the CVD simulated colors and in some there was marginal improvement in contrast.
* Blurriness in the recolored images was seen (possibly due to naturalness factor being more prioritized even though weight coefficients in the loss term favored contrast (alpha = 0.25, beta = 1.0)).

==== Quantitative Results ====
Based on the above qualitative results, we decided to score and evaluate metrics for comparison with related work only using the Conditional Autoencoder.
As mentioned above, the evaluation metrics are adapted from [1] and [2]. Please refer to the definitions in the paper, as we have used the same. On a high level, the three components are:
* SSIM: Measures the structural similarity between the original and recolored images, ensuring the structural integrity of the recolored image is maintained.
<math>
SSIM(X, Y) = \frac{(2\mu_X\mu_Y + c_1)(2\sigma_{XY} + c_2)}{(\mu_X^2 + \mu_Y^2 + c_1)(\sigma_X^2 + \sigma_Y^2 + c_2)}
</math>

* Total Color Contrast: Quantifies the visibility improvement between indistinguishable colors for CVD individuals.
<math>
TCC = \frac{1}{n_1} \sum_{(i,j) \in \Omega_1} |x_i - x_j|
+ \frac{1}{N \cdot n_2} \sum_{i=1}^{N} \sum_{j \in \Omega_2} |x_i - x_j|
</math>
* Chromatic Difference: Quantifies the perceptual differences in color before and after recoloring, ensuring enhanced distinguishability
<math>
CD(i) = \sqrt{\lambda (l_i' - l_i)^2 + (a_i' - a_i)^2 + (b_i' - b_i)^2}
</math>
(lamda is a constant, not wavelength and l,a,b represent LAB space coordinates of recolored (') and original respectively.)
* Inference Time: Determines the computational efficiency of the models.

The key results are in Table 2 and takeaways for the Conditional Autoencoder can be summarized as follows:

{| class="wikitable" style="text-align:center; width:30%; margin:auto;"
|-
! Metric
! Value
|-
| Inference Time
| 2.6 seconds/image
|-
| SSIM ("Structure")
| 0.8707
|-
| Total Color Contrast ("Distinguishability")
| 0.5771 / (~0.851)*
|-
| Chromatic Difference ("Color")
| 0.3521 / (~0.963)*
|+ '''Table 2: Quantitative Evaluation Results'''
|}

Note: * indicates results from paper [2] for protan/deutan whichever is larger.

* TCC and CD are good but not as good as paper [2] because they use optimize networks for each CVD type separately.
* Blurry (SSIM is not optimized for enough)
* Mixing CVD types in the same network needs to be more sophisticated

== Conclusions ==
Through our (many) experiments, we learned a couple of things:

1. '''Model Effectiveness''':
Among the models, the Conditional Autoencoder showed the best balance between enhancing color contrast and preserving naturalness. It improved the distinguishability of colors for CVD individuals while maintaining a smooth, visually appealing output. However, it produced slightly blurry images, which could be improved with better loss functions or refinement techniques. The Conditional U-Net was also effective in preserving structure and providing stable recoloring, but it required careful training to avoid overfitting. The Conditional Parallel RGB MLP, while computationally fast, lacked the ability to capture spatial relationships between pixels, making it unsuitable for this task.

2. '''Importance of Loss Functions''':
Designing appropriate loss functions was crucial for achieving the right balance between naturalness, contrast enhancement, and structural preservation. The global and local contrast losses significantly improved the visibility of recolored images, while the naturalness loss ensured that the outputs did not look artificial. Incorporating metrics like SSIM and Chromatic Difference into the evaluation also helped us better understand how well the models performed.

3. '''Challenges with Data''':
One of the biggest challenges was ensuring that the dataset effectively represented real-world scenarios for CVD individuals. Simulating CVD perceptions and generating recolored images that matched those perceptions required a well-defined pipeline. A more diverse dataset or additional user studies with CVD participants could help fine-tune the models further.

4. '''Computational Efficiency''':
While models like the Conditional Autoencoder and Conditional U-Net provided high-quality recoloring, their inference times were moderate, making them feasible for real-time applications. Optimizing these models further could make them more scalable for real-world use cases, such as accessibility tools in apps or websites.

5. '''What Worked and What Didn’t''':
* Worked: Contrast enhancement methods using local and global losses were effective in improving visibility for CVD individuals. Transformer-inspired loss functions borrowed from Swin architecture added robustness.
* Didn’t Work: Pixel-wise methods like the Conditional RGB MLP struggled due to their inability to handle spatial dependencies. Additionally, overfitting was a recurring issue in larger architectures without careful training.

6. '''Future Directions''':
* Better Loss Functions: Refining the loss functions to address issues like blurriness in outputs could further improve results.
* User Studies: Testing the models with real CVD participants would provide valuable insights and help validate the results.
* Model Optimization: Reducing the computational cost of high-performing models like the Conditional Autoencoder could make them more practical for deployment.
* Exploration of New Architectures: Trying newer methods, such as lightweight transformers or diffusion-based models, might enhance recoloring performance while maintaining efficiency.

While there’s still room for improvement, our models demonstrated the potential of deep learning in addressing the challenges faced by individuals with CVD. Our future work would focus on refining these methods and bringing them closer to practical, everyday applications.

== References ==
[1] Li, H., Zhang, L., Zhang, X., Zhang, M., Zhu, G., Shen, P., ... & Shah, S. A. A. (2020). Color vision deficiency datasets & recoloring evaluation using GANs. Multimedia Tools and Applications, 79, 27583-27614.

[2] Chen, L., Zhu, Z., Huang, W., Go, K., Chen, X., & Mao, X. (2024). Image recoloring for color vision deficiency compensation using Swin transformer. Neural Computing and Applications, 36(11), 6051-6066.

[3] Jiang, S., Liu, D., Li, D., & Xu, C. (2023). Personalized image generation for color vision deficiency population. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22571-22580).

[4] Huang, J.-B., Chen, C.-S., Jen, T.-C., & Wang, S.-J. (n.d.). Image recolorization for the colorblind [GitHub repository]. Retrieved December 12, 2024, from https://github.com/jbhuang0604/RecolorForColorblind

[5] Dietrich, J. (n.d.). Daltonize Python Package [GitHub repository]. Retrieved December 12, 2024, from https://github.com/joergdietrich/daltonize/blob/main/daltonize/daltonize.py

[6] Dougherty, B., & Wade, A. (2000). Vischeck. Retrieved December 12, 2024, from https://www.vischeck.com/

[7] Brettel, H., Viénot, F., & Mollon, J. D. (1997). Computerized simulation of color appearance for dichromats. Josa a, 14(10), 2647-2655.

[8] Zhu, Z., Toyoura, M., Go, K., Fujishiro, I., Kashiwagi, K., & Mao, X. (2019). Processing images for red–green dichromats compensation via naturalness and information-preservation considered recoloring. The Visual Computer, 35, 1053-1066.

[9] Zhu, Z., Toyoura, M., Go, K., Kashiwagi, K., Fujishiro, I., Wong, T. T., & Mao, X. (2021). Personalized image recoloring for color vision deficiency compensation. IEEE Transactions on Multimedia, 24, 1721-1734.

[10] Tsekouras, G. E., Rigos, A., Chatzistamatis, S., Tsimikas, J., Kotis, K., Caridakis, G., & Anagnostopoulos, C. N. (2021). A novel approach to image recoloring for color vision deficiency. Sensors, 21(8), 2740.

[11] Huang, J. B., Chen, C. S., Jen, T. C., & Wang, S. J. (2009, April). Image recolorization for the colorblind. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 1161-1164). IEEE.

[12] Color-Blindness.com. (n.d.). COBLIS - Color Blindness Simulator. Retrieved December 13, 2024, from https://www.color-blindness.com/coblis-color-blindness-simulator/

== Appendix I ==
* [https://github.com/rainasong/psych221-aut24-final-project.git Code]
* [https://drive.google.com/drive/folders/10WMXPbtpV7Hy5_qBA_TCEbW-kCpj1D7v Dataset]

=== Additional results ===
1. '''Recolored Images - Conditional Autoencoder'''
<div style="display: inline; width: 220px; float: center;">
[[File:eb_1.png|400 px|Wikipedia encyclopedia]][[File:eb_2.png|400 px]] </div>

2. '''Loss curves'''
<div style="display: inline; width: 800px; float: center;">
[[File:loss_ae.png|300 px|center|thumb|Losses - Conditional Autoencoder]][[File:loss_unet.png|300 px|thumb|center|Losses - Conditional U-Net]][[File:loss_mlp.png|300 px|center|thumb|Losses - Conditional MLP]]</div>

== Appendix II ==
'''Ishikaa''':
* Training, evaluation and visualization for all deep learning methods (MLP, U-Net and Autoencoder)
* GMM recoloring method in Python & adding severity index
* 'Ground Truth' dataset creation and logging
* AWS Compute setup & configuration
* Written Report & Presentation

'''Raina''':

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-13T09:54:14Z

Rainas: /* 2. Optimization-Based Recoloring */

== Introduction ==
Color Vision Deficiency (CVD) affects approximately 350 million individuals worldwide, impairing their ability to distinguish certain colors. Image recoloring for individuals with CVDs has been a well-researched area, with numerous attempts aimed at creating images that make colors more distinguishable for those with CVDs. CVDs are typically classified into three main types: protanopia (difficulty perceiving red), deuteranopia (difficulty perceiving green), and tritanopia (difficulty perceiving blue). These conditions arise due to the absence or malfunction of certain types of cone cells in the retina, which are responsible for color vision. For instance, the lack or defect of red or green cones leads to protanopia or deuteranopia, respectively. This can result in partial or complete loss of perception of specific colors. Moreover, individuals with CVD can be dichromats (completely missing one type of cone cell), or anomalous trichromats (having all three types of cones but with altered sensitivity), causing milder color perception issues. Most research has focused on protanopia and deuteranopia, as these conditions are more prevalent, and only a few consider different severity levels.

In this work, we aim to consider all three types of CVDs, taking into account varying levels of severity for personalization. We explore existing methods in the field and experiment through two main approaches: mathematical transformations and deep learning techniques. We will start by reviewing current advancements in these two domains, followed by presenting our experiments and results. Evaluations of each method will be provided, leading to a discussion of our findings and outlining potential directions for future work.

The motivation for this work arose from a personal experience. While admiring a rainbow, a phenomenon I find among the most beautiful in the world, with a friend who has deuteranopia, I realized that they were unable to distinguish the vibrant array of colors. This experience highlighted the emotional and perceptual gap caused by CVD, inspiring the goal of this project: to develop personalized and efficient tools that enhance color perception for individuals with CVDs. Ultimately, we aim to enable those with CVDs to enjoy the same vivid experiences, such as the beauty of a rainbow, experienced by those with normal color vision.

== Background ==
In recent years, numerous methods have been developed to recolor images for individuals with CVDs, ranging from traditional mathematical approaches to advanced deep learning techniques. This section focuses on the prominent recent works in these two categories.

=== Mathematical-based methods ===
Mathematical approaches to image recoloring for individuals with CVDs have been extensively developed to enhance color discrimination while trying to preserve the natural appearance of images. These methods typically involve color space transformations, optimization techniques, and perceptual modeling to achieve their objectives.

==== Daltonization ====
Daltonization enhances images for individuals with CVD by correcting colors based on the simulated deficiency. The process involves comparing the original LMS values with the simulated deficient values to compute the error:
<math display="block">
\text{Error}_{\text{LMS}} = \text{LMS}_{\text{original}} - \text{LMS}_{\text{simulated}}
</math>

The error is then mapped back to the RGB space using a correction matrix because the error contains the information that dichromats cannot see, and the correction matrix rotates it to a part of the spectrum that they can see. For example, the correction matrix, as implemented in tools like Daltonize [5] and Vischeck [6], is:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

The corrected RGB values are added back to the original LMS values to generate a daltonized image that improves contrast for CVD viewers.

==== Optimization-based Method ====
Zhu et al. [8] introduced an optimization-based recoloring framework for red-green dichromacy, aiming to balance naturalness and contrast. The framework minimizes a total loss function defined as:

<math display="block"> E = \beta E_{\text{nat}} + E_{\text{cont}} </math>

where <math>\beta</math> is a scalar weight that controls the trade-off between the two objectives: naturalness preservation (<math>E_{\text{nat}}</math>) and contrast enhancement (<math>E_{\text{cont}}</math>).

The naturalness term, <math>E_{\text{nat}}</math>, ensures that the recolored image closely resembles the original image for CVD viewers by minimizing perceptual differences:

<math display="block"> E_{\text{nat}} = \sum_{i=1}^N \| c_i^+ - c_i \|^2, </math>

where:
* <math>N</math> is the total number of pixels in the image,
* <math>c_i</math> is the original color of the <math>i</math>-th pixel,
* <math>c_i^+</math> is the recolored value of the <math>i</math>-th pixel,
* <math>\| c_i^+ - c_i \|</math> is the Euclidean distance, measuring the perceptual difference between the original and recolored colors.

The contrast term, <math>E_{\text{cont}}</math>, enhances the distinguishability of colors in the recolored image by minimizing changes in color contrast:

<math display="block"> E_{\text{cont}} = \sum_{i \neq j} \| (c_i^+ - c_j^+) - (c_i - c_j) \|^2, </math>

where:
* <math>(c_i^+ - c_j^+)</math> is the perceived color difference between pixels <math>i</math> and <math>j</math> after recoloring,
* <math>(c_i - c_j)</math> is the original color difference,
* <math>\| (c_i^+ - c_j^+) - (c_i - c_j) \|</math> represents the deviation in color contrast before and after recoloring.

To address the limitations of this approach, Zhu et al. [9] proposed a degree-adaptable framework incorporating a transformation matrix <math>T</math> that simulates CVD perception. The transformation matrix is defined as:

<math display="block"> T = \begin{bmatrix} t_{11} & t_{12} & t_{13} \\ t_{21} & t_{22} & t_{23} \\ t_{31} & t_{32} & t_{33} \end{bmatrix}, </math>

where <math>t_{ij}</math> are the elements representing the relationships between the original and perceived LMS (Long, Medium, Short wavelength) cone responses for individuals with CVD.

The degree-adaptable loss function extends the optimization by adjusting weights based on perceptual importance, defined as:

<math display="block"> E = \beta \sum_{i=1}^N \alpha_i \| T(c_i^+ - c_i) \|^2 + \sum_{i \neq j} \| T(c_i^+ - c_j^+) - T(c_i - c_j) \|^2. </math>

Here:
* <math>\alpha_i</math> assigns weights to each pixel, prioritizing the preservation of colors with smaller perception errors,
* <math>\| T(c_i^+ - c_i) \|</math> measures the perceptual difference after recoloring,
* <math>\| T(c_i^+ - c_j^+) - T(c_i - c_j) \|</math> quantifies the deviation in color contrast under CVD simulation.

This framework improves both contrast and personalization but requires further optimization for real-time performance.

==== Confusion lines based Method ====
Tsekouras et al. [10] proposed a novel image recoloring approach for individuals with protanopia and deuteranopia, focusing on improving color naturalness and enhancing contrast. Their framework consists of four modules, with a key focus on shifting confusing colors along confusion lines in the CIE 1931 chromaticity diagram.

The process begins with fuzzy clustering, which identifies representative colors (key colors) from the input image. These key colors are then analyzed on the chromaticity diagram, where confusion lines—paths representing colors indistinguishable by individuals with CVD—serve as the basis for recoloring. Confusion lines are defined using the copunctal point of the missing cone type and another reference point:

<math display="block">
d(v, L) = \frac{\left|(x_{cp} - x_0)(y_0 - y_v) - (x_0 - x_v)(y_{cp} - y_0)\right|}{\sqrt{(x_{cp} - x_0)^2 + (y_{cp} - y_0)^2}},
</math>

where:
* <math display="inline">v = (x_v, y_v)</math> is the chromaticity coordinate of the color,
* <math display="inline">L</math> is the confusion line passing through the copunctal point <math display="inline">(x_{cp}, y_{cp})</math> and another reference point <math display="inline">(x_0, y_0)</math>,
* <math display="inline">d(v, L)</math> measures the perpendicular distance from the point <math display="inline">v</math> to the confusion line <math display="inline">L</math>.

Confusing colors, identified as key colors lying on occupied confusion lines, are iteratively shifted to the nearest non-occupied confusion lines to enhance discriminability for CVD viewers. High-ranking colors, determined by their prominence in image clusters, are shifted to the nearest unoccupied confusion lines. This reallocation ensures that these colors are distinguishable to viewers with CVD while minimizing disruption to the image's overall color harmony.

After shifting, the luminance of the recolored key colors is optimized using a regularized objective function to balance naturalness and contrast:
<math display="block">
E = (E_1 + E_2) + \lambda E_3,
</math>

where:
* <math display="inline">E</math> is the total loss,
* <math display="inline">\lambda</math> is a weight parameter controlling the trade-off between contrast enhancement and naturalness preservation.

The first term, <math display="inline">E_1</math>, measures contrast enhancement for normal trichromats:

<math display="block">
E_1 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - b_j\| - \|f_D(a_{i,\text{rec}}) - f_D(b_j)\| \right|,
</math>

where:
* <math display="inline">n_A</math> and <math display="inline">n_B</math> are the number of key colors in clusters <math display="inline">A</math> and <math display="inline">B</math>, respectively,
* <math display="inline">a_i</math> is the chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">b_j</math> is the chromaticity of the <math display="inline">j</math>-th key color in cluster <math display="inline">B</math>,
* <math display="inline">f_D</math> is a function simulating the dichromatic vision of individuals with color vision deficiencies,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color.

The second term, <math display="inline">E_2</math>, measures contrast enhancement for dichromats:

<math display="block">
E_2 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - a_j\| - \|f_D(a_{i,\text{rec}}) - f_D(a_{j,\text{rec}})\| \right|,
</math>

where:
* <math display="inline">a_i</math> and <math display="inline">a_j</math> are the chromaticities of the <math display="inline">i</math>-th and <math display="inline">j</math>-th key colors in cluster <math display="inline">A</math>,
* <math display="inline">f_D(a_{i,\text{rec}})</math> simulates the dichromatic perception of the recolored chromaticity <math display="inline">a_{i,\text{rec}}</math>.

The third term, <math display="inline">E_3</math>, preserves the naturalness of the recolored image:

<math display="block">
E_3 = \frac{1}{n_A} \sum_{i=1}^{n_A} \|a_i - a_{i,\text{rec}}\|,
</math>

where:
* <math display="inline">a_i</math> is the original chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">\|a_i - a_{i,\text{rec}}\|</math> is the Euclidean distance between the original and recolored chromaticities, measuring how much the naturalness is preserved.

This method significantly enhances the contrast and naturalness of recolored images by leveraging confusion line geometry and regularized optimization. However, challenges remain in achieving real-time performance and handling cases where shifting may distort the aesthetic quality of the image.

==== GMM-based Method ====
Huang et al. [11] proposed an efficient and effective re-coloring algorithm for individuals with CVD using a Gaussian Mixture Model (GMM) to represent color distributions. The algorithm comprises four main steps: feature extraction, clustering using GMM, optimization of Gaussian components, and interpolation for recoloring.

Step 1 - Feature Extraction:
Each pixel in the input image is represented in the CIEL*a*b* color space, which approximates perceptual differences using the Euclidean distance between colors. The color feature vector <math display="inline">x</math> is used as input for clustering.

Step 2 - Clustering via GMM:
The color distribution of the image is modeled using a GMM with <math display="inline">K</math> Gaussian components:
<math display="block">
p(x|\Theta) = \sum_{i=1}^K \omega_i G_i(x|\theta_i),
</math>
where:
* <math display="inline">\Theta</math> is the parameter set containing all weights, means, and covariance matrices,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian,
* <math display="inline">G_i(x|\theta_i)</math> is the 3D normal distribution with parameters <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix).

Step 3 - Optimization:
To ensure color distinguishability for CVD viewers, the algorithm adjusts the mean vector of each Gaussian component using an optimization function that preserves the symmetric Kullback-Leibler (KL) divergence:
<math display="block">
D_{sKL}(G_i, G_j) = D_{KL}(G_i \| G_j) + D_{KL}(G_j \| G_i),
</math>
where:
* <math display="inline">D_{KL}(G_i \| G_j)</math> measures the dissimilarity between two Gaussian distributions <math display="inline">G_i</math> and <math display="inline">G_j</math>.

The optimization aims to preserve the contrast perceived by CVD viewers while maintaining naturalness. Weights are assigned to Gaussian components based on the perceptual importance of colors:
<math display="block">
\lambda_i = \frac{\sum_{j=1}^N \alpha_j p(i|x_j, \Theta)}{\sum_{k=1}^K \sum_{j=1}^N \alpha_j p(k|x_j, \Theta)},
</math>
where:
* <math display="inline">\alpha_j = \|x_j - \text{Sim}(x_j)\|</math> is the perceptual error of the <math display="inline">j</math>-th color feature when simulated for CVD,
* <math display="inline">\text{Sim}(\cdot)</math> is the simulation function for CVD perception.

Step 4 - Interpolation for Recoloring:
After optimizing the Gaussians, the mapping function <math display="inline">M_i(\cdot)</math> relocates the mean vectors while maintaining covariance matrices. Interpolation ensures smooth transitions between recolored regions:
<math display="block">
T(x_j)_H = x_j^H + \sum_{i=1}^K p(i|x_j, \Theta) (M_i(\mu_i)_H - \mu_i^H),
</math>
where:
* <math display="inline">T(x_j)_H</math> is the hue adjustment for the <math display="inline">j</math>-th color,
* <math display="inline">M_i(\mu_i)_H</math> is the mapped hue of the <math display="inline">i</math>-th Gaussian's mean.

While the GMM-based approach effectively models color distributions and enhances the contrast of recolored images significantly, it has limitations:
* The accuracy of recoloring depends on the choice of <math display="inline">K</math>, which may vary for different images.
* The method assumes diagonal covariance matrices for computational efficiency, which may oversimplify real-world color distributions. Sometimes the colors in the recolored images are not very natural.
* The high computational complexity in the optimization step of this algorithm may be difficult for real-time applications.

=== Deep Learning based methods ===
Conventional methods for recoloring, including optimization-based approaches (as discussed above), fail to generalize well across varying severity levels and CVD types. While these methods improve color differentiation, they frequently compromise naturalness or require extensive computational resources, making them less suitable for real-time, efficient, personalized applications.

==== GAN-Based Recoloring for CVD ====

In [1] GANs (Generative Adversarial Networks) was explored for recoloring, with a backbone Pix2Pix-GAN, Cycle-GAN, and Bicycle-GAN structure showing promising results. These models are generate creative recolored images by learning mappings between normal and CVD-affected color spaces. However, this and existing GAN approaches struggle with balancing naturalness and contrast. This specific reference also requires paired datasets (since it is adapted from style transfer), making it computationally intensive and less suitable for personalization.

==== Swin Transformer Recoloring ====

The authors in [2] introduced a hierarchical vision transformer (SWIN) architecture that processes images through shifted windows, effectively capturing both local and global contextual information. In computer vision, this design generally allows efficient handling of high-resolution images and has been applied to various tasks, including image classification and object detection. Despite its robust performance, this architecture is still computationally intensive and does not inherently account for the specific needs of CVD individuals, as it lacks mechanisms for personalized color adjustments.

==== Personalized CVD-GAN ====

To cater to the diverse needs of the CVD population, the Personalized CVD-GAN [3] was developed. This model generates images that are not only CVD-friendly but also tailored to individual degrees of color vision deficiency. By disentangling color representations using a unique triple-latent structure in their method, continuous personalization was possible to adjust images according to specific CVD severities. While effective, this approach is computationally demanding, making it less practical for real-time applications. In our experiment, it took around 18 days for one epoch (or one iteration over the entire dataset).

Thus, existing methods either lack personalization or are too resource-intensive for widespread use.

== Methods ==
We aim to find effective and efficient ways to recolor images for people with CVD with the personalization of different severity levels. We start by exploring existing methods and identifying opportunities for improvement. Since mathematical-based approaches provide a solid foundation and are well-documented, we began our experiments by testing these methods, as described in the background. We later extended our exploration to deep learning based methods.

=== Mathematical based ===
We explored four main methods, building on the foundational work discussed in the background section.

==== Method 1: Daltonization as a baseline ====
We started with the relatively intuitive Daltonization method, where we adjusted the colors in an image to compensate for color vision deficiencies by simulating how the colors appear to individuals with CVD. This involves computing the difference between the original and simulated color perception in the LMS (Long, Medium, Short wavelength) color space. The calculated error is then corrected and mapped back to the RGB space using a transformation matrix, resulting in a recolored image that enhances color differentiation for viewers with CVD.

The simulation of CVDs relies on the physiology of human vision, particularly the responses of the Long (L), Medium (M), and Short (S) wavelength-sensitive cones in the retina. The LMS color space is derived from the spectral sensitivities of these cones, making it an ideal framework for modeling human color perception.

To simulate CVD, we first transformed colors in RGB color space into the LMS color space using the following linear transformation matrix based on Stockman and Sharpe’s cone fundamentals:
<math display="block">
T_{\text{RGB-to-LMS}} = \begin{bmatrix}
0.3904725 & 0.54990437 & 0.00890159 \\
0.07092586 & 0.96310739 & 0.00135809 \\
0.02314268 & 0.12801221 & 0.93605194
\end{bmatrix}
</math>

For individuals with CVD, the missing cone’s response is replaced by a weighted combination of the remaining two cones. This approach, introduced by Brettel, Viénot, and Mollon (1997) [7], uses specific coefficients derived from cone sensitivities. For example, in protanopia (L-cone deficiency), the L-cone response is approximated using the M- and S-cone responses as:
<math display="block">
L_{\text{simulated}} = 0 \cdot L + 0.90822864 \cdot M + 0.008192 \cdot S
</math>

For deuteranopia (M-cone deficiency), the M-cone is replaced as:
<math display="block">
M_{\text{simulated}} = 1.10104433 \cdot L + 0 \cdot M - 0.00901975 \cdot S
</math>

For tritanopia (S-cone deficiency), the S-cone is replaced as:
<math display="block">
S_{\text{simulated}} = -0.15773032 \cdot L + 1.19465634 \cdot M + 0 \cdot S
</math>

These transformations allow accurate simulation of the perceptual experience of individuals with CVD. (The numbers are derived from [5]).

The error between the original and simulated is then mapped into the RGB color space using a deficiency-specific correction matrix, which adjusts the image to enhance contrast and recover lost color differences. The predefined correction matrix is applied to the error in RGB space, transforming it back into LMS space for final adjustments. The corrected LMS values are added back to the original values, producing a recolored image that improves visual accessibility for viewers with CVD. This approach uses the Daltonize-inspired correction matrix:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

==== Method 2: Optimizing Objective Function ====
To improve the results from the Daltonization method, we designed a framework inspired by methods discussed in the background, incorporating dominant color extraction, optimization-based recoloring, and edit propagation. This approach aims to find a balance between the naturalness and contrast while compensating colors that are not visible for corresponding CVD types.

===== 1. Extraction of Dominant Colors =====
We begin by extracting the dominant colors from the input image using fuzzy clustering via a MiniBatch K-means algorithm. This step identifies a reduced set of representative colors that capture the primary color information in the image:
<math display="block">
\mathbf{C} = \{\mathbf{c}_1, \mathbf{c}_2, \ldots, \mathbf{c}_N\},
</math>
where <math display="inline">N</math> represents the number of clusters, and <math display="inline">\mathbf{c}_i</math> represents the centroid of the <math display="inline">i</math>-th cluster.

===== 2. Optimization-Based Recoloring =====
Once the dominant colors are extracted, we apply an optimization process to adjust these colors. The optimization uses the formulas mentioned in, and aims to balance two key objectives:

1. Naturalness Preservation: Ensures the recolored image minimally deviates from the original.
<math display="block">
E_{\text{nat}} = \sum_{i=1}^N \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_i^{\text{original}}) \|^2,
</math>
where <math display="inline">\mathbf{T}</math> is the transformation matrix based on the severity and type of CVD, and <math display="inline">\mathbf{c}_i^{\text{original}}</math> is the original color.

2. Contrast Enhancement: Improves the differentiation of colors for individuals with CVD:
<math display="block">
E_{\text{cont}} = \sum_{i=1}^N \sum_{j>i} \left( \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_j) \|^2 - \| \mathbf{c}_i^{\text{original}} - \mathbf{c}_j^{\text{original}} \|^2 \right)^2.
</math>

The total objective function combines these two terms:
<math display="block">
E = \beta E_{\text{nat}} + E_{\text{cont}},
</math>
where <math display="inline">\beta</math> controls the trade-off between naturalness and contrast.

Optimization is performed using the L-BFGS-B algorithm to ensure efficient convergence under bounded constraints.

The transformation matrices for each type of CVD are the following, which are based on [12]:

<div style="text-align:center;">
<math display="inline">
T_{\text{Protanopia}} = \begin{bmatrix} 0.566 & 0.558 & 0 \\ 0.433 & 0.442 & 0.242 \\ 0 & 0 & 0.758 \end{bmatrix},
</math>
<math display="inline">
T_{\text{Deuteranopia}} = \begin{bmatrix} 0.625 & 0.7 & 0 \\ 0.375 & 0.3 & 0.3 \\ 0 & 0 & 0.7 \end{bmatrix},
</math>
<math display="inline">
T_{\text{Tritanopia}} = \begin{bmatrix} 0.95 & 0 & 0 \\ 0.05 & 0.433 & 0 \\ 0 & 0.567 & 1 \end{bmatrix}.
</math>
</div>

===== 3. Edit Propagation =====
After optimizing the dominant colors, we propagate these edits across the entire image to ensure smooth transitions. The propagation step uses interpolation in the CIE-Lab color space:
<math display="block">
\Delta L^* = \text{griddata}(\mathbf{c}^{\text{original}}, \mathbf{c}^{\text{recolored}} - \mathbf{c}^{\text{original}}, \mathbf{I}),
</math>
where <math display="inline">\mathbf{I}</math> represents the pixel values in the Lab color space, and <math display="inline">\Delta L^*</math> adjusts the luminance values. The recolored image is reconstructed by applying the interpolated changes back to the original image and converting it to RGB.

=== Deep Learning based ===

==== Task Overview ====
Given an input RGB image and a label for the user (as shown in the figure), we want a deep learning model to output a recolored RGB image that is specific to that user. More details on inputs and outputs are discussed in further sections but an overview is shown in Figure 1. All of the code was done in Python using a deep learning framework called [https://pytorch.org PyTorch]
[[File:Io.png|right|thumb|200px|Figure 1: Dataset]]

==== Types ====
1. ''' Supervised methods ''':
These are deep learning models that require a 'ground truth' recolored image for the neural network to learn recolorization. While these methods are simple, easy to train and integrate the user label, they require an already present ground truth comparison of expected output.

2. ''' Unsupervised methods ''':
These models are trained without a ground truth and can also encode user label information while training. They are generally better at generating more natural images, but they require more compute and sophisticated model architectures or loss functions for the recoloring task

==== Dataset ====
The dataset used for this project was constructed specifically to address the challenges of recoloring images for individuals with color vision deficiency (CVD). We first gathered an open-source RGB image dataset from [2]. To improve the capability of the proposed model to enhance the contrast between CVD-indistinguishable color
pairs, in their study, they created a new dataset consisting of 141,000 pictures of both natural scenes and artificial images containing
CVD-confusing colors without labels. To generate labels (and ground truth recolored images for supervised methods), we randomly sampled 15,000 images and recolored by simulating random labels for severity and type of CVD. The recoloring for ground truth images was done using a [https://github.com/jbhuang0604/RecolorForColorblind/tree/master MATLAB script] (adapted to Python) from [4]. Note: The open-source tools used in the Python version for the recoloring script were [https://scikit-image.org Scikit-Image], [https://scipy.org Scipy] and [https://python-colormath.readthedocs.io/en/latest/ Colormath].

As shown in Figure 1, each sample in the dataset consists of:

1. ''' Original RGB Image''' : High-resolution images, resized to <code> 256x256</code> pixels and normalized to <code>[0,1]</code> range, representing the standard color space.

2. ''' CVD Labels ''' : Condition labels encoded as <code>severity * [protan, deutan]</code>, where severity ranges from 0.1 to 1.0. For example, a label <code>[0.6, 0]</code> corresponds to protanopia at 60% severity.

Data augmentation techniques such as random rotations, crops, and brightness adjustments were applied to expand the dataset, ensuring robust model generalization across diverse scenarios.

==== Supervised Methods ====
===== Conditional Parallel RGB MLP =====
[[File:mlp.png|right|thumb|Figure 2: Conditional MLP architecture]]
As shown in Figure 2, the model predicts the R, G, and B channels separately using an independent multi-layer perceptron (MLP) for each channel. The input image is concatenated with the label encoding along the channel dimension and is passed to 3 parallel MLPs simultaneously. These parallel networks are learned to predicted R, G, B channels of a recolored image based on given ground truth. The outputs from each of these networks are concatenated to produce the recolored RGB image of same spatial dimensions as input. Essentially, each channel is disentangled, enabling targeted adjustments.

The loss function used to train was pixel wise, mean-squared error loss:
<math>
\mathcal{L}_{\text{MSE}} = \frac{1}{N} \sum_{p=1}^{N} \left( I(p) - I'(p) \right)^2
</math>

Where:
* I, I': Recolored (model output) image and ground truth recolored image respectively
* p: Image index
* N: Total number of images

===== Conditional U-Net =====
In a similar fashion of inputs, a convolutional neural network (CNN)-based U-Net architecture was tested to generate a full recolored image as output. The conditional inputs here affect both the encoder and decoder. [[File:Unet condtional.png|right|thumb|Figure 3: Conditional U-Net architecture]]
U-Nets are widely used in computer vision tasks and are very robust to new tasks as well. The architecture we adopted is shown in Figure 3.
The loss function used to train the U-Net was a commonly used VGG Perceptual Loss:
<math>
\mathcal{L}_{\text{VGG}} = \sum_{l} \frac{1}{N_l} \| \phi_l(I) - \phi_l(I') \|_2^2
</math>

Where:
* I and I': are recolored (model output) and ground recolored images respectively
* <math>\phi_l</math> is the l-th of the pre-trained VGG network

==== Unsupervised Methods ====
===== Conditional Autoencoder =====
As shown in Figure4, an unsupervised CNN-based encoder-decoder network was trained to reconstruct full recolored images with a CVD-aware color palette. The key to making this network align with the recoloring task was the loss functions. The loss functions we used to train this network were inspired from [2]. [[File:Ae.png|right|350px|thumb|Figure 4: Conditional Autoencoder architecture]]

The total loss function is given by:
<math>
\mathcal{L}_{\text{total}} = \alpha \cdot \mathcal{L}_{\text{naturalness}} + 2 \cdot (1 - \alpha) \cdot \mathcal{L}_{\text{contrast}}
</math>

Where:
<math>
\mathcal{L}_{\text{contrast}} = \beta \cdot \mathcal{L}_{\text{global}} + (2 - \beta) \cdot \mathcal{L}_{\text{local}}
</math>

The components of the loss functions are described below:

1. '''Global Contrast Loss''':
The global contrast loss ensures that the overall contrast of the recolored image is preserved. It is defined as
<math>
\mathcal{L}_{global} = \frac{1}{\|\omega\|} \sum_{<x, y> \in \epsilon \omega} \text{CL}(x, y)
</math>

2. '''Local Contrast Loss''':
The local contrast loss focuses on preserving the contrast within a small neighborhood around each pixel. <math>
\mathcal{L}_{l} = \frac{1}{N} \sum_{x=1}^{N} \sum_{y \in \omega_x} \frac{\text{CL}(x, y)}{\|\omega_x\|}
</math>

Note:

<math>
\text{CL}(x, y) = \|\hat{c}_x' - \hat{c}_y'\| - \|c_x - c_y\|
</math>

* x,y: Two distinct pixels in the image.
* cx and cy: CVD simulated colors of original image
* c^x′and c^y: CVD simulated colors of recolored image (model output)
* ||w||: Global (or large) window of image
* ||wx||: Local window or neighborhood around a pixel x

3. '''Naturalness Loss''':
The naturalness loss drives output image to have colors that are visually similar and close to natural distributions. <math>
\mathcal{L}_{\text{natural}} = 1 - \text{SSIM}(I', I)
</math>

Where:
* I(i), I'(i): Original and recolored images respectively

== Results ==
=== Mathematical based methods ===
{| class="wikitable"
|+ Table 1: Quantitative Evaluation Results for Mathematical Methods
! !! Method 1 !! Method 2 !! Method 3 !! Method 4
|-
! colspan="5" | Performance
|-
| Time/image || 0.2s || 1m13s || 4.4s || 1.6s
|-
! colspan="5" | SSIM Metrics
|-
| Original vs Recolored || 0.0066 || 0.9998 || 0.9988 || 0.9902
|-
| Original vs Original Simulated || 0.9985 || 0.9985 || 0.9985 || 0.9985
|-
| Recolored vs Recolored Simulated || 0.9565 || 0.9986 || 0.9986 || 0.9968
|-
! colspan="5" | TCC Metrics
|-
| Original vs Recolored || 0.4211 || 0.0001 || 0.0003 || 0.0005
|-
| Original vs Original Simulated || 0.0004 || 0.0003 || 0.0003 || 0.0003
|-
| Recolored vs Recolored Simulated || 0.0380 || 0.0003 || 0.0002 || 0.0005
|-
! colspan="5" | CD ΔE76 Metrics
|-
| Original vs Recolored || 57.4513 || 0.0217 || 0.0632 || 0.1057
|-
| Original vs Original Simulated || 0.0462 || 0.0462 || 0.0462 || 0.0462
|-
| Recolored vs Recolored Simulated || 8.4251 || 0.0458 || 0.0435 || 0.0578
|-
! colspan="5" | CIEDE2000 Metrics
|-
| Original vs Recolored || 41.2667 || 0.0229 || 0.0675 || 0.1312
|-
| Original vs Original Simulated || 0.0681 || 0.0681 || 0.0681 || 0.0681
|-
| Recolored vs Recolored Simulated || 6.9145 || 0.0671 || 0.0630 || 0.0838
|-
! colspan="5" | CIEDE94 Metrics
|-
| Original vs Recolored || 57.3637 || 0.0217 || 0.0630 || 0.1056
|-
| Original vs Original Simulated || 0.0461 || 0.0461 || 0.0461 || 0.0461
|-
| Recolored vs Recolored Simulated || 5.3878 || 0.0457 || 0.0434 || 0.0576
|-
! colspan="5" | D-CIELAB ΔEab Metrics
|-
| Original vs Recolored || 2.1314 || 3.8863 || 7.6867 || 8.0045
|-
| Original vs Original Simulated || 1.7209 || 1.7209 || 1.7209 || 1.7209
|-
| Recolored vs Recolored Simulated || 1.5926 || 1.9673 || 1.4363 || 2.4009
|}

=== Deep Learning based methods ===
The results focus on evaluating the performance of the above neural network architectures—Conditional Parallel RGB MLP, Deep U-Net, and Conditional Autoencoder. Quantitive metrics such as Structural Similarity Index (SSIM), total color contrast (TCC), Chromatic Difference (CD), and inference time were used to assess the effectiveness of the models provided in [1] and [2].

==== Qualitative Results ====
The recolored outputs were visually evaluated to determine their alignment with expected results. The 'expected' results for supervised mean how closely they resemble ground truth recolored image and for unsupervised method mean how much contrast and naturalness is observed in the CVD simulated recolored images compared to original.
The results and takeaways can be summarized as follows:

1. '''Conditional Parallel RGB MLP''': (Figure 5)
[[File:Mlp_res.png|right|400px|thumb|Figure 5 Conditional MLP: Model failure]]
* Recoloring was inconsistent, with visible artifacts in regions where spatial correlations were essential.
* The pixels seemed more discretized, suggesting that disentanglement was not very useful for this case (especially naturalness).
* Failed to preserve natural color transitions, particularly in complex images.
2. '''Conditional U-Net''': (Figure 6, 7)
[[File:Unet_res1.png|right|400px|thumb|Figure 6 Conditional U-Net: Model failure]]
[[File:Unet_res2.png|right|400px|thumb|Figure 7 Conditional U-Net: CVD Simulated examples]]
* Produced stable recoloring, preserving structural details.
* Initially showed improvement towards resembling ground truth, but over time started 'reconstructing' the colors of the original image.
* The CVD simulations of recolored versus original were similar or worse meaning that the model was not doing well for this task
* Sometimes it over-saturated some colors, affecting the visual appeal.
3. '''Conditional Autoencoder''': (Figure 8, 9)
[[File:ae_res1.png|right|400px|thumb|Figure 8 Conditional Autoencoder: Majority good results]]
[[File:ae_res1.png|right|400px|thumb|Figure 9 Conditional Autoencoder: Marginal or negative improvement + Blurriness]]
* Achieved smooth and natural recoloring, with fewer artifacts.
* Showed the highest contrast improvement among the three models.
* In some cases, hurt the contrast in the CVD simulated colors and in some there was marginal improvement in contrast.
* Blurriness in the recolored images was seen (possibly due to naturalness factor being more prioritized even though weight coefficients in the loss term favored contrast (alpha = 0.25, beta = 1.0)).

==== Quantitative Results ====
Based on the above qualitative results, we decided to score and evaluate metrics for comparison with related work only using the Conditional Autoencoder.
As mentioned above, the evaluation metrics are adapted from [1] and [2]. Please refer to the definitions in the paper, as we have used the same. On a high level, the three components are:
* SSIM: Measures the structural similarity between the original and recolored images, ensuring the structural integrity of the recolored image is maintained.
<math>
SSIM(X, Y) = \frac{(2\mu_X\mu_Y + c_1)(2\sigma_{XY} + c_2)}{(\mu_X^2 + \mu_Y^2 + c_1)(\sigma_X^2 + \sigma_Y^2 + c_2)}
</math>

* Total Color Contrast: Quantifies the visibility improvement between indistinguishable colors for CVD individuals.
<math>
TCC = \frac{1}{n_1} \sum_{(i,j) \in \Omega_1} |x_i - x_j|
+ \frac{1}{N \cdot n_2} \sum_{i=1}^{N} \sum_{j \in \Omega_2} |x_i - x_j|
</math>
* Chromatic Difference: Quantifies the perceptual differences in color before and after recoloring, ensuring enhanced distinguishability
<math>
CD(i) = \sqrt{\lambda (l_i' - l_i)^2 + (a_i' - a_i)^2 + (b_i' - b_i)^2}
</math>
(lamda is a constant, not wavelength and l,a,b represent LAB space coordinates of recolored (') and original respectively.)
* Inference Time: Determines the computational efficiency of the models.

The key results are in Table 2 and takeaways for the Conditional Autoencoder can be summarized as follows:

{| class="wikitable" style="text-align:center; width:30%; margin:auto;"
|-
! Metric
! Value
|-
| Inference Time
| 2.6 seconds/image
|-
| SSIM ("Structure")
| 0.8707
|-
| Total Color Contrast ("Distinguishability")
| 0.5771 / (~0.851)*
|-
| Chromatic Difference ("Color")
| 0.3521 / (~0.963)*
|+ '''Table 2: Quantitative Evaluation Results'''
|}

Note: * indicates results from paper [2] for protan/deutan whichever is larger.

* TCC and CD are good but not as good as paper [2] because they use optimize networks for each CVD type separately.
* Blurry (SSIM is not optimized for enough)
* Mixing CVD types in the same network needs to be more sophisticated

== Conclusions ==
Through our (many) experiments, we learned a couple of things:

1. '''Model Effectiveness''':
Among the models, the Conditional Autoencoder showed the best balance between enhancing color contrast and preserving naturalness. It improved the distinguishability of colors for CVD individuals while maintaining a smooth, visually appealing output. However, it produced slightly blurry images, which could be improved with better loss functions or refinement techniques. The Conditional U-Net was also effective in preserving structure and providing stable recoloring, but it required careful training to avoid overfitting. The Conditional Parallel RGB MLP, while computationally fast, lacked the ability to capture spatial relationships between pixels, making it unsuitable for this task.

2. '''Importance of Loss Functions''':
Designing appropriate loss functions was crucial for achieving the right balance between naturalness, contrast enhancement, and structural preservation. The global and local contrast losses significantly improved the visibility of recolored images, while the naturalness loss ensured that the outputs did not look artificial. Incorporating metrics like SSIM and Chromatic Difference into the evaluation also helped us better understand how well the models performed.

3. '''Challenges with Data''':
One of the biggest challenges was ensuring that the dataset effectively represented real-world scenarios for CVD individuals. Simulating CVD perceptions and generating recolored images that matched those perceptions required a well-defined pipeline. A more diverse dataset or additional user studies with CVD participants could help fine-tune the models further.

4. '''Computational Efficiency''':
While models like the Conditional Autoencoder and Conditional U-Net provided high-quality recoloring, their inference times were moderate, making them feasible for real-time applications. Optimizing these models further could make them more scalable for real-world use cases, such as accessibility tools in apps or websites.

5. '''What Worked and What Didn’t''':
* Worked: Contrast enhancement methods using local and global losses were effective in improving visibility for CVD individuals. Transformer-inspired loss functions borrowed from Swin architecture added robustness.
* Didn’t Work: Pixel-wise methods like the Conditional RGB MLP struggled due to their inability to handle spatial dependencies. Additionally, overfitting was a recurring issue in larger architectures without careful training.

6. '''Future Directions''':
* Better Loss Functions: Refining the loss functions to address issues like blurriness in outputs could further improve results.
* User Studies: Testing the models with real CVD participants would provide valuable insights and help validate the results.
* Model Optimization: Reducing the computational cost of high-performing models like the Conditional Autoencoder could make them more practical for deployment.
* Exploration of New Architectures: Trying newer methods, such as lightweight transformers or diffusion-based models, might enhance recoloring performance while maintaining efficiency.

While there’s still room for improvement, our models demonstrated the potential of deep learning in addressing the challenges faced by individuals with CVD. Our future work would focus on refining these methods and bringing them closer to practical, everyday applications.

== References ==
[1] Li, H., Zhang, L., Zhang, X., Zhang, M., Zhu, G., Shen, P., ... & Shah, S. A. A. (2020). Color vision deficiency datasets & recoloring evaluation using GANs. Multimedia Tools and Applications, 79, 27583-27614.

[2] Chen, L., Zhu, Z., Huang, W., Go, K., Chen, X., & Mao, X. (2024). Image recoloring for color vision deficiency compensation using Swin transformer. Neural Computing and Applications, 36(11), 6051-6066.

[3] Jiang, S., Liu, D., Li, D., & Xu, C. (2023). Personalized image generation for color vision deficiency population. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22571-22580).

[4] Huang, J.-B., Chen, C.-S., Jen, T.-C., & Wang, S.-J. (n.d.). Image recolorization for the colorblind [GitHub repository]. Retrieved December 12, 2024, from https://github.com/jbhuang0604/RecolorForColorblind

[5] Dietrich, J. (n.d.). Daltonize Python Package [GitHub repository]. Retrieved December 12, 2024, from https://github.com/joergdietrich/daltonize/blob/main/daltonize/daltonize.py

[6] Dougherty, B., & Wade, A. (2000). Vischeck. Retrieved December 12, 2024, from https://www.vischeck.com/

[7] Brettel, H., Viénot, F., & Mollon, J. D. (1997). Computerized simulation of color appearance for dichromats. Josa a, 14(10), 2647-2655.

[8] Zhu, Z., Toyoura, M., Go, K., Fujishiro, I., Kashiwagi, K., & Mao, X. (2019). Processing images for red–green dichromats compensation via naturalness and information-preservation considered recoloring. The Visual Computer, 35, 1053-1066.

[9] Zhu, Z., Toyoura, M., Go, K., Kashiwagi, K., Fujishiro, I., Wong, T. T., & Mao, X. (2021). Personalized image recoloring for color vision deficiency compensation. IEEE Transactions on Multimedia, 24, 1721-1734.

[10] Tsekouras, G. E., Rigos, A., Chatzistamatis, S., Tsimikas, J., Kotis, K., Caridakis, G., & Anagnostopoulos, C. N. (2021). A novel approach to image recoloring for color vision deficiency. Sensors, 21(8), 2740.

[11] Huang, J. B., Chen, C. S., Jen, T. C., & Wang, S. J. (2009, April). Image recolorization for the colorblind. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 1161-1164). IEEE.

[12] Color-Blindness.com. (n.d.). COBLIS - Color Blindness Simulator. Retrieved December 13, 2024, from https://www.color-blindness.com/coblis-color-blindness-simulator/

== Appendix I ==
* [https://github.com/rainasong/psych221-aut24-final-project.git Code]
* [https://drive.google.com/drive/folders/10WMXPbtpV7Hy5_qBA_TCEbW-kCpj1D7v Dataset]

=== Additional results ===
1. '''Recolored Images - Conditional Autoencoder'''
<div style="display: inline; width: 220px; float: center;">
[[File:eb_1.png|400 px|Wikipedia encyclopedia]][[File:eb_2.png|400 px]] </div>

2. '''Loss curves'''
<div style="display: inline; width: 800px; float: center;">
[[File:loss_ae.png|300 px|center|thumb|Losses - Conditional Autoencoder]][[File:loss_unet.png|300 px|thumb|center|Losses - Conditional U-Net]][[File:loss_mlp.png|300 px|center|thumb|Losses - Conditional MLP]]</div>

== Appendix II ==
'''Ishikaa''':
* Training, evaluation and visualization for all deep learning methods (MLP, U-Net and Autoencoder)
* GMM recoloring method in Python & adding severity index
* 'Ground Truth' dataset creation and logging
* AWS Compute setup & configuration
* Written Report & Presentation

'''Raina''':

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-13T09:52:09Z

Rainas: /* Optimization-based method */

== Introduction ==
Color Vision Deficiency (CVD) affects approximately 350 million individuals worldwide, impairing their ability to distinguish certain colors. Image recoloring for individuals with CVDs has been a well-researched area, with numerous attempts aimed at creating images that make colors more distinguishable for those with CVDs. CVDs are typically classified into three main types: protanopia (difficulty perceiving red), deuteranopia (difficulty perceiving green), and tritanopia (difficulty perceiving blue). These conditions arise due to the absence or malfunction of certain types of cone cells in the retina, which are responsible for color vision. For instance, the lack or defect of red or green cones leads to protanopia or deuteranopia, respectively. This can result in partial or complete loss of perception of specific colors. Moreover, individuals with CVD can be dichromats (completely missing one type of cone cell), or anomalous trichromats (having all three types of cones but with altered sensitivity), causing milder color perception issues. Most research has focused on protanopia and deuteranopia, as these conditions are more prevalent, and only a few consider different severity levels.

In this work, we aim to consider all three types of CVDs, taking into account varying levels of severity for personalization. We explore existing methods in the field and experiment through two main approaches: mathematical transformations and deep learning techniques. We will start by reviewing current advancements in these two domains, followed by presenting our experiments and results. Evaluations of each method will be provided, leading to a discussion of our findings and outlining potential directions for future work.

The motivation for this work arose from a personal experience. While admiring a rainbow, a phenomenon I find among the most beautiful in the world, with a friend who has deuteranopia, I realized that they were unable to distinguish the vibrant array of colors. This experience highlighted the emotional and perceptual gap caused by CVD, inspiring the goal of this project: to develop personalized and efficient tools that enhance color perception for individuals with CVDs. Ultimately, we aim to enable those with CVDs to enjoy the same vivid experiences, such as the beauty of a rainbow, experienced by those with normal color vision.

== Background ==
In recent years, numerous methods have been developed to recolor images for individuals with CVDs, ranging from traditional mathematical approaches to advanced deep learning techniques. This section focuses on the prominent recent works in these two categories.

=== Mathematical-based methods ===
Mathematical approaches to image recoloring for individuals with CVDs have been extensively developed to enhance color discrimination while trying to preserve the natural appearance of images. These methods typically involve color space transformations, optimization techniques, and perceptual modeling to achieve their objectives.

==== Daltonization ====
Daltonization enhances images for individuals with CVD by correcting colors based on the simulated deficiency. The process involves comparing the original LMS values with the simulated deficient values to compute the error:
<math display="block">
\text{Error}_{\text{LMS}} = \text{LMS}_{\text{original}} - \text{LMS}_{\text{simulated}}
</math>

The error is then mapped back to the RGB space using a correction matrix because the error contains the information that dichromats cannot see, and the correction matrix rotates it to a part of the spectrum that they can see. For example, the correction matrix, as implemented in tools like Daltonize [5] and Vischeck [6], is:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

The corrected RGB values are added back to the original LMS values to generate a daltonized image that improves contrast for CVD viewers.

==== Optimization-based Method ====
Zhu et al. [8] introduced an optimization-based recoloring framework for red-green dichromacy, aiming to balance naturalness and contrast. The framework minimizes a total loss function defined as:

<math display="block"> E = \beta E_{\text{nat}} + E_{\text{cont}} </math>

where <math>\beta</math> is a scalar weight that controls the trade-off between the two objectives: naturalness preservation (<math>E_{\text{nat}}</math>) and contrast enhancement (<math>E_{\text{cont}}</math>).

The naturalness term, <math>E_{\text{nat}}</math>, ensures that the recolored image closely resembles the original image for CVD viewers by minimizing perceptual differences:

<math display="block"> E_{\text{nat}} = \sum_{i=1}^N \| c_i^+ - c_i \|^2, </math>

where:
* <math>N</math> is the total number of pixels in the image,
* <math>c_i</math> is the original color of the <math>i</math>-th pixel,
* <math>c_i^+</math> is the recolored value of the <math>i</math>-th pixel,
* <math>\| c_i^+ - c_i \|</math> is the Euclidean distance, measuring the perceptual difference between the original and recolored colors.

The contrast term, <math>E_{\text{cont}}</math>, enhances the distinguishability of colors in the recolored image by minimizing changes in color contrast:

<math display="block"> E_{\text{cont}} = \sum_{i \neq j} \| (c_i^+ - c_j^+) - (c_i - c_j) \|^2, </math>

where:
* <math>(c_i^+ - c_j^+)</math> is the perceived color difference between pixels <math>i</math> and <math>j</math> after recoloring,
* <math>(c_i - c_j)</math> is the original color difference,
* <math>\| (c_i^+ - c_j^+) - (c_i - c_j) \|</math> represents the deviation in color contrast before and after recoloring.

To address the limitations of this approach, Zhu et al. [9] proposed a degree-adaptable framework incorporating a transformation matrix <math>T</math> that simulates CVD perception. The transformation matrix is defined as:

<math display="block"> T = \begin{bmatrix} t_{11} & t_{12} & t_{13} \\ t_{21} & t_{22} & t_{23} \\ t_{31} & t_{32} & t_{33} \end{bmatrix}, </math>

where <math>t_{ij}</math> are the elements representing the relationships between the original and perceived LMS (Long, Medium, Short wavelength) cone responses for individuals with CVD.

The degree-adaptable loss function extends the optimization by adjusting weights based on perceptual importance, defined as:

<math display="block"> E = \beta \sum_{i=1}^N \alpha_i \| T(c_i^+ - c_i) \|^2 + \sum_{i \neq j} \| T(c_i^+ - c_j^+) - T(c_i - c_j) \|^2. </math>

Here:
* <math>\alpha_i</math> assigns weights to each pixel, prioritizing the preservation of colors with smaller perception errors,
* <math>\| T(c_i^+ - c_i) \|</math> measures the perceptual difference after recoloring,
* <math>\| T(c_i^+ - c_j^+) - T(c_i - c_j) \|</math> quantifies the deviation in color contrast under CVD simulation.

This framework improves both contrast and personalization but requires further optimization for real-time performance.

==== Confusion lines based Method ====
Tsekouras et al. [10] proposed a novel image recoloring approach for individuals with protanopia and deuteranopia, focusing on improving color naturalness and enhancing contrast. Their framework consists of four modules, with a key focus on shifting confusing colors along confusion lines in the CIE 1931 chromaticity diagram.

The process begins with fuzzy clustering, which identifies representative colors (key colors) from the input image. These key colors are then analyzed on the chromaticity diagram, where confusion lines—paths representing colors indistinguishable by individuals with CVD—serve as the basis for recoloring. Confusion lines are defined using the copunctal point of the missing cone type and another reference point:

<math display="block">
d(v, L) = \frac{\left|(x_{cp} - x_0)(y_0 - y_v) - (x_0 - x_v)(y_{cp} - y_0)\right|}{\sqrt{(x_{cp} - x_0)^2 + (y_{cp} - y_0)^2}},
</math>

where:
* <math display="inline">v = (x_v, y_v)</math> is the chromaticity coordinate of the color,
* <math display="inline">L</math> is the confusion line passing through the copunctal point <math display="inline">(x_{cp}, y_{cp})</math> and another reference point <math display="inline">(x_0, y_0)</math>,
* <math display="inline">d(v, L)</math> measures the perpendicular distance from the point <math display="inline">v</math> to the confusion line <math display="inline">L</math>.

Confusing colors, identified as key colors lying on occupied confusion lines, are iteratively shifted to the nearest non-occupied confusion lines to enhance discriminability for CVD viewers. High-ranking colors, determined by their prominence in image clusters, are shifted to the nearest unoccupied confusion lines. This reallocation ensures that these colors are distinguishable to viewers with CVD while minimizing disruption to the image's overall color harmony.

After shifting, the luminance of the recolored key colors is optimized using a regularized objective function to balance naturalness and contrast:
<math display="block">
E = (E_1 + E_2) + \lambda E_3,
</math>

where:
* <math display="inline">E</math> is the total loss,
* <math display="inline">\lambda</math> is a weight parameter controlling the trade-off between contrast enhancement and naturalness preservation.

The first term, <math display="inline">E_1</math>, measures contrast enhancement for normal trichromats:

<math display="block">
E_1 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - b_j\| - \|f_D(a_{i,\text{rec}}) - f_D(b_j)\| \right|,
</math>

where:
* <math display="inline">n_A</math> and <math display="inline">n_B</math> are the number of key colors in clusters <math display="inline">A</math> and <math display="inline">B</math>, respectively,
* <math display="inline">a_i</math> is the chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">b_j</math> is the chromaticity of the <math display="inline">j</math>-th key color in cluster <math display="inline">B</math>,
* <math display="inline">f_D</math> is a function simulating the dichromatic vision of individuals with color vision deficiencies,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color.

The second term, <math display="inline">E_2</math>, measures contrast enhancement for dichromats:

<math display="block">
E_2 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - a_j\| - \|f_D(a_{i,\text{rec}}) - f_D(a_{j,\text{rec}})\| \right|,
</math>

where:
* <math display="inline">a_i</math> and <math display="inline">a_j</math> are the chromaticities of the <math display="inline">i</math>-th and <math display="inline">j</math>-th key colors in cluster <math display="inline">A</math>,
* <math display="inline">f_D(a_{i,\text{rec}})</math> simulates the dichromatic perception of the recolored chromaticity <math display="inline">a_{i,\text{rec}}</math>.

The third term, <math display="inline">E_3</math>, preserves the naturalness of the recolored image:

<math display="block">
E_3 = \frac{1}{n_A} \sum_{i=1}^{n_A} \|a_i - a_{i,\text{rec}}\|,
</math>

where:
* <math display="inline">a_i</math> is the original chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">\|a_i - a_{i,\text{rec}}\|</math> is the Euclidean distance between the original and recolored chromaticities, measuring how much the naturalness is preserved.

This method significantly enhances the contrast and naturalness of recolored images by leveraging confusion line geometry and regularized optimization. However, challenges remain in achieving real-time performance and handling cases where shifting may distort the aesthetic quality of the image.

==== GMM-based Method ====
Huang et al. [11] proposed an efficient and effective re-coloring algorithm for individuals with CVD using a Gaussian Mixture Model (GMM) to represent color distributions. The algorithm comprises four main steps: feature extraction, clustering using GMM, optimization of Gaussian components, and interpolation for recoloring.

Step 1 - Feature Extraction:
Each pixel in the input image is represented in the CIEL*a*b* color space, which approximates perceptual differences using the Euclidean distance between colors. The color feature vector <math display="inline">x</math> is used as input for clustering.

Step 2 - Clustering via GMM:
The color distribution of the image is modeled using a GMM with <math display="inline">K</math> Gaussian components:
<math display="block">
p(x|\Theta) = \sum_{i=1}^K \omega_i G_i(x|\theta_i),
</math>
where:
* <math display="inline">\Theta</math> is the parameter set containing all weights, means, and covariance matrices,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian,
* <math display="inline">G_i(x|\theta_i)</math> is the 3D normal distribution with parameters <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix).

Step 3 - Optimization:
To ensure color distinguishability for CVD viewers, the algorithm adjusts the mean vector of each Gaussian component using an optimization function that preserves the symmetric Kullback-Leibler (KL) divergence:
<math display="block">
D_{sKL}(G_i, G_j) = D_{KL}(G_i \| G_j) + D_{KL}(G_j \| G_i),
</math>
where:
* <math display="inline">D_{KL}(G_i \| G_j)</math> measures the dissimilarity between two Gaussian distributions <math display="inline">G_i</math> and <math display="inline">G_j</math>.

The optimization aims to preserve the contrast perceived by CVD viewers while maintaining naturalness. Weights are assigned to Gaussian components based on the perceptual importance of colors:
<math display="block">
\lambda_i = \frac{\sum_{j=1}^N \alpha_j p(i|x_j, \Theta)}{\sum_{k=1}^K \sum_{j=1}^N \alpha_j p(k|x_j, \Theta)},
</math>
where:
* <math display="inline">\alpha_j = \|x_j - \text{Sim}(x_j)\|</math> is the perceptual error of the <math display="inline">j</math>-th color feature when simulated for CVD,
* <math display="inline">\text{Sim}(\cdot)</math> is the simulation function for CVD perception.

Step 4 - Interpolation for Recoloring:
After optimizing the Gaussians, the mapping function <math display="inline">M_i(\cdot)</math> relocates the mean vectors while maintaining covariance matrices. Interpolation ensures smooth transitions between recolored regions:
<math display="block">
T(x_j)_H = x_j^H + \sum_{i=1}^K p(i|x_j, \Theta) (M_i(\mu_i)_H - \mu_i^H),
</math>
where:
* <math display="inline">T(x_j)_H</math> is the hue adjustment for the <math display="inline">j</math>-th color,
* <math display="inline">M_i(\mu_i)_H</math> is the mapped hue of the <math display="inline">i</math>-th Gaussian's mean.

While the GMM-based approach effectively models color distributions and enhances the contrast of recolored images significantly, it has limitations:
* The accuracy of recoloring depends on the choice of <math display="inline">K</math>, which may vary for different images.
* The method assumes diagonal covariance matrices for computational efficiency, which may oversimplify real-world color distributions. Sometimes the colors in the recolored images are not very natural.
* The high computational complexity in the optimization step of this algorithm may be difficult for real-time applications.

=== Deep Learning based methods ===
Conventional methods for recoloring, including optimization-based approaches (as discussed above), fail to generalize well across varying severity levels and CVD types. While these methods improve color differentiation, they frequently compromise naturalness or require extensive computational resources, making them less suitable for real-time, efficient, personalized applications.

==== GAN-Based Recoloring for CVD ====

In [1] GANs (Generative Adversarial Networks) was explored for recoloring, with a backbone Pix2Pix-GAN, Cycle-GAN, and Bicycle-GAN structure showing promising results. These models are generate creative recolored images by learning mappings between normal and CVD-affected color spaces. However, this and existing GAN approaches struggle with balancing naturalness and contrast. This specific reference also requires paired datasets (since it is adapted from style transfer), making it computationally intensive and less suitable for personalization.

==== Swin Transformer Recoloring ====

The authors in [2] introduced a hierarchical vision transformer (SWIN) architecture that processes images through shifted windows, effectively capturing both local and global contextual information. In computer vision, this design generally allows efficient handling of high-resolution images and has been applied to various tasks, including image classification and object detection. Despite its robust performance, this architecture is still computationally intensive and does not inherently account for the specific needs of CVD individuals, as it lacks mechanisms for personalized color adjustments.

==== Personalized CVD-GAN ====

To cater to the diverse needs of the CVD population, the Personalized CVD-GAN [3] was developed. This model generates images that are not only CVD-friendly but also tailored to individual degrees of color vision deficiency. By disentangling color representations using a unique triple-latent structure in their method, continuous personalization was possible to adjust images according to specific CVD severities. While effective, this approach is computationally demanding, making it less practical for real-time applications. In our experiment, it took around 18 days for one epoch (or one iteration over the entire dataset).

Thus, existing methods either lack personalization or are too resource-intensive for widespread use.

== Methods ==
We aim to find effective and efficient ways to recolor images for people with CVD with the personalization of different severity levels. We start by exploring existing methods and identifying opportunities for improvement. Since mathematical-based approaches provide a solid foundation and are well-documented, we began our experiments by testing these methods, as described in the background. We later extended our exploration to deep learning based methods.

=== Mathematical based ===
We explored four main methods, building on the foundational work discussed in the background section.

==== Method 1: Daltonization as a baseline ====
We started with the relatively intuitive Daltonization method, where we adjusted the colors in an image to compensate for color vision deficiencies by simulating how the colors appear to individuals with CVD. This involves computing the difference between the original and simulated color perception in the LMS (Long, Medium, Short wavelength) color space. The calculated error is then corrected and mapped back to the RGB space using a transformation matrix, resulting in a recolored image that enhances color differentiation for viewers with CVD.

The simulation of CVDs relies on the physiology of human vision, particularly the responses of the Long (L), Medium (M), and Short (S) wavelength-sensitive cones in the retina. The LMS color space is derived from the spectral sensitivities of these cones, making it an ideal framework for modeling human color perception.

To simulate CVD, we first transformed colors in RGB color space into the LMS color space using the following linear transformation matrix based on Stockman and Sharpe’s cone fundamentals:
<math display="block">
T_{\text{RGB-to-LMS}} = \begin{bmatrix}
0.3904725 & 0.54990437 & 0.00890159 \\
0.07092586 & 0.96310739 & 0.00135809 \\
0.02314268 & 0.12801221 & 0.93605194
\end{bmatrix}
</math>

For individuals with CVD, the missing cone’s response is replaced by a weighted combination of the remaining two cones. This approach, introduced by Brettel, Viénot, and Mollon (1997) [7], uses specific coefficients derived from cone sensitivities. For example, in protanopia (L-cone deficiency), the L-cone response is approximated using the M- and S-cone responses as:
<math display="block">
L_{\text{simulated}} = 0 \cdot L + 0.90822864 \cdot M + 0.008192 \cdot S
</math>

For deuteranopia (M-cone deficiency), the M-cone is replaced as:
<math display="block">
M_{\text{simulated}} = 1.10104433 \cdot L + 0 \cdot M - 0.00901975 \cdot S
</math>

For tritanopia (S-cone deficiency), the S-cone is replaced as:
<math display="block">
S_{\text{simulated}} = -0.15773032 \cdot L + 1.19465634 \cdot M + 0 \cdot S
</math>

These transformations allow accurate simulation of the perceptual experience of individuals with CVD. (The numbers are derived from [5]).

The error between the original and simulated is then mapped into the RGB color space using a deficiency-specific correction matrix, which adjusts the image to enhance contrast and recover lost color differences. The predefined correction matrix is applied to the error in RGB space, transforming it back into LMS space for final adjustments. The corrected LMS values are added back to the original values, producing a recolored image that improves visual accessibility for viewers with CVD. This approach uses the Daltonize-inspired correction matrix:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

==== Method 2: Optimizing Objective Function ====
To improve the results from the Daltonization method, we designed a framework inspired by methods discussed in the background, incorporating dominant color extraction, optimization-based recoloring, and edit propagation. This approach aims to find a balance between the naturalness and contrast while compensating colors that are not visible for corresponding CVD types.

===== 1. Extraction of Dominant Colors =====
We begin by extracting the dominant colors from the input image using fuzzy clustering via a MiniBatch K-means algorithm. This step identifies a reduced set of representative colors that capture the primary color information in the image:
<math display="block">
\mathbf{C} = \{\mathbf{c}_1, \mathbf{c}_2, \ldots, \mathbf{c}_N\},
</math>
where <math display="inline">N</math> represents the number of clusters, and <math display="inline">\mathbf{c}_i</math> represents the centroid of the <math display="inline">i</math>-th cluster.

===== 2. Optimization-Based Recoloring =====
Once the dominant colors are extracted, we apply an optimization process to adjust these colors. The optimization uses the formulas mentioned in, and aims to balance two key objectives:

1. Naturalness Preservation: Ensures the recolored image minimally deviates from the original.
<math display="block">
E_{\text{nat}} = \sum_{i=1}^N \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_i^{\text{original}}) \|^2,
</math>
where <math display="inline">\mathbf{T}</math> is the transformation matrix based on the severity and type of CVD, and <math display="inline">\mathbf{c}_i^{\text{original}}</math> is the original color.

2. Contrast Enhancement: Improves the differentiation of colors for individuals with CVD:
<math display="block">
E_{\text{cont}} = \sum_{i=1}^N \sum_{j>i} \left( \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_j) \|^2 - \| \mathbf{c}_i^{\text{original}} - \mathbf{c}_j^{\text{original}} \|^2 \right)^2.
</math>

The total objective function combines these two terms:
<math display="block">
E = \beta E_{\text{nat}} + E_{\text{cont}},
</math>
where <math display="inline">\beta</math> controls the trade-off between naturalness and contrast.

Optimization is performed using the L-BFGS-B algorithm to ensure efficient convergence under bounded constraints.

The transformation matrices for each type of CVD are the following, which are based on [13]:
<math display="inline">
T_{\text{Protanopia}} = \begin{bmatrix} 0.566 & 0.558 & 0 \\ 0.433 & 0.442 & 0.242 \\ 0 & 0 & 0.758 \end{bmatrix},
</math>
<math display="inline">
T_{\text{Deuteranopia}} = \begin{bmatrix} 0.625 & 0.7 & 0 \\ 0.375 & 0.3 & 0.3 \\ 0 & 0 & 0.7 \end{bmatrix},
</math>
<math display="inline">
T_{\text{Tritanopia}} = \begin{bmatrix} 0.95 & 0 & 0 \\ 0.05 & 0.433 & 0 \\ 0 & 0.567 & 1 \end{bmatrix}.
</math>

===== 3. Edit Propagation =====
After optimizing the dominant colors, we propagate these edits across the entire image to ensure smooth transitions. The propagation step uses interpolation in the CIE-Lab color space:
<math display="block">
\Delta L^* = \text{griddata}(\mathbf{c}^{\text{original}}, \mathbf{c}^{\text{recolored}} - \mathbf{c}^{\text{original}}, \mathbf{I}),
</math>
where <math display="inline">\mathbf{I}</math> represents the pixel values in the Lab color space, and <math display="inline">\Delta L^*</math> adjusts the luminance values. The recolored image is reconstructed by applying the interpolated changes back to the original image and converting it to RGB.

=== Deep Learning based ===

==== Task Overview ====
Given an input RGB image and a label for the user (as shown in the figure), we want a deep learning model to output a recolored RGB image that is specific to that user. More details on inputs and outputs are discussed in further sections but an overview is shown in Figure 1. All of the code was done in Python using a deep learning framework called [https://pytorch.org PyTorch]
[[File:Io.png|right|thumb|200px|Figure 1: Dataset]]

==== Types ====
1. ''' Supervised methods ''':
These are deep learning models that require a 'ground truth' recolored image for the neural network to learn recolorization. While these methods are simple, easy to train and integrate the user label, they require an already present ground truth comparison of expected output.

2. ''' Unsupervised methods ''':
These models are trained without a ground truth and can also encode user label information while training. They are generally better at generating more natural images, but they require more compute and sophisticated model architectures or loss functions for the recoloring task

==== Dataset ====
The dataset used for this project was constructed specifically to address the challenges of recoloring images for individuals with color vision deficiency (CVD). We first gathered an open-source RGB image dataset from [2]. To improve the capability of the proposed model to enhance the contrast between CVD-indistinguishable color
pairs, in their study, they created a new dataset consisting of 141,000 pictures of both natural scenes and artificial images containing
CVD-confusing colors without labels. To generate labels (and ground truth recolored images for supervised methods), we randomly sampled 15,000 images and recolored by simulating random labels for severity and type of CVD. The recoloring for ground truth images was done using a [https://github.com/jbhuang0604/RecolorForColorblind/tree/master MATLAB script] (adapted to Python) from [4]. Note: The open-source tools used in the Python version for the recoloring script were [https://scikit-image.org Scikit-Image], [https://scipy.org Scipy] and [https://python-colormath.readthedocs.io/en/latest/ Colormath].

As shown in Figure 1, each sample in the dataset consists of:

1. ''' Original RGB Image''' : High-resolution images, resized to <code> 256x256</code> pixels and normalized to <code>[0,1]</code> range, representing the standard color space.

2. ''' CVD Labels ''' : Condition labels encoded as <code>severity * [protan, deutan]</code>, where severity ranges from 0.1 to 1.0. For example, a label <code>[0.6, 0]</code> corresponds to protanopia at 60% severity.

Data augmentation techniques such as random rotations, crops, and brightness adjustments were applied to expand the dataset, ensuring robust model generalization across diverse scenarios.

==== Supervised Methods ====
===== Conditional Parallel RGB MLP =====
[[File:mlp.png|right|thumb|Figure 2: Conditional MLP architecture]]
As shown in Figure 2, the model predicts the R, G, and B channels separately using an independent multi-layer perceptron (MLP) for each channel. The input image is concatenated with the label encoding along the channel dimension and is passed to 3 parallel MLPs simultaneously. These parallel networks are learned to predicted R, G, B channels of a recolored image based on given ground truth. The outputs from each of these networks are concatenated to produce the recolored RGB image of same spatial dimensions as input. Essentially, each channel is disentangled, enabling targeted adjustments.

The loss function used to train was pixel wise, mean-squared error loss:
<math>
\mathcal{L}_{\text{MSE}} = \frac{1}{N} \sum_{p=1}^{N} \left( I(p) - I'(p) \right)^2
</math>

Where:
* I, I': Recolored (model output) image and ground truth recolored image respectively
* p: Image index
* N: Total number of images

===== Conditional U-Net =====
In a similar fashion of inputs, a convolutional neural network (CNN)-based U-Net architecture was tested to generate a full recolored image as output. The conditional inputs here affect both the encoder and decoder. [[File:Unet condtional.png|right|thumb|Figure 3: Conditional U-Net architecture]]
U-Nets are widely used in computer vision tasks and are very robust to new tasks as well. The architecture we adopted is shown in Figure 3.
The loss function used to train the U-Net was a commonly used VGG Perceptual Loss:
<math>
\mathcal{L}_{\text{VGG}} = \sum_{l} \frac{1}{N_l} \| \phi_l(I) - \phi_l(I') \|_2^2
</math>

Where:
* I and I': are recolored (model output) and ground recolored images respectively
* <math>\phi_l</math> is the l-th of the pre-trained VGG network

==== Unsupervised Methods ====
===== Conditional Autoencoder =====
As shown in Figure4, an unsupervised CNN-based encoder-decoder network was trained to reconstruct full recolored images with a CVD-aware color palette. The key to making this network align with the recoloring task was the loss functions. The loss functions we used to train this network were inspired from [2]. [[File:Ae.png|right|350px|thumb|Figure 4: Conditional Autoencoder architecture]]

The total loss function is given by:
<math>
\mathcal{L}_{\text{total}} = \alpha \cdot \mathcal{L}_{\text{naturalness}} + 2 \cdot (1 - \alpha) \cdot \mathcal{L}_{\text{contrast}}
</math>

Where:
<math>
\mathcal{L}_{\text{contrast}} = \beta \cdot \mathcal{L}_{\text{global}} + (2 - \beta) \cdot \mathcal{L}_{\text{local}}
</math>

The components of the loss functions are described below:

1. '''Global Contrast Loss''':
The global contrast loss ensures that the overall contrast of the recolored image is preserved. It is defined as
<math>
\mathcal{L}_{global} = \frac{1}{\|\omega\|} \sum_{<x, y> \in \epsilon \omega} \text{CL}(x, y)
</math>

2. '''Local Contrast Loss''':
The local contrast loss focuses on preserving the contrast within a small neighborhood around each pixel. <math>
\mathcal{L}_{l} = \frac{1}{N} \sum_{x=1}^{N} \sum_{y \in \omega_x} \frac{\text{CL}(x, y)}{\|\omega_x\|}
</math>

Note:

<math>
\text{CL}(x, y) = \|\hat{c}_x' - \hat{c}_y'\| - \|c_x - c_y\|
</math>

* x,y: Two distinct pixels in the image.
* cx and cy: CVD simulated colors of original image
* c^x′and c^y: CVD simulated colors of recolored image (model output)
* ||w||: Global (or large) window of image
* ||wx||: Local window or neighborhood around a pixel x

3. '''Naturalness Loss''':
The naturalness loss drives output image to have colors that are visually similar and close to natural distributions. <math>
\mathcal{L}_{\text{natural}} = 1 - \text{SSIM}(I', I)
</math>

Where:
* I(i), I'(i): Original and recolored images respectively

== Results ==
=== Mathematical based methods ===
{| class="wikitable"
|+ Table 1: Quantitative Evaluation Results for Mathematical Methods
! !! Method 1 !! Method 2 !! Method 3 !! Method 4
|-
! colspan="5" | Performance
|-
| Time/image || 0.2s || 1m13s || 4.4s || 1.6s
|-
! colspan="5" | SSIM Metrics
|-
| Original vs Recolored || 0.0066 || 0.9998 || 0.9988 || 0.9902
|-
| Original vs Original Simulated || 0.9985 || 0.9985 || 0.9985 || 0.9985
|-
| Recolored vs Recolored Simulated || 0.9565 || 0.9986 || 0.9986 || 0.9968
|-
! colspan="5" | TCC Metrics
|-
| Original vs Recolored || 0.4211 || 0.0001 || 0.0003 || 0.0005
|-
| Original vs Original Simulated || 0.0004 || 0.0003 || 0.0003 || 0.0003
|-
| Recolored vs Recolored Simulated || 0.0380 || 0.0003 || 0.0002 || 0.0005
|-
! colspan="5" | CD ΔE76 Metrics
|-
| Original vs Recolored || 57.4513 || 0.0217 || 0.0632 || 0.1057
|-
| Original vs Original Simulated || 0.0462 || 0.0462 || 0.0462 || 0.0462
|-
| Recolored vs Recolored Simulated || 8.4251 || 0.0458 || 0.0435 || 0.0578
|-
! colspan="5" | CIEDE2000 Metrics
|-
| Original vs Recolored || 41.2667 || 0.0229 || 0.0675 || 0.1312
|-
| Original vs Original Simulated || 0.0681 || 0.0681 || 0.0681 || 0.0681
|-
| Recolored vs Recolored Simulated || 6.9145 || 0.0671 || 0.0630 || 0.0838
|-
! colspan="5" | CIEDE94 Metrics
|-
| Original vs Recolored || 57.3637 || 0.0217 || 0.0630 || 0.1056
|-
| Original vs Original Simulated || 0.0461 || 0.0461 || 0.0461 || 0.0461
|-
| Recolored vs Recolored Simulated || 5.3878 || 0.0457 || 0.0434 || 0.0576
|-
! colspan="5" | D-CIELAB ΔEab Metrics
|-
| Original vs Recolored || 2.1314 || 3.8863 || 7.6867 || 8.0045
|-
| Original vs Original Simulated || 1.7209 || 1.7209 || 1.7209 || 1.7209
|-
| Recolored vs Recolored Simulated || 1.5926 || 1.9673 || 1.4363 || 2.4009
|}

=== Deep Learning based methods ===
The results focus on evaluating the performance of the above neural network architectures—Conditional Parallel RGB MLP, Deep U-Net, and Conditional Autoencoder. Quantitive metrics such as Structural Similarity Index (SSIM), total color contrast (TCC), Chromatic Difference (CD), and inference time were used to assess the effectiveness of the models provided in [1] and [2].

==== Qualitative Results ====
The recolored outputs were visually evaluated to determine their alignment with expected results. The 'expected' results for supervised mean how closely they resemble ground truth recolored image and for unsupervised method mean how much contrast and naturalness is observed in the CVD simulated recolored images compared to original.
The results and takeaways can be summarized as follows:

1. '''Conditional Parallel RGB MLP''': (Figure 5)
[[File:Mlp_res.png|right|400px|thumb|Figure 5 Conditional MLP: Model failure]]
* Recoloring was inconsistent, with visible artifacts in regions where spatial correlations were essential.
* The pixels seemed more discretized, suggesting that disentanglement was not very useful for this case (especially naturalness).
* Failed to preserve natural color transitions, particularly in complex images.
2. '''Conditional U-Net''': (Figure 6, 7)
[[File:Unet_res1.png|right|400px|thumb|Figure 6 Conditional U-Net: Model failure]]
[[File:Unet_res2.png|right|400px|thumb|Figure 7 Conditional U-Net: CVD Simulated examples]]
* Produced stable recoloring, preserving structural details.
* Initially showed improvement towards resembling ground truth, but over time started 'reconstructing' the colors of the original image.
* The CVD simulations of recolored versus original were similar or worse meaning that the model was not doing well for this task
* Sometimes it over-saturated some colors, affecting the visual appeal.
3. '''Conditional Autoencoder''': (Figure 8, 9)
[[File:ae_res1.png|right|400px|thumb|Figure 8 Conditional Autoencoder: Majority good results]]
[[File:ae_res1.png|right|400px|thumb|Figure 9 Conditional Autoencoder: Marginal or negative improvement + Blurriness]]
* Achieved smooth and natural recoloring, with fewer artifacts.
* Showed the highest contrast improvement among the three models.
* In some cases, hurt the contrast in the CVD simulated colors and in some there was marginal improvement in contrast.
* Blurriness in the recolored images was seen (possibly due to naturalness factor being more prioritized even though weight coefficients in the loss term favored contrast (alpha = 0.25, beta = 1.0)).

==== Quantitative Results ====
Based on the above qualitative results, we decided to score and evaluate metrics for comparison with related work only using the Conditional Autoencoder.
As mentioned above, the evaluation metrics are adapted from [1] and [2]. Please refer to the definitions in the paper, as we have used the same. On a high level, the three components are:
* SSIM: Measures the structural similarity between the original and recolored images, ensuring the structural integrity of the recolored image is maintained.
<math>
SSIM(X, Y) = \frac{(2\mu_X\mu_Y + c_1)(2\sigma_{XY} + c_2)}{(\mu_X^2 + \mu_Y^2 + c_1)(\sigma_X^2 + \sigma_Y^2 + c_2)}
</math>

* Total Color Contrast: Quantifies the visibility improvement between indistinguishable colors for CVD individuals.
<math>
TCC = \frac{1}{n_1} \sum_{(i,j) \in \Omega_1} |x_i - x_j|
+ \frac{1}{N \cdot n_2} \sum_{i=1}^{N} \sum_{j \in \Omega_2} |x_i - x_j|
</math>
* Chromatic Difference: Quantifies the perceptual differences in color before and after recoloring, ensuring enhanced distinguishability
<math>
CD(i) = \sqrt{\lambda (l_i' - l_i)^2 + (a_i' - a_i)^2 + (b_i' - b_i)^2}
</math>
(lamda is a constant, not wavelength and l,a,b represent LAB space coordinates of recolored (') and original respectively.)
* Inference Time: Determines the computational efficiency of the models.

The key results are in Table 2 and takeaways for the Conditional Autoencoder can be summarized as follows:

{| class="wikitable" style="text-align:center; width:30%; margin:auto;"
|-
! Metric
! Value
|-
| Inference Time
| 2.6 seconds/image
|-
| SSIM ("Structure")
| 0.8707
|-
| Total Color Contrast ("Distinguishability")
| 0.5771 / (~0.851)*
|-
| Chromatic Difference ("Color")
| 0.3521 / (~0.963)*
|+ '''Table 2: Quantitative Evaluation Results'''
|}

Note: * indicates results from paper [2] for protan/deutan whichever is larger.

* TCC and CD are good but not as good as paper [2] because they use optimize networks for each CVD type separately.
* Blurry (SSIM is not optimized for enough)
* Mixing CVD types in the same network needs to be more sophisticated

== Conclusions ==
Through our (many) experiments, we learned a couple of things:

1. '''Model Effectiveness''':
Among the models, the Conditional Autoencoder showed the best balance between enhancing color contrast and preserving naturalness. It improved the distinguishability of colors for CVD individuals while maintaining a smooth, visually appealing output. However, it produced slightly blurry images, which could be improved with better loss functions or refinement techniques. The Conditional U-Net was also effective in preserving structure and providing stable recoloring, but it required careful training to avoid overfitting. The Conditional Parallel RGB MLP, while computationally fast, lacked the ability to capture spatial relationships between pixels, making it unsuitable for this task.

2. '''Importance of Loss Functions''':
Designing appropriate loss functions was crucial for achieving the right balance between naturalness, contrast enhancement, and structural preservation. The global and local contrast losses significantly improved the visibility of recolored images, while the naturalness loss ensured that the outputs did not look artificial. Incorporating metrics like SSIM and Chromatic Difference into the evaluation also helped us better understand how well the models performed.

3. '''Challenges with Data''':
One of the biggest challenges was ensuring that the dataset effectively represented real-world scenarios for CVD individuals. Simulating CVD perceptions and generating recolored images that matched those perceptions required a well-defined pipeline. A more diverse dataset or additional user studies with CVD participants could help fine-tune the models further.

4. '''Computational Efficiency''':
While models like the Conditional Autoencoder and Conditional U-Net provided high-quality recoloring, their inference times were moderate, making them feasible for real-time applications. Optimizing these models further could make them more scalable for real-world use cases, such as accessibility tools in apps or websites.

5. '''What Worked and What Didn’t''':
* Worked: Contrast enhancement methods using local and global losses were effective in improving visibility for CVD individuals. Transformer-inspired loss functions borrowed from Swin architecture added robustness.
* Didn’t Work: Pixel-wise methods like the Conditional RGB MLP struggled due to their inability to handle spatial dependencies. Additionally, overfitting was a recurring issue in larger architectures without careful training.

6. '''Future Directions''':
* Better Loss Functions: Refining the loss functions to address issues like blurriness in outputs could further improve results.
* User Studies: Testing the models with real CVD participants would provide valuable insights and help validate the results.
* Model Optimization: Reducing the computational cost of high-performing models like the Conditional Autoencoder could make them more practical for deployment.
* Exploration of New Architectures: Trying newer methods, such as lightweight transformers or diffusion-based models, might enhance recoloring performance while maintaining efficiency.

While there’s still room for improvement, our models demonstrated the potential of deep learning in addressing the challenges faced by individuals with CVD. Our future work would focus on refining these methods and bringing them closer to practical, everyday applications.

== References ==
[1] Li, H., Zhang, L., Zhang, X., Zhang, M., Zhu, G., Shen, P., ... & Shah, S. A. A. (2020). Color vision deficiency datasets & recoloring evaluation using GANs. Multimedia Tools and Applications, 79, 27583-27614.

[2] Chen, L., Zhu, Z., Huang, W., Go, K., Chen, X., & Mao, X. (2024). Image recoloring for color vision deficiency compensation using Swin transformer. Neural Computing and Applications, 36(11), 6051-6066.

[3] Jiang, S., Liu, D., Li, D., & Xu, C. (2023). Personalized image generation for color vision deficiency population. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22571-22580).

[4] Huang, J.-B., Chen, C.-S., Jen, T.-C., & Wang, S.-J. (n.d.). Image recolorization for the colorblind [GitHub repository]. Retrieved December 12, 2024, from https://github.com/jbhuang0604/RecolorForColorblind

[5] Dietrich, J. (n.d.). Daltonize Python Package [GitHub repository]. Retrieved December 12, 2024, from https://github.com/joergdietrich/daltonize/blob/main/daltonize/daltonize.py

[6] Dougherty, B., & Wade, A. (2000). Vischeck. Retrieved December 12, 2024, from https://www.vischeck.com/

[7] Brettel, H., Viénot, F., & Mollon, J. D. (1997). Computerized simulation of color appearance for dichromats. Josa a, 14(10), 2647-2655.

[8] Zhu, Z., Toyoura, M., Go, K., Fujishiro, I., Kashiwagi, K., & Mao, X. (2019). Processing images for red–green dichromats compensation via naturalness and information-preservation considered recoloring. The Visual Computer, 35, 1053-1066.

[9] Zhu, Z., Toyoura, M., Go, K., Kashiwagi, K., Fujishiro, I., Wong, T. T., & Mao, X. (2021). Personalized image recoloring for color vision deficiency compensation. IEEE Transactions on Multimedia, 24, 1721-1734.

[10] Tsekouras, G. E., Rigos, A., Chatzistamatis, S., Tsimikas, J., Kotis, K., Caridakis, G., & Anagnostopoulos, C. N. (2021). A novel approach to image recoloring for color vision deficiency. Sensors, 21(8), 2740.

[11] Huang, J. B., Chen, C. S., Jen, T. C., & Wang, S. J. (2009, April). Image recolorization for the colorblind. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 1161-1164). IEEE.

[12] Color-Blindness.com. (n.d.). COBLIS - Color Blindness Simulator. Retrieved December 13, 2024, from https://www.color-blindness.com/coblis-color-blindness-simulator/

== Appendix I ==
* [https://github.com/rainasong/psych221-aut24-final-project.git Code]
* [https://drive.google.com/drive/folders/10WMXPbtpV7Hy5_qBA_TCEbW-kCpj1D7v Dataset]

=== Additional results ===
1. '''Recolored Images - Conditional Autoencoder'''
<div style="display: inline; width: 220px; float: center;">
[[File:eb_1.png|400 px|Wikipedia encyclopedia]][[File:eb_2.png|400 px]] </div>

2. '''Loss curves'''
<div style="display: inline; width: 800px; float: center;">
[[File:loss_ae.png|300 px|center|thumb|Losses - Conditional Autoencoder]][[File:loss_unet.png|300 px|thumb|center|Losses - Conditional U-Net]][[File:loss_mlp.png|300 px|center|thumb|Losses - Conditional MLP]]</div>

== Appendix II ==
'''Ishikaa''':
* Training, evaluation and visualization for all deep learning methods (MLP, U-Net and Autoencoder)
* GMM recoloring method in Python & adding severity index
* 'Ground Truth' dataset creation and logging
* AWS Compute setup & configuration
* Written Report & Presentation

'''Raina''':

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-13T09:50:06Z

Rainas: /* References */

== Introduction ==
Color Vision Deficiency (CVD) affects approximately 350 million individuals worldwide, impairing their ability to distinguish certain colors. Image recoloring for individuals with CVDs has been a well-researched area, with numerous attempts aimed at creating images that make colors more distinguishable for those with CVDs. CVDs are typically classified into three main types: protanopia (difficulty perceiving red), deuteranopia (difficulty perceiving green), and tritanopia (difficulty perceiving blue). These conditions arise due to the absence or malfunction of certain types of cone cells in the retina, which are responsible for color vision. For instance, the lack or defect of red or green cones leads to protanopia or deuteranopia, respectively. This can result in partial or complete loss of perception of specific colors. Moreover, individuals with CVD can be dichromats (completely missing one type of cone cell), or anomalous trichromats (having all three types of cones but with altered sensitivity), causing milder color perception issues. Most research has focused on protanopia and deuteranopia, as these conditions are more prevalent, and only a few consider different severity levels.

In this work, we aim to consider all three types of CVDs, taking into account varying levels of severity for personalization. We explore existing methods in the field and experiment through two main approaches: mathematical transformations and deep learning techniques. We will start by reviewing current advancements in these two domains, followed by presenting our experiments and results. Evaluations of each method will be provided, leading to a discussion of our findings and outlining potential directions for future work.

The motivation for this work arose from a personal experience. While admiring a rainbow, a phenomenon I find among the most beautiful in the world, with a friend who has deuteranopia, I realized that they were unable to distinguish the vibrant array of colors. This experience highlighted the emotional and perceptual gap caused by CVD, inspiring the goal of this project: to develop personalized and efficient tools that enhance color perception for individuals with CVDs. Ultimately, we aim to enable those with CVDs to enjoy the same vivid experiences, such as the beauty of a rainbow, experienced by those with normal color vision.

== Background ==
In recent years, numerous methods have been developed to recolor images for individuals with CVDs, ranging from traditional mathematical approaches to advanced deep learning techniques. This section focuses on the prominent recent works in these two categories.

=== Mathematical-based methods ===
Mathematical approaches to image recoloring for individuals with CVDs have been extensively developed to enhance color discrimination while trying to preserve the natural appearance of images. These methods typically involve color space transformations, optimization techniques, and perceptual modeling to achieve their objectives.

==== Daltonization ====
Daltonization enhances images for individuals with CVD by correcting colors based on the simulated deficiency. The process involves comparing the original LMS values with the simulated deficient values to compute the error:
<math display="block">
\text{Error}_{\text{LMS}} = \text{LMS}_{\text{original}} - \text{LMS}_{\text{simulated}}
</math>

The error is then mapped back to the RGB space using a correction matrix because the error contains the information that dichromats cannot see, and the correction matrix rotates it to a part of the spectrum that they can see. For example, the correction matrix, as implemented in tools like Daltonize [5] and Vischeck [6], is:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

The corrected RGB values are added back to the original LMS values to generate a daltonized image that improves contrast for CVD viewers.

==== Optimization-based Method ====
Zhu et al. [8] introduced an optimization-based recoloring framework for red-green dichromacy, aiming to balance naturalness and contrast. The framework minimizes a total loss function defined as:

<math display="block"> E = \beta E_{\text{nat}} + E_{\text{cont}} </math>

where <math>\beta</math> is a scalar weight that controls the trade-off between the two objectives: naturalness preservation (<math>E_{\text{nat}}</math>) and contrast enhancement (<math>E_{\text{cont}}</math>).

The naturalness term, <math>E_{\text{nat}}</math>, ensures that the recolored image closely resembles the original image for CVD viewers by minimizing perceptual differences:

<math display="block"> E_{\text{nat}} = \sum_{i=1}^N \| c_i^+ - c_i \|^2, </math>

where:
* <math>N</math> is the total number of pixels in the image,
* <math>c_i</math> is the original color of the <math>i</math>-th pixel,
* <math>c_i^+</math> is the recolored value of the <math>i</math>-th pixel,
* <math>\| c_i^+ - c_i \|</math> is the Euclidean distance, measuring the perceptual difference between the original and recolored colors.

The contrast term, <math>E_{\text{cont}}</math>, enhances the distinguishability of colors in the recolored image by minimizing changes in color contrast:

<math display="block"> E_{\text{cont}} = \sum_{i \neq j} \| (c_i^+ - c_j^+) - (c_i - c_j) \|^2, </math>

where:
* <math>(c_i^+ - c_j^+)</math> is the perceived color difference between pixels <math>i</math> and <math>j</math> after recoloring,
* <math>(c_i - c_j)</math> is the original color difference,
* <math>\| (c_i^+ - c_j^+) - (c_i - c_j) \|</math> represents the deviation in color contrast before and after recoloring.

To address the limitations of this approach, Zhu et al. [9] proposed a degree-adaptable framework incorporating a transformation matrix <math>T</math> that simulates CVD perception. The transformation matrix is defined as:

<math display="block"> T = \begin{bmatrix} t_{11} & t_{12} & t_{13} \\ t_{21} & t_{22} & t_{23} \\ t_{31} & t_{32} & t_{33} \end{bmatrix}, </math>

where <math>t_{ij}</math> are the elements representing the relationships between the original and perceived LMS (Long, Medium, Short wavelength) cone responses for individuals with CVD.

The degree-adaptable loss function extends the optimization by adjusting weights based on perceptual importance, defined as:

<math display="block"> E = \beta \sum_{i=1}^N \alpha_i \| T(c_i^+ - c_i) \|^2 + \sum_{i \neq j} \| T(c_i^+ - c_j^+) - T(c_i - c_j) \|^2. </math>

Here:
* <math>\alpha_i</math> assigns weights to each pixel, prioritizing the preservation of colors with smaller perception errors,
* <math>\| T(c_i^+ - c_i) \|</math> measures the perceptual difference after recoloring,
* <math>\| T(c_i^+ - c_j^+) - T(c_i - c_j) \|</math> quantifies the deviation in color contrast under CVD simulation.

This framework improves both contrast and personalization but requires further optimization for real-time performance.

==== Confusion lines based Method ====
Tsekouras et al. [10] proposed a novel image recoloring approach for individuals with protanopia and deuteranopia, focusing on improving color naturalness and enhancing contrast. Their framework consists of four modules, with a key focus on shifting confusing colors along confusion lines in the CIE 1931 chromaticity diagram.

The process begins with fuzzy clustering, which identifies representative colors (key colors) from the input image. These key colors are then analyzed on the chromaticity diagram, where confusion lines—paths representing colors indistinguishable by individuals with CVD—serve as the basis for recoloring. Confusion lines are defined using the copunctal point of the missing cone type and another reference point:

<math display="block">
d(v, L) = \frac{\left|(x_{cp} - x_0)(y_0 - y_v) - (x_0 - x_v)(y_{cp} - y_0)\right|}{\sqrt{(x_{cp} - x_0)^2 + (y_{cp} - y_0)^2}},
</math>

where:
* <math display="inline">v = (x_v, y_v)</math> is the chromaticity coordinate of the color,
* <math display="inline">L</math> is the confusion line passing through the copunctal point <math display="inline">(x_{cp}, y_{cp})</math> and another reference point <math display="inline">(x_0, y_0)</math>,
* <math display="inline">d(v, L)</math> measures the perpendicular distance from the point <math display="inline">v</math> to the confusion line <math display="inline">L</math>.

Confusing colors, identified as key colors lying on occupied confusion lines, are iteratively shifted to the nearest non-occupied confusion lines to enhance discriminability for CVD viewers. High-ranking colors, determined by their prominence in image clusters, are shifted to the nearest unoccupied confusion lines. This reallocation ensures that these colors are distinguishable to viewers with CVD while minimizing disruption to the image's overall color harmony.

After shifting, the luminance of the recolored key colors is optimized using a regularized objective function to balance naturalness and contrast:
<math display="block">
E = (E_1 + E_2) + \lambda E_3,
</math>

where:
* <math display="inline">E</math> is the total loss,
* <math display="inline">\lambda</math> is a weight parameter controlling the trade-off between contrast enhancement and naturalness preservation.

The first term, <math display="inline">E_1</math>, measures contrast enhancement for normal trichromats:

<math display="block">
E_1 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - b_j\| - \|f_D(a_{i,\text{rec}}) - f_D(b_j)\| \right|,
</math>

where:
* <math display="inline">n_A</math> and <math display="inline">n_B</math> are the number of key colors in clusters <math display="inline">A</math> and <math display="inline">B</math>, respectively,
* <math display="inline">a_i</math> is the chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">b_j</math> is the chromaticity of the <math display="inline">j</math>-th key color in cluster <math display="inline">B</math>,
* <math display="inline">f_D</math> is a function simulating the dichromatic vision of individuals with color vision deficiencies,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color.

The second term, <math display="inline">E_2</math>, measures contrast enhancement for dichromats:

<math display="block">
E_2 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - a_j\| - \|f_D(a_{i,\text{rec}}) - f_D(a_{j,\text{rec}})\| \right|,
</math>

where:
* <math display="inline">a_i</math> and <math display="inline">a_j</math> are the chromaticities of the <math display="inline">i</math>-th and <math display="inline">j</math>-th key colors in cluster <math display="inline">A</math>,
* <math display="inline">f_D(a_{i,\text{rec}})</math> simulates the dichromatic perception of the recolored chromaticity <math display="inline">a_{i,\text{rec}}</math>.

The third term, <math display="inline">E_3</math>, preserves the naturalness of the recolored image:

<math display="block">
E_3 = \frac{1}{n_A} \sum_{i=1}^{n_A} \|a_i - a_{i,\text{rec}}\|,
</math>

where:
* <math display="inline">a_i</math> is the original chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">\|a_i - a_{i,\text{rec}}\|</math> is the Euclidean distance between the original and recolored chromaticities, measuring how much the naturalness is preserved.

This method significantly enhances the contrast and naturalness of recolored images by leveraging confusion line geometry and regularized optimization. However, challenges remain in achieving real-time performance and handling cases where shifting may distort the aesthetic quality of the image.

==== GMM-based Method ====
Huang et al. [11] proposed an efficient and effective re-coloring algorithm for individuals with CVD using a Gaussian Mixture Model (GMM) to represent color distributions. The algorithm comprises four main steps: feature extraction, clustering using GMM, optimization of Gaussian components, and interpolation for recoloring.

Step 1 - Feature Extraction:
Each pixel in the input image is represented in the CIEL*a*b* color space, which approximates perceptual differences using the Euclidean distance between colors. The color feature vector <math display="inline">x</math> is used as input for clustering.

Step 2 - Clustering via GMM:
The color distribution of the image is modeled using a GMM with <math display="inline">K</math> Gaussian components:
<math display="block">
p(x|\Theta) = \sum_{i=1}^K \omega_i G_i(x|\theta_i),
</math>
where:
* <math display="inline">\Theta</math> is the parameter set containing all weights, means, and covariance matrices,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian,
* <math display="inline">G_i(x|\theta_i)</math> is the 3D normal distribution with parameters <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix).

Step 3 - Optimization:
To ensure color distinguishability for CVD viewers, the algorithm adjusts the mean vector of each Gaussian component using an optimization function that preserves the symmetric Kullback-Leibler (KL) divergence:
<math display="block">
D_{sKL}(G_i, G_j) = D_{KL}(G_i \| G_j) + D_{KL}(G_j \| G_i),
</math>
where:
* <math display="inline">D_{KL}(G_i \| G_j)</math> measures the dissimilarity between two Gaussian distributions <math display="inline">G_i</math> and <math display="inline">G_j</math>.

The optimization aims to preserve the contrast perceived by CVD viewers while maintaining naturalness. Weights are assigned to Gaussian components based on the perceptual importance of colors:
<math display="block">
\lambda_i = \frac{\sum_{j=1}^N \alpha_j p(i|x_j, \Theta)}{\sum_{k=1}^K \sum_{j=1}^N \alpha_j p(k|x_j, \Theta)},
</math>
where:
* <math display="inline">\alpha_j = \|x_j - \text{Sim}(x_j)\|</math> is the perceptual error of the <math display="inline">j</math>-th color feature when simulated for CVD,
* <math display="inline">\text{Sim}(\cdot)</math> is the simulation function for CVD perception.

Step 4 - Interpolation for Recoloring:
After optimizing the Gaussians, the mapping function <math display="inline">M_i(\cdot)</math> relocates the mean vectors while maintaining covariance matrices. Interpolation ensures smooth transitions between recolored regions:
<math display="block">
T(x_j)_H = x_j^H + \sum_{i=1}^K p(i|x_j, \Theta) (M_i(\mu_i)_H - \mu_i^H),
</math>
where:
* <math display="inline">T(x_j)_H</math> is the hue adjustment for the <math display="inline">j</math>-th color,
* <math display="inline">M_i(\mu_i)_H</math> is the mapped hue of the <math display="inline">i</math>-th Gaussian's mean.

While the GMM-based approach effectively models color distributions and enhances the contrast of recolored images significantly, it has limitations:
* The accuracy of recoloring depends on the choice of <math display="inline">K</math>, which may vary for different images.
* The method assumes diagonal covariance matrices for computational efficiency, which may oversimplify real-world color distributions. Sometimes the colors in the recolored images are not very natural.
* The high computational complexity in the optimization step of this algorithm may be difficult for real-time applications.

=== Deep Learning based methods ===
Conventional methods for recoloring, including optimization-based approaches (as discussed above), fail to generalize well across varying severity levels and CVD types. While these methods improve color differentiation, they frequently compromise naturalness or require extensive computational resources, making them less suitable for real-time, efficient, personalized applications.

==== GAN-Based Recoloring for CVD ====

In [1] GANs (Generative Adversarial Networks) was explored for recoloring, with a backbone Pix2Pix-GAN, Cycle-GAN, and Bicycle-GAN structure showing promising results. These models are generate creative recolored images by learning mappings between normal and CVD-affected color spaces. However, this and existing GAN approaches struggle with balancing naturalness and contrast. This specific reference also requires paired datasets (since it is adapted from style transfer), making it computationally intensive and less suitable for personalization.

==== Swin Transformer Recoloring ====

The authors in [2] introduced a hierarchical vision transformer (SWIN) architecture that processes images through shifted windows, effectively capturing both local and global contextual information. In computer vision, this design generally allows efficient handling of high-resolution images and has been applied to various tasks, including image classification and object detection. Despite its robust performance, this architecture is still computationally intensive and does not inherently account for the specific needs of CVD individuals, as it lacks mechanisms for personalized color adjustments.

==== Personalized CVD-GAN ====

To cater to the diverse needs of the CVD population, the Personalized CVD-GAN [3] was developed. This model generates images that are not only CVD-friendly but also tailored to individual degrees of color vision deficiency. By disentangling color representations using a unique triple-latent structure in their method, continuous personalization was possible to adjust images according to specific CVD severities. While effective, this approach is computationally demanding, making it less practical for real-time applications. In our experiment, it took around 18 days for one epoch (or one iteration over the entire dataset).

Thus, existing methods either lack personalization or are too resource-intensive for widespread use.

== Methods ==
We aim to find effective and efficient ways to recolor images for people with CVD with the personalization of different severity levels. We start by exploring existing methods and identifying opportunities for improvement. Since mathematical-based approaches provide a solid foundation and are well-documented, we began our experiments by testing these methods, as described in the background. We later extended our exploration to deep learning based methods.

=== Mathematical based ===
We explored four main methods, building on the foundational work discussed in the background section.

==== Method 1: Daltonization as a baseline ====
We started with the relatively intuitive Daltonization method, where we adjusted the colors in an image to compensate for color vision deficiencies by simulating how the colors appear to individuals with CVD. This involves computing the difference between the original and simulated color perception in the LMS (Long, Medium, Short wavelength) color space. The calculated error is then corrected and mapped back to the RGB space using a transformation matrix, resulting in a recolored image that enhances color differentiation for viewers with CVD.

The simulation of CVDs relies on the physiology of human vision, particularly the responses of the Long (L), Medium (M), and Short (S) wavelength-sensitive cones in the retina. The LMS color space is derived from the spectral sensitivities of these cones, making it an ideal framework for modeling human color perception.

To simulate CVD, we first transformed colors in RGB color space into the LMS color space using the following linear transformation matrix based on Stockman and Sharpe’s cone fundamentals:
<math display="block">
T_{\text{RGB-to-LMS}} = \begin{bmatrix}
0.3904725 & 0.54990437 & 0.00890159 \\
0.07092586 & 0.96310739 & 0.00135809 \\
0.02314268 & 0.12801221 & 0.93605194
\end{bmatrix}
</math>

For individuals with CVD, the missing cone’s response is replaced by a weighted combination of the remaining two cones. This approach, introduced by Brettel, Viénot, and Mollon (1997) [7], uses specific coefficients derived from cone sensitivities. For example, in protanopia (L-cone deficiency), the L-cone response is approximated using the M- and S-cone responses as:
<math display="block">
L_{\text{simulated}} = 0 \cdot L + 0.90822864 \cdot M + 0.008192 \cdot S
</math>

For deuteranopia (M-cone deficiency), the M-cone is replaced as:
<math display="block">
M_{\text{simulated}} = 1.10104433 \cdot L + 0 \cdot M - 0.00901975 \cdot S
</math>

For tritanopia (S-cone deficiency), the S-cone is replaced as:
<math display="block">
S_{\text{simulated}} = -0.15773032 \cdot L + 1.19465634 \cdot M + 0 \cdot S
</math>

These transformations allow accurate simulation of the perceptual experience of individuals with CVD. (The numbers are derived from [5]).

The error between the original and simulated is then mapped into the RGB color space using a deficiency-specific correction matrix, which adjusts the image to enhance contrast and recover lost color differences. The predefined correction matrix is applied to the error in RGB space, transforming it back into LMS space for final adjustments. The corrected LMS values are added back to the original values, producing a recolored image that improves visual accessibility for viewers with CVD. This approach uses the Daltonize-inspired correction matrix:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

==== Optimization-based method ====
To improve the results from the Daltonization method, we designed a framework inspired by methods discussed in the background, incorporating dominant color extraction, optimization-based recoloring, and edit propagation. This approach aims to find a balance between the naturalness and contrast while compensating colors that are not visible for corresponding CVD types.

===== 1. Extraction of Dominant Colors =====
We begin by extracting the dominant colors from the input image using fuzzy clustering via a MiniBatch K-means algorithm. This step identifies a reduced set of representative colors that capture the primary color information in the image:
<math display="block">
\mathbf{C} = \{\mathbf{c}_1, \mathbf{c}_2, \ldots, \mathbf{c}_N\},
</math>
where <math display="inline">N</math> represents the number of clusters, and <math display="inline">\mathbf{c}_i</math> represents the centroid of the <math display="inline">i</math>-th cluster.

===== 2. Optimization-Based Recoloring =====
Once the dominant colors are extracted, we apply an optimization process to adjust these colors. The optimization uses the formulas mentioned in, and aims to balance two key objectives:

1. Naturalness Preservation: Ensures the recolored image minimally deviates from the original.
<math display="block">
E_{\text{nat}} = \sum_{i=1}^N \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_i^{\text{original}}) \|^2,
</math>
where <math display="inline">\mathbf{T}</math> is the transformation matrix based on the severity and type of CVD, and <math display="inline">\mathbf{c}_i^{\text{original}}</math> is the original color.

2. Contrast Enhancement: Improves the differentiation of colors for individuals with CVD:
<math display="block">
E_{\text{cont}} = \sum_{i=1}^N \sum_{j>i} \left( \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_j) \|^2 - \| \mathbf{c}_i^{\text{original}} - \mathbf{c}_j^{\text{original}} \|^2 \right)^2.
</math>

The total objective function combines these two terms:
<math display="block">
E = \beta E_{\text{nat}} + E_{\text{cont}},
</math>
where <math display="inline">\beta</math> controls the trade-off between naturalness and contrast.

Optimization is performed using the L-BFGS-B algorithm to ensure efficient convergence under bounded constraints.

The transformation matrices for each type of CVD are the following, which are based on [13]:
<math display="block">
\text{Protanopia Transformation Matrix} = \begin{bmatrix}
0.566 & 0.558 & 0 \\
0.433 & 0.442 & 0.242 \\
0 & 0 & 0.758
\end{bmatrix},
</math>

<math display="block">
\text{Deuteranopia Transformation Matrix} = \begin{bmatrix}
0.625 & 0.7 & 0 \\
0.375 & 0.3 & 0.3 \\
0 & 0 & 0.7
\end{bmatrix},
</math>

<math display="block">
\text{Tritanopia Transformation Matrix} = \begin{bmatrix}
0.95 & 0 & 0 \\
0.05 & 0.433 & 0 \\
0 & 0.567 & 1
\end{bmatrix}.
</math>

===== 3. Edit Propagation =====
After optimizing the dominant colors, we propagate these edits across the entire image to ensure smooth transitions. The propagation step uses interpolation in the CIE-Lab color space:
<math display="block">
\Delta L^* = \text{griddata}(\mathbf{c}^{\text{original}}, \mathbf{c}^{\text{recolored}} - \mathbf{c}^{\text{original}}, \mathbf{I}),
</math>
where <math display="inline">\mathbf{I}</math> represents the pixel values in the Lab color space, and <math display="inline">\Delta L^*</math> adjusts the luminance values. The recolored image is reconstructed by applying the interpolated changes back to the original image and converting it to RGB.

=== Deep Learning based ===

==== Task Overview ====
Given an input RGB image and a label for the user (as shown in the figure), we want a deep learning model to output a recolored RGB image that is specific to that user. More details on inputs and outputs are discussed in further sections but an overview is shown in Figure 1. All of the code was done in Python using a deep learning framework called [https://pytorch.org PyTorch]
[[File:Io.png|right|thumb|200px|Figure 1: Dataset]]

==== Types ====
1. ''' Supervised methods ''':
These are deep learning models that require a 'ground truth' recolored image for the neural network to learn recolorization. While these methods are simple, easy to train and integrate the user label, they require an already present ground truth comparison of expected output.

2. ''' Unsupervised methods ''':
These models are trained without a ground truth and can also encode user label information while training. They are generally better at generating more natural images, but they require more compute and sophisticated model architectures or loss functions for the recoloring task

==== Dataset ====
The dataset used for this project was constructed specifically to address the challenges of recoloring images for individuals with color vision deficiency (CVD). We first gathered an open-source RGB image dataset from [2]. To improve the capability of the proposed model to enhance the contrast between CVD-indistinguishable color
pairs, in their study, they created a new dataset consisting of 141,000 pictures of both natural scenes and artificial images containing
CVD-confusing colors without labels. To generate labels (and ground truth recolored images for supervised methods), we randomly sampled 15,000 images and recolored by simulating random labels for severity and type of CVD. The recoloring for ground truth images was done using a [https://github.com/jbhuang0604/RecolorForColorblind/tree/master MATLAB script] (adapted to Python) from [4]. Note: The open-source tools used in the Python version for the recoloring script were [https://scikit-image.org Scikit-Image], [https://scipy.org Scipy] and [https://python-colormath.readthedocs.io/en/latest/ Colormath].

As shown in Figure 1, each sample in the dataset consists of:

1. ''' Original RGB Image''' : High-resolution images, resized to <code> 256x256</code> pixels and normalized to <code>[0,1]</code> range, representing the standard color space.

2. ''' CVD Labels ''' : Condition labels encoded as <code>severity * [protan, deutan]</code>, where severity ranges from 0.1 to 1.0. For example, a label <code>[0.6, 0]</code> corresponds to protanopia at 60% severity.

Data augmentation techniques such as random rotations, crops, and brightness adjustments were applied to expand the dataset, ensuring robust model generalization across diverse scenarios.

==== Supervised Methods ====
===== Conditional Parallel RGB MLP =====
[[File:mlp.png|right|thumb|Figure 2: Conditional MLP architecture]]
As shown in Figure 2, the model predicts the R, G, and B channels separately using an independent multi-layer perceptron (MLP) for each channel. The input image is concatenated with the label encoding along the channel dimension and is passed to 3 parallel MLPs simultaneously. These parallel networks are learned to predicted R, G, B channels of a recolored image based on given ground truth. The outputs from each of these networks are concatenated to produce the recolored RGB image of same spatial dimensions as input. Essentially, each channel is disentangled, enabling targeted adjustments.

The loss function used to train was pixel wise, mean-squared error loss:
<math>
\mathcal{L}_{\text{MSE}} = \frac{1}{N} \sum_{p=1}^{N} \left( I(p) - I'(p) \right)^2
</math>

Where:
* I, I': Recolored (model output) image and ground truth recolored image respectively
* p: Image index
* N: Total number of images

===== Conditional U-Net =====
In a similar fashion of inputs, a convolutional neural network (CNN)-based U-Net architecture was tested to generate a full recolored image as output. The conditional inputs here affect both the encoder and decoder. [[File:Unet condtional.png|right|thumb|Figure 3: Conditional U-Net architecture]]
U-Nets are widely used in computer vision tasks and are very robust to new tasks as well. The architecture we adopted is shown in Figure 3.
The loss function used to train the U-Net was a commonly used VGG Perceptual Loss:
<math>
\mathcal{L}_{\text{VGG}} = \sum_{l} \frac{1}{N_l} \| \phi_l(I) - \phi_l(I') \|_2^2
</math>

Where:
* I and I': are recolored (model output) and ground recolored images respectively
* <math>\phi_l</math> is the l-th of the pre-trained VGG network

==== Unsupervised Methods ====
===== Conditional Autoencoder =====
As shown in Figure4, an unsupervised CNN-based encoder-decoder network was trained to reconstruct full recolored images with a CVD-aware color palette. The key to making this network align with the recoloring task was the loss functions. The loss functions we used to train this network were inspired from [2]. [[File:Ae.png|right|350px|thumb|Figure 4: Conditional Autoencoder architecture]]

The total loss function is given by:
<math>
\mathcal{L}_{\text{total}} = \alpha \cdot \mathcal{L}_{\text{naturalness}} + 2 \cdot (1 - \alpha) \cdot \mathcal{L}_{\text{contrast}}
</math>

Where:
<math>
\mathcal{L}_{\text{contrast}} = \beta \cdot \mathcal{L}_{\text{global}} + (2 - \beta) \cdot \mathcal{L}_{\text{local}}
</math>

The components of the loss functions are described below:

1. '''Global Contrast Loss''':
The global contrast loss ensures that the overall contrast of the recolored image is preserved. It is defined as
<math>
\mathcal{L}_{global} = \frac{1}{\|\omega\|} \sum_{<x, y> \in \epsilon \omega} \text{CL}(x, y)
</math>

2. '''Local Contrast Loss''':
The local contrast loss focuses on preserving the contrast within a small neighborhood around each pixel. <math>
\mathcal{L}_{l} = \frac{1}{N} \sum_{x=1}^{N} \sum_{y \in \omega_x} \frac{\text{CL}(x, y)}{\|\omega_x\|}
</math>

Note:

<math>
\text{CL}(x, y) = \|\hat{c}_x' - \hat{c}_y'\| - \|c_x - c_y\|
</math>

* x,y: Two distinct pixels in the image.
* cx and cy: CVD simulated colors of original image
* c^x′and c^y: CVD simulated colors of recolored image (model output)
* ||w||: Global (or large) window of image
* ||wx||: Local window or neighborhood around a pixel x

3. '''Naturalness Loss''':
The naturalness loss drives output image to have colors that are visually similar and close to natural distributions. <math>
\mathcal{L}_{\text{natural}} = 1 - \text{SSIM}(I', I)
</math>

Where:
* I(i), I'(i): Original and recolored images respectively

== Results ==
=== Mathematical based methods ===
{| class="wikitable"
|+ Table 1: Quantitative Evaluation Results for Mathematical Methods
! !! Method 1 !! Method 2 !! Method 3 !! Method 4
|-
! colspan="5" | Performance
|-
| Time/image || 0.2s || 1m13s || 4.4s || 1.6s
|-
! colspan="5" | SSIM Metrics
|-
| Original vs Recolored || 0.0066 || 0.9998 || 0.9988 || 0.9902
|-
| Original vs Original Simulated || 0.9985 || 0.9985 || 0.9985 || 0.9985
|-
| Recolored vs Recolored Simulated || 0.9565 || 0.9986 || 0.9986 || 0.9968
|-
! colspan="5" | TCC Metrics
|-
| Original vs Recolored || 0.4211 || 0.0001 || 0.0003 || 0.0005
|-
| Original vs Original Simulated || 0.0004 || 0.0003 || 0.0003 || 0.0003
|-
| Recolored vs Recolored Simulated || 0.0380 || 0.0003 || 0.0002 || 0.0005
|-
! colspan="5" | CD ΔE76 Metrics
|-
| Original vs Recolored || 57.4513 || 0.0217 || 0.0632 || 0.1057
|-
| Original vs Original Simulated || 0.0462 || 0.0462 || 0.0462 || 0.0462
|-
| Recolored vs Recolored Simulated || 8.4251 || 0.0458 || 0.0435 || 0.0578
|-
! colspan="5" | CIEDE2000 Metrics
|-
| Original vs Recolored || 41.2667 || 0.0229 || 0.0675 || 0.1312
|-
| Original vs Original Simulated || 0.0681 || 0.0681 || 0.0681 || 0.0681
|-
| Recolored vs Recolored Simulated || 6.9145 || 0.0671 || 0.0630 || 0.0838
|-
! colspan="5" | CIEDE94 Metrics
|-
| Original vs Recolored || 57.3637 || 0.0217 || 0.0630 || 0.1056
|-
| Original vs Original Simulated || 0.0461 || 0.0461 || 0.0461 || 0.0461
|-
| Recolored vs Recolored Simulated || 5.3878 || 0.0457 || 0.0434 || 0.0576
|-
! colspan="5" | D-CIELAB ΔEab Metrics
|-
| Original vs Recolored || 2.1314 || 3.8863 || 7.6867 || 8.0045
|-
| Original vs Original Simulated || 1.7209 || 1.7209 || 1.7209 || 1.7209
|-
| Recolored vs Recolored Simulated || 1.5926 || 1.9673 || 1.4363 || 2.4009
|}

=== Deep Learning based methods ===
The results focus on evaluating the performance of the above neural network architectures—Conditional Parallel RGB MLP, Deep U-Net, and Conditional Autoencoder. Quantitive metrics such as Structural Similarity Index (SSIM), total color contrast (TCC), Chromatic Difference (CD), and inference time were used to assess the effectiveness of the models provided in [1] and [2].

==== Qualitative Results ====
The recolored outputs were visually evaluated to determine their alignment with expected results. The 'expected' results for supervised mean how closely they resemble ground truth recolored image and for unsupervised method mean how much contrast and naturalness is observed in the CVD simulated recolored images compared to original.
The results and takeaways can be summarized as follows:

1. '''Conditional Parallel RGB MLP''': (Figure 5)
[[File:Mlp_res.png|right|400px|thumb|Figure 5 Conditional MLP: Model failure]]
* Recoloring was inconsistent, with visible artifacts in regions where spatial correlations were essential.
* The pixels seemed more discretized, suggesting that disentanglement was not very useful for this case (especially naturalness).
* Failed to preserve natural color transitions, particularly in complex images.
2. '''Conditional U-Net''': (Figure 6, 7)
[[File:Unet_res1.png|right|400px|thumb|Figure 6 Conditional U-Net: Model failure]]
[[File:Unet_res2.png|right|400px|thumb|Figure 7 Conditional U-Net: CVD Simulated examples]]
* Produced stable recoloring, preserving structural details.
* Initially showed improvement towards resembling ground truth, but over time started 'reconstructing' the colors of the original image.
* The CVD simulations of recolored versus original were similar or worse meaning that the model was not doing well for this task
* Sometimes it over-saturated some colors, affecting the visual appeal.
3. '''Conditional Autoencoder''': (Figure 8, 9)
[[File:ae_res1.png|right|400px|thumb|Figure 8 Conditional Autoencoder: Majority good results]]
[[File:ae_res1.png|right|400px|thumb|Figure 9 Conditional Autoencoder: Marginal or negative improvement + Blurriness]]
* Achieved smooth and natural recoloring, with fewer artifacts.
* Showed the highest contrast improvement among the three models.
* In some cases, hurt the contrast in the CVD simulated colors and in some there was marginal improvement in contrast.
* Blurriness in the recolored images was seen (possibly due to naturalness factor being more prioritized even though weight coefficients in the loss term favored contrast (alpha = 0.25, beta = 1.0)).

==== Quantitative Results ====
Based on the above qualitative results, we decided to score and evaluate metrics for comparison with related work only using the Conditional Autoencoder.
As mentioned above, the evaluation metrics are adapted from [1] and [2]. Please refer to the definitions in the paper, as we have used the same. On a high level, the three components are:
* SSIM: Measures the structural similarity between the original and recolored images, ensuring the structural integrity of the recolored image is maintained.
<math>
SSIM(X, Y) = \frac{(2\mu_X\mu_Y + c_1)(2\sigma_{XY} + c_2)}{(\mu_X^2 + \mu_Y^2 + c_1)(\sigma_X^2 + \sigma_Y^2 + c_2)}
</math>

* Total Color Contrast: Quantifies the visibility improvement between indistinguishable colors for CVD individuals.
<math>
TCC = \frac{1}{n_1} \sum_{(i,j) \in \Omega_1} |x_i - x_j|
+ \frac{1}{N \cdot n_2} \sum_{i=1}^{N} \sum_{j \in \Omega_2} |x_i - x_j|
</math>
* Chromatic Difference: Quantifies the perceptual differences in color before and after recoloring, ensuring enhanced distinguishability
<math>
CD(i) = \sqrt{\lambda (l_i' - l_i)^2 + (a_i' - a_i)^2 + (b_i' - b_i)^2}
</math>
(lamda is a constant, not wavelength and l,a,b represent LAB space coordinates of recolored (') and original respectively.)
* Inference Time: Determines the computational efficiency of the models.

The key results are in Table 2 and takeaways for the Conditional Autoencoder can be summarized as follows:

{| class="wikitable" style="text-align:center; width:30%; margin:auto;"
|-
! Metric
! Value
|-
| Inference Time
| 2.6 seconds/image
|-
| SSIM ("Structure")
| 0.8707
|-
| Total Color Contrast ("Distinguishability")
| 0.5771 / (~0.851)*
|-
| Chromatic Difference ("Color")
| 0.3521 / (~0.963)*
|+ '''Table 2: Quantitative Evaluation Results'''
|}

Note: * indicates results from paper [2] for protan/deutan whichever is larger.

* TCC and CD are good but not as good as paper [2] because they use optimize networks for each CVD type separately.
* Blurry (SSIM is not optimized for enough)
* Mixing CVD types in the same network needs to be more sophisticated

== Conclusions ==
Through our (many) experiments, we learned a couple of things:

1. '''Model Effectiveness''':
Among the models, the Conditional Autoencoder showed the best balance between enhancing color contrast and preserving naturalness. It improved the distinguishability of colors for CVD individuals while maintaining a smooth, visually appealing output. However, it produced slightly blurry images, which could be improved with better loss functions or refinement techniques. The Conditional U-Net was also effective in preserving structure and providing stable recoloring, but it required careful training to avoid overfitting. The Conditional Parallel RGB MLP, while computationally fast, lacked the ability to capture spatial relationships between pixels, making it unsuitable for this task.

2. '''Importance of Loss Functions''':
Designing appropriate loss functions was crucial for achieving the right balance between naturalness, contrast enhancement, and structural preservation. The global and local contrast losses significantly improved the visibility of recolored images, while the naturalness loss ensured that the outputs did not look artificial. Incorporating metrics like SSIM and Chromatic Difference into the evaluation also helped us better understand how well the models performed.

3. '''Challenges with Data''':
One of the biggest challenges was ensuring that the dataset effectively represented real-world scenarios for CVD individuals. Simulating CVD perceptions and generating recolored images that matched those perceptions required a well-defined pipeline. A more diverse dataset or additional user studies with CVD participants could help fine-tune the models further.

4. '''Computational Efficiency''':
While models like the Conditional Autoencoder and Conditional U-Net provided high-quality recoloring, their inference times were moderate, making them feasible for real-time applications. Optimizing these models further could make them more scalable for real-world use cases, such as accessibility tools in apps or websites.

5. '''What Worked and What Didn’t''':
* Worked: Contrast enhancement methods using local and global losses were effective in improving visibility for CVD individuals. Transformer-inspired loss functions borrowed from Swin architecture added robustness.
* Didn’t Work: Pixel-wise methods like the Conditional RGB MLP struggled due to their inability to handle spatial dependencies. Additionally, overfitting was a recurring issue in larger architectures without careful training.

6. '''Future Directions''':
* Better Loss Functions: Refining the loss functions to address issues like blurriness in outputs could further improve results.
* User Studies: Testing the models with real CVD participants would provide valuable insights and help validate the results.
* Model Optimization: Reducing the computational cost of high-performing models like the Conditional Autoencoder could make them more practical for deployment.
* Exploration of New Architectures: Trying newer methods, such as lightweight transformers or diffusion-based models, might enhance recoloring performance while maintaining efficiency.

While there’s still room for improvement, our models demonstrated the potential of deep learning in addressing the challenges faced by individuals with CVD. Our future work would focus on refining these methods and bringing them closer to practical, everyday applications.

== References ==
[1] Li, H., Zhang, L., Zhang, X., Zhang, M., Zhu, G., Shen, P., ... & Shah, S. A. A. (2020). Color vision deficiency datasets & recoloring evaluation using GANs. Multimedia Tools and Applications, 79, 27583-27614.

[2] Chen, L., Zhu, Z., Huang, W., Go, K., Chen, X., & Mao, X. (2024). Image recoloring for color vision deficiency compensation using Swin transformer. Neural Computing and Applications, 36(11), 6051-6066.

[3] Jiang, S., Liu, D., Li, D., & Xu, C. (2023). Personalized image generation for color vision deficiency population. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22571-22580).

[4] Huang, J.-B., Chen, C.-S., Jen, T.-C., & Wang, S.-J. (n.d.). Image recolorization for the colorblind [GitHub repository]. Retrieved December 12, 2024, from https://github.com/jbhuang0604/RecolorForColorblind

[5] Dietrich, J. (n.d.). Daltonize Python Package [GitHub repository]. Retrieved December 12, 2024, from https://github.com/joergdietrich/daltonize/blob/main/daltonize/daltonize.py

[6] Dougherty, B., & Wade, A. (2000). Vischeck. Retrieved December 12, 2024, from https://www.vischeck.com/

[7] Brettel, H., Viénot, F., & Mollon, J. D. (1997). Computerized simulation of color appearance for dichromats. Josa a, 14(10), 2647-2655.

[8] Zhu, Z., Toyoura, M., Go, K., Fujishiro, I., Kashiwagi, K., & Mao, X. (2019). Processing images for red–green dichromats compensation via naturalness and information-preservation considered recoloring. The Visual Computer, 35, 1053-1066.

[9] Zhu, Z., Toyoura, M., Go, K., Kashiwagi, K., Fujishiro, I., Wong, T. T., & Mao, X. (2021). Personalized image recoloring for color vision deficiency compensation. IEEE Transactions on Multimedia, 24, 1721-1734.

[10] Tsekouras, G. E., Rigos, A., Chatzistamatis, S., Tsimikas, J., Kotis, K., Caridakis, G., & Anagnostopoulos, C. N. (2021). A novel approach to image recoloring for color vision deficiency. Sensors, 21(8), 2740.

[11] Huang, J. B., Chen, C. S., Jen, T. C., & Wang, S. J. (2009, April). Image recolorization for the colorblind. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 1161-1164). IEEE.

[12] Color-Blindness.com. (n.d.). COBLIS - Color Blindness Simulator. Retrieved December 13, 2024, from https://www.color-blindness.com/coblis-color-blindness-simulator/

== Appendix I ==
* [https://github.com/rainasong/psych221-aut24-final-project.git Code]
* [https://drive.google.com/drive/folders/10WMXPbtpV7Hy5_qBA_TCEbW-kCpj1D7v Dataset]

=== Additional results ===
1. '''Recolored Images - Conditional Autoencoder'''
<div style="display: inline; width: 220px; float: center;">
[[File:eb_1.png|400 px|Wikipedia encyclopedia]][[File:eb_2.png|400 px]] </div>

2. '''Loss curves'''
<div style="display: inline; width: 800px; float: center;">
[[File:loss_ae.png|300 px|center|thumb|Losses - Conditional Autoencoder]][[File:loss_unet.png|300 px|thumb|center|Losses - Conditional U-Net]][[File:loss_mlp.png|300 px|center|thumb|Losses - Conditional MLP]]</div>

== Appendix II ==
'''Ishikaa''':
* Training, evaluation and visualization for all deep learning methods (MLP, U-Net and Autoencoder)
* GMM recoloring method in Python & adding severity index
* 'Ground Truth' dataset creation and logging
* AWS Compute setup & configuration
* Written Report & Presentation

'''Raina''':

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-13T09:49:41Z

Rainas: /* Optimization-based method */

== Introduction ==
Color Vision Deficiency (CVD) affects approximately 350 million individuals worldwide, impairing their ability to distinguish certain colors. Image recoloring for individuals with CVDs has been a well-researched area, with numerous attempts aimed at creating images that make colors more distinguishable for those with CVDs. CVDs are typically classified into three main types: protanopia (difficulty perceiving red), deuteranopia (difficulty perceiving green), and tritanopia (difficulty perceiving blue). These conditions arise due to the absence or malfunction of certain types of cone cells in the retina, which are responsible for color vision. For instance, the lack or defect of red or green cones leads to protanopia or deuteranopia, respectively. This can result in partial or complete loss of perception of specific colors. Moreover, individuals with CVD can be dichromats (completely missing one type of cone cell), or anomalous trichromats (having all three types of cones but with altered sensitivity), causing milder color perception issues. Most research has focused on protanopia and deuteranopia, as these conditions are more prevalent, and only a few consider different severity levels.

In this work, we aim to consider all three types of CVDs, taking into account varying levels of severity for personalization. We explore existing methods in the field and experiment through two main approaches: mathematical transformations and deep learning techniques. We will start by reviewing current advancements in these two domains, followed by presenting our experiments and results. Evaluations of each method will be provided, leading to a discussion of our findings and outlining potential directions for future work.

The motivation for this work arose from a personal experience. While admiring a rainbow, a phenomenon I find among the most beautiful in the world, with a friend who has deuteranopia, I realized that they were unable to distinguish the vibrant array of colors. This experience highlighted the emotional and perceptual gap caused by CVD, inspiring the goal of this project: to develop personalized and efficient tools that enhance color perception for individuals with CVDs. Ultimately, we aim to enable those with CVDs to enjoy the same vivid experiences, such as the beauty of a rainbow, experienced by those with normal color vision.

== Background ==
In recent years, numerous methods have been developed to recolor images for individuals with CVDs, ranging from traditional mathematical approaches to advanced deep learning techniques. This section focuses on the prominent recent works in these two categories.

=== Mathematical-based methods ===
Mathematical approaches to image recoloring for individuals with CVDs have been extensively developed to enhance color discrimination while trying to preserve the natural appearance of images. These methods typically involve color space transformations, optimization techniques, and perceptual modeling to achieve their objectives.

==== Daltonization ====
Daltonization enhances images for individuals with CVD by correcting colors based on the simulated deficiency. The process involves comparing the original LMS values with the simulated deficient values to compute the error:
<math display="block">
\text{Error}_{\text{LMS}} = \text{LMS}_{\text{original}} - \text{LMS}_{\text{simulated}}
</math>

The error is then mapped back to the RGB space using a correction matrix because the error contains the information that dichromats cannot see, and the correction matrix rotates it to a part of the spectrum that they can see. For example, the correction matrix, as implemented in tools like Daltonize [5] and Vischeck [6], is:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

The corrected RGB values are added back to the original LMS values to generate a daltonized image that improves contrast for CVD viewers.

==== Optimization-based Method ====
Zhu et al. [8] introduced an optimization-based recoloring framework for red-green dichromacy, aiming to balance naturalness and contrast. The framework minimizes a total loss function defined as:

<math display="block"> E = \beta E_{\text{nat}} + E_{\text{cont}} </math>

where <math>\beta</math> is a scalar weight that controls the trade-off between the two objectives: naturalness preservation (<math>E_{\text{nat}}</math>) and contrast enhancement (<math>E_{\text{cont}}</math>).

The naturalness term, <math>E_{\text{nat}}</math>, ensures that the recolored image closely resembles the original image for CVD viewers by minimizing perceptual differences:

<math display="block"> E_{\text{nat}} = \sum_{i=1}^N \| c_i^+ - c_i \|^2, </math>

where:
* <math>N</math> is the total number of pixels in the image,
* <math>c_i</math> is the original color of the <math>i</math>-th pixel,
* <math>c_i^+</math> is the recolored value of the <math>i</math>-th pixel,
* <math>\| c_i^+ - c_i \|</math> is the Euclidean distance, measuring the perceptual difference between the original and recolored colors.

The contrast term, <math>E_{\text{cont}}</math>, enhances the distinguishability of colors in the recolored image by minimizing changes in color contrast:

<math display="block"> E_{\text{cont}} = \sum_{i \neq j} \| (c_i^+ - c_j^+) - (c_i - c_j) \|^2, </math>

where:
* <math>(c_i^+ - c_j^+)</math> is the perceived color difference between pixels <math>i</math> and <math>j</math> after recoloring,
* <math>(c_i - c_j)</math> is the original color difference,
* <math>\| (c_i^+ - c_j^+) - (c_i - c_j) \|</math> represents the deviation in color contrast before and after recoloring.

To address the limitations of this approach, Zhu et al. [9] proposed a degree-adaptable framework incorporating a transformation matrix <math>T</math> that simulates CVD perception. The transformation matrix is defined as:

<math display="block"> T = \begin{bmatrix} t_{11} & t_{12} & t_{13} \\ t_{21} & t_{22} & t_{23} \\ t_{31} & t_{32} & t_{33} \end{bmatrix}, </math>

where <math>t_{ij}</math> are the elements representing the relationships between the original and perceived LMS (Long, Medium, Short wavelength) cone responses for individuals with CVD.

The degree-adaptable loss function extends the optimization by adjusting weights based on perceptual importance, defined as:

<math display="block"> E = \beta \sum_{i=1}^N \alpha_i \| T(c_i^+ - c_i) \|^2 + \sum_{i \neq j} \| T(c_i^+ - c_j^+) - T(c_i - c_j) \|^2. </math>

Here:
* <math>\alpha_i</math> assigns weights to each pixel, prioritizing the preservation of colors with smaller perception errors,
* <math>\| T(c_i^+ - c_i) \|</math> measures the perceptual difference after recoloring,
* <math>\| T(c_i^+ - c_j^+) - T(c_i - c_j) \|</math> quantifies the deviation in color contrast under CVD simulation.

This framework improves both contrast and personalization but requires further optimization for real-time performance.

==== Confusion lines based Method ====
Tsekouras et al. [10] proposed a novel image recoloring approach for individuals with protanopia and deuteranopia, focusing on improving color naturalness and enhancing contrast. Their framework consists of four modules, with a key focus on shifting confusing colors along confusion lines in the CIE 1931 chromaticity diagram.

The process begins with fuzzy clustering, which identifies representative colors (key colors) from the input image. These key colors are then analyzed on the chromaticity diagram, where confusion lines—paths representing colors indistinguishable by individuals with CVD—serve as the basis for recoloring. Confusion lines are defined using the copunctal point of the missing cone type and another reference point:

<math display="block">
d(v, L) = \frac{\left|(x_{cp} - x_0)(y_0 - y_v) - (x_0 - x_v)(y_{cp} - y_0)\right|}{\sqrt{(x_{cp} - x_0)^2 + (y_{cp} - y_0)^2}},
</math>

where:
* <math display="inline">v = (x_v, y_v)</math> is the chromaticity coordinate of the color,
* <math display="inline">L</math> is the confusion line passing through the copunctal point <math display="inline">(x_{cp}, y_{cp})</math> and another reference point <math display="inline">(x_0, y_0)</math>,
* <math display="inline">d(v, L)</math> measures the perpendicular distance from the point <math display="inline">v</math> to the confusion line <math display="inline">L</math>.

Confusing colors, identified as key colors lying on occupied confusion lines, are iteratively shifted to the nearest non-occupied confusion lines to enhance discriminability for CVD viewers. High-ranking colors, determined by their prominence in image clusters, are shifted to the nearest unoccupied confusion lines. This reallocation ensures that these colors are distinguishable to viewers with CVD while minimizing disruption to the image's overall color harmony.

After shifting, the luminance of the recolored key colors is optimized using a regularized objective function to balance naturalness and contrast:
<math display="block">
E = (E_1 + E_2) + \lambda E_3,
</math>

where:
* <math display="inline">E</math> is the total loss,
* <math display="inline">\lambda</math> is a weight parameter controlling the trade-off between contrast enhancement and naturalness preservation.

The first term, <math display="inline">E_1</math>, measures contrast enhancement for normal trichromats:

<math display="block">
E_1 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - b_j\| - \|f_D(a_{i,\text{rec}}) - f_D(b_j)\| \right|,
</math>

where:
* <math display="inline">n_A</math> and <math display="inline">n_B</math> are the number of key colors in clusters <math display="inline">A</math> and <math display="inline">B</math>, respectively,
* <math display="inline">a_i</math> is the chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">b_j</math> is the chromaticity of the <math display="inline">j</math>-th key color in cluster <math display="inline">B</math>,
* <math display="inline">f_D</math> is a function simulating the dichromatic vision of individuals with color vision deficiencies,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color.

The second term, <math display="inline">E_2</math>, measures contrast enhancement for dichromats:

<math display="block">
E_2 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - a_j\| - \|f_D(a_{i,\text{rec}}) - f_D(a_{j,\text{rec}})\| \right|,
</math>

where:
* <math display="inline">a_i</math> and <math display="inline">a_j</math> are the chromaticities of the <math display="inline">i</math>-th and <math display="inline">j</math>-th key colors in cluster <math display="inline">A</math>,
* <math display="inline">f_D(a_{i,\text{rec}})</math> simulates the dichromatic perception of the recolored chromaticity <math display="inline">a_{i,\text{rec}}</math>.

The third term, <math display="inline">E_3</math>, preserves the naturalness of the recolored image:

<math display="block">
E_3 = \frac{1}{n_A} \sum_{i=1}^{n_A} \|a_i - a_{i,\text{rec}}\|,
</math>

where:
* <math display="inline">a_i</math> is the original chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">\|a_i - a_{i,\text{rec}}\|</math> is the Euclidean distance between the original and recolored chromaticities, measuring how much the naturalness is preserved.

This method significantly enhances the contrast and naturalness of recolored images by leveraging confusion line geometry and regularized optimization. However, challenges remain in achieving real-time performance and handling cases where shifting may distort the aesthetic quality of the image.

==== GMM-based Method ====
Huang et al. [11] proposed an efficient and effective re-coloring algorithm for individuals with CVD using a Gaussian Mixture Model (GMM) to represent color distributions. The algorithm comprises four main steps: feature extraction, clustering using GMM, optimization of Gaussian components, and interpolation for recoloring.

Step 1 - Feature Extraction:
Each pixel in the input image is represented in the CIEL*a*b* color space, which approximates perceptual differences using the Euclidean distance between colors. The color feature vector <math display="inline">x</math> is used as input for clustering.

Step 2 - Clustering via GMM:
The color distribution of the image is modeled using a GMM with <math display="inline">K</math> Gaussian components:
<math display="block">
p(x|\Theta) = \sum_{i=1}^K \omega_i G_i(x|\theta_i),
</math>
where:
* <math display="inline">\Theta</math> is the parameter set containing all weights, means, and covariance matrices,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian,
* <math display="inline">G_i(x|\theta_i)</math> is the 3D normal distribution with parameters <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix).

Step 3 - Optimization:
To ensure color distinguishability for CVD viewers, the algorithm adjusts the mean vector of each Gaussian component using an optimization function that preserves the symmetric Kullback-Leibler (KL) divergence:
<math display="block">
D_{sKL}(G_i, G_j) = D_{KL}(G_i \| G_j) + D_{KL}(G_j \| G_i),
</math>
where:
* <math display="inline">D_{KL}(G_i \| G_j)</math> measures the dissimilarity between two Gaussian distributions <math display="inline">G_i</math> and <math display="inline">G_j</math>.

The optimization aims to preserve the contrast perceived by CVD viewers while maintaining naturalness. Weights are assigned to Gaussian components based on the perceptual importance of colors:
<math display="block">
\lambda_i = \frac{\sum_{j=1}^N \alpha_j p(i|x_j, \Theta)}{\sum_{k=1}^K \sum_{j=1}^N \alpha_j p(k|x_j, \Theta)},
</math>
where:
* <math display="inline">\alpha_j = \|x_j - \text{Sim}(x_j)\|</math> is the perceptual error of the <math display="inline">j</math>-th color feature when simulated for CVD,
* <math display="inline">\text{Sim}(\cdot)</math> is the simulation function for CVD perception.

Step 4 - Interpolation for Recoloring:
After optimizing the Gaussians, the mapping function <math display="inline">M_i(\cdot)</math> relocates the mean vectors while maintaining covariance matrices. Interpolation ensures smooth transitions between recolored regions:
<math display="block">
T(x_j)_H = x_j^H + \sum_{i=1}^K p(i|x_j, \Theta) (M_i(\mu_i)_H - \mu_i^H),
</math>
where:
* <math display="inline">T(x_j)_H</math> is the hue adjustment for the <math display="inline">j</math>-th color,
* <math display="inline">M_i(\mu_i)_H</math> is the mapped hue of the <math display="inline">i</math>-th Gaussian's mean.

While the GMM-based approach effectively models color distributions and enhances the contrast of recolored images significantly, it has limitations:
* The accuracy of recoloring depends on the choice of <math display="inline">K</math>, which may vary for different images.
* The method assumes diagonal covariance matrices for computational efficiency, which may oversimplify real-world color distributions. Sometimes the colors in the recolored images are not very natural.
* The high computational complexity in the optimization step of this algorithm may be difficult for real-time applications.

=== Deep Learning based methods ===
Conventional methods for recoloring, including optimization-based approaches (as discussed above), fail to generalize well across varying severity levels and CVD types. While these methods improve color differentiation, they frequently compromise naturalness or require extensive computational resources, making them less suitable for real-time, efficient, personalized applications.

==== GAN-Based Recoloring for CVD ====

In [1] GANs (Generative Adversarial Networks) was explored for recoloring, with a backbone Pix2Pix-GAN, Cycle-GAN, and Bicycle-GAN structure showing promising results. These models are generate creative recolored images by learning mappings between normal and CVD-affected color spaces. However, this and existing GAN approaches struggle with balancing naturalness and contrast. This specific reference also requires paired datasets (since it is adapted from style transfer), making it computationally intensive and less suitable for personalization.

==== Swin Transformer Recoloring ====

The authors in [2] introduced a hierarchical vision transformer (SWIN) architecture that processes images through shifted windows, effectively capturing both local and global contextual information. In computer vision, this design generally allows efficient handling of high-resolution images and has been applied to various tasks, including image classification and object detection. Despite its robust performance, this architecture is still computationally intensive and does not inherently account for the specific needs of CVD individuals, as it lacks mechanisms for personalized color adjustments.

==== Personalized CVD-GAN ====

To cater to the diverse needs of the CVD population, the Personalized CVD-GAN [3] was developed. This model generates images that are not only CVD-friendly but also tailored to individual degrees of color vision deficiency. By disentangling color representations using a unique triple-latent structure in their method, continuous personalization was possible to adjust images according to specific CVD severities. While effective, this approach is computationally demanding, making it less practical for real-time applications. In our experiment, it took around 18 days for one epoch (or one iteration over the entire dataset).

Thus, existing methods either lack personalization or are too resource-intensive for widespread use.

== Methods ==
We aim to find effective and efficient ways to recolor images for people with CVD with the personalization of different severity levels. We start by exploring existing methods and identifying opportunities for improvement. Since mathematical-based approaches provide a solid foundation and are well-documented, we began our experiments by testing these methods, as described in the background. We later extended our exploration to deep learning based methods.

=== Mathematical based ===
We explored four main methods, building on the foundational work discussed in the background section.

==== Method 1: Daltonization as a baseline ====
We started with the relatively intuitive Daltonization method, where we adjusted the colors in an image to compensate for color vision deficiencies by simulating how the colors appear to individuals with CVD. This involves computing the difference between the original and simulated color perception in the LMS (Long, Medium, Short wavelength) color space. The calculated error is then corrected and mapped back to the RGB space using a transformation matrix, resulting in a recolored image that enhances color differentiation for viewers with CVD.

The simulation of CVDs relies on the physiology of human vision, particularly the responses of the Long (L), Medium (M), and Short (S) wavelength-sensitive cones in the retina. The LMS color space is derived from the spectral sensitivities of these cones, making it an ideal framework for modeling human color perception.

To simulate CVD, we first transformed colors in RGB color space into the LMS color space using the following linear transformation matrix based on Stockman and Sharpe’s cone fundamentals:
<math display="block">
T_{\text{RGB-to-LMS}} = \begin{bmatrix}
0.3904725 & 0.54990437 & 0.00890159 \\
0.07092586 & 0.96310739 & 0.00135809 \\
0.02314268 & 0.12801221 & 0.93605194
\end{bmatrix}
</math>

For individuals with CVD, the missing cone’s response is replaced by a weighted combination of the remaining two cones. This approach, introduced by Brettel, Viénot, and Mollon (1997) [7], uses specific coefficients derived from cone sensitivities. For example, in protanopia (L-cone deficiency), the L-cone response is approximated using the M- and S-cone responses as:
<math display="block">
L_{\text{simulated}} = 0 \cdot L + 0.90822864 \cdot M + 0.008192 \cdot S
</math>

For deuteranopia (M-cone deficiency), the M-cone is replaced as:
<math display="block">
M_{\text{simulated}} = 1.10104433 \cdot L + 0 \cdot M - 0.00901975 \cdot S
</math>

For tritanopia (S-cone deficiency), the S-cone is replaced as:
<math display="block">
S_{\text{simulated}} = -0.15773032 \cdot L + 1.19465634 \cdot M + 0 \cdot S
</math>

These transformations allow accurate simulation of the perceptual experience of individuals with CVD. (The numbers are derived from [5]).

The error between the original and simulated is then mapped into the RGB color space using a deficiency-specific correction matrix, which adjusts the image to enhance contrast and recover lost color differences. The predefined correction matrix is applied to the error in RGB space, transforming it back into LMS space for final adjustments. The corrected LMS values are added back to the original values, producing a recolored image that improves visual accessibility for viewers with CVD. This approach uses the Daltonize-inspired correction matrix:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

==== Optimization-based method ====
To improve the results from the Daltonization method, we designed a framework inspired by methods discussed in the background, incorporating dominant color extraction, optimization-based recoloring, and edit propagation. This approach aims to find a balance between the naturalness and contrast while compensating colors that are not visible for corresponding CVD types.

===== 1. Extraction of Dominant Colors =====
We begin by extracting the dominant colors from the input image using fuzzy clustering via a MiniBatch K-means algorithm. This step identifies a reduced set of representative colors that capture the primary color information in the image:
<math display="block">
\mathbf{C} = \{\mathbf{c}_1, \mathbf{c}_2, \ldots, \mathbf{c}_N\},
</math>
where <math display="inline">N</math> represents the number of clusters, and <math display="inline">\mathbf{c}_i</math> represents the centroid of the <math display="inline">i</math>-th cluster.

===== 2. Optimization-Based Recoloring =====
Once the dominant colors are extracted, we apply an optimization process to adjust these colors. The optimization uses the formulas mentioned in, and aims to balance two key objectives:

1. Naturalness Preservation: Ensures the recolored image minimally deviates from the original.
<math display="block">
E_{\text{nat}} = \sum_{i=1}^N \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_i^{\text{original}}) \|^2,
</math>
where <math display="inline">\mathbf{T}</math> is the transformation matrix based on the severity and type of CVD, and <math display="inline">\mathbf{c}_i^{\text{original}}</math> is the original color.

2. Contrast Enhancement: Improves the differentiation of colors for individuals with CVD:
<math display="block">
E_{\text{cont}} = \sum_{i=1}^N \sum_{j>i} \left( \| \mathbf{T} (\mathbf{c}_i - \mathbf{c}_j) \|^2 - \| \mathbf{c}_i^{\text{original}} - \mathbf{c}_j^{\text{original}} \|^2 \right)^2.
</math>

The total objective function combines these two terms:
<math display="block">
E = \beta E_{\text{nat}} + E_{\text{cont}},
</math>
where <math display="inline">\beta</math> controls the trade-off between naturalness and contrast.

Optimization is performed using the L-BFGS-B algorithm to ensure efficient convergence under bounded constraints.

The transformation matrices for each type of CVD are the following, which are based on [13]:
<math display="block">
\text{Protanopia Transformation Matrix} = \begin{bmatrix}
0.566 & 0.558 & 0 \\
0.433 & 0.442 & 0.242 \\
0 & 0 & 0.758
\end{bmatrix},
</math>

<math display="block">
\text{Deuteranopia Transformation Matrix} = \begin{bmatrix}
0.625 & 0.7 & 0 \\
0.375 & 0.3 & 0.3 \\
0 & 0 & 0.7
\end{bmatrix},
</math>

<math display="block">
\text{Tritanopia Transformation Matrix} = \begin{bmatrix}
0.95 & 0 & 0 \\
0.05 & 0.433 & 0 \\
0 & 0.567 & 1
\end{bmatrix}.
</math>

===== 3. Edit Propagation =====
After optimizing the dominant colors, we propagate these edits across the entire image to ensure smooth transitions. The propagation step uses interpolation in the CIE-Lab color space:
<math display="block">
\Delta L^* = \text{griddata}(\mathbf{c}^{\text{original}}, \mathbf{c}^{\text{recolored}} - \mathbf{c}^{\text{original}}, \mathbf{I}),
</math>
where <math display="inline">\mathbf{I}</math> represents the pixel values in the Lab color space, and <math display="inline">\Delta L^*</math> adjusts the luminance values. The recolored image is reconstructed by applying the interpolated changes back to the original image and converting it to RGB.

=== Deep Learning based ===

==== Task Overview ====
Given an input RGB image and a label for the user (as shown in the figure), we want a deep learning model to output a recolored RGB image that is specific to that user. More details on inputs and outputs are discussed in further sections but an overview is shown in Figure 1. All of the code was done in Python using a deep learning framework called [https://pytorch.org PyTorch]
[[File:Io.png|right|thumb|200px|Figure 1: Dataset]]

==== Types ====
1. ''' Supervised methods ''':
These are deep learning models that require a 'ground truth' recolored image for the neural network to learn recolorization. While these methods are simple, easy to train and integrate the user label, they require an already present ground truth comparison of expected output.

2. ''' Unsupervised methods ''':
These models are trained without a ground truth and can also encode user label information while training. They are generally better at generating more natural images, but they require more compute and sophisticated model architectures or loss functions for the recoloring task

==== Dataset ====
The dataset used for this project was constructed specifically to address the challenges of recoloring images for individuals with color vision deficiency (CVD). We first gathered an open-source RGB image dataset from [2]. To improve the capability of the proposed model to enhance the contrast between CVD-indistinguishable color
pairs, in their study, they created a new dataset consisting of 141,000 pictures of both natural scenes and artificial images containing
CVD-confusing colors without labels. To generate labels (and ground truth recolored images for supervised methods), we randomly sampled 15,000 images and recolored by simulating random labels for severity and type of CVD. The recoloring for ground truth images was done using a [https://github.com/jbhuang0604/RecolorForColorblind/tree/master MATLAB script] (adapted to Python) from [4]. Note: The open-source tools used in the Python version for the recoloring script were [https://scikit-image.org Scikit-Image], [https://scipy.org Scipy] and [https://python-colormath.readthedocs.io/en/latest/ Colormath].

As shown in Figure 1, each sample in the dataset consists of:

1. ''' Original RGB Image''' : High-resolution images, resized to <code> 256x256</code> pixels and normalized to <code>[0,1]</code> range, representing the standard color space.

2. ''' CVD Labels ''' : Condition labels encoded as <code>severity * [protan, deutan]</code>, where severity ranges from 0.1 to 1.0. For example, a label <code>[0.6, 0]</code> corresponds to protanopia at 60% severity.

Data augmentation techniques such as random rotations, crops, and brightness adjustments were applied to expand the dataset, ensuring robust model generalization across diverse scenarios.

==== Supervised Methods ====
===== Conditional Parallel RGB MLP =====
[[File:mlp.png|right|thumb|Figure 2: Conditional MLP architecture]]
As shown in Figure 2, the model predicts the R, G, and B channels separately using an independent multi-layer perceptron (MLP) for each channel. The input image is concatenated with the label encoding along the channel dimension and is passed to 3 parallel MLPs simultaneously. These parallel networks are learned to predicted R, G, B channels of a recolored image based on given ground truth. The outputs from each of these networks are concatenated to produce the recolored RGB image of same spatial dimensions as input. Essentially, each channel is disentangled, enabling targeted adjustments.

The loss function used to train was pixel wise, mean-squared error loss:
<math>
\mathcal{L}_{\text{MSE}} = \frac{1}{N} \sum_{p=1}^{N} \left( I(p) - I'(p) \right)^2
</math>

Where:
* I, I': Recolored (model output) image and ground truth recolored image respectively
* p: Image index
* N: Total number of images

===== Conditional U-Net =====
In a similar fashion of inputs, a convolutional neural network (CNN)-based U-Net architecture was tested to generate a full recolored image as output. The conditional inputs here affect both the encoder and decoder. [[File:Unet condtional.png|right|thumb|Figure 3: Conditional U-Net architecture]]
U-Nets are widely used in computer vision tasks and are very robust to new tasks as well. The architecture we adopted is shown in Figure 3.
The loss function used to train the U-Net was a commonly used VGG Perceptual Loss:
<math>
\mathcal{L}_{\text{VGG}} = \sum_{l} \frac{1}{N_l} \| \phi_l(I) - \phi_l(I') \|_2^2
</math>

Where:
* I and I': are recolored (model output) and ground recolored images respectively
* <math>\phi_l</math> is the l-th of the pre-trained VGG network

==== Unsupervised Methods ====
===== Conditional Autoencoder =====
As shown in Figure4, an unsupervised CNN-based encoder-decoder network was trained to reconstruct full recolored images with a CVD-aware color palette. The key to making this network align with the recoloring task was the loss functions. The loss functions we used to train this network were inspired from [2]. [[File:Ae.png|right|350px|thumb|Figure 4: Conditional Autoencoder architecture]]

The total loss function is given by:
<math>
\mathcal{L}_{\text{total}} = \alpha \cdot \mathcal{L}_{\text{naturalness}} + 2 \cdot (1 - \alpha) \cdot \mathcal{L}_{\text{contrast}}
</math>

Where:
<math>
\mathcal{L}_{\text{contrast}} = \beta \cdot \mathcal{L}_{\text{global}} + (2 - \beta) \cdot \mathcal{L}_{\text{local}}
</math>

The components of the loss functions are described below:

1. '''Global Contrast Loss''':
The global contrast loss ensures that the overall contrast of the recolored image is preserved. It is defined as
<math>
\mathcal{L}_{global} = \frac{1}{\|\omega\|} \sum_{<x, y> \in \epsilon \omega} \text{CL}(x, y)
</math>

2. '''Local Contrast Loss''':
The local contrast loss focuses on preserving the contrast within a small neighborhood around each pixel. <math>
\mathcal{L}_{l} = \frac{1}{N} \sum_{x=1}^{N} \sum_{y \in \omega_x} \frac{\text{CL}(x, y)}{\|\omega_x\|}
</math>

Note:

<math>
\text{CL}(x, y) = \|\hat{c}_x' - \hat{c}_y'\| - \|c_x - c_y\|
</math>

* x,y: Two distinct pixels in the image.
* cx and cy: CVD simulated colors of original image
* c^x′and c^y: CVD simulated colors of recolored image (model output)
* ||w||: Global (or large) window of image
* ||wx||: Local window or neighborhood around a pixel x

3. '''Naturalness Loss''':
The naturalness loss drives output image to have colors that are visually similar and close to natural distributions. <math>
\mathcal{L}_{\text{natural}} = 1 - \text{SSIM}(I', I)
</math>

Where:
* I(i), I'(i): Original and recolored images respectively

== Results ==
=== Mathematical based methods ===
{| class="wikitable"
|+ Table 1: Quantitative Evaluation Results for Mathematical Methods
! !! Method 1 !! Method 2 !! Method 3 !! Method 4
|-
! colspan="5" | Performance
|-
| Time/image || 0.2s || 1m13s || 4.4s || 1.6s
|-
! colspan="5" | SSIM Metrics
|-
| Original vs Recolored || 0.0066 || 0.9998 || 0.9988 || 0.9902
|-
| Original vs Original Simulated || 0.9985 || 0.9985 || 0.9985 || 0.9985
|-
| Recolored vs Recolored Simulated || 0.9565 || 0.9986 || 0.9986 || 0.9968
|-
! colspan="5" | TCC Metrics
|-
| Original vs Recolored || 0.4211 || 0.0001 || 0.0003 || 0.0005
|-
| Original vs Original Simulated || 0.0004 || 0.0003 || 0.0003 || 0.0003
|-
| Recolored vs Recolored Simulated || 0.0380 || 0.0003 || 0.0002 || 0.0005
|-
! colspan="5" | CD ΔE76 Metrics
|-
| Original vs Recolored || 57.4513 || 0.0217 || 0.0632 || 0.1057
|-
| Original vs Original Simulated || 0.0462 || 0.0462 || 0.0462 || 0.0462
|-
| Recolored vs Recolored Simulated || 8.4251 || 0.0458 || 0.0435 || 0.0578
|-
! colspan="5" | CIEDE2000 Metrics
|-
| Original vs Recolored || 41.2667 || 0.0229 || 0.0675 || 0.1312
|-
| Original vs Original Simulated || 0.0681 || 0.0681 || 0.0681 || 0.0681
|-
| Recolored vs Recolored Simulated || 6.9145 || 0.0671 || 0.0630 || 0.0838
|-
! colspan="5" | CIEDE94 Metrics
|-
| Original vs Recolored || 57.3637 || 0.0217 || 0.0630 || 0.1056
|-
| Original vs Original Simulated || 0.0461 || 0.0461 || 0.0461 || 0.0461
|-
| Recolored vs Recolored Simulated || 5.3878 || 0.0457 || 0.0434 || 0.0576
|-
! colspan="5" | D-CIELAB ΔEab Metrics
|-
| Original vs Recolored || 2.1314 || 3.8863 || 7.6867 || 8.0045
|-
| Original vs Original Simulated || 1.7209 || 1.7209 || 1.7209 || 1.7209
|-
| Recolored vs Recolored Simulated || 1.5926 || 1.9673 || 1.4363 || 2.4009
|}

=== Deep Learning based methods ===
The results focus on evaluating the performance of the above neural network architectures—Conditional Parallel RGB MLP, Deep U-Net, and Conditional Autoencoder. Quantitive metrics such as Structural Similarity Index (SSIM), total color contrast (TCC), Chromatic Difference (CD), and inference time were used to assess the effectiveness of the models provided in [1] and [2].

==== Qualitative Results ====
The recolored outputs were visually evaluated to determine their alignment with expected results. The 'expected' results for supervised mean how closely they resemble ground truth recolored image and for unsupervised method mean how much contrast and naturalness is observed in the CVD simulated recolored images compared to original.
The results and takeaways can be summarized as follows:

1. '''Conditional Parallel RGB MLP''': (Figure 5)
[[File:Mlp_res.png|right|400px|thumb|Figure 5 Conditional MLP: Model failure]]
* Recoloring was inconsistent, with visible artifacts in regions where spatial correlations were essential.
* The pixels seemed more discretized, suggesting that disentanglement was not very useful for this case (especially naturalness).
* Failed to preserve natural color transitions, particularly in complex images.
2. '''Conditional U-Net''': (Figure 6, 7)
[[File:Unet_res1.png|right|400px|thumb|Figure 6 Conditional U-Net: Model failure]]
[[File:Unet_res2.png|right|400px|thumb|Figure 7 Conditional U-Net: CVD Simulated examples]]
* Produced stable recoloring, preserving structural details.
* Initially showed improvement towards resembling ground truth, but over time started 'reconstructing' the colors of the original image.
* The CVD simulations of recolored versus original were similar or worse meaning that the model was not doing well for this task
* Sometimes it over-saturated some colors, affecting the visual appeal.
3. '''Conditional Autoencoder''': (Figure 8, 9)
[[File:ae_res1.png|right|400px|thumb|Figure 8 Conditional Autoencoder: Majority good results]]
[[File:ae_res1.png|right|400px|thumb|Figure 9 Conditional Autoencoder: Marginal or negative improvement + Blurriness]]
* Achieved smooth and natural recoloring, with fewer artifacts.
* Showed the highest contrast improvement among the three models.
* In some cases, hurt the contrast in the CVD simulated colors and in some there was marginal improvement in contrast.
* Blurriness in the recolored images was seen (possibly due to naturalness factor being more prioritized even though weight coefficients in the loss term favored contrast (alpha = 0.25, beta = 1.0)).

==== Quantitative Results ====
Based on the above qualitative results, we decided to score and evaluate metrics for comparison with related work only using the Conditional Autoencoder.
As mentioned above, the evaluation metrics are adapted from [1] and [2]. Please refer to the definitions in the paper, as we have used the same. On a high level, the three components are:
* SSIM: Measures the structural similarity between the original and recolored images, ensuring the structural integrity of the recolored image is maintained.
<math>
SSIM(X, Y) = \frac{(2\mu_X\mu_Y + c_1)(2\sigma_{XY} + c_2)}{(\mu_X^2 + \mu_Y^2 + c_1)(\sigma_X^2 + \sigma_Y^2 + c_2)}
</math>

* Total Color Contrast: Quantifies the visibility improvement between indistinguishable colors for CVD individuals.
<math>
TCC = \frac{1}{n_1} \sum_{(i,j) \in \Omega_1} |x_i - x_j|
+ \frac{1}{N \cdot n_2} \sum_{i=1}^{N} \sum_{j \in \Omega_2} |x_i - x_j|
</math>
* Chromatic Difference: Quantifies the perceptual differences in color before and after recoloring, ensuring enhanced distinguishability
<math>
CD(i) = \sqrt{\lambda (l_i' - l_i)^2 + (a_i' - a_i)^2 + (b_i' - b_i)^2}
</math>
(lamda is a constant, not wavelength and l,a,b represent LAB space coordinates of recolored (') and original respectively.)
* Inference Time: Determines the computational efficiency of the models.

The key results are in Table 2 and takeaways for the Conditional Autoencoder can be summarized as follows:

{| class="wikitable" style="text-align:center; width:30%; margin:auto;"
|-
! Metric
! Value
|-
| Inference Time
| 2.6 seconds/image
|-
| SSIM ("Structure")
| 0.8707
|-
| Total Color Contrast ("Distinguishability")
| 0.5771 / (~0.851)*
|-
| Chromatic Difference ("Color")
| 0.3521 / (~0.963)*
|+ '''Table 2: Quantitative Evaluation Results'''
|}

Note: * indicates results from paper [2] for protan/deutan whichever is larger.

* TCC and CD are good but not as good as paper [2] because they use optimize networks for each CVD type separately.
* Blurry (SSIM is not optimized for enough)
* Mixing CVD types in the same network needs to be more sophisticated

== Conclusions ==
Through our (many) experiments, we learned a couple of things:

1. '''Model Effectiveness''':
Among the models, the Conditional Autoencoder showed the best balance between enhancing color contrast and preserving naturalness. It improved the distinguishability of colors for CVD individuals while maintaining a smooth, visually appealing output. However, it produced slightly blurry images, which could be improved with better loss functions or refinement techniques. The Conditional U-Net was also effective in preserving structure and providing stable recoloring, but it required careful training to avoid overfitting. The Conditional Parallel RGB MLP, while computationally fast, lacked the ability to capture spatial relationships between pixels, making it unsuitable for this task.

2. '''Importance of Loss Functions''':
Designing appropriate loss functions was crucial for achieving the right balance between naturalness, contrast enhancement, and structural preservation. The global and local contrast losses significantly improved the visibility of recolored images, while the naturalness loss ensured that the outputs did not look artificial. Incorporating metrics like SSIM and Chromatic Difference into the evaluation also helped us better understand how well the models performed.

3. '''Challenges with Data''':
One of the biggest challenges was ensuring that the dataset effectively represented real-world scenarios for CVD individuals. Simulating CVD perceptions and generating recolored images that matched those perceptions required a well-defined pipeline. A more diverse dataset or additional user studies with CVD participants could help fine-tune the models further.

4. '''Computational Efficiency''':
While models like the Conditional Autoencoder and Conditional U-Net provided high-quality recoloring, their inference times were moderate, making them feasible for real-time applications. Optimizing these models further could make them more scalable for real-world use cases, such as accessibility tools in apps or websites.

5. '''What Worked and What Didn’t''':
* Worked: Contrast enhancement methods using local and global losses were effective in improving visibility for CVD individuals. Transformer-inspired loss functions borrowed from Swin architecture added robustness.
* Didn’t Work: Pixel-wise methods like the Conditional RGB MLP struggled due to their inability to handle spatial dependencies. Additionally, overfitting was a recurring issue in larger architectures without careful training.

6. '''Future Directions''':
* Better Loss Functions: Refining the loss functions to address issues like blurriness in outputs could further improve results.
* User Studies: Testing the models with real CVD participants would provide valuable insights and help validate the results.
* Model Optimization: Reducing the computational cost of high-performing models like the Conditional Autoencoder could make them more practical for deployment.
* Exploration of New Architectures: Trying newer methods, such as lightweight transformers or diffusion-based models, might enhance recoloring performance while maintaining efficiency.

While there’s still room for improvement, our models demonstrated the potential of deep learning in addressing the challenges faced by individuals with CVD. Our future work would focus on refining these methods and bringing them closer to practical, everyday applications.

== References ==
[1] Li, H., Zhang, L., Zhang, X., Zhang, M., Zhu, G., Shen, P., ... & Shah, S. A. A. (2020). Color vision deficiency datasets & recoloring evaluation using GANs. Multimedia Tools and Applications, 79, 27583-27614.

[2] Chen, L., Zhu, Z., Huang, W., Go, K., Chen, X., & Mao, X. (2024). Image recoloring for color vision deficiency compensation using Swin transformer. Neural Computing and Applications, 36(11), 6051-6066.

[3] Jiang, S., Liu, D., Li, D., & Xu, C. (2023). Personalized image generation for color vision deficiency population. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22571-22580).

[4] Huang, J.-B., Chen, C.-S., Jen, T.-C., & Wang, S.-J. (n.d.). Image recolorization for the colorblind [GitHub repository]. Retrieved December 12, 2024, from https://github.com/jbhuang0604/RecolorForColorblind

[5] Dietrich, J. (n.d.). Daltonize Python Package [GitHub repository]. Retrieved December 12, 2024, from https://github.com/joergdietrich/daltonize/blob/main/daltonize/daltonize.py

[6] Dougherty, B., & Wade, A. (2000). Vischeck. Retrieved December 12, 2024, from https://www.vischeck.com/

[7] Brettel, H., Viénot, F., & Mollon, J. D. (1997). Computerized simulation of color appearance for dichromats. Josa a, 14(10), 2647-2655.

[8] Zhu, Z., Toyoura, M., Go, K., Fujishiro, I., Kashiwagi, K., & Mao, X. (2019). Processing images for red–green dichromats compensation via naturalness and information-preservation considered recoloring. The Visual Computer, 35, 1053-1066.

[9] Zhu, Z., Toyoura, M., Go, K., Kashiwagi, K., Fujishiro, I., Wong, T. T., & Mao, X. (2021). Personalized image recoloring for color vision deficiency compensation. IEEE Transactions on Multimedia, 24, 1721-1734.

[10] Tsekouras, G. E., Rigos, A., Chatzistamatis, S., Tsimikas, J., Kotis, K., Caridakis, G., & Anagnostopoulos, C. N. (2021). A novel approach to image recoloring for color vision deficiency. Sensors, 21(8), 2740.

[11] Huang, J. B., Chen, C. S., Jen, T. C., & Wang, S. J. (2009, April). Image recolorization for the colorblind. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 1161-1164). IEEE.

== Appendix I ==
* [https://github.com/rainasong/psych221-aut24-final-project.git Code]
* [https://drive.google.com/drive/folders/10WMXPbtpV7Hy5_qBA_TCEbW-kCpj1D7v Dataset]

=== Additional results ===
1. '''Recolored Images - Conditional Autoencoder'''
<div style="display: inline; width: 220px; float: center;">
[[File:eb_1.png|400 px|Wikipedia encyclopedia]][[File:eb_2.png|400 px]] </div>

2. '''Loss curves'''
<div style="display: inline; width: 800px; float: center;">
[[File:loss_ae.png|300 px|center|thumb|Losses - Conditional Autoencoder]][[File:loss_unet.png|300 px|thumb|center|Losses - Conditional U-Net]][[File:loss_mlp.png|300 px|center|thumb|Losses - Conditional MLP]]</div>

== Appendix II ==
'''Ishikaa''':
* Training, evaluation and visualization for all deep learning methods (MLP, U-Net and Autoencoder)
* GMM recoloring method in Python & adding severity index
* 'Ground Truth' dataset creation and logging
* AWS Compute setup & configuration
* Written Report & Presentation

'''Raina''':

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-13T09:31:06Z

Rainas: /* GMM-based Method */

== Introduction ==
Color Vision Deficiency (CVD) affects approximately 350 million individuals worldwide, impairing their ability to distinguish certain colors. Image recoloring for individuals with CVDs has been a well-researched area, with numerous attempts aimed at creating images that make colors more distinguishable for those with CVDs. CVDs are typically classified into three main types: protanopia (difficulty perceiving red), deuteranopia (difficulty perceiving green), and tritanopia (difficulty perceiving blue). These conditions arise due to the absence or malfunction of certain types of cone cells in the retina, which are responsible for color vision. For instance, the lack or defect of red or green cones leads to protanopia or deuteranopia, respectively. This can result in partial or complete loss of perception of specific colors. Moreover, individuals with CVD can be dichromats (completely missing one type of cone cell), or anomalous trichromats (having all three types of cones but with altered sensitivity), causing milder color perception issues. Most research has focused on protanopia and deuteranopia, as these conditions are more prevalent, and only a few consider different severity levels.

In this work, we aim to consider all three types of CVDs, taking into account varying levels of severity for personalization. We explore existing methods in the field and experiment through two main approaches: mathematical transformations and deep learning techniques. We will start by reviewing current advancements in these two domains, followed by presenting our experiments and results. Evaluations of each method will be provided, leading to a discussion of our findings and outlining potential directions for future work.

The motivation for this work arose from a personal experience. While admiring a rainbow, a phenomenon I find among the most beautiful in the world, with a friend who has deuteranopia, I realized that they were unable to distinguish the vibrant array of colors. This experience highlighted the emotional and perceptual gap caused by CVD, inspiring the goal of this project: to develop personalized and efficient tools that enhance color perception for individuals with CVDs. Ultimately, we aim to enable those with CVDs to enjoy the same vivid experiences, such as the beauty of a rainbow, experienced by those with normal color vision.

== Background ==
In recent years, numerous methods have been developed to recolor images for individuals with CVDs, ranging from traditional mathematical approaches to advanced deep learning techniques. This section focuses on the prominent recent works in these two categories.

=== Mathematical-based methods ===
Mathematical approaches to image recoloring for individuals with CVDs have been extensively developed to enhance color discrimination while trying to preserve the natural appearance of images. These methods typically involve color space transformations, optimization techniques, and perceptual modeling to achieve their objectives.

==== Daltonization ====
Daltonization enhances images for individuals with CVD by correcting colors based on the simulated deficiency. The process involves comparing the original LMS values with the simulated deficient values to compute the error:
<math display="block">
\text{Error}_{\text{LMS}} = \text{LMS}_{\text{original}} - \text{LMS}_{\text{simulated}}
</math>

The error is then mapped back to the RGB space using a correction matrix because the error contains the information that dichromats cannot see, and the correction matrix rotates it to a part of the spectrum that they can see. For example, the correction matrix, as implemented in tools like Daltonize [5] and Vischeck [6], is:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

The corrected RGB values are added back to the original LMS values to generate a daltonized image that improves contrast for CVD viewers.

==== Optimization-based Method ====
Zhu et al. [8] introduced an optimization-based recoloring framework for red-green dichromacy, aiming to balance naturalness and contrast. The framework minimizes a total loss function defined as:

<math display="block"> E = \beta E_{\text{nat}} + E_{\text{cont}} </math>

where <math>\beta</math> is a scalar weight that controls the trade-off between the two objectives: naturalness preservation (<math>E_{\text{nat}}</math>) and contrast enhancement (<math>E_{\text{cont}}</math>).

The naturalness term, <math>E_{\text{nat}}</math>, ensures that the recolored image closely resembles the original image for CVD viewers by minimizing perceptual differences:

<math display="block"> E_{\text{nat}} = \sum_{i=1}^N \| c_i^+ - c_i \|^2, </math>

where:
* <math>N</math> is the total number of pixels in the image,
* <math>c_i</math> is the original color of the <math>i</math>-th pixel,
* <math>c_i^+</math> is the recolored value of the <math>i</math>-th pixel,
* <math>\| c_i^+ - c_i \|</math> is the Euclidean distance, measuring the perceptual difference between the original and recolored colors.

The contrast term, <math>E_{\text{cont}}</math>, enhances the distinguishability of colors in the recolored image by minimizing changes in color contrast:

<math display="block"> E_{\text{cont}} = \sum_{i \neq j} \| (c_i^+ - c_j^+) - (c_i - c_j) \|^2, </math>

where:
* <math>(c_i^+ - c_j^+)</math> is the perceived color difference between pixels <math>i</math> and <math>j</math> after recoloring,
* <math>(c_i - c_j)</math> is the original color difference,
* <math>\| (c_i^+ - c_j^+) - (c_i - c_j) \|</math> represents the deviation in color contrast before and after recoloring.

To address the limitations of this approach, Zhu et al. [9] proposed a degree-adaptable framework incorporating a transformation matrix <math>T</math> that simulates CVD perception. The transformation matrix is defined as:

<math display="block"> T = \begin{bmatrix} t_{11} & t_{12} & t_{13} \\ t_{21} & t_{22} & t_{23} \\ t_{31} & t_{32} & t_{33} \end{bmatrix}, </math>

where <math>t_{ij}</math> are the elements representing the relationships between the original and perceived LMS (Long, Medium, Short wavelength) cone responses for individuals with CVD.

The degree-adaptable loss function extends the optimization by adjusting weights based on perceptual importance, defined as:

<math display="block"> E = \beta \sum_{i=1}^N \alpha_i \| T(c_i^+ - c_i) \|^2 + \sum_{i \neq j} \| T(c_i^+ - c_j^+) - T(c_i - c_j) \|^2. </math>

Here:
* <math>\alpha_i</math> assigns weights to each pixel, prioritizing the preservation of colors with smaller perception errors,
* <math>\| T(c_i^+ - c_i) \|</math> measures the perceptual difference after recoloring,
* <math>\| T(c_i^+ - c_j^+) - T(c_i - c_j) \|</math> quantifies the deviation in color contrast under CVD simulation.

This framework improves both contrast and personalization but requires further optimization for real-time performance.

==== Confusion lines based Method ====
Tsekouras et al. [10] proposed a novel image recoloring approach for individuals with protanopia and deuteranopia, focusing on improving color naturalness and enhancing contrast. Their framework consists of four modules, with a key focus on shifting confusing colors along confusion lines in the CIE 1931 chromaticity diagram.

The process begins with fuzzy clustering, which identifies representative colors (key colors) from the input image. These key colors are then analyzed on the chromaticity diagram, where confusion lines—paths representing colors indistinguishable by individuals with CVD—serve as the basis for recoloring. Confusion lines are defined using the copunctal point of the missing cone type and another reference point:

<math display="block">
d(v, L) = \frac{\left|(x_{cp} - x_0)(y_0 - y_v) - (x_0 - x_v)(y_{cp} - y_0)\right|}{\sqrt{(x_{cp} - x_0)^2 + (y_{cp} - y_0)^2}},
</math>

where:
* <math display="inline">v = (x_v, y_v)</math> is the chromaticity coordinate of the color,
* <math display="inline">L</math> is the confusion line passing through the copunctal point <math display="inline">(x_{cp}, y_{cp})</math> and another reference point <math display="inline">(x_0, y_0)</math>,
* <math display="inline">d(v, L)</math> measures the perpendicular distance from the point <math display="inline">v</math> to the confusion line <math display="inline">L</math>.

Confusing colors, identified as key colors lying on occupied confusion lines, are iteratively shifted to the nearest non-occupied confusion lines to enhance discriminability for CVD viewers. High-ranking colors, determined by their prominence in image clusters, are shifted to the nearest unoccupied confusion lines. This reallocation ensures that these colors are distinguishable to viewers with CVD while minimizing disruption to the image's overall color harmony.

After shifting, the luminance of the recolored key colors is optimized using a regularized objective function to balance naturalness and contrast:
<math display="block">
E = (E_1 + E_2) + \lambda E_3,
</math>

where:
* <math display="inline">E</math> is the total loss,
* <math display="inline">\lambda</math> is a weight parameter controlling the trade-off between contrast enhancement and naturalness preservation.

The first term, <math display="inline">E_1</math>, measures contrast enhancement for normal trichromats:

<math display="block">
E_1 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - b_j\| - \|f_D(a_{i,\text{rec}}) - f_D(b_j)\| \right|,
</math>

where:
* <math display="inline">n_A</math> and <math display="inline">n_B</math> are the number of key colors in clusters <math display="inline">A</math> and <math display="inline">B</math>, respectively,
* <math display="inline">a_i</math> is the chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">b_j</math> is the chromaticity of the <math display="inline">j</math>-th key color in cluster <math display="inline">B</math>,
* <math display="inline">f_D</math> is a function simulating the dichromatic vision of individuals with color vision deficiencies,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color.

The second term, <math display="inline">E_2</math>, measures contrast enhancement for dichromats:

<math display="block">
E_2 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - a_j\| - \|f_D(a_{i,\text{rec}}) - f_D(a_{j,\text{rec}})\| \right|,
</math>

where:
* <math display="inline">a_i</math> and <math display="inline">a_j</math> are the chromaticities of the <math display="inline">i</math>-th and <math display="inline">j</math>-th key colors in cluster <math display="inline">A</math>,
* <math display="inline">f_D(a_{i,\text{rec}})</math> simulates the dichromatic perception of the recolored chromaticity <math display="inline">a_{i,\text{rec}}</math>.

The third term, <math display="inline">E_3</math>, preserves the naturalness of the recolored image:

<math display="block">
E_3 = \frac{1}{n_A} \sum_{i=1}^{n_A} \|a_i - a_{i,\text{rec}}\|,
</math>

where:
* <math display="inline">a_i</math> is the original chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">\|a_i - a_{i,\text{rec}}\|</math> is the Euclidean distance between the original and recolored chromaticities, measuring how much the naturalness is preserved.

This method significantly enhances the contrast and naturalness of recolored images by leveraging confusion line geometry and regularized optimization. However, challenges remain in achieving real-time performance and handling cases where shifting may distort the aesthetic quality of the image.

==== GMM-based Method ====
Huang et al. [11] proposed an efficient and effective re-coloring algorithm for individuals with CVD using a Gaussian Mixture Model (GMM) to represent color distributions. The algorithm comprises four main steps: feature extraction, clustering using GMM, optimization of Gaussian components, and interpolation for recoloring.

Step 1 - Feature Extraction:
Each pixel in the input image is represented in the CIEL*a*b* color space, which approximates perceptual differences using the Euclidean distance between colors. The color feature vector <math display="inline">x</math> is used as input for clustering.

Step 2 - Clustering via GMM:
The color distribution of the image is modeled using a GMM with <math display="inline">K</math> Gaussian components:
<math display="block">
p(x|\Theta) = \sum_{i=1}^K \omega_i G_i(x|\theta_i),
</math>
where:
* <math display="inline">\Theta</math> is the parameter set containing all weights, means, and covariance matrices,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian,
* <math display="inline">G_i(x|\theta_i)</math> is the 3D normal distribution with parameters <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix).

Step 3 - Optimization:
To ensure color distinguishability for CVD viewers, the algorithm adjusts the mean vector of each Gaussian component using an optimization function that preserves the symmetric Kullback-Leibler (KL) divergence:
<math display="block">
D_{sKL}(G_i, G_j) = D_{KL}(G_i \| G_j) + D_{KL}(G_j \| G_i),
</math>
where:
* <math display="inline">D_{KL}(G_i \| G_j)</math> measures the dissimilarity between two Gaussian distributions <math display="inline">G_i</math> and <math display="inline">G_j</math>.

The optimization aims to preserve the contrast perceived by CVD viewers while maintaining naturalness. Weights are assigned to Gaussian components based on the perceptual importance of colors:
<math display="block">
\lambda_i = \frac{\sum_{j=1}^N \alpha_j p(i|x_j, \Theta)}{\sum_{k=1}^K \sum_{j=1}^N \alpha_j p(k|x_j, \Theta)},
</math>
where:
* <math display="inline">\alpha_j = \|x_j - \text{Sim}(x_j)\|</math> is the perceptual error of the <math display="inline">j</math>-th color feature when simulated for CVD,
* <math display="inline">\text{Sim}(\cdot)</math> is the simulation function for CVD perception.

Step 4 - Interpolation for Recoloring:
After optimizing the Gaussians, the mapping function <math display="inline">M_i(\cdot)</math> relocates the mean vectors while maintaining covariance matrices. Interpolation ensures smooth transitions between recolored regions:
<math display="block">
T(x_j)_H = x_j^H + \sum_{i=1}^K p(i|x_j, \Theta) (M_i(\mu_i)_H - \mu_i^H),
</math>
where:
* <math display="inline">T(x_j)_H</math> is the hue adjustment for the <math display="inline">j</math>-th color,
* <math display="inline">M_i(\mu_i)_H</math> is the mapped hue of the <math display="inline">i</math>-th Gaussian's mean.

While the GMM-based approach effectively models color distributions and enhances the contrast of recolored images significantly, it has limitations:
* The accuracy of recoloring depends on the choice of <math display="inline">K</math>, which may vary for different images.
* The method assumes diagonal covariance matrices for computational efficiency, which may oversimplify real-world color distributions. Sometimes the colors in the recolored images are not very natural.
* The high computational complexity in the optimization step of this algorithm may be difficult for real-time applications.

=== Deep Learning based methods ===
Conventional methods for recoloring, including optimization-based approaches (as discussed above), fail to generalize well across varying severity levels and CVD types. While these methods improve color differentiation, they frequently compromise naturalness or require extensive computational resources, making them less suitable for real-time, efficient, personalized applications.

==== GAN-Based Recoloring for CVD ====

In [1] GANs (Generative Adversarial Networks) was explored for recoloring, with a backbone Pix2Pix-GAN, Cycle-GAN, and Bicycle-GAN structure showing promising results. These models are generate creative recolored images by learning mappings between normal and CVD-affected color spaces. However, this and existing GAN approaches struggle with balancing naturalness and contrast. This specific reference also requires paired datasets (since it is adapted from style transfer), making it computationally intensive and less suitable for personalization.

==== Swin Transformer Recoloring ====

The authors in [2] introduced a hierarchical vision transformer (SWIN) architecture that processes images through shifted windows, effectively capturing both local and global contextual information. In computer vision, this design generally allows efficient handling of high-resolution images and has been applied to various tasks, including image classification and object detection. Despite its robust performance, this architecture is still computationally intensive and does not inherently account for the specific needs of CVD individuals, as it lacks mechanisms for personalized color adjustments.

==== Personalized CVD-GAN ====

To cater to the diverse needs of the CVD population, the Personalized CVD-GAN [3] was developed. This model generates images that are not only CVD-friendly but also tailored to individual degrees of color vision deficiency. By disentangling color representations using a unique triple-latent structure in their method, continuous personalization was possible to adjust images according to specific CVD severities. While effective, this approach is computationally demanding, making it less practical for real-time applications. In our experiment, it took around 18 days for one epoch (or one iteration over the entire dataset).

Thus, existing methods either lack personalization or are too resource-intensive for widespread use.

== Methods ==
We aim to find effective and efficient ways to recolor images for people with CVD with the personalization of different severity levels. We start by exploring existing methods and identifying opportunities for improvement. Since mathematical-based approaches provide a solid foundation and are well-documented, we began our experiments by testing these methods, as described in the background. We later extended our exploration to deep learning based methods.

=== Mathematical based ===
We explored four main methods, building on the foundational work discussed in the background section.

==== Method 1: Daltonization as a baseline ====
We started with the relatively intuitive Daltonization method, where we adjusted the colors in an image to compensate for color vision deficiencies by simulating how the colors appear to individuals with CVD. This involves computing the difference between the original and simulated color perception in the LMS (Long, Medium, Short wavelength) color space. The calculated error is then corrected and mapped back to the RGB space using a transformation matrix, resulting in a recolored image that enhances color differentiation for viewers with CVD.

The simulation of CVDs relies on the physiology of human vision, particularly the responses of the Long (L), Medium (M), and Short (S) wavelength-sensitive cones in the retina. The LMS color space is derived from the spectral sensitivities of these cones, making it an ideal framework for modeling human color perception.

To simulate CVD, we first transformed colors in RGB color space into the LMS color space using the following linear transformation matrix based on Stockman and Sharpe’s cone fundamentals:
<math display="block">
T_{\text{RGB-to-LMS}} = \begin{bmatrix}
0.3904725 & 0.54990437 & 0.00890159 \\
0.07092586 & 0.96310739 & 0.00135809 \\
0.02314268 & 0.12801221 & 0.93605194
\end{bmatrix}
</math>

For individuals with CVD, the missing cone’s response is replaced by a weighted combination of the remaining two cones. This approach, introduced by Brettel, Viénot, and Mollon (1997) [7], uses specific coefficients derived from cone sensitivities. For example, in protanopia (L-cone deficiency), the L-cone response is approximated using the M- and S-cone responses as:
<math display="block">
L_{\text{simulated}} = 0 \cdot L + 0.90822864 \cdot M + 0.008192 \cdot S
</math>

For deuteranopia (M-cone deficiency), the M-cone is replaced as:
<math display="block">
M_{\text{simulated}} = 1.10104433 \cdot L + 0 \cdot M - 0.00901975 \cdot S
</math>

For tritanopia (S-cone deficiency), the S-cone is replaced as:
<math display="block">
S_{\text{simulated}} = -0.15773032 \cdot L + 1.19465634 \cdot M + 0 \cdot S
</math>

These transformations allow accurate simulation of the perceptual experience of individuals with CVD. (The numbers are derived from [5]).

The error between the original and simulated is then mapped into the RGB color space using a deficiency-specific correction matrix, which adjusts the image to enhance contrast and recover lost color differences. The predefined correction matrix is applied to the error in RGB space, transforming it back into LMS space for final adjustments. The corrected LMS values are added back to the original values, producing a recolored image that improves visual accessibility for viewers with CVD. This approach uses the Daltonize-inspired correction matrix:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

==== Optimization-based method ====

=== Deep Learning based ===

==== Task Overview ====
Given an input RGB image and a label for the user (as shown in the figure), we want a deep learning model to output a recolored RGB image that is specific to that user. More details on inputs and outputs are discussed in further sections but an overview is shown in Figure 1. All of the code was done in Python using a deep learning framework called [https://pytorch.org PyTorch]
[[File:Io.png|right|thumb|200px|Figure 1: Dataset]]

==== Types ====
1. ''' Supervised methods ''':
These are deep learning models that require a 'ground truth' recolored image for the neural network to learn recolorization. While these methods are simple, easy to train and integrate the user label, they require an already present ground truth comparison of expected output.

2. ''' Unsupervised methods ''':
These models are trained without a ground truth and can also encode user label information while training. They are generally better at generating more natural images, but they require more compute and sophisticated model architectures or loss functions for the recoloring task

==== Dataset ====
The dataset used for this project was constructed specifically to address the challenges of recoloring images for individuals with color vision deficiency (CVD). We first gathered an open-source RGB image dataset from [2]. To improve the capability of the proposed model to enhance the contrast between CVD-indistinguishable color
pairs, in their study, they created a new dataset consisting of 141,000 pictures of both natural scenes and artificial images containing
CVD-confusing colors without labels. To generate labels (and ground truth recolored images for supervised methods), we randomly sampled 15,000 images and recolored by simulating random labels for severity and type of CVD. The recoloring for ground truth images was done using a [https://github.com/jbhuang0604/RecolorForColorblind/tree/master MATLAB script] (adapted to Python) from [4]. Note: The open-source tools used in the Python version for the recoloring script were [https://scikit-image.org Scikit-Image], [https://scipy.org Scipy] and [https://python-colormath.readthedocs.io/en/latest/ Colormath].

As shown in Figure 1, each sample in the dataset consists of:

1. ''' Original RGB Image''' : High-resolution images, resized to <code> 256x256</code> pixels and normalized to <code>[0,1]</code> range, representing the standard color space.

2. ''' CVD Labels ''' : Condition labels encoded as <code>severity * [protan, deutan]</code>, where severity ranges from 0.1 to 1.0. For example, a label <code>[0.6, 0]</code> corresponds to protanopia at 60% severity.

Data augmentation techniques such as random rotations, crops, and brightness adjustments were applied to expand the dataset, ensuring robust model generalization across diverse scenarios.

==== Supervised Methods ====
===== Conditional Parallel RGB MLP =====
[[File:mlp.png|right|thumb|Figure 2: Conditional MLP architecture]]
As shown in Figure 2, the model predicts the R, G, and B channels separately using an independent multi-layer perceptron (MLP) for each channel. The input image is concatenated with the label encoding along the channel dimension and is passed to 3 parallel MLPs simultaneously. These parallel networks are learned to predicted R, G, B channels of a recolored image based on given ground truth. The outputs from each of these networks are concatenated to produce the recolored RGB image of same spatial dimensions as input. Essentially, each channel is disentangled, enabling targeted adjustments.

The loss function used to train was pixel wise, mean-squared error loss:
<math>
\mathcal{L}_{\text{MSE}} = \frac{1}{N} \sum_{p=1}^{N} \left( I(p) - I'(p) \right)^2
</math>

Where:
* I, I': Recolored (model output) image and ground truth recolored image respectively
* p: Image index
* N: Total number of images

===== Conditional U-Net =====
In a similar fashion of inputs, a convolutional neural network (CNN)-based U-Net architecture was tested to generate a full recolored image as output. The conditional inputs here affect both the encoder and decoder. [[File:Unet condtional.png|right|thumb|Figure 3: Conditional U-Net architecture]]
U-Nets are widely used in computer vision tasks and are very robust to new tasks as well. The architecture we adopted is shown in Figure 3.
The loss function used to train the U-Net was a commonly used VGG Perceptual Loss:
<math>
\mathcal{L}_{\text{VGG}} = \sum_{l} \frac{1}{N_l} \| \phi_l(I) - \phi_l(I') \|_2^2
</math>

Where:
* I and I': are recolored (model output) and ground recolored images respectively
* <math>\phi_l</math> is the l-th of the pre-trained VGG network

==== Unsupervised Methods ====
===== Conditional Autoencoder =====
As shown in Figure4, an unsupervised CNN-based encoder-decoder network was trained to reconstruct full recolored images with a CVD-aware color palette. The key to making this network align with the recoloring task was the loss functions. The loss functions we used to train this network were inspired from [2]. [[File:Ae.png|right|350px|thumb|Figure 4: Conditional Autoencoder architecture]]

The total loss function is given by:
<math>
\mathcal{L}_{\text{total}} = \alpha \cdot \mathcal{L}_{\text{naturalness}} + 2 \cdot (1 - \alpha) \cdot \mathcal{L}_{\text{contrast}}
</math>

Where:
<math>
\mathcal{L}_{\text{contrast}} = \beta \cdot \mathcal{L}_{\text{global}} + (2 - \beta) \cdot \mathcal{L}_{\text{local}}
</math>

The components of the loss functions are described below:

1. '''Global Contrast Loss''':
The global contrast loss ensures that the overall contrast of the recolored image is preserved. It is defined as
<math>
\mathcal{L}_{global} = \frac{1}{\|\omega\|} \sum_{<x, y> \in \epsilon \omega} \text{CL}(x, y)
</math>

2. '''Local Contrast Loss''':
The local contrast loss focuses on preserving the contrast within a small neighborhood around each pixel. <math>
\mathcal{L}_{l} = \frac{1}{N} \sum_{x=1}^{N} \sum_{y \in \omega_x} \frac{\text{CL}(x, y)}{\|\omega_x\|}
</math>

Note:

<math>
\text{CL}(x, y) = \|\hat{c}_x' - \hat{c}_y'\| - \|c_x - c_y\|
</math>

* x,y: Two distinct pixels in the image.
* cx and cy: CVD simulated colors of original image
* c^x′and c^y: CVD simulated colors of recolored image (model output)
* ||w||: Global (or large) window of image
* ||wx||: Local window or neighborhood around a pixel x

3. '''Naturalness Loss''':
The naturalness loss drives output image to have colors that are visually similar and close to natural distributions. <math>
\mathcal{L}_{\text{natural}} = 1 - \text{SSIM}(I', I)
</math>

Where:
* I(i), I'(i): Original and recolored images respectively

== Results ==
=== Mathematical based methods ===
{| class="wikitable"
|+ Table 1: Quantitative Evaluation Results for Mathematical Methods
! !! Method 1 !! Method 2 !! Method 3 !! Method 4
|-
! colspan="5" | Performance
|-
| Time/image || 0.2s || 1m13s || 4.4s || 1.6s
|-
! colspan="5" | SSIM Metrics
|-
| Original vs Recolored || 0.0066 || 0.9998 || 0.9988 || 0.9902
|-
| Original vs Original Simulated || 0.9985 || 0.9985 || 0.9985 || 0.9985
|-
| Recolored vs Recolored Simulated || 0.9565 || 0.9986 || 0.9986 || 0.9968
|-
! colspan="5" | TCC Metrics
|-
| Original vs Recolored || 0.4211 || 0.0001 || 0.0003 || 0.0005
|-
| Original vs Original Simulated || 0.0004 || 0.0003 || 0.0003 || 0.0003
|-
| Recolored vs Recolored Simulated || 0.0380 || 0.0003 || 0.0002 || 0.0005
|-
! colspan="5" | CD ΔE76 Metrics
|-
| Original vs Recolored || 57.4513 || 0.0217 || 0.0632 || 0.1057
|-
| Original vs Original Simulated || 0.0462 || 0.0462 || 0.0462 || 0.0462
|-
| Recolored vs Recolored Simulated || 8.4251 || 0.0458 || 0.0435 || 0.0578
|-
! colspan="5" | CIEDE2000 Metrics
|-
| Original vs Recolored || 41.2667 || 0.0229 || 0.0675 || 0.1312
|-
| Original vs Original Simulated || 0.0681 || 0.0681 || 0.0681 || 0.0681
|-
| Recolored vs Recolored Simulated || 6.9145 || 0.0671 || 0.0630 || 0.0838
|-
! colspan="5" | CIEDE94 Metrics
|-
| Original vs Recolored || 57.3637 || 0.0217 || 0.0630 || 0.1056
|-
| Original vs Original Simulated || 0.0461 || 0.0461 || 0.0461 || 0.0461
|-
| Recolored vs Recolored Simulated || 5.3878 || 0.0457 || 0.0434 || 0.0576
|-
! colspan="5" | D-CIELAB ΔEab Metrics
|-
| Original vs Recolored || 2.1314 || 3.8863 || 7.6867 || 8.0045
|-
| Original vs Original Simulated || 1.7209 || 1.7209 || 1.7209 || 1.7209
|-
| Recolored vs Recolored Simulated || 1.5926 || 1.9673 || 1.4363 || 2.4009
|}

=== Deep Learning based methods ===
The results focus on evaluating the performance of the above neural network architectures—Conditional Parallel RGB MLP, Deep U-Net, and Conditional Autoencoder. Quantitive metrics such as Structural Similarity Index (SSIM), total color contrast (TCC), Chromatic Difference (CD), and inference time were used to assess the effectiveness of the models provided in [1] and [2].

==== Qualitative Results ====
The recolored outputs were visually evaluated to determine their alignment with expected results. The 'expected' results for supervised mean how closely they resemble ground truth recolored image and for unsupervised method mean how much contrast and naturalness is observed in the CVD simulated recolored images compared to original.
The results and takeaways can be summarized as follows:

1. '''Conditional Parallel RGB MLP''': (Figure 5)
[[File:Mlp_res.png|right|400px|thumb|Figure 5 Conditional MLP: Model failure]]
* Recoloring was inconsistent, with visible artifacts in regions where spatial correlations were essential.
* The pixels seemed more discretized, suggesting that disentanglement was not very useful for this case (especially naturalness).
* Failed to preserve natural color transitions, particularly in complex images.
2. '''Conditional U-Net''': (Figure 6, 7)
[[File:Unet_res1.png|right|400px|thumb|Figure 6 Conditional U-Net: Model failure]]
[[File:Unet_res2.png|right|400px|thumb|Figure 7 Conditional U-Net: CVD Simulated examples]]
* Produced stable recoloring, preserving structural details.
* Initially showed improvement towards resembling ground truth, but over time started 'reconstructing' the colors of the original image.
* The CVD simulations of recolored versus original were similar or worse meaning that the model was not doing well for this task
* Sometimes it over-saturated some colors, affecting the visual appeal.
3. '''Conditional Autoencoder''': (Figure 8, 9)
[[File:ae_res1.png|right|400px|thumb|Figure 8 Conditional Autoencoder: Majority good results]]
[[File:ae_res1.png|right|400px|thumb|Figure 9 Conditional Autoencoder: Marginal or negative improvement + Blurriness]]
* Achieved smooth and natural recoloring, with fewer artifacts.
* Showed the highest contrast improvement among the three models.
* In some cases, hurt the contrast in the CVD simulated colors and in some there was marginal improvement in contrast.
* Blurriness in the recolored images was seen (possibly due to naturalness factor being more prioritized even though weight coefficients in the loss term favored contrast (alpha = 0.25, beta = 1.0)).

==== Quantitative Results ====
Based on the above qualitative results, we decided to score and evaluate metrics for comparison with related work only using the Conditional Autoencoder.
As mentioned above, the evaluation metrics are adapted from [1] and [2]. Please refer to the definitions in the paper, as we have used the same. On a high level, the three components are:
* SSIM: Measures the structural similarity between the original and recolored images, ensuring the structural integrity of the recolored image is maintained.
<math>
SSIM(X, Y) = \frac{(2\mu_X\mu_Y + c_1)(2\sigma_{XY} + c_2)}{(\mu_X^2 + \mu_Y^2 + c_1)(\sigma_X^2 + \sigma_Y^2 + c_2)}
</math>

* Total Color Contrast: Quantifies the visibility improvement between indistinguishable colors for CVD individuals.
<math>
TCC = \frac{1}{n_1} \sum_{(i,j) \in \Omega_1} |x_i - x_j|
+ \frac{1}{N \cdot n_2} \sum_{i=1}^{N} \sum_{j \in \Omega_2} |x_i - x_j|
</math>
* Chromatic Difference: Quantifies the perceptual differences in color before and after recoloring, ensuring enhanced distinguishability
<math>
CD(i) = \sqrt{\lambda (l_i' - l_i)^2 + (a_i' - a_i)^2 + (b_i' - b_i)^2}
</math>
(lamda is a constant, not wavelength and l,a,b represent LAB space coordinates of recolored (') and original respectively.)
* Inference Time: Determines the computational efficiency of the models.

The key results are in Table 2 and takeaways for the Conditional Autoencoder can be summarized as follows:

{| class="wikitable" style="text-align:center; width:30%; margin:auto;"
|-
! Metric
! Value
|-
| Inference Time
| 2.6 seconds/image
|-
| SSIM ("Structure")
| 0.8707
|-
| Total Color Contrast ("Distinguishability")
| 0.5771 / (~0.851)*
|-
| Chromatic Difference ("Color")
| 0.3521 / (~0.963)*
|+ '''Table 2: Quantitative Evaluation Results'''
|}

Note: * indicates results from paper [2] for protan/deutan whichever is larger.

* TCC and CD are good but not as good as paper [2] because they use optimize networks for each CVD type separately.
* Blurry (SSIM is not optimized for enough)
* Mixing CVD types in the same network needs to be more sophisticated

== Conclusions ==
Through our (many) experiments, we learned a couple of things:

1. '''Model Effectiveness''':
Among the models, the Conditional Autoencoder showed the best balance between enhancing color contrast and preserving naturalness. It improved the distinguishability of colors for CVD individuals while maintaining a smooth, visually appealing output. However, it produced slightly blurry images, which could be improved with better loss functions or refinement techniques. The Conditional U-Net was also effective in preserving structure and providing stable recoloring, but it required careful training to avoid overfitting. The Conditional Parallel RGB MLP, while computationally fast, lacked the ability to capture spatial relationships between pixels, making it unsuitable for this task.

2. '''Importance of Loss Functions''':
Designing appropriate loss functions was crucial for achieving the right balance between naturalness, contrast enhancement, and structural preservation. The global and local contrast losses significantly improved the visibility of recolored images, while the naturalness loss ensured that the outputs did not look artificial. Incorporating metrics like SSIM and Chromatic Difference into the evaluation also helped us better understand how well the models performed.

3. '''Challenges with Data''':
One of the biggest challenges was ensuring that the dataset effectively represented real-world scenarios for CVD individuals. Simulating CVD perceptions and generating recolored images that matched those perceptions required a well-defined pipeline. A more diverse dataset or additional user studies with CVD participants could help fine-tune the models further.

4. '''Computational Efficiency''':
While models like the Conditional Autoencoder and Conditional U-Net provided high-quality recoloring, their inference times were moderate, making them feasible for real-time applications. Optimizing these models further could make them more scalable for real-world use cases, such as accessibility tools in apps or websites.

5. '''What Worked and What Didn’t''':
* Worked: Contrast enhancement methods using local and global losses were effective in improving visibility for CVD individuals. Transformer-inspired loss functions borrowed from Swin architecture added robustness.
* Didn’t Work: Pixel-wise methods like the Conditional RGB MLP struggled due to their inability to handle spatial dependencies. Additionally, overfitting was a recurring issue in larger architectures without careful training.

6. '''Future Directions''':
* Better Loss Functions: Refining the loss functions to address issues like blurriness in outputs could further improve results.
* User Studies: Testing the models with real CVD participants would provide valuable insights and help validate the results.
* Model Optimization: Reducing the computational cost of high-performing models like the Conditional Autoencoder could make them more practical for deployment.
* Exploration of New Architectures: Trying newer methods, such as lightweight transformers or diffusion-based models, might enhance recoloring performance while maintaining efficiency.

While there’s still room for improvement, our models demonstrated the potential of deep learning in addressing the challenges faced by individuals with CVD. Our future work would focus on refining these methods and bringing them closer to practical, everyday applications.

== References ==
[1] Li, H., Zhang, L., Zhang, X., Zhang, M., Zhu, G., Shen, P., ... & Shah, S. A. A. (2020). Color vision deficiency datasets & recoloring evaluation using GANs. Multimedia Tools and Applications, 79, 27583-27614.

[2] Chen, L., Zhu, Z., Huang, W., Go, K., Chen, X., & Mao, X. (2024). Image recoloring for color vision deficiency compensation using Swin transformer. Neural Computing and Applications, 36(11), 6051-6066.

[3] Jiang, S., Liu, D., Li, D., & Xu, C. (2023). Personalized image generation for color vision deficiency population. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22571-22580).

[4] Huang, J.-B., Chen, C.-S., Jen, T.-C., & Wang, S.-J. (n.d.). Image recolorization for the colorblind [GitHub repository]. Retrieved December 12, 2024, from https://github.com/jbhuang0604/RecolorForColorblind

[5] Dietrich, J. (n.d.). Daltonize Python Package [GitHub repository]. Retrieved December 12, 2024, from https://github.com/joergdietrich/daltonize/blob/main/daltonize/daltonize.py

[6] Dougherty, B., & Wade, A. (2000). Vischeck. Retrieved December 12, 2024, from https://www.vischeck.com/

[7] Brettel, H., Viénot, F., & Mollon, J. D. (1997). Computerized simulation of color appearance for dichromats. Josa a, 14(10), 2647-2655.

[8] Zhu, Z., Toyoura, M., Go, K., Fujishiro, I., Kashiwagi, K., & Mao, X. (2019). Processing images for red–green dichromats compensation via naturalness and information-preservation considered recoloring. The Visual Computer, 35, 1053-1066.

[9] Zhu, Z., Toyoura, M., Go, K., Kashiwagi, K., Fujishiro, I., Wong, T. T., & Mao, X. (2021). Personalized image recoloring for color vision deficiency compensation. IEEE Transactions on Multimedia, 24, 1721-1734.

[10] Tsekouras, G. E., Rigos, A., Chatzistamatis, S., Tsimikas, J., Kotis, K., Caridakis, G., & Anagnostopoulos, C. N. (2021). A novel approach to image recoloring for color vision deficiency. Sensors, 21(8), 2740.

[11] Huang, J. B., Chen, C. S., Jen, T. C., & Wang, S. J. (2009, April). Image recolorization for the colorblind. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 1161-1164). IEEE.

== Appendix I ==
* [https://github.com/rainasong/psych221-aut24-final-project.git Code]
* [https://drive.google.com/drive/folders/10WMXPbtpV7Hy5_qBA_TCEbW-kCpj1D7v Dataset]

=== Additional results ===
1. '''Recolored Images - Conditional Autoencoder'''
<div style="display: inline; width: 220px; float: center;">
[[File:eb_1.png|400 px|Wikipedia encyclopedia]][[File:eb_2.png|400 px]] </div>

2. '''Loss curves'''
<div style="display: inline; width: 800px; float: center;">
[[File:loss_ae.png|300 px|center|thumb|Losses - Conditional Autoencoder]][[File:loss_unet.png|300 px|thumb|center|Losses - Conditional U-Net]][[File:loss_mlp.png|300 px|center|thumb|Losses - Conditional MLP]]</div>

== Appendix II ==
'''Ishikaa''':
* Training, evaluation and visualization for all deep learning methods (MLP, U-Net and Autoencoder)
* GMM recoloring method in Python & adding severity index
* 'Ground Truth' dataset creation and logging
* AWS Compute setup & configuration
* Written Report & Presentation

'''Raina''':

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-13T09:30:00Z

Rainas: /* Confusion lines based Method */

== Introduction ==
Color Vision Deficiency (CVD) affects approximately 350 million individuals worldwide, impairing their ability to distinguish certain colors. Image recoloring for individuals with CVDs has been a well-researched area, with numerous attempts aimed at creating images that make colors more distinguishable for those with CVDs. CVDs are typically classified into three main types: protanopia (difficulty perceiving red), deuteranopia (difficulty perceiving green), and tritanopia (difficulty perceiving blue). These conditions arise due to the absence or malfunction of certain types of cone cells in the retina, which are responsible for color vision. For instance, the lack or defect of red or green cones leads to protanopia or deuteranopia, respectively. This can result in partial or complete loss of perception of specific colors. Moreover, individuals with CVD can be dichromats (completely missing one type of cone cell), or anomalous trichromats (having all three types of cones but with altered sensitivity), causing milder color perception issues. Most research has focused on protanopia and deuteranopia, as these conditions are more prevalent, and only a few consider different severity levels.

In this work, we aim to consider all three types of CVDs, taking into account varying levels of severity for personalization. We explore existing methods in the field and experiment through two main approaches: mathematical transformations and deep learning techniques. We will start by reviewing current advancements in these two domains, followed by presenting our experiments and results. Evaluations of each method will be provided, leading to a discussion of our findings and outlining potential directions for future work.

The motivation for this work arose from a personal experience. While admiring a rainbow, a phenomenon I find among the most beautiful in the world, with a friend who has deuteranopia, I realized that they were unable to distinguish the vibrant array of colors. This experience highlighted the emotional and perceptual gap caused by CVD, inspiring the goal of this project: to develop personalized and efficient tools that enhance color perception for individuals with CVDs. Ultimately, we aim to enable those with CVDs to enjoy the same vivid experiences, such as the beauty of a rainbow, experienced by those with normal color vision.

== Background ==
In recent years, numerous methods have been developed to recolor images for individuals with CVDs, ranging from traditional mathematical approaches to advanced deep learning techniques. This section focuses on the prominent recent works in these two categories.

=== Mathematical-based methods ===
Mathematical approaches to image recoloring for individuals with CVDs have been extensively developed to enhance color discrimination while trying to preserve the natural appearance of images. These methods typically involve color space transformations, optimization techniques, and perceptual modeling to achieve their objectives.

==== Daltonization ====
Daltonization enhances images for individuals with CVD by correcting colors based on the simulated deficiency. The process involves comparing the original LMS values with the simulated deficient values to compute the error:
<math display="block">
\text{Error}_{\text{LMS}} = \text{LMS}_{\text{original}} - \text{LMS}_{\text{simulated}}
</math>

The error is then mapped back to the RGB space using a correction matrix because the error contains the information that dichromats cannot see, and the correction matrix rotates it to a part of the spectrum that they can see. For example, the correction matrix, as implemented in tools like Daltonize [5] and Vischeck [6], is:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

The corrected RGB values are added back to the original LMS values to generate a daltonized image that improves contrast for CVD viewers.

==== Optimization-based Method ====
Zhu et al. [8] introduced an optimization-based recoloring framework for red-green dichromacy, aiming to balance naturalness and contrast. The framework minimizes a total loss function defined as:

<math display="block"> E = \beta E_{\text{nat}} + E_{\text{cont}} </math>

where <math>\beta</math> is a scalar weight that controls the trade-off between the two objectives: naturalness preservation (<math>E_{\text{nat}}</math>) and contrast enhancement (<math>E_{\text{cont}}</math>).

The naturalness term, <math>E_{\text{nat}}</math>, ensures that the recolored image closely resembles the original image for CVD viewers by minimizing perceptual differences:

<math display="block"> E_{\text{nat}} = \sum_{i=1}^N \| c_i^+ - c_i \|^2, </math>

where:
* <math>N</math> is the total number of pixels in the image,
* <math>c_i</math> is the original color of the <math>i</math>-th pixel,
* <math>c_i^+</math> is the recolored value of the <math>i</math>-th pixel,
* <math>\| c_i^+ - c_i \|</math> is the Euclidean distance, measuring the perceptual difference between the original and recolored colors.

The contrast term, <math>E_{\text{cont}}</math>, enhances the distinguishability of colors in the recolored image by minimizing changes in color contrast:

<math display="block"> E_{\text{cont}} = \sum_{i \neq j} \| (c_i^+ - c_j^+) - (c_i - c_j) \|^2, </math>

where:
* <math>(c_i^+ - c_j^+)</math> is the perceived color difference between pixels <math>i</math> and <math>j</math> after recoloring,
* <math>(c_i - c_j)</math> is the original color difference,
* <math>\| (c_i^+ - c_j^+) - (c_i - c_j) \|</math> represents the deviation in color contrast before and after recoloring.

To address the limitations of this approach, Zhu et al. [9] proposed a degree-adaptable framework incorporating a transformation matrix <math>T</math> that simulates CVD perception. The transformation matrix is defined as:

<math display="block"> T = \begin{bmatrix} t_{11} & t_{12} & t_{13} \\ t_{21} & t_{22} & t_{23} \\ t_{31} & t_{32} & t_{33} \end{bmatrix}, </math>

where <math>t_{ij}</math> are the elements representing the relationships between the original and perceived LMS (Long, Medium, Short wavelength) cone responses for individuals with CVD.

The degree-adaptable loss function extends the optimization by adjusting weights based on perceptual importance, defined as:

<math display="block"> E = \beta \sum_{i=1}^N \alpha_i \| T(c_i^+ - c_i) \|^2 + \sum_{i \neq j} \| T(c_i^+ - c_j^+) - T(c_i - c_j) \|^2. </math>

Here:
* <math>\alpha_i</math> assigns weights to each pixel, prioritizing the preservation of colors with smaller perception errors,
* <math>\| T(c_i^+ - c_i) \|</math> measures the perceptual difference after recoloring,
* <math>\| T(c_i^+ - c_j^+) - T(c_i - c_j) \|</math> quantifies the deviation in color contrast under CVD simulation.

This framework improves both contrast and personalization but requires further optimization for real-time performance.

==== Confusion lines based Method ====
Tsekouras et al. [10] proposed a novel image recoloring approach for individuals with protanopia and deuteranopia, focusing on improving color naturalness and enhancing contrast. Their framework consists of four modules, with a key focus on shifting confusing colors along confusion lines in the CIE 1931 chromaticity diagram.

The process begins with fuzzy clustering, which identifies representative colors (key colors) from the input image. These key colors are then analyzed on the chromaticity diagram, where confusion lines—paths representing colors indistinguishable by individuals with CVD—serve as the basis for recoloring. Confusion lines are defined using the copunctal point of the missing cone type and another reference point:

<math display="block">
d(v, L) = \frac{\left|(x_{cp} - x_0)(y_0 - y_v) - (x_0 - x_v)(y_{cp} - y_0)\right|}{\sqrt{(x_{cp} - x_0)^2 + (y_{cp} - y_0)^2}},
</math>

where:
* <math display="inline">v = (x_v, y_v)</math> is the chromaticity coordinate of the color,
* <math display="inline">L</math> is the confusion line passing through the copunctal point <math display="inline">(x_{cp}, y_{cp})</math> and another reference point <math display="inline">(x_0, y_0)</math>,
* <math display="inline">d(v, L)</math> measures the perpendicular distance from the point <math display="inline">v</math> to the confusion line <math display="inline">L</math>.

Confusing colors, identified as key colors lying on occupied confusion lines, are iteratively shifted to the nearest non-occupied confusion lines to enhance discriminability for CVD viewers. High-ranking colors, determined by their prominence in image clusters, are shifted to the nearest unoccupied confusion lines. This reallocation ensures that these colors are distinguishable to viewers with CVD while minimizing disruption to the image's overall color harmony.

After shifting, the luminance of the recolored key colors is optimized using a regularized objective function to balance naturalness and contrast:
<math display="block">
E = (E_1 + E_2) + \lambda E_3,
</math>

where:
* <math display="inline">E</math> is the total loss,
* <math display="inline">\lambda</math> is a weight parameter controlling the trade-off between contrast enhancement and naturalness preservation.

The first term, <math display="inline">E_1</math>, measures contrast enhancement for normal trichromats:

<math display="block">
E_1 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - b_j\| - \|f_D(a_{i,\text{rec}}) - f_D(b_j)\| \right|,
</math>

where:
* <math display="inline">n_A</math> and <math display="inline">n_B</math> are the number of key colors in clusters <math display="inline">A</math> and <math display="inline">B</math>, respectively,
* <math display="inline">a_i</math> is the chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">b_j</math> is the chromaticity of the <math display="inline">j</math>-th key color in cluster <math display="inline">B</math>,
* <math display="inline">f_D</math> is a function simulating the dichromatic vision of individuals with color vision deficiencies,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color.

The second term, <math display="inline">E_2</math>, measures contrast enhancement for dichromats:

<math display="block">
E_2 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - a_j\| - \|f_D(a_{i,\text{rec}}) - f_D(a_{j,\text{rec}})\| \right|,
</math>

where:
* <math display="inline">a_i</math> and <math display="inline">a_j</math> are the chromaticities of the <math display="inline">i</math>-th and <math display="inline">j</math>-th key colors in cluster <math display="inline">A</math>,
* <math display="inline">f_D(a_{i,\text{rec}})</math> simulates the dichromatic perception of the recolored chromaticity <math display="inline">a_{i,\text{rec}}</math>.

The third term, <math display="inline">E_3</math>, preserves the naturalness of the recolored image:

<math display="block">
E_3 = \frac{1}{n_A} \sum_{i=1}^{n_A} \|a_i - a_{i,\text{rec}}\|,
</math>

where:
* <math display="inline">a_i</math> is the original chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">\|a_i - a_{i,\text{rec}}\|</math> is the Euclidean distance between the original and recolored chromaticities, measuring how much the naturalness is preserved.

This method significantly enhances the contrast and naturalness of recolored images by leveraging confusion line geometry and regularized optimization. However, challenges remain in achieving real-time performance and handling cases where shifting may distort the aesthetic quality of the image.

==== GMM-based Method ====
Huang et al. [11] proposed an efficient and effective re-coloring algorithm for individuals with CVD using a Gaussian Mixture Model (GMM) to represent color distributions. The algorithm comprises four main steps: feature extraction, clustering using GMM, optimization of Gaussian components, and interpolation for recoloring.

Step 1 - Feature Extraction:
Each pixel in the input image is represented in the CIEL*a*b* color space, which approximates perceptual differences using the Euclidean distance between colors. The color feature vector <math display="inline">x</math> is used as input for clustering.

Step 2 - Clustering via GMM:
The color distribution of the image is modeled using a GMM with <math display="inline">K</math> Gaussian components:
<math display="block">
p(x|\Theta) = \sum_{i=1}^K \omega_i G_i(x|\theta_i),
</math>
where:
* <math display="inline">\Theta</math> is the parameter set containing all weights, means, and covariance matrices,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian,
* <math display="inline">G_i(x|\theta_i)</math> is the 3D normal distribution with parameters <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix).

Diving into more details, the parameters of the GMM are initialized using the K-means algorithm and refined via the Expectation-Maximization (EM) algorithm, which consists of the E-step and the M-step:

The E-step calculates the probability of each color (or pixel) belonging to a specific Gaussian component in the GMM. This probability, also known as the "responsibility," reflects how much each Gaussian contributes to the representation of a color:

<math display="block">
p(i|x_j, \Theta^{\text{old}}) = \frac{\omega_i G_i(x_j|\theta_i)}{\sum_{k=1}^K \omega_k G_k(x_j|\theta_k)}.
</math>

Here:
* <math display="inline">p(i|x_j, \Theta^{\text{old}})</math> is the probability of the <math display="inline">j</math>-th color feature <math display="inline">x_j</math> belonging to the <math display="inline">i</math>-th Gaussian component,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian, representing its overall contribution to the color distribution,
* <math display="inline">G_i(x_j|\theta_i)</math> is the Gaussian distribution for the <math display="inline">i</math>-th component, evaluated at <math display="inline">x_j</math>, where <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix),
* <math display="inline">\sum_{k=1}^K \omega_k G_k(x_j|\theta_k)</math> normalizes the probabilities by considering the contributions of all <math display="inline">K</math> Gaussians to the <math display="inline">j</math>-th pixel.

This step essentially assigns each pixel a "soft" membership to each Gaussian component, rather than forcing a hard clustering decision. Pixels that are close to a Gaussian's mean (in feature space) will have higher probabilities of belonging to that Gaussian.

The M-step updates the parameters of each Gaussian component based on the probabilities computed in the E-step. These updates refine the Gaussian model to better fit the data:

1. Update the mixing weights:
<math display="block">
\omega_i^{\text{new}} = \frac{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}})}{N},
</math>
This equation calculates the proportion of pixels assigned to the <math display="inline">i</math>-th Gaussian. It reflects how dominant each Gaussian is in representing the color distribution.

2. Update the means:
<math display="block">
\mu_i^{\text{new}} = \frac{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}}) x_j}{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}})},
</math>
This equation computes the new mean vector <math display="inline">\mu_i^{\text{new}}</math> for the <math display="inline">i</math>-th Gaussian. It is a weighted average of all pixel feature vectors <math display="inline">x_j</math>, where the weights are the probabilities <math display="inline">p(i|x_j, \Theta^{\text{old}})</math>. Pixels with higher probabilities contribute more to the new mean.

3. Update the covariance matrices:
<math display="block">
\Sigma_i^{\text{new}} = \frac{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}}) (x_j - \mu_i^{\text{new}})(x_j - \mu_i^{\text{new}})^T}{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}})}.
</math>
This equation calculates the new covariance matrix <math display="inline">\Sigma_i^{\text{new}}</math> for the <math display="inline">i</math>-th Gaussian. It measures the spread of pixel features around the new mean, weighted by the probabilities from the E-step.

Step 3 - Optimization:
To ensure color distinguishability for CVD viewers, the algorithm adjusts the mean vector of each Gaussian component using an optimization function that preserves the symmetric Kullback-Leibler (KL) divergence:
<math display="block">
D_{sKL}(G_i, G_j) = D_{KL}(G_i \| G_j) + D_{KL}(G_j \| G_i),
</math>
where:
* <math display="inline">D_{KL}(G_i \| G_j)</math> measures the dissimilarity between two Gaussian distributions <math display="inline">G_i</math> and <math display="inline">G_j</math>.

The optimization aims to preserve the contrast perceived by CVD viewers while maintaining naturalness. Weights are assigned to Gaussian components based on the perceptual importance of colors:
<math display="block">
\lambda_i = \frac{\sum_{j=1}^N \alpha_j p(i|x_j, \Theta)}{\sum_{k=1}^K \sum_{j=1}^N \alpha_j p(k|x_j, \Theta)},
</math>
where:
* <math display="inline">\alpha_j = \|x_j - \text{Sim}(x_j)\|</math> is the perceptual error of the <math display="inline">j</math>-th color feature when simulated for CVD,
* <math display="inline">\text{Sim}(\cdot)</math> is the simulation function for CVD perception.

Step 4 - Interpolation for Recoloring:
After optimizing the Gaussians, the mapping function <math display="inline">M_i(\cdot)</math> relocates the mean vectors while maintaining covariance matrices. Interpolation ensures smooth transitions between recolored regions:
<math display="block">
T(x_j)_H = x_j^H + \sum_{i=1}^K p(i|x_j, \Theta) (M_i(\mu_i)_H - \mu_i^H),
</math>
where:
* <math display="inline">T(x_j)_H</math> is the hue adjustment for the <math display="inline">j</math>-th color,
* <math display="inline">M_i(\mu_i)_H</math> is the mapped hue of the <math display="inline">i</math>-th Gaussian's mean.

While the GMM-based approach effectively models color distributions and enhances the contrast of recolored images significantly, it has limitations:
* The accuracy of recoloring depends on the choice of <math display="inline">K</math>, which may vary for different images.
* The method assumes diagonal covariance matrices for computational efficiency, which may oversimplify real-world color distributions. Sometimes the colors in the recolored images are not very natural.
* The high computational complexity in the optimization step of this algorithm may be difficult for real-time applications.

=== Deep Learning based methods ===
Conventional methods for recoloring, including optimization-based approaches (as discussed above), fail to generalize well across varying severity levels and CVD types. While these methods improve color differentiation, they frequently compromise naturalness or require extensive computational resources, making them less suitable for real-time, efficient, personalized applications.

==== GAN-Based Recoloring for CVD ====

In [1] GANs (Generative Adversarial Networks) was explored for recoloring, with a backbone Pix2Pix-GAN, Cycle-GAN, and Bicycle-GAN structure showing promising results. These models are generate creative recolored images by learning mappings between normal and CVD-affected color spaces. However, this and existing GAN approaches struggle with balancing naturalness and contrast. This specific reference also requires paired datasets (since it is adapted from style transfer), making it computationally intensive and less suitable for personalization.

==== Swin Transformer Recoloring ====

The authors in [2] introduced a hierarchical vision transformer (SWIN) architecture that processes images through shifted windows, effectively capturing both local and global contextual information. In computer vision, this design generally allows efficient handling of high-resolution images and has been applied to various tasks, including image classification and object detection. Despite its robust performance, this architecture is still computationally intensive and does not inherently account for the specific needs of CVD individuals, as it lacks mechanisms for personalized color adjustments.

==== Personalized CVD-GAN ====

To cater to the diverse needs of the CVD population, the Personalized CVD-GAN [3] was developed. This model generates images that are not only CVD-friendly but also tailored to individual degrees of color vision deficiency. By disentangling color representations using a unique triple-latent structure in their method, continuous personalization was possible to adjust images according to specific CVD severities. While effective, this approach is computationally demanding, making it less practical for real-time applications. In our experiment, it took around 18 days for one epoch (or one iteration over the entire dataset).

Thus, existing methods either lack personalization or are too resource-intensive for widespread use.

== Methods ==
We aim to find effective and efficient ways to recolor images for people with CVD with the personalization of different severity levels. We start by exploring existing methods and identifying opportunities for improvement. Since mathematical-based approaches provide a solid foundation and are well-documented, we began our experiments by testing these methods, as described in the background. We later extended our exploration to deep learning based methods.

=== Mathematical based ===
We explored four main methods, building on the foundational work discussed in the background section.

==== Method 1: Daltonization as a baseline ====
We started with the relatively intuitive Daltonization method, where we adjusted the colors in an image to compensate for color vision deficiencies by simulating how the colors appear to individuals with CVD. This involves computing the difference between the original and simulated color perception in the LMS (Long, Medium, Short wavelength) color space. The calculated error is then corrected and mapped back to the RGB space using a transformation matrix, resulting in a recolored image that enhances color differentiation for viewers with CVD.

The simulation of CVDs relies on the physiology of human vision, particularly the responses of the Long (L), Medium (M), and Short (S) wavelength-sensitive cones in the retina. The LMS color space is derived from the spectral sensitivities of these cones, making it an ideal framework for modeling human color perception.

To simulate CVD, we first transformed colors in RGB color space into the LMS color space using the following linear transformation matrix based on Stockman and Sharpe’s cone fundamentals:
<math display="block">
T_{\text{RGB-to-LMS}} = \begin{bmatrix}
0.3904725 & 0.54990437 & 0.00890159 \\
0.07092586 & 0.96310739 & 0.00135809 \\
0.02314268 & 0.12801221 & 0.93605194
\end{bmatrix}
</math>

For individuals with CVD, the missing cone’s response is replaced by a weighted combination of the remaining two cones. This approach, introduced by Brettel, Viénot, and Mollon (1997) [7], uses specific coefficients derived from cone sensitivities. For example, in protanopia (L-cone deficiency), the L-cone response is approximated using the M- and S-cone responses as:
<math display="block">
L_{\text{simulated}} = 0 \cdot L + 0.90822864 \cdot M + 0.008192 \cdot S
</math>

For deuteranopia (M-cone deficiency), the M-cone is replaced as:
<math display="block">
M_{\text{simulated}} = 1.10104433 \cdot L + 0 \cdot M - 0.00901975 \cdot S
</math>

For tritanopia (S-cone deficiency), the S-cone is replaced as:
<math display="block">
S_{\text{simulated}} = -0.15773032 \cdot L + 1.19465634 \cdot M + 0 \cdot S
</math>

These transformations allow accurate simulation of the perceptual experience of individuals with CVD. (The numbers are derived from [5]).

The error between the original and simulated is then mapped into the RGB color space using a deficiency-specific correction matrix, which adjusts the image to enhance contrast and recover lost color differences. The predefined correction matrix is applied to the error in RGB space, transforming it back into LMS space for final adjustments. The corrected LMS values are added back to the original values, producing a recolored image that improves visual accessibility for viewers with CVD. This approach uses the Daltonize-inspired correction matrix:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

==== Optimization-based method ====

=== Deep Learning based ===

==== Task Overview ====
Given an input RGB image and a label for the user (as shown in the figure), we want a deep learning model to output a recolored RGB image that is specific to that user. More details on inputs and outputs are discussed in further sections but an overview is shown in Figure 1. All of the code was done in Python using a deep learning framework called [https://pytorch.org PyTorch]
[[File:Io.png|right|thumb|200px|Figure 1: Dataset]]

==== Types ====
1. ''' Supervised methods ''':
These are deep learning models that require a 'ground truth' recolored image for the neural network to learn recolorization. While these methods are simple, easy to train and integrate the user label, they require an already present ground truth comparison of expected output.

2. ''' Unsupervised methods ''':
These models are trained without a ground truth and can also encode user label information while training. They are generally better at generating more natural images, but they require more compute and sophisticated model architectures or loss functions for the recoloring task

==== Dataset ====
The dataset used for this project was constructed specifically to address the challenges of recoloring images for individuals with color vision deficiency (CVD). We first gathered an open-source RGB image dataset from [2]. To improve the capability of the proposed model to enhance the contrast between CVD-indistinguishable color
pairs, in their study, they created a new dataset consisting of 141,000 pictures of both natural scenes and artificial images containing
CVD-confusing colors without labels. To generate labels (and ground truth recolored images for supervised methods), we randomly sampled 15,000 images and recolored by simulating random labels for severity and type of CVD. The recoloring for ground truth images was done using a [https://github.com/jbhuang0604/RecolorForColorblind/tree/master MATLAB script] (adapted to Python) from [4]. Note: The open-source tools used in the Python version for the recoloring script were [https://scikit-image.org Scikit-Image], [https://scipy.org Scipy] and [https://python-colormath.readthedocs.io/en/latest/ Colormath].

As shown in Figure 1, each sample in the dataset consists of:

1. ''' Original RGB Image''' : High-resolution images, resized to <code> 256x256</code> pixels and normalized to <code>[0,1]</code> range, representing the standard color space.

2. ''' CVD Labels ''' : Condition labels encoded as <code>severity * [protan, deutan]</code>, where severity ranges from 0.1 to 1.0. For example, a label <code>[0.6, 0]</code> corresponds to protanopia at 60% severity.

Data augmentation techniques such as random rotations, crops, and brightness adjustments were applied to expand the dataset, ensuring robust model generalization across diverse scenarios.

==== Supervised Methods ====
===== Conditional Parallel RGB MLP =====
[[File:mlp.png|right|thumb|Figure 2: Conditional MLP architecture]]
As shown in Figure 2, the model predicts the R, G, and B channels separately using an independent multi-layer perceptron (MLP) for each channel. The input image is concatenated with the label encoding along the channel dimension and is passed to 3 parallel MLPs simultaneously. These parallel networks are learned to predicted R, G, B channels of a recolored image based on given ground truth. The outputs from each of these networks are concatenated to produce the recolored RGB image of same spatial dimensions as input. Essentially, each channel is disentangled, enabling targeted adjustments.

The loss function used to train was pixel wise, mean-squared error loss:
<math>
\mathcal{L}_{\text{MSE}} = \frac{1}{N} \sum_{p=1}^{N} \left( I(p) - I'(p) \right)^2
</math>

Where:
* I, I': Recolored (model output) image and ground truth recolored image respectively
* p: Image index
* N: Total number of images

===== Conditional U-Net =====
In a similar fashion of inputs, a convolutional neural network (CNN)-based U-Net architecture was tested to generate a full recolored image as output. The conditional inputs here affect both the encoder and decoder. [[File:Unet condtional.png|right|thumb|Figure 3: Conditional U-Net architecture]]
U-Nets are widely used in computer vision tasks and are very robust to new tasks as well. The architecture we adopted is shown in Figure 3.
The loss function used to train the U-Net was a commonly used VGG Perceptual Loss:
<math>
\mathcal{L}_{\text{VGG}} = \sum_{l} \frac{1}{N_l} \| \phi_l(I) - \phi_l(I') \|_2^2
</math>

Where:
* I and I': are recolored (model output) and ground recolored images respectively
* <math>\phi_l</math> is the l-th of the pre-trained VGG network

==== Unsupervised Methods ====
===== Conditional Autoencoder =====
As shown in Figure4, an unsupervised CNN-based encoder-decoder network was trained to reconstruct full recolored images with a CVD-aware color palette. The key to making this network align with the recoloring task was the loss functions. The loss functions we used to train this network were inspired from [2]. [[File:Ae.png|right|350px|thumb|Figure 4: Conditional Autoencoder architecture]]

The total loss function is given by:
<math>
\mathcal{L}_{\text{total}} = \alpha \cdot \mathcal{L}_{\text{naturalness}} + 2 \cdot (1 - \alpha) \cdot \mathcal{L}_{\text{contrast}}
</math>

Where:
<math>
\mathcal{L}_{\text{contrast}} = \beta \cdot \mathcal{L}_{\text{global}} + (2 - \beta) \cdot \mathcal{L}_{\text{local}}
</math>

The components of the loss functions are described below:

1. '''Global Contrast Loss''':
The global contrast loss ensures that the overall contrast of the recolored image is preserved. It is defined as
<math>
\mathcal{L}_{global} = \frac{1}{\|\omega\|} \sum_{<x, y> \in \epsilon \omega} \text{CL}(x, y)
</math>

2. '''Local Contrast Loss''':
The local contrast loss focuses on preserving the contrast within a small neighborhood around each pixel. <math>
\mathcal{L}_{l} = \frac{1}{N} \sum_{x=1}^{N} \sum_{y \in \omega_x} \frac{\text{CL}(x, y)}{\|\omega_x\|}
</math>

Note:

<math>
\text{CL}(x, y) = \|\hat{c}_x' - \hat{c}_y'\| - \|c_x - c_y\|
</math>

* x,y: Two distinct pixels in the image.
* cx and cy: CVD simulated colors of original image
* c^x′and c^y: CVD simulated colors of recolored image (model output)
* ||w||: Global (or large) window of image
* ||wx||: Local window or neighborhood around a pixel x

3. '''Naturalness Loss''':
The naturalness loss drives output image to have colors that are visually similar and close to natural distributions. <math>
\mathcal{L}_{\text{natural}} = 1 - \text{SSIM}(I', I)
</math>

Where:
* I(i), I'(i): Original and recolored images respectively

== Results ==
=== Mathematical based methods ===
{| class="wikitable"
|+ Table 1: Quantitative Evaluation Results for Mathematical Methods
! !! Method 1 !! Method 2 !! Method 3 !! Method 4
|-
! colspan="5" | Performance
|-
| Time/image || 0.2s || 1m13s || 4.4s || 1.6s
|-
! colspan="5" | SSIM Metrics
|-
| Original vs Recolored || 0.0066 || 0.9998 || 0.9988 || 0.9902
|-
| Original vs Original Simulated || 0.9985 || 0.9985 || 0.9985 || 0.9985
|-
| Recolored vs Recolored Simulated || 0.9565 || 0.9986 || 0.9986 || 0.9968
|-
! colspan="5" | TCC Metrics
|-
| Original vs Recolored || 0.4211 || 0.0001 || 0.0003 || 0.0005
|-
| Original vs Original Simulated || 0.0004 || 0.0003 || 0.0003 || 0.0003
|-
| Recolored vs Recolored Simulated || 0.0380 || 0.0003 || 0.0002 || 0.0005
|-
! colspan="5" | CD ΔE76 Metrics
|-
| Original vs Recolored || 57.4513 || 0.0217 || 0.0632 || 0.1057
|-
| Original vs Original Simulated || 0.0462 || 0.0462 || 0.0462 || 0.0462
|-
| Recolored vs Recolored Simulated || 8.4251 || 0.0458 || 0.0435 || 0.0578
|-
! colspan="5" | CIEDE2000 Metrics
|-
| Original vs Recolored || 41.2667 || 0.0229 || 0.0675 || 0.1312
|-
| Original vs Original Simulated || 0.0681 || 0.0681 || 0.0681 || 0.0681
|-
| Recolored vs Recolored Simulated || 6.9145 || 0.0671 || 0.0630 || 0.0838
|-
! colspan="5" | CIEDE94 Metrics
|-
| Original vs Recolored || 57.3637 || 0.0217 || 0.0630 || 0.1056
|-
| Original vs Original Simulated || 0.0461 || 0.0461 || 0.0461 || 0.0461
|-
| Recolored vs Recolored Simulated || 5.3878 || 0.0457 || 0.0434 || 0.0576
|-
! colspan="5" | D-CIELAB ΔEab Metrics
|-
| Original vs Recolored || 2.1314 || 3.8863 || 7.6867 || 8.0045
|-
| Original vs Original Simulated || 1.7209 || 1.7209 || 1.7209 || 1.7209
|-
| Recolored vs Recolored Simulated || 1.5926 || 1.9673 || 1.4363 || 2.4009
|}

=== Deep Learning based methods ===
The results focus on evaluating the performance of the above neural network architectures—Conditional Parallel RGB MLP, Deep U-Net, and Conditional Autoencoder. Quantitive metrics such as Structural Similarity Index (SSIM), total color contrast (TCC), Chromatic Difference (CD), and inference time were used to assess the effectiveness of the models provided in [1] and [2].

==== Qualitative Results ====
The recolored outputs were visually evaluated to determine their alignment with expected results. The 'expected' results for supervised mean how closely they resemble ground truth recolored image and for unsupervised method mean how much contrast and naturalness is observed in the CVD simulated recolored images compared to original.
The results and takeaways can be summarized as follows:

1. '''Conditional Parallel RGB MLP''': (Figure 5)
[[File:Mlp_res.png|right|400px|thumb|Figure 5 Conditional MLP: Model failure]]
* Recoloring was inconsistent, with visible artifacts in regions where spatial correlations were essential.
* The pixels seemed more discretized, suggesting that disentanglement was not very useful for this case (especially naturalness).
* Failed to preserve natural color transitions, particularly in complex images.
2. '''Conditional U-Net''': (Figure 6, 7)
[[File:Unet_res1.png|right|400px|thumb|Figure 6 Conditional U-Net: Model failure]]
[[File:Unet_res2.png|right|400px|thumb|Figure 7 Conditional U-Net: CVD Simulated examples]]
* Produced stable recoloring, preserving structural details.
* Initially showed improvement towards resembling ground truth, but over time started 'reconstructing' the colors of the original image.
* The CVD simulations of recolored versus original were similar or worse meaning that the model was not doing well for this task
* Sometimes it over-saturated some colors, affecting the visual appeal.
3. '''Conditional Autoencoder''': (Figure 8, 9)
[[File:ae_res1.png|right|400px|thumb|Figure 8 Conditional Autoencoder: Majority good results]]
[[File:ae_res1.png|right|400px|thumb|Figure 9 Conditional Autoencoder: Marginal or negative improvement + Blurriness]]
* Achieved smooth and natural recoloring, with fewer artifacts.
* Showed the highest contrast improvement among the three models.
* In some cases, hurt the contrast in the CVD simulated colors and in some there was marginal improvement in contrast.
* Blurriness in the recolored images was seen (possibly due to naturalness factor being more prioritized even though weight coefficients in the loss term favored contrast (alpha = 0.25, beta = 1.0)).

==== Quantitative Results ====
Based on the above qualitative results, we decided to score and evaluate metrics for comparison with related work only using the Conditional Autoencoder.
As mentioned above, the evaluation metrics are adapted from [1] and [2]. Please refer to the definitions in the paper, as we have used the same. On a high level, the three components are:
* SSIM: Measures the structural similarity between the original and recolored images, ensuring the structural integrity of the recolored image is maintained.
<math>
SSIM(X, Y) = \frac{(2\mu_X\mu_Y + c_1)(2\sigma_{XY} + c_2)}{(\mu_X^2 + \mu_Y^2 + c_1)(\sigma_X^2 + \sigma_Y^2 + c_2)}
</math>

* Total Color Contrast: Quantifies the visibility improvement between indistinguishable colors for CVD individuals.
<math>
TCC = \frac{1}{n_1} \sum_{(i,j) \in \Omega_1} |x_i - x_j|
+ \frac{1}{N \cdot n_2} \sum_{i=1}^{N} \sum_{j \in \Omega_2} |x_i - x_j|
</math>
* Chromatic Difference: Quantifies the perceptual differences in color before and after recoloring, ensuring enhanced distinguishability
<math>
CD(i) = \sqrt{\lambda (l_i' - l_i)^2 + (a_i' - a_i)^2 + (b_i' - b_i)^2}
</math>
(lamda is a constant, not wavelength and l,a,b represent LAB space coordinates of recolored (') and original respectively.)
* Inference Time: Determines the computational efficiency of the models.

The key results are in Table 2 and takeaways for the Conditional Autoencoder can be summarized as follows:

{| class="wikitable" style="text-align:center; width:30%; margin:auto;"
|-
! Metric
! Value
|-
| Inference Time
| 2.6 seconds/image
|-
| SSIM ("Structure")
| 0.8707
|-
| Total Color Contrast ("Distinguishability")
| 0.5771 / (~0.851)*
|-
| Chromatic Difference ("Color")
| 0.3521 / (~0.963)*
|+ '''Table 2: Quantitative Evaluation Results'''
|}

Note: * indicates results from paper [2] for protan/deutan whichever is larger.

* TCC and CD are good but not as good as paper [2] because they use optimize networks for each CVD type separately.
* Blurry (SSIM is not optimized for enough)
* Mixing CVD types in the same network needs to be more sophisticated

== Conclusions ==
Through our (many) experiments, we learned a couple of things:

1. '''Model Effectiveness''':
Among the models, the Conditional Autoencoder showed the best balance between enhancing color contrast and preserving naturalness. It improved the distinguishability of colors for CVD individuals while maintaining a smooth, visually appealing output. However, it produced slightly blurry images, which could be improved with better loss functions or refinement techniques. The Conditional U-Net was also effective in preserving structure and providing stable recoloring, but it required careful training to avoid overfitting. The Conditional Parallel RGB MLP, while computationally fast, lacked the ability to capture spatial relationships between pixels, making it unsuitable for this task.

2. '''Importance of Loss Functions''':
Designing appropriate loss functions was crucial for achieving the right balance between naturalness, contrast enhancement, and structural preservation. The global and local contrast losses significantly improved the visibility of recolored images, while the naturalness loss ensured that the outputs did not look artificial. Incorporating metrics like SSIM and Chromatic Difference into the evaluation also helped us better understand how well the models performed.

3. '''Challenges with Data''':
One of the biggest challenges was ensuring that the dataset effectively represented real-world scenarios for CVD individuals. Simulating CVD perceptions and generating recolored images that matched those perceptions required a well-defined pipeline. A more diverse dataset or additional user studies with CVD participants could help fine-tune the models further.

4. '''Computational Efficiency''':
While models like the Conditional Autoencoder and Conditional U-Net provided high-quality recoloring, their inference times were moderate, making them feasible for real-time applications. Optimizing these models further could make them more scalable for real-world use cases, such as accessibility tools in apps or websites.

5. '''What Worked and What Didn’t''':
* Worked: Contrast enhancement methods using local and global losses were effective in improving visibility for CVD individuals. Transformer-inspired loss functions borrowed from Swin architecture added robustness.
* Didn’t Work: Pixel-wise methods like the Conditional RGB MLP struggled due to their inability to handle spatial dependencies. Additionally, overfitting was a recurring issue in larger architectures without careful training.

6. '''Future Directions''':
* Better Loss Functions: Refining the loss functions to address issues like blurriness in outputs could further improve results.
* User Studies: Testing the models with real CVD participants would provide valuable insights and help validate the results.
* Model Optimization: Reducing the computational cost of high-performing models like the Conditional Autoencoder could make them more practical for deployment.
* Exploration of New Architectures: Trying newer methods, such as lightweight transformers or diffusion-based models, might enhance recoloring performance while maintaining efficiency.

While there’s still room for improvement, our models demonstrated the potential of deep learning in addressing the challenges faced by individuals with CVD. Our future work would focus on refining these methods and bringing them closer to practical, everyday applications.

== References ==
[1] Li, H., Zhang, L., Zhang, X., Zhang, M., Zhu, G., Shen, P., ... & Shah, S. A. A. (2020). Color vision deficiency datasets & recoloring evaluation using GANs. Multimedia Tools and Applications, 79, 27583-27614.

[2] Chen, L., Zhu, Z., Huang, W., Go, K., Chen, X., & Mao, X. (2024). Image recoloring for color vision deficiency compensation using Swin transformer. Neural Computing and Applications, 36(11), 6051-6066.

[3] Jiang, S., Liu, D., Li, D., & Xu, C. (2023). Personalized image generation for color vision deficiency population. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22571-22580).

[4] Huang, J.-B., Chen, C.-S., Jen, T.-C., & Wang, S.-J. (n.d.). Image recolorization for the colorblind [GitHub repository]. Retrieved December 12, 2024, from https://github.com/jbhuang0604/RecolorForColorblind

[5] Dietrich, J. (n.d.). Daltonize Python Package [GitHub repository]. Retrieved December 12, 2024, from https://github.com/joergdietrich/daltonize/blob/main/daltonize/daltonize.py

[6] Dougherty, B., & Wade, A. (2000). Vischeck. Retrieved December 12, 2024, from https://www.vischeck.com/

[7] Brettel, H., Viénot, F., & Mollon, J. D. (1997). Computerized simulation of color appearance for dichromats. Josa a, 14(10), 2647-2655.

[8] Zhu, Z., Toyoura, M., Go, K., Fujishiro, I., Kashiwagi, K., & Mao, X. (2019). Processing images for red–green dichromats compensation via naturalness and information-preservation considered recoloring. The Visual Computer, 35, 1053-1066.

[9] Zhu, Z., Toyoura, M., Go, K., Kashiwagi, K., Fujishiro, I., Wong, T. T., & Mao, X. (2021). Personalized image recoloring for color vision deficiency compensation. IEEE Transactions on Multimedia, 24, 1721-1734.

[10] Tsekouras, G. E., Rigos, A., Chatzistamatis, S., Tsimikas, J., Kotis, K., Caridakis, G., & Anagnostopoulos, C. N. (2021). A novel approach to image recoloring for color vision deficiency. Sensors, 21(8), 2740.

[11] Huang, J. B., Chen, C. S., Jen, T. C., & Wang, S. J. (2009, April). Image recolorization for the colorblind. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 1161-1164). IEEE.

== Appendix I ==
* [https://github.com/rainasong/psych221-aut24-final-project.git Code]
* [https://drive.google.com/drive/folders/10WMXPbtpV7Hy5_qBA_TCEbW-kCpj1D7v Dataset]

=== Additional results ===
1. '''Recolored Images - Conditional Autoencoder'''
<div style="display: inline; width: 220px; float: center;">
[[File:eb_1.png|400 px|Wikipedia encyclopedia]][[File:eb_2.png|400 px]] </div>

2. '''Loss curves'''
<div style="display: inline; width: 800px; float: center;">
[[File:loss_ae.png|300 px|center|thumb|Losses - Conditional Autoencoder]][[File:loss_unet.png|300 px|thumb|center|Losses - Conditional U-Net]][[File:loss_mlp.png|300 px|center|thumb|Losses - Conditional MLP]]</div>

== Appendix II ==
'''Ishikaa''':
* Training, evaluation and visualization for all deep learning methods (MLP, U-Net and Autoencoder)
* GMM recoloring method in Python & adding severity index
* 'Ground Truth' dataset creation and logging
* AWS Compute setup & configuration
* Written Report & Presentation

'''Raina''':

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-13T09:19:39Z

Rainas:

== Introduction ==
Color Vision Deficiency (CVD) affects approximately 350 million individuals worldwide, impairing their ability to distinguish certain colors. Image recoloring for individuals with CVDs has been a well-researched area, with numerous attempts aimed at creating images that make colors more distinguishable for those with CVDs. CVDs are typically classified into three main types: protanopia (difficulty perceiving red), deuteranopia (difficulty perceiving green), and tritanopia (difficulty perceiving blue). These conditions arise due to the absence or malfunction of certain types of cone cells in the retina, which are responsible for color vision. For instance, the lack or defect of red or green cones leads to protanopia or deuteranopia, respectively. This can result in partial or complete loss of perception of specific colors. Moreover, individuals with CVD can be dichromats (completely missing one type of cone cell), or anomalous trichromats (having all three types of cones but with altered sensitivity), causing milder color perception issues. Most research has focused on protanopia and deuteranopia, as these conditions are more prevalent, and only a few consider different severity levels.

In this work, we aim to consider all three types of CVDs, taking into account varying levels of severity for personalization. We explore existing methods in the field and experiment through two main approaches: mathematical transformations and deep learning techniques. We will start by reviewing current advancements in these two domains, followed by presenting our experiments and results. Evaluations of each method will be provided, leading to a discussion of our findings and outlining potential directions for future work.

The motivation for this work arose from a personal experience. While admiring a rainbow, a phenomenon I find among the most beautiful in the world, with a friend who has deuteranopia, I realized that they were unable to distinguish the vibrant array of colors. This experience highlighted the emotional and perceptual gap caused by CVD, inspiring the goal of this project: to develop personalized and efficient tools that enhance color perception for individuals with CVDs. Ultimately, we aim to enable those with CVDs to enjoy the same vivid experiences, such as the beauty of a rainbow, experienced by those with normal color vision.

== Background ==
In recent years, numerous methods have been developed to recolor images for individuals with CVDs, ranging from traditional mathematical approaches to advanced deep learning techniques. This section focuses on the prominent recent works in these two categories.

=== Mathematical-based methods ===
Mathematical approaches to image recoloring for individuals with CVDs have been extensively developed to enhance color discrimination while trying to preserve the natural appearance of images. These methods typically involve color space transformations, optimization techniques, and perceptual modeling to achieve their objectives.

==== Daltonization ====
Daltonization enhances images for individuals with CVD by correcting colors based on the simulated deficiency. The process involves comparing the original LMS values with the simulated deficient values to compute the error:
<math display="block">
\text{Error}_{\text{LMS}} = \text{LMS}_{\text{original}} - \text{LMS}_{\text{simulated}}
</math>

The error is then mapped back to the RGB space using a correction matrix because the error contains the information that dichromats cannot see, and the correction matrix rotates it to a part of the spectrum that they can see. For example, the correction matrix, as implemented in tools like Daltonize [5] and Vischeck [6], is:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

The corrected RGB values are added back to the original LMS values to generate a daltonized image that improves contrast for CVD viewers.

==== Optimization-based Method ====
Zhu et al. [8] introduced an optimization-based recoloring framework for red-green dichromacy, aiming to balance naturalness and contrast. The framework minimizes a total loss function defined as:

<math display="block"> E = \beta E_{\text{nat}} + E_{\text{cont}} </math>

where <math>\beta</math> is a scalar weight that controls the trade-off between the two objectives: naturalness preservation (<math>E_{\text{nat}}</math>) and contrast enhancement (<math>E_{\text{cont}}</math>).

The naturalness term, <math>E_{\text{nat}}</math>, ensures that the recolored image closely resembles the original image for CVD viewers by minimizing perceptual differences:

<math display="block"> E_{\text{nat}} = \sum_{i=1}^N \| c_i^+ - c_i \|^2, </math>

where:
* <math>N</math> is the total number of pixels in the image,
* <math>c_i</math> is the original color of the <math>i</math>-th pixel,
* <math>c_i^+</math> is the recolored value of the <math>i</math>-th pixel,
* <math>\| c_i^+ - c_i \|</math> is the Euclidean distance, measuring the perceptual difference between the original and recolored colors.

The contrast term, <math>E_{\text{cont}}</math>, enhances the distinguishability of colors in the recolored image by minimizing changes in color contrast:

<math display="block"> E_{\text{cont}} = \sum_{i \neq j} \| (c_i^+ - c_j^+) - (c_i - c_j) \|^2, </math>

where:
* <math>(c_i^+ - c_j^+)</math> is the perceived color difference between pixels <math>i</math> and <math>j</math> after recoloring,
* <math>(c_i - c_j)</math> is the original color difference,
* <math>\| (c_i^+ - c_j^+) - (c_i - c_j) \|</math> represents the deviation in color contrast before and after recoloring.

To address the limitations of this approach, Zhu et al. [9] proposed a degree-adaptable framework incorporating a transformation matrix <math>T</math> that simulates CVD perception. The transformation matrix is defined as:

<math display="block"> T = \begin{bmatrix} t_{11} & t_{12} & t_{13} \\ t_{21} & t_{22} & t_{23} \\ t_{31} & t_{32} & t_{33} \end{bmatrix}, </math>

where <math>t_{ij}</math> are the elements representing the relationships between the original and perceived LMS (Long, Medium, Short wavelength) cone responses for individuals with CVD.

The degree-adaptable loss function extends the optimization by adjusting weights based on perceptual importance, defined as:

<math display="block"> E = \beta \sum_{i=1}^N \alpha_i \| T(c_i^+ - c_i) \|^2 + \sum_{i \neq j} \| T(c_i^+ - c_j^+) - T(c_i - c_j) \|^2. </math>

Here:
* <math>\alpha_i</math> assigns weights to each pixel, prioritizing the preservation of colors with smaller perception errors,
* <math>\| T(c_i^+ - c_i) \|</math> measures the perceptual difference after recoloring,
* <math>\| T(c_i^+ - c_j^+) - T(c_i - c_j) \|</math> quantifies the deviation in color contrast under CVD simulation.

This framework improves both contrast and personalization but requires further optimization for real-time performance.

==== Confusion lines based Method ====
Tsekouras et al. [10] proposed a novel image recoloring approach for individuals with protanopia and deuteranopia, focusing on improving color naturalness and enhancing contrast. Their framework consists of four modules, with a key focus on shifting confusing colors along confusion lines in the CIE 1931 chromaticity diagram.

The method begins with fuzzy clustering to extract representative colors (key colors) from the input image. These colors are mapped onto the CIE 1931 chromaticity diagram, where confusion lines represent loci of colors perceived as identical by individuals with CVD. Confusion lines are defined using the copunctal point of the missing cone type and another reference point:

<math display="block">
d(v, L) = \frac{\left|(x_{cp} - x_0)(y_0 - y_v) - (x_0 - x_v)(y_{cp} - y_0)\right|}{\sqrt{(x_{cp} - x_0)^2 + (y_{cp} - y_0)^2}},
</math>

where:
* <math display="inline">v = (x_v, y_v)</math> is the chromaticity coordinate of the color,
* <math display="inline">L</math> is the confusion line passing through the copunctal point <math display="inline">(x_{cp}, y_{cp})</math> and another reference point <math display="inline">(x_0, y_0)</math>,
* <math display="inline">d(v, L)</math> measures the perpendicular distance from the point <math display="inline">v</math> to the confusion line <math display="inline">L</math>.

Confusing colors, identified as key colors lying on occupied confusion lines, are iteratively shifted to the nearest non-occupied confusion lines to enhance discriminability for CVD viewers. The translation process involves:

1. Ranking key colors by their cluster sizes:
<math display="block">
\text{rank}(v_i) = \frac{|A_i|}{\sum_{j=1}^{n_A}|A_j|},
</math>

where:
* <math display="inline">v_i</math> is the chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">|A_i|</math> is the cardinality (number of pixels) of its associated cluster,
* <math display="inline">n_A</math> is the total number of clusters.

2. Translating the highest-ranked confusing color <math display="inline">v^*</math> to its projection on the nearest non-occupied confusion line:
<math display="block">
v^*_{\text{tr}} = \text{proj}(v^*, L^*),
</math>

where:
* <math display="inline">v^*_{\text{tr}}</math> is the new position of the color <math display="inline">v^*</math> after translation,
* <math display="inline">L^*</math> is the nearest non-occupied confusion line, determined as:
<math display="block">
d(v^*, L^*) = \min_{L \in \text{CL}_D} d(v^*, L).
</math>

3. Updating the sets of confusing colors and non-occupied confusion lines iteratively:
<math display="block">
\Phi_V = \Phi_V - \{v^*\}, \quad \text{CL}_D = \text{CL}_D - \{L^*\}.
</math>

where:
* <math display="inline">\Phi_V</math> is the set of confusing colors,
* <math display="inline">\text{CL}_D</math> is the set of non-occupied confusion lines.

After shifting, the luminance of the recolored key colors is optimized using a regularized objective function to balance naturalness and contrast:
<math display="block">
E = (E_1 + E_2) + \lambda E_3,
</math>

where:
* <math display="inline">E</math> is the total loss,
* <math display="inline">\lambda</math> is a weight parameter controlling the trade-off between contrast enhancement and naturalness preservation.

The first term, <math display="inline">E_1</math>, measures contrast enhancement for normal trichromats:

<math display="block">
E_1 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - b_j\| - \|f_D(a_{i,\text{rec}}) - f_D(b_j)\| \right|,
</math>

where:
* <math display="inline">n_A</math> and <math display="inline">n_B</math> are the number of key colors in clusters <math display="inline">A</math> and <math display="inline">B</math>, respectively,
* <math display="inline">a_i</math> is the chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">b_j</math> is the chromaticity of the <math display="inline">j</math>-th key color in cluster <math display="inline">B</math>,
* <math display="inline">f_D</math> is a function simulating the dichromatic vision of individuals with color vision deficiencies,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color.

The second term, <math display="inline">E_2</math>, measures contrast enhancement for dichromats:

<math display="block">
E_2 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - a_j\| - \|f_D(a_{i,\text{rec}}) - f_D(a_{j,\text{rec}})\| \right|,
</math>

where:
* <math display="inline">a_i</math> and <math display="inline">a_j</math> are the chromaticities of the <math display="inline">i</math>-th and <math display="inline">j</math>-th key colors in cluster <math display="inline">A</math>,
* <math display="inline">f_D(a_{i,\text{rec}})</math> simulates the dichromatic perception of the recolored chromaticity <math display="inline">a_{i,\text{rec}}</math>.

The third term, <math display="inline">E_3</math>, preserves the naturalness of the recolored image:

<math display="block">
E_3 = \frac{1}{n_A} \sum_{i=1}^{n_A} \|a_i - a_{i,\text{rec}}\|,
</math>

where:
* <math display="inline">a_i</math> is the original chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">\|a_i - a_{i,\text{rec}}\|</math> is the Euclidean distance between the original and recolored chromaticities, measuring how much the naturalness is preserved.

This method significantly enhances the contrast and naturalness of recolored images by leveraging confusion line geometry and regularized optimization. However, challenges remain in achieving real-time performance and handling cases where shifting may distort the aesthetic quality of the image.

==== GMM-based Method ====
Huang et al. [11] proposed an efficient and effective re-coloring algorithm for individuals with CVD using a Gaussian Mixture Model (GMM) to represent color distributions. The algorithm comprises four main steps: feature extraction, clustering using GMM, optimization of Gaussian components, and interpolation for recoloring.

Step 1 - Feature Extraction:
Each pixel in the input image is represented in the CIEL*a*b* color space, which approximates perceptual differences using the Euclidean distance between colors. The color feature vector <math display="inline">x</math> is used as input for clustering.

Step 2 - Clustering via GMM:
The color distribution of the image is modeled using a GMM with <math display="inline">K</math> Gaussian components:
<math display="block">
p(x|\Theta) = \sum_{i=1}^K \omega_i G_i(x|\theta_i),
</math>
where:
* <math display="inline">\Theta</math> is the parameter set containing all weights, means, and covariance matrices,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian,
* <math display="inline">G_i(x|\theta_i)</math> is the 3D normal distribution with parameters <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix).

Diving into more details, the parameters of the GMM are initialized using the K-means algorithm and refined via the Expectation-Maximization (EM) algorithm, which consists of the E-step and the M-step:

The E-step calculates the probability of each color (or pixel) belonging to a specific Gaussian component in the GMM. This probability, also known as the "responsibility," reflects how much each Gaussian contributes to the representation of a color:

<math display="block">
p(i|x_j, \Theta^{\text{old}}) = \frac{\omega_i G_i(x_j|\theta_i)}{\sum_{k=1}^K \omega_k G_k(x_j|\theta_k)}.
</math>

Here:
* <math display="inline">p(i|x_j, \Theta^{\text{old}})</math> is the probability of the <math display="inline">j</math>-th color feature <math display="inline">x_j</math> belonging to the <math display="inline">i</math>-th Gaussian component,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian, representing its overall contribution to the color distribution,
* <math display="inline">G_i(x_j|\theta_i)</math> is the Gaussian distribution for the <math display="inline">i</math>-th component, evaluated at <math display="inline">x_j</math>, where <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix),
* <math display="inline">\sum_{k=1}^K \omega_k G_k(x_j|\theta_k)</math> normalizes the probabilities by considering the contributions of all <math display="inline">K</math> Gaussians to the <math display="inline">j</math>-th pixel.

This step essentially assigns each pixel a "soft" membership to each Gaussian component, rather than forcing a hard clustering decision. Pixels that are close to a Gaussian's mean (in feature space) will have higher probabilities of belonging to that Gaussian.

The M-step updates the parameters of each Gaussian component based on the probabilities computed in the E-step. These updates refine the Gaussian model to better fit the data:

1. Update the mixing weights:
<math display="block">
\omega_i^{\text{new}} = \frac{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}})}{N},
</math>
This equation calculates the proportion of pixels assigned to the <math display="inline">i</math>-th Gaussian. It reflects how dominant each Gaussian is in representing the color distribution.

2. Update the means:
<math display="block">
\mu_i^{\text{new}} = \frac{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}}) x_j}{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}})},
</math>
This equation computes the new mean vector <math display="inline">\mu_i^{\text{new}}</math> for the <math display="inline">i</math>-th Gaussian. It is a weighted average of all pixel feature vectors <math display="inline">x_j</math>, where the weights are the probabilities <math display="inline">p(i|x_j, \Theta^{\text{old}})</math>. Pixels with higher probabilities contribute more to the new mean.

3. Update the covariance matrices:
<math display="block">
\Sigma_i^{\text{new}} = \frac{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}}) (x_j - \mu_i^{\text{new}})(x_j - \mu_i^{\text{new}})^T}{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}})}.
</math>
This equation calculates the new covariance matrix <math display="inline">\Sigma_i^{\text{new}}</math> for the <math display="inline">i</math>-th Gaussian. It measures the spread of pixel features around the new mean, weighted by the probabilities from the E-step.

Step 3 - Optimization:
To ensure color distinguishability for CVD viewers, the algorithm adjusts the mean vector of each Gaussian component using an optimization function that preserves the symmetric Kullback-Leibler (KL) divergence:
<math display="block">
D_{sKL}(G_i, G_j) = D_{KL}(G_i \| G_j) + D_{KL}(G_j \| G_i),
</math>
where:
* <math display="inline">D_{KL}(G_i \| G_j)</math> measures the dissimilarity between two Gaussian distributions <math display="inline">G_i</math> and <math display="inline">G_j</math>.

The optimization aims to preserve the contrast perceived by CVD viewers while maintaining naturalness. Weights are assigned to Gaussian components based on the perceptual importance of colors:
<math display="block">
\lambda_i = \frac{\sum_{j=1}^N \alpha_j p(i|x_j, \Theta)}{\sum_{k=1}^K \sum_{j=1}^N \alpha_j p(k|x_j, \Theta)},
</math>
where:
* <math display="inline">\alpha_j = \|x_j - \text{Sim}(x_j)\|</math> is the perceptual error of the <math display="inline">j</math>-th color feature when simulated for CVD,
* <math display="inline">\text{Sim}(\cdot)</math> is the simulation function for CVD perception.

Step 4 - Interpolation for Recoloring:
After optimizing the Gaussians, the mapping function <math display="inline">M_i(\cdot)</math> relocates the mean vectors while maintaining covariance matrices. Interpolation ensures smooth transitions between recolored regions:
<math display="block">
T(x_j)_H = x_j^H + \sum_{i=1}^K p(i|x_j, \Theta) (M_i(\mu_i)_H - \mu_i^H),
</math>
where:
* <math display="inline">T(x_j)_H</math> is the hue adjustment for the <math display="inline">j</math>-th color,
* <math display="inline">M_i(\mu_i)_H</math> is the mapped hue of the <math display="inline">i</math>-th Gaussian's mean.

While the GMM-based approach effectively models color distributions and enhances the contrast of recolored images significantly, it has limitations:
* The accuracy of recoloring depends on the choice of <math display="inline">K</math>, which may vary for different images.
* The method assumes diagonal covariance matrices for computational efficiency, which may oversimplify real-world color distributions. Sometimes the colors in the recolored images are not very natural.
* The high computational complexity in the optimization step of this algorithm may be difficult for real-time applications.

=== Deep Learning based methods ===
Conventional methods for recoloring, including optimization-based approaches (as discussed above), fail to generalize well across varying severity levels and CVD types. While these methods improve color differentiation, they frequently compromise naturalness or require extensive computational resources, making them less suitable for real-time, efficient, personalized applications.

==== GAN-Based Recoloring for CVD ====

In [1] GANs (Generative Adversarial Networks) was explored for recoloring, with a backbone Pix2Pix-GAN, Cycle-GAN, and Bicycle-GAN structure showing promising results. These models are generate creative recolored images by learning mappings between normal and CVD-affected color spaces. However, this and existing GAN approaches struggle with balancing naturalness and contrast. This specific reference also requires paired datasets (since it is adapted from style transfer), making it computationally intensive and less suitable for personalization.

==== Swin Transformer Recoloring ====

The authors in [2] introduced a hierarchical vision transformer (SWIN) architecture that processes images through shifted windows, effectively capturing both local and global contextual information. In computer vision, this design generally allows efficient handling of high-resolution images and has been applied to various tasks, including image classification and object detection. Despite its robust performance, this architecture is still computationally intensive and does not inherently account for the specific needs of CVD individuals, as it lacks mechanisms for personalized color adjustments.

==== Personalized CVD-GAN ====

To cater to the diverse needs of the CVD population, the Personalized CVD-GAN [3] was developed. This model generates images that are not only CVD-friendly but also tailored to individual degrees of color vision deficiency. By disentangling color representations using a unique triple-latent structure in their method, continuous personalization was possible to adjust images according to specific CVD severities. While effective, this approach is computationally demanding, making it less practical for real-time applications. In our experiment, it took around 18 days for one epoch (or one iteration over the entire dataset).

Thus, existing methods either lack personalization or are too resource-intensive for widespread use.

== Methods ==
We aim to find effective and efficient ways to recolor images for people with CVD with the personalization of different severity levels. We start by exploring existing methods and identifying opportunities for improvement. Since mathematical-based approaches provide a solid foundation and are well-documented, we began our experiments by testing these methods, as described in the background. We later extended our exploration to deep learning based methods.

=== Mathematical based ===
We explored four main methods, building on the foundational work discussed in the background section.

==== Method 1: Daltonization as a baseline ====
We started with the relatively intuitive Daltonization method, where we adjusted the colors in an image to compensate for color vision deficiencies by simulating how the colors appear to individuals with CVD. This involves computing the difference between the original and simulated color perception in the LMS (Long, Medium, Short wavelength) color space. The calculated error is then corrected and mapped back to the RGB space using a transformation matrix, resulting in a recolored image that enhances color differentiation for viewers with CVD.

The simulation of CVDs relies on the physiology of human vision, particularly the responses of the Long (L), Medium (M), and Short (S) wavelength-sensitive cones in the retina. The LMS color space is derived from the spectral sensitivities of these cones, making it an ideal framework for modeling human color perception.

To simulate CVD, we first transformed colors in RGB color space into the LMS color space using the following linear transformation matrix based on Stockman and Sharpe’s cone fundamentals:
<math display="block">
T_{\text{RGB-to-LMS}} = \begin{bmatrix}
0.3904725 & 0.54990437 & 0.00890159 \\
0.07092586 & 0.96310739 & 0.00135809 \\
0.02314268 & 0.12801221 & 0.93605194
\end{bmatrix}
</math>

For individuals with CVD, the missing cone’s response is replaced by a weighted combination of the remaining two cones. This approach, introduced by Brettel, Viénot, and Mollon (1997) [7], uses specific coefficients derived from cone sensitivities. For example, in protanopia (L-cone deficiency), the L-cone response is approximated using the M- and S-cone responses as:
<math display="block">
L_{\text{simulated}} = 0 \cdot L + 0.90822864 \cdot M + 0.008192 \cdot S
</math>

For deuteranopia (M-cone deficiency), the M-cone is replaced as:
<math display="block">
M_{\text{simulated}} = 1.10104433 \cdot L + 0 \cdot M - 0.00901975 \cdot S
</math>

For tritanopia (S-cone deficiency), the S-cone is replaced as:
<math display="block">
S_{\text{simulated}} = -0.15773032 \cdot L + 1.19465634 \cdot M + 0 \cdot S
</math>

These transformations allow accurate simulation of the perceptual experience of individuals with CVD. (The numbers are derived from [5]).

The error between the original and simulated is then mapped into the RGB color space using a deficiency-specific correction matrix, which adjusts the image to enhance contrast and recover lost color differences. The predefined correction matrix is applied to the error in RGB space, transforming it back into LMS space for final adjustments. The corrected LMS values are added back to the original values, producing a recolored image that improves visual accessibility for viewers with CVD. This approach uses the Daltonize-inspired correction matrix:
<math display="block"> \text{Correction Matrix} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

==== Optimization-based method ====

=== Deep Learning based ===

==== Task Overview ====
Given an input RGB image and a label for the user (as shown in the figure), we want a deep learning model to output a recolored RGB image that is specific to that user. More details on inputs and outputs are discussed in further sections but an overview is shown in Figure 1. All of the code was done in Python using a deep learning framework called [https://pytorch.org PyTorch]
[[File:Io.png|right|thumb|200px|Figure 1: Dataset]]

==== Types ====
1. ''' Supervised methods ''':
These are deep learning models that require a 'ground truth' recolored image for the neural network to learn recolorization. While these methods are simple, easy to train and integrate the user label, they require an already present ground truth comparison of expected output.

2. ''' Unsupervised methods ''':
These models are trained without a ground truth and can also encode user label information while training. They are generally better at generating more natural images, but they require more compute and sophisticated model architectures or loss functions for the recoloring task

==== Dataset ====
The dataset used for this project was constructed specifically to address the challenges of recoloring images for individuals with color vision deficiency (CVD). We first gathered an open-source RGB image dataset from [2]. To improve the capability of the proposed model to enhance the contrast between CVD-indistinguishable color
pairs, in their study, they created a new dataset consisting of 141,000 pictures of both natural scenes and artificial images containing
CVD-confusing colors without labels. To generate labels (and ground truth recolored images for supervised methods), we randomly sampled 15,000 images and recolored by simulating random labels for severity and type of CVD. The recoloring for ground truth images was done using a [https://github.com/jbhuang0604/RecolorForColorblind/tree/master MATLAB script] (adapted to Python) from [4]. Note: The open-source tools used in the Python version for the recoloring script were [https://scikit-image.org Scikit-Image], [https://scipy.org Scipy] and [https://python-colormath.readthedocs.io/en/latest/ Colormath].

As shown in Figure 1, each sample in the dataset consists of:

1. ''' Original RGB Image''' : High-resolution images, resized to <code> 256x256</code> pixels and normalized to <code>[0,1]</code> range, representing the standard color space.

2. ''' CVD Labels ''' : Condition labels encoded as <code>severity * [protan, deutan]</code>, where severity ranges from 0.1 to 1.0. For example, a label <code>[0.6, 0]</code> corresponds to protanopia at 60% severity.

Data augmentation techniques such as random rotations, crops, and brightness adjustments were applied to expand the dataset, ensuring robust model generalization across diverse scenarios.

==== Supervised Methods ====
===== Conditional Parallel RGB MLP =====
[[File:mlp.png|right|thumb|Figure 2: Conditional MLP architecture]]
As shown in Figure 2, the model predicts the R, G, and B channels separately using an independent multi-layer perceptron (MLP) for each channel. The input image is concatenated with the label encoding along the channel dimension and is passed to 3 parallel MLPs simultaneously. These parallel networks are learned to predicted R, G, B channels of a recolored image based on given ground truth. The outputs from each of these networks are concatenated to produce the recolored RGB image of same spatial dimensions as input. Essentially, each channel is disentangled, enabling targeted adjustments.

The loss function used to train was pixel wise, mean-squared error loss:
<math>
\mathcal{L}_{\text{MSE}} = \frac{1}{N} \sum_{p=1}^{N} \left( I(p) - I'(p) \right)^2
</math>

Where:
* I, I': Recolored (model output) image and ground truth recolored image respectively
* p: Image index
* N: Total number of images

===== Conditional U-Net =====
In a similar fashion of inputs, a convolutional neural network (CNN)-based U-Net architecture was tested to generate a full recolored image as output. The conditional inputs here affect both the encoder and decoder. [[File:Unet condtional.png|right|thumb|Figure 3: Conditional U-Net architecture]]
U-Nets are widely used in computer vision tasks and are very robust to new tasks as well. The architecture we adopted is shown in Figure 3.
The loss function used to train the U-Net was a commonly used VGG Perceptual Loss:
<math>
\mathcal{L}_{\text{VGG}} = \sum_{l} \frac{1}{N_l} \| \phi_l(I) - \phi_l(I') \|_2^2
</math>

Where:
* I and I': are recolored (model output) and ground recolored images respectively
* <math>\phi_l</math> is the l-th of the pre-trained VGG network

==== Unsupervised Methods ====
===== Conditional Autoencoder =====
As shown in Figure4, an unsupervised CNN-based encoder-decoder network was trained to reconstruct full recolored images with a CVD-aware color palette. The key to making this network align with the recoloring task was the loss functions. The loss functions we used to train this network were inspired from [2]. [[File:Ae.png|right|350px|thumb|Figure 4: Conditional Autoencoder architecture]]

The total loss function is given by:
<math>
\mathcal{L}_{\text{total}} = \alpha \cdot \mathcal{L}_{\text{naturalness}} + 2 \cdot (1 - \alpha) \cdot \mathcal{L}_{\text{contrast}}
</math>

Where:
<math>
\mathcal{L}_{\text{contrast}} = \beta \cdot \mathcal{L}_{\text{global}} + (2 - \beta) \cdot \mathcal{L}_{\text{local}}
</math>

The components of the loss functions are described below:

1. '''Global Contrast Loss''':
The global contrast loss ensures that the overall contrast of the recolored image is preserved. It is defined as
<math>
\mathcal{L}_{global} = \frac{1}{\|\omega\|} \sum_{<x, y> \in \epsilon \omega} \text{CL}(x, y)
</math>

2. '''Local Contrast Loss''':
The local contrast loss focuses on preserving the contrast within a small neighborhood around each pixel. <math>
\mathcal{L}_{l} = \frac{1}{N} \sum_{x=1}^{N} \sum_{y \in \omega_x} \frac{\text{CL}(x, y)}{\|\omega_x\|}
</math>

Note:

<math>
\text{CL}(x, y) = \|\hat{c}_x' - \hat{c}_y'\| - \|c_x - c_y\|
</math>

* x,y: Two distinct pixels in the image.
* cx and cy: CVD simulated colors of original image
* c^x′and c^y: CVD simulated colors of recolored image (model output)
* ||w||: Global (or large) window of image
* ||wx||: Local window or neighborhood around a pixel x

3. '''Naturalness Loss''':
The naturalness loss drives output image to have colors that are visually similar and close to natural distributions. <math>
\mathcal{L}_{\text{natural}} = 1 - \text{SSIM}(I', I)
</math>

Where:
* I(i), I'(i): Original and recolored images respectively

== Results ==
=== Mathematical based methods ===
{| class="wikitable"
|+ Table 1: Quantitative Evaluation Results for Mathematical Methods
! !! Method 1 !! Method 2 !! Method 3 !! Method 4
|-
! colspan="5" | Performance
|-
| Time/image || 0.2s || 1m13s || 4.4s || 1.6s
|-
! colspan="5" | SSIM Metrics
|-
| Original vs Recolored || 0.0066 || 0.9998 || 0.9988 || 0.9902
|-
| Original vs Original Simulated || 0.9985 || 0.9985 || 0.9985 || 0.9985
|-
| Recolored vs Recolored Simulated || 0.9565 || 0.9986 || 0.9986 || 0.9968
|-
! colspan="5" | TCC Metrics
|-
| Original vs Recolored || 0.4211 || 0.0001 || 0.0003 || 0.0005
|-
| Original vs Original Simulated || 0.0004 || 0.0003 || 0.0003 || 0.0003
|-
| Recolored vs Recolored Simulated || 0.0380 || 0.0003 || 0.0002 || 0.0005
|-
! colspan="5" | CD ΔE76 Metrics
|-
| Original vs Recolored || 57.4513 || 0.0217 || 0.0632 || 0.1057
|-
| Original vs Original Simulated || 0.0462 || 0.0462 || 0.0462 || 0.0462
|-
| Recolored vs Recolored Simulated || 8.4251 || 0.0458 || 0.0435 || 0.0578
|-
! colspan="5" | CIEDE2000 Metrics
|-
| Original vs Recolored || 41.2667 || 0.0229 || 0.0675 || 0.1312
|-
| Original vs Original Simulated || 0.0681 || 0.0681 || 0.0681 || 0.0681
|-
| Recolored vs Recolored Simulated || 6.9145 || 0.0671 || 0.0630 || 0.0838
|-
! colspan="5" | CIEDE94 Metrics
|-
| Original vs Recolored || 57.3637 || 0.0217 || 0.0630 || 0.1056
|-
| Original vs Original Simulated || 0.0461 || 0.0461 || 0.0461 || 0.0461
|-
| Recolored vs Recolored Simulated || 5.3878 || 0.0457 || 0.0434 || 0.0576
|-
! colspan="5" | D-CIELAB ΔEab Metrics
|-
| Original vs Recolored || 2.1314 || 3.8863 || 7.6867 || 8.0045
|-
| Original vs Original Simulated || 1.7209 || 1.7209 || 1.7209 || 1.7209
|-
| Recolored vs Recolored Simulated || 1.5926 || 1.9673 || 1.4363 || 2.4009
|}

=== Deep Learning based methods ===
The results focus on evaluating the performance of the above neural network architectures—Conditional Parallel RGB MLP, Deep U-Net, and Conditional Autoencoder. Quantitive metrics such as Structural Similarity Index (SSIM), total color contrast (TCC), Chromatic Difference (CD), and inference time were used to assess the effectiveness of the models provided in [1] and [2].

==== Qualitative Results ====
The recolored outputs were visually evaluated to determine their alignment with expected results. The 'expected' results for supervised mean how closely they resemble ground truth recolored image and for unsupervised method mean how much contrast and naturalness is observed in the CVD simulated recolored images compared to original.
The results and takeaways can be summarized as follows:

1. '''Conditional Parallel RGB MLP''': (Figure 5)
[[File:Mlp_res.png|right|400px|thumb|Figure 5 Conditional MLP: Model failure]]
* Recoloring was inconsistent, with visible artifacts in regions where spatial correlations were essential.
* The pixels seemed more discretized, suggesting that disentanglement was not very useful for this case (especially naturalness).
* Failed to preserve natural color transitions, particularly in complex images.
2. '''Conditional U-Net''': (Figure 6, 7)
[[File:Unet_res1.png|right|400px|thumb|Figure 6 Conditional U-Net: Model failure]]
[[File:Unet_res2.png|right|400px|thumb|Figure 7 Conditional U-Net: CVD Simulated examples]]
* Produced stable recoloring, preserving structural details.
* Initially showed improvement towards resembling ground truth, but over time started 'reconstructing' the colors of the original image.
* The CVD simulations of recolored versus original were similar or worse meaning that the model was not doing well for this task
* Sometimes it over-saturated some colors, affecting the visual appeal.
3. '''Conditional Autoencoder''': (Figure 8, 9)
[[File:ae_res1.png|right|400px|thumb|Figure 8 Conditional Autoencoder: Majority good results]]
[[File:ae_res1.png|right|400px|thumb|Figure 9 Conditional Autoencoder: Marginal or negative improvement + Blurriness]]
* Achieved smooth and natural recoloring, with fewer artifacts.
* Showed the highest contrast improvement among the three models.
* In some cases, hurt the contrast in the CVD simulated colors and in some there was marginal improvement in contrast.
* Blurriness in the recolored images was seen (possibly due to naturalness factor being more prioritized even though weight coefficients in the loss term favored contrast (alpha = 0.25, beta = 1.0)).

==== Quantitative Results ====
Based on the above qualitative results, we decided to score and evaluate metrics for comparison with related work only using the Conditional Autoencoder.
As mentioned above, the evaluation metrics are adapted from [1] and [2]. Please refer to the definitions in the paper, as we have used the same. On a high level, the three components are:
* SSIM: Measures the structural similarity between the original and recolored images, ensuring the structural integrity of the recolored image is maintained.
<math>
SSIM(X, Y) = \frac{(2\mu_X\mu_Y + c_1)(2\sigma_{XY} + c_2)}{(\mu_X^2 + \mu_Y^2 + c_1)(\sigma_X^2 + \sigma_Y^2 + c_2)}
</math>

* Total Color Contrast: Quantifies the visibility improvement between indistinguishable colors for CVD individuals.
<math>
TCC = \frac{1}{n_1} \sum_{(i,j) \in \Omega_1} |x_i - x_j|
+ \frac{1}{N \cdot n_2} \sum_{i=1}^{N} \sum_{j \in \Omega_2} |x_i - x_j|
</math>
* Chromatic Difference: Quantifies the perceptual differences in color before and after recoloring, ensuring enhanced distinguishability
<math>
CD(i) = \sqrt{\lambda (l_i' - l_i)^2 + (a_i' - a_i)^2 + (b_i' - b_i)^2}
</math>
(lamda is a constant, not wavelength and l,a,b represent LAB space coordinates of recolored (') and original respectively.)
* Inference Time: Determines the computational efficiency of the models.

The key results are in Table 2 and takeaways for the Conditional Autoencoder can be summarized as follows:

{| class="wikitable" style="text-align:center; width:30%; margin:auto;"
|-
! Metric
! Value
|-
| Inference Time
| 2.6 seconds/image
|-
| SSIM ("Structure")
| 0.8707
|-
| Total Color Contrast ("Distinguishability")
| 0.5771 / (~0.851)*
|-
| Chromatic Difference ("Color")
| 0.3521 / (~0.963)*
|+ '''Table 2: Quantitative Evaluation Results'''
|}

Note: * indicates results from paper [2] for protan/deutan whichever is larger.

* TCC and CD are good but not as good as paper [2] because they use optimize networks for each CVD type separately.
* Blurry (SSIM is not optimized for enough)
* Mixing CVD types in the same network needs to be more sophisticated

== Conclusions ==
Through our (many) experiments, we learned a couple of things:

1. '''Model Effectiveness''':
Among the models, the Conditional Autoencoder showed the best balance between enhancing color contrast and preserving naturalness. It improved the distinguishability of colors for CVD individuals while maintaining a smooth, visually appealing output. However, it produced slightly blurry images, which could be improved with better loss functions or refinement techniques. The Conditional U-Net was also effective in preserving structure and providing stable recoloring, but it required careful training to avoid overfitting. The Conditional Parallel RGB MLP, while computationally fast, lacked the ability to capture spatial relationships between pixels, making it unsuitable for this task.

2. '''Importance of Loss Functions''':
Designing appropriate loss functions was crucial for achieving the right balance between naturalness, contrast enhancement, and structural preservation. The global and local contrast losses significantly improved the visibility of recolored images, while the naturalness loss ensured that the outputs did not look artificial. Incorporating metrics like SSIM and Chromatic Difference into the evaluation also helped us better understand how well the models performed.

3. '''Challenges with Data''':
One of the biggest challenges was ensuring that the dataset effectively represented real-world scenarios for CVD individuals. Simulating CVD perceptions and generating recolored images that matched those perceptions required a well-defined pipeline. A more diverse dataset or additional user studies with CVD participants could help fine-tune the models further.

4. '''Computational Efficiency''':
While models like the Conditional Autoencoder and Conditional U-Net provided high-quality recoloring, their inference times were moderate, making them feasible for real-time applications. Optimizing these models further could make them more scalable for real-world use cases, such as accessibility tools in apps or websites.

5. '''What Worked and What Didn’t''':
* Worked: Contrast enhancement methods using local and global losses were effective in improving visibility for CVD individuals. Transformer-inspired loss functions borrowed from Swin architecture added robustness.
* Didn’t Work: Pixel-wise methods like the Conditional RGB MLP struggled due to their inability to handle spatial dependencies. Additionally, overfitting was a recurring issue in larger architectures without careful training.

6. '''Future Directions''':
* Better Loss Functions: Refining the loss functions to address issues like blurriness in outputs could further improve results.
* User Studies: Testing the models with real CVD participants would provide valuable insights and help validate the results.
* Model Optimization: Reducing the computational cost of high-performing models like the Conditional Autoencoder could make them more practical for deployment.
* Exploration of New Architectures: Trying newer methods, such as lightweight transformers or diffusion-based models, might enhance recoloring performance while maintaining efficiency.

While there’s still room for improvement, our models demonstrated the potential of deep learning in addressing the challenges faced by individuals with CVD. Our future work would focus on refining these methods and bringing them closer to practical, everyday applications.

== References ==
[1] Li, H., Zhang, L., Zhang, X., Zhang, M., Zhu, G., Shen, P., ... & Shah, S. A. A. (2020). Color vision deficiency datasets & recoloring evaluation using GANs. Multimedia Tools and Applications, 79, 27583-27614.

[2] Chen, L., Zhu, Z., Huang, W., Go, K., Chen, X., & Mao, X. (2024). Image recoloring for color vision deficiency compensation using Swin transformer. Neural Computing and Applications, 36(11), 6051-6066.

[3] Jiang, S., Liu, D., Li, D., & Xu, C. (2023). Personalized image generation for color vision deficiency population. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22571-22580).

[4] Huang, J.-B., Chen, C.-S., Jen, T.-C., & Wang, S.-J. (n.d.). Image recolorization for the colorblind [GitHub repository]. Retrieved December 12, 2024, from https://github.com/jbhuang0604/RecolorForColorblind

[5] Dietrich, J. (n.d.). Daltonize Python Package [GitHub repository]. Retrieved December 12, 2024, from https://github.com/joergdietrich/daltonize/blob/main/daltonize/daltonize.py

[6] Dougherty, B., & Wade, A. (2000). Vischeck. Retrieved December 12, 2024, from https://www.vischeck.com/

[7] Brettel, H., Viénot, F., & Mollon, J. D. (1997). Computerized simulation of color appearance for dichromats. Josa a, 14(10), 2647-2655.

[8] Zhu, Z., Toyoura, M., Go, K., Fujishiro, I., Kashiwagi, K., & Mao, X. (2019). Processing images for red–green dichromats compensation via naturalness and information-preservation considered recoloring. The Visual Computer, 35, 1053-1066.

[9] Zhu, Z., Toyoura, M., Go, K., Kashiwagi, K., Fujishiro, I., Wong, T. T., & Mao, X. (2021). Personalized image recoloring for color vision deficiency compensation. IEEE Transactions on Multimedia, 24, 1721-1734.

[10] Tsekouras, G. E., Rigos, A., Chatzistamatis, S., Tsimikas, J., Kotis, K., Caridakis, G., & Anagnostopoulos, C. N. (2021). A novel approach to image recoloring for color vision deficiency. Sensors, 21(8), 2740.

[11] Huang, J. B., Chen, C. S., Jen, T. C., & Wang, S. J. (2009, April). Image recolorization for the colorblind. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 1161-1164). IEEE.

== Appendix I ==
* [https://github.com/rainasong/psych221-aut24-final-project.git Code]
* [https://drive.google.com/drive/folders/10WMXPbtpV7Hy5_qBA_TCEbW-kCpj1D7v Dataset]

=== Additional results ===
1. '''Recolored Images - Conditional Autoencoder'''
<div style="display: inline; width: 220px; float: center;">
[[File:eb_1.png|400 px|Wikipedia encyclopedia]][[File:eb_2.png|400 px]] </div>

2. '''Loss curves'''
<div style="display: inline; width: 800px; float: center;">
[[File:loss_ae.png|300 px|center|thumb|Losses - Conditional Autoencoder]][[File:loss_unet.png|300 px|thumb|center|Losses - Conditional U-Net]][[File:loss_mlp.png|300 px|center|thumb|Losses - Conditional MLP]]</div>

== Appendix II ==
'''Ishikaa''':
* Training, evaluation and visualization for all deep learning methods (MLP, U-Net and Autoencoder)
* GMM recoloring method in Python & adding severity index
* 'Ground Truth' dataset creation and logging
* AWS Compute setup & configuration
* Written Report & Presentation

'''Raina''':

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-13T06:33:58Z

Rainas:

== Introduction ==
Color Vision Deficiency (CVD) affects approximately 350 million individuals worldwide, impairing their ability to distinguish certain colors. Image recoloring for individuals with CVDs has been a well-researched area, with numerous attempts aimed at creating images that make colors more distinguishable for those with CVDs. CVDs are typically classified into three main types: protanopia (difficulty perceiving red), deuteranopia (difficulty perceiving green), and tritanopia (difficulty perceiving blue). These conditions arise due to the absence or malfunction of certain types of cone cells in the retina, which are responsible for color vision. For instance, the lack or defect of red or green cones leads to protanopia or deuteranopia, respectively. This can result in partial or complete loss of perception of specific colors. Moreover, individuals with CVD can be dichromats (completely missing one type of cone cell), or anomalous trichromats (having all three types of cones but with altered sensitivity), causing milder color perception issues. Most research has focused on protanopia and deuteranopia, as these conditions are more prevalent, and only a few consider different severity levels.

In this work, we aim to consider all three types of CVDs, taking into account varying levels of severity for personalization. We explore existing methods in the field and experiment through two main approaches: mathematical transformations and deep learning techniques. We will start by reviewing current advancements in these two domains, followed by presenting our experiments and results. Evaluations of each method will be provided, leading to a discussion of our findings and outlining potential directions for future work.

The motivation for this work arose from a personal experience. While admiring a rainbow, a phenomenon I find among the most beautiful in the world, with a friend who has deuteranopia, I realized that they were unable to distinguish the vibrant array of colors. This experience highlighted the emotional and perceptual gap caused by CVD, inspiring the goal of this project: to develop personalized and efficient tools that enhance color perception for individuals with CVDs. Ultimately, we aim to enable those with CVDs to enjoy the same vivid experiences, such as the beauty of a rainbow, experienced by those with normal color vision.

== Background ==
In recent years, numerous methods have been developed to recolor images for individuals with CVDs, ranging from traditional mathematical approaches to advanced deep learning techniques. This section focuses on the prominent recent works in these two categories.

=== Mathematical-based methods ===
Mathematical approaches to image recoloring for individuals with CVDs have been extensively developed to enhance color discrimination while trying to preserve the natural appearance of images. These methods typically involve color space transformations, optimization techniques, and perceptual modeling to achieve their objectives.

==== Daltonization ====
Daltonization enhances images for individuals with CVD by correcting colors based on the simulated deficiency. The process involves comparing the original LMS values with the simulated deficient values to compute the error:
<math display="block">
\text{Error}_{\text{LMS}} = \text{LMS}_{\text{original}} - \text{LMS}_{\text{simulated}}
</math>

The error is then mapped back to the RGB space using a correction matrix. For example, the correction matrix for protanopia, as implemented in tools like Vischeck [6], is:
<math display="block"> \text{Correction Matrix for Protanopia} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

The corrected RGB values are added back to the original LMS values to generate a daltonized image that improves contrast for CVD viewers.

The simulation of CVDs relies on the physiology of human vision, particularly the responses of the Long (L), Medium (M), and Short (S) wavelength-sensitive cones in the retina. The LMS color space is derived from the spectral sensitivities of these cones, making it an ideal framework for modeling human color perception.

To simulate CVD, colors are first transformed into the LMS color space using the following linear transformation matrix based on Stockman and Sharpe’s cone fundamentals:
<math display="block">
T_{\text{RGB-to-LMS}} = \begin{bmatrix}
0.3904725 & 0.54990437 & 0.00890159 \\
0.07092586 & 0.96310739 & 0.00135809 \\
0.02314268 & 0.12801221 & 0.93605194
\end{bmatrix}
</math>

For individuals with CVD, the missing cone’s response is replaced by a weighted combination of the remaining two cones. This approach, introduced by Brettel, Viénot, and Mollon (1997) [7], uses specific coefficients derived from cone sensitivities. For example, in protanopia (L-cone deficiency), the L-cone response is approximated using the M- and S-cone responses as:
<math display="block">
L_{\text{simulated}} = 0 \cdot L + 0.90822864 \cdot M + 0.008192 \cdot S
</math>

For deuteranopia (M-cone deficiency), the M-cone is replaced as:
<math display="block">
M_{\text{simulated}} = 1.10104433 \cdot L + 0 \cdot M - 0.00901975 \cdot S
</math>

For tritanopia (S-cone deficiency), the S-cone is replaced as:
<math display="block">
S_{\text{simulated}} = -0.15773032 \cdot L + 1.19465634 \cdot M + 0 \cdot S
</math>

These transformations allow accurate simulation of the perceptual experience of individuals with CVD. (The numbers are derived from [5]).

==== Optimization-based Method ====
Zhu et al. [8] introduced an optimization-based recoloring framework for red-green dichromacy, aiming to balance naturalness and contrast. The framework minimizes a total loss function defined as:

<math display="block"> E = \beta E_{\text{nat}} + E_{\text{cont}} </math>

where <math>\beta</math> is a scalar weight that controls the trade-off between the two objectives: naturalness preservation (<math>E_{\text{nat}}</math>) and contrast enhancement (<math>E_{\text{cont}}</math>).

The naturalness term, <math>E_{\text{nat}}</math>, ensures that the recolored image closely resembles the original image for CVD viewers by minimizing perceptual differences:

<math display="block"> E_{\text{nat}} = \sum_{i=1}^N \| c_i^+ - c_i \|^2, </math>

where:
* <math>N</math> is the total number of pixels in the image,
* <math>c_i</math> is the original color of the <math>i</math>-th pixel,
* <math>c_i^+</math> is the recolored value of the <math>i</math>-th pixel,
* <math>\| c_i^+ - c_i \|</math> is the Euclidean distance, measuring the perceptual difference between the original and recolored colors.

The contrast term, <math>E_{\text{cont}}</math>, enhances the distinguishability of colors in the recolored image by minimizing changes in color contrast:

<math display="block"> E_{\text{cont}} = \sum_{i \neq j} \| (c_i^+ - c_j^+) - (c_i - c_j) \|^2, </math>

where:
* <math>(c_i^+ - c_j^+)</math> is the perceived color difference between pixels <math>i</math> and <math>j</math> after recoloring,
* <math>(c_i - c_j)</math> is the original color difference,
* <math>\| (c_i^+ - c_j^+) - (c_i - c_j) \|</math> represents the deviation in color contrast before and after recoloring.

To address the limitations of this approach, Zhu et al. [9] proposed a degree-adaptable framework incorporating a transformation matrix <math>T</math> that simulates CVD perception. The transformation matrix is defined as:

<math display="block"> T = \begin{bmatrix} t_{11} & t_{12} & t_{13} \\ t_{21} & t_{22} & t_{23} \\ t_{31} & t_{32} & t_{33} \end{bmatrix}, </math>

where <math>t_{ij}</math> are the elements representing the relationships between the original and perceived LMS (Long, Medium, Short wavelength) cone responses for individuals with CVD.

The degree-adaptable loss function extends the optimization by adjusting weights based on perceptual importance, defined as:

<math display="block"> E = \beta \sum_{i=1}^N \alpha_i \| T(c_i^+ - c_i) \|^2 + \sum_{i \neq j} \| T(c_i^+ - c_j^+) - T(c_i - c_j) \|^2. </math>

Here:
* <math>\alpha_i</math> assigns weights to each pixel, prioritizing the preservation of colors with smaller perception errors,
* <math>\| T(c_i^+ - c_i) \|</math> measures the perceptual difference after recoloring,
* <math>\| T(c_i^+ - c_j^+) - T(c_i - c_j) \|</math> quantifies the deviation in color contrast under CVD simulation.

This framework improves both contrast and personalization but requires further optimization for real-time performance.

==== Confusion lines based Method ====
Tsekouras et al. [10] proposed a novel image recoloring approach for individuals with protanopia and deuteranopia, focusing on improving color naturalness and enhancing contrast. Their framework consists of four modules, with a key focus on shifting confusing colors along confusion lines in the CIE 1931 chromaticity diagram.

The method begins with fuzzy clustering to extract representative colors (key colors) from the input image. These colors are mapped onto the CIE 1931 chromaticity diagram, where confusion lines represent loci of colors perceived as identical by individuals with CVD. Confusion lines are defined using the copunctal point of the missing cone type and another reference point:

<math display="block">
d(v, L) = \frac{\left|(x_{cp} - x_0)(y_0 - y_v) - (x_0 - x_v)(y_{cp} - y_0)\right|}{\sqrt{(x_{cp} - x_0)^2 + (y_{cp} - y_0)^2}},
</math>

where:
* <math display="inline">v = (x_v, y_v)</math> is the chromaticity coordinate of the color,
* <math display="inline">L</math> is the confusion line passing through the copunctal point <math display="inline">(x_{cp}, y_{cp})</math> and another reference point <math display="inline">(x_0, y_0)</math>,
* <math display="inline">d(v, L)</math> measures the perpendicular distance from the point <math display="inline">v</math> to the confusion line <math display="inline">L</math>.

Confusing colors, identified as key colors lying on occupied confusion lines, are iteratively shifted to the nearest non-occupied confusion lines to enhance discriminability for CVD viewers. The translation process involves:

1. Ranking key colors by their cluster sizes:
<math display="block">
\text{rank}(v_i) = \frac{|A_i|}{\sum_{j=1}^{n_A}|A_j|},
</math>

where:
* <math display="inline">v_i</math> is the chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">|A_i|</math> is the cardinality (number of pixels) of its associated cluster,
* <math display="inline">n_A</math> is the total number of clusters.

2. Translating the highest-ranked confusing color <math display="inline">v^*</math> to its projection on the nearest non-occupied confusion line:
<math display="block">
v^*_{\text{tr}} = \text{proj}(v^*, L^*),
</math>

where:
* <math display="inline">v^*_{\text{tr}}</math> is the new position of the color <math display="inline">v^*</math> after translation,
* <math display="inline">L^*</math> is the nearest non-occupied confusion line, determined as:
<math display="block">
d(v^*, L^*) = \min_{L \in \text{CL}_D} d(v^*, L).
</math>

3. Updating the sets of confusing colors and non-occupied confusion lines iteratively:
<math display="block">
\Phi_V = \Phi_V - \{v^*\}, \quad \text{CL}_D = \text{CL}_D - \{L^*\}.
</math>

where:
* <math display="inline">\Phi_V</math> is the set of confusing colors,
* <math display="inline">\text{CL}_D</math> is the set of non-occupied confusion lines.

After shifting, the luminance of the recolored key colors is optimized using a regularized objective function to balance naturalness and contrast:
<math display="block">
E = (E_1 + E_2) + \lambda E_3,
</math>

where:
* <math display="inline">E</math> is the total loss,
* <math display="inline">\lambda</math> is a weight parameter controlling the trade-off between contrast enhancement and naturalness preservation.

The first term, <math display="inline">E_1</math>, measures contrast enhancement for normal trichromats:

<math display="block">
E_1 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - b_j\| - \|f_D(a_{i,\text{rec}}) - f_D(b_j)\| \right|,
</math>

where:
* <math display="inline">n_A</math> and <math display="inline">n_B</math> are the number of key colors in clusters <math display="inline">A</math> and <math display="inline">B</math>, respectively,
* <math display="inline">a_i</math> is the chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">b_j</math> is the chromaticity of the <math display="inline">j</math>-th key color in cluster <math display="inline">B</math>,
* <math display="inline">f_D</math> is a function simulating the dichromatic vision of individuals with color vision deficiencies,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color.

The second term, <math display="inline">E_2</math>, measures contrast enhancement for dichromats:

<math display="block">
E_2 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - a_j\| - \|f_D(a_{i,\text{rec}}) - f_D(a_{j,\text{rec}})\| \right|,
</math>

where:
* <math display="inline">a_i</math> and <math display="inline">a_j</math> are the chromaticities of the <math display="inline">i</math>-th and <math display="inline">j</math>-th key colors in cluster <math display="inline">A</math>,
* <math display="inline">f_D(a_{i,\text{rec}})</math> simulates the dichromatic perception of the recolored chromaticity <math display="inline">a_{i,\text{rec}}</math>.

The third term, <math display="inline">E_3</math>, preserves the naturalness of the recolored image:

<math display="block">
E_3 = \frac{1}{n_A} \sum_{i=1}^{n_A} \|a_i - a_{i,\text{rec}}\|,
</math>

where:
* <math display="inline">a_i</math> is the original chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">\|a_i - a_{i,\text{rec}}\|</math> is the Euclidean distance between the original and recolored chromaticities, measuring how much the naturalness is preserved.

This method significantly enhances the contrast and naturalness of recolored images by leveraging confusion line geometry and regularized optimization. However, challenges remain in achieving real-time performance and handling cases where shifting may distort the aesthetic quality of the image.

==== GMM-based Method ====
Huang et al. [11] proposed an efficient and effective re-coloring algorithm for individuals with CVD using a Gaussian Mixture Model (GMM) to represent color distributions. The algorithm comprises four main steps: feature extraction, clustering using GMM, optimization of Gaussian components, and interpolation for recoloring.

Step 1 - Feature Extraction:
Each pixel in the input image is represented in the CIEL*a*b* color space, which approximates perceptual differences using the Euclidean distance between colors. The color feature vector <math display="inline">x</math> is used as input for clustering.

Step 2 - Clustering via GMM:
The color distribution of the image is modeled using a GMM with <math display="inline">K</math> Gaussian components:
<math display="block">
p(x|\Theta) = \sum_{i=1}^K \omega_i G_i(x|\theta_i),
</math>
where:
* <math display="inline">\Theta</math> is the parameter set containing all weights, means, and covariance matrices,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian,
* <math display="inline">G_i(x|\theta_i)</math> is the 3D normal distribution with parameters <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix).

Diving into more details, the parameters of the GMM are initialized using the K-means algorithm and refined via the Expectation-Maximization (EM) algorithm, which consists of the E-step and the M-step:

The E-step calculates the probability of each color (or pixel) belonging to a specific Gaussian component in the GMM. This probability, also known as the "responsibility," reflects how much each Gaussian contributes to the representation of a color:

<math display="block">
p(i|x_j, \Theta^{\text{old}}) = \frac{\omega_i G_i(x_j|\theta_i)}{\sum_{k=1}^K \omega_k G_k(x_j|\theta_k)}.
</math>

Here:
* <math display="inline">p(i|x_j, \Theta^{\text{old}})</math> is the probability of the <math display="inline">j</math>-th color feature <math display="inline">x_j</math> belonging to the <math display="inline">i</math>-th Gaussian component,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian, representing its overall contribution to the color distribution,
* <math display="inline">G_i(x_j|\theta_i)</math> is the Gaussian distribution for the <math display="inline">i</math>-th component, evaluated at <math display="inline">x_j</math>, where <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix),
* <math display="inline">\sum_{k=1}^K \omega_k G_k(x_j|\theta_k)</math> normalizes the probabilities by considering the contributions of all <math display="inline">K</math> Gaussians to the <math display="inline">j</math>-th pixel.

This step essentially assigns each pixel a "soft" membership to each Gaussian component, rather than forcing a hard clustering decision. Pixels that are close to a Gaussian's mean (in feature space) will have higher probabilities of belonging to that Gaussian.

The M-step updates the parameters of each Gaussian component based on the probabilities computed in the E-step. These updates refine the Gaussian model to better fit the data:

1. Update the mixing weights:
<math display="block">
\omega_i^{\text{new}} = \frac{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}})}{N},
</math>
This equation calculates the proportion of pixels assigned to the <math display="inline">i</math>-th Gaussian. It reflects how dominant each Gaussian is in representing the color distribution.

2. Update the means:
<math display="block">
\mu_i^{\text{new}} = \frac{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}}) x_j}{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}})},
</math>
This equation computes the new mean vector <math display="inline">\mu_i^{\text{new}}</math> for the <math display="inline">i</math>-th Gaussian. It is a weighted average of all pixel feature vectors <math display="inline">x_j</math>, where the weights are the probabilities <math display="inline">p(i|x_j, \Theta^{\text{old}})</math>. Pixels with higher probabilities contribute more to the new mean.

3. Update the covariance matrices:
<math display="block">
\Sigma_i^{\text{new}} = \frac{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}}) (x_j - \mu_i^{\text{new}})(x_j - \mu_i^{\text{new}})^T}{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}})}.
</math>
This equation calculates the new covariance matrix <math display="inline">\Sigma_i^{\text{new}}</math> for the <math display="inline">i</math>-th Gaussian. It measures the spread of pixel features around the new mean, weighted by the probabilities from the E-step.

Step 3 - Optimization:
To ensure color distinguishability for CVD viewers, the algorithm adjusts the mean vector of each Gaussian component using an optimization function that preserves the symmetric Kullback-Leibler (KL) divergence:
<math display="block">
D_{sKL}(G_i, G_j) = D_{KL}(G_i \| G_j) + D_{KL}(G_j \| G_i),
</math>
where:
* <math display="inline">D_{KL}(G_i \| G_j)</math> measures the dissimilarity between two Gaussian distributions <math display="inline">G_i</math> and <math display="inline">G_j</math>.

The optimization aims to preserve the contrast perceived by CVD viewers while maintaining naturalness. Weights are assigned to Gaussian components based on the perceptual importance of colors:
<math display="block">
\lambda_i = \frac{\sum_{j=1}^N \alpha_j p(i|x_j, \Theta)}{\sum_{k=1}^K \sum_{j=1}^N \alpha_j p(k|x_j, \Theta)},
</math>
where:
* <math display="inline">\alpha_j = \|x_j - \text{Sim}(x_j)\|</math> is the perceptual error of the <math display="inline">j</math>-th color feature when simulated for CVD,
* <math display="inline">\text{Sim}(\cdot)</math> is the simulation function for CVD perception.

Step 4 - Interpolation for Recoloring:
After optimizing the Gaussians, the mapping function <math display="inline">M_i(\cdot)</math> relocates the mean vectors while maintaining covariance matrices. Interpolation ensures smooth transitions between recolored regions:
<math display="block">
T(x_j)_H = x_j^H + \sum_{i=1}^K p(i|x_j, \Theta) (M_i(\mu_i)_H - \mu_i^H),
</math>
where:
* <math display="inline">T(x_j)_H</math> is the hue adjustment for the <math display="inline">j</math>-th color,
* <math display="inline">M_i(\mu_i)_H</math> is the mapped hue of the <math display="inline">i</math>-th Gaussian's mean.

While the GMM-based approach effectively models color distributions and enhances the contrast of recolored images significantly, it has limitations:
* The accuracy of recoloring depends on the choice of <math display="inline">K</math>, which may vary for different images.
* The method assumes diagonal covariance matrices for computational efficiency, which may oversimplify real-world color distributions. Sometimes the colors in the recolored images are not very natural.
* The high computational complexity in the optimization step of this algorithm may be difficult for real-time applications.

=== Deep Learning based methods ===
Conventional methods for recoloring, including optimization-based approaches (as discussed above), fail to generalize well across varying severity levels and CVD types. While these methods improve color differentiation, they frequently compromise naturalness or require extensive computational resources, making them less suitable for real-time, efficient, personalized applications.

==== GAN-Based Recoloring for CVD ====

In [1] GANs (Generative Adversarial Networks) was explored for recoloring, with a backbone Pix2Pix-GAN, Cycle-GAN, and Bicycle-GAN structure showing promising results. These models are generate creative recolored images by learning mappings between normal and CVD-affected color spaces. However, this and existing GAN approaches struggle with balancing naturalness and contrast. This specific reference also requires paired datasets (since it is adapted from style transfer), making it computationally intensive and less suitable for personalization.

==== Swin Transformer Recoloring ====

The authors in [2] introduced a hierarchical vision transformer (SWIN) architecture that processes images through shifted windows, effectively capturing both local and global contextual information. In computer vision, this design generally allows efficient handling of high-resolution images and has been applied to various tasks, including image classification and object detection. Despite its robust performance, this architecture is still computationally intensive and does not inherently account for the specific needs of CVD individuals, as it lacks mechanisms for personalized color adjustments.

==== Personalized CVD-GAN ====

To cater to the diverse needs of the CVD population, the Personalized CVD-GAN [3] was developed. This model generates images that are not only CVD-friendly but also tailored to individual degrees of color vision deficiency. By disentangling color representations using a unique triple-latent structure in their method, continuous personalization was possible to adjust images according to specific CVD severities. While effective, this approach is computationally demanding, making it less practical for real-time applications. In our experiment, it took around 18 days for one epoch (or one iteration over the entire dataset).

Thus, existing methods either lack personalization or are too resource-intensive for widespread use.

== Methods ==
We aim to find effective and efficient ways to recolor images for people with CVD with the personalization of different severity levels. We start by exploring existing methods and identifying opportunities for improvement. Since mathematical-based approaches provide a solid foundation and are well-documented, we began our experiments by testing these methods, as described in the background. We later extended our exploration to deep learning based methods.

=== Mathematical based ===
We explored four main methods, building on the foundational work discussed in the background section.

==== Method 1: Daltonization as a baseline ====

=== Deep Learning based ===

==== Task Overview ====
Given an input RGB image and a label for the user (as shown in the figure), we want a deep learning model to output a recolored RGB image that is specific to that user. More details on inputs and outputs are discussed in further sections but an overview is shown in Figure 1. All of the code was done in Python using a deep learning framework called [https://pytorch.org PyTorch]
[[File:Io.png|right|thumb|200px|Figure 1: Dataset]]

==== Types ====
1. ''' Supervised methods ''':
These are deep learning models that require a 'ground truth' recolored image for the neural network to learn recolorization. While these methods are simple, easy to train and integrate the user label, they require an already present ground truth comparison of expected output.

2. ''' Unsupervised methods ''':
These models are trained without a ground truth and can also encode user label information while training. They are generally better at generating more natural images, but they require more compute and sophisticated model architectures or loss functions for the recoloring task

==== Dataset ====
The dataset used for this project was constructed specifically to address the challenges of recoloring images for individuals with color vision deficiency (CVD). We first gathered an open-source RGB image dataset from [2]. To improve the capability of the proposed model to enhance the contrast between CVD-indistinguishable color
pairs, in their study, they created a new dataset consisting of 141,000 pictures of both natural scenes and artificial images containing
CVD-confusing colors without labels. To generate labels (and ground truth recolored images for supervised methods), we randomly sampled 15,000 images and recolored by simulating random labels for severity and type of CVD. The recoloring for ground truth images was done using a [https://github.com/jbhuang0604/RecolorForColorblind/tree/master MATLAB script] (adapted to Python) from [4]. Note: The open-source tools used in the Python version for the recoloring script were [https://scikit-image.org Scikit-Image], [https://scipy.org Scipy] and [https://python-colormath.readthedocs.io/en/latest/ Colormath].

As shown in Figure 1, each sample in the dataset consists of:

1. ''' Original RGB Image''' : High-resolution images, resized to <code> 256x256</code> pixels and normalized to <code>[0,1]</code> range, representing the standard color space.

2. ''' CVD Labels ''' : Condition labels encoded as <code>severity * [protan, deutan]</code>, where severity ranges from 0.1 to 1.0. For example, a label <code>[0.6, 0]</code> corresponds to protanopia at 60% severity.

Data augmentation techniques such as random rotations, crops, and brightness adjustments were applied to expand the dataset, ensuring robust model generalization across diverse scenarios.

==== Supervised Methods ====
===== Conditional Parallel RGB MLP =====
[[File:mlp.png|right|thumb|Figure 2: Conditional MLP architecture]]
As shown in Figure 2, the model predicts the R, G, and B channels separately using an independent multi-layer perceptron (MLP) for each channel. The input image is concatenated with the label encoding along the channel dimension and is passed to 3 parallel MLPs simultaneously. These parallel networks are learned to predicted R, G, B channels of a recolored image based on given ground truth. The outputs from each of these networks are concatenated to produce the recolored RGB image of same spatial dimensions as input. Essentially, each channel is disentangled, enabling targeted adjustments.

The loss function used to train was pixel wise, mean-squared error loss:
<math>
\mathcal{L}_{\text{MSE}} = \frac{1}{N} \sum_{p=1}^{N} \left( I(p) - I'(p) \right)^2
</math>

Where:
* I, I': Recolored (model output) image and ground truth recolored image respectively
* p: Image index
* N: Total number of images

===== Conditional U-Net =====
In a similar fashion of inputs, a convolutional neural network (CNN)-based U-Net architecture was tested to generate a full recolored image as output. The conditional inputs here affect both the encoder and decoder. [[File:Unet condtional.png|right|thumb|Figure 3: Conditional U-Net architecture]]
U-Nets are widely used in computer vision tasks and are very robust to new tasks as well. The architecture we adopted is shown in Figure 3.
The loss function used to train the U-Net was a commonly used VGG Perceptual Loss:
<math>
\mathcal{L}_{\text{VGG}} = \sum_{l} \frac{1}{N_l} \| \phi_l(I) - \phi_l(I') \|_2^2
</math>

Where:
* I and I': are recolored (model output) and ground recolored images respectively
* <math>\phi_l</math> is the l-th of the pre-trained VGG network

==== Unsupervised Methods ====
===== Conditional Autoencoder =====
As shown in Figure4, an unsupervised CNN-based encoder-decoder network was trained to reconstruct full recolored images with a CVD-aware color palette. The key to making this network align with the recoloring task was the loss functions. The loss functions we used to train this network were inspired from [2]. [[File:Ae.png|right|350px|thumb|Figure 4: Conditional Autoencoder architecture]]

The total loss function is given by:
<math>
\mathcal{L}_{\text{total}} = \alpha \cdot \mathcal{L}_{\text{naturalness}} + 2 \cdot (1 - \alpha) \cdot \mathcal{L}_{\text{contrast}}
</math>

Where:
<math>
\mathcal{L}_{\text{contrast}} = \beta \cdot \mathcal{L}_{\text{global}} + (2 - \beta) \cdot \mathcal{L}_{\text{local}}
</math>

The components of the loss functions are described below:

1. '''Global Contrast Loss''':
The global contrast loss ensures that the overall contrast of the recolored image is preserved. It is defined as
<math>
\mathcal{L}_{global} = \frac{1}{\|\omega\|} \sum_{<x, y> \in \epsilon \omega} \text{CL}(x, y)
</math>

2. '''Local Contrast Loss''':
The local contrast loss focuses on preserving the contrast within a small neighborhood around each pixel. <math>
\mathcal{L}_{l} = \frac{1}{N} \sum_{x=1}^{N} \sum_{y \in \omega_x} \frac{\text{CL}(x, y)}{\|\omega_x\|}
</math>

Note:

<math>
\text{CL}(x, y) = \|\hat{c}_x' - \hat{c}_y'\| - \|c_x - c_y\|
</math>

* x,y: Two distinct pixels in the image.
* cx and cy: CVD simulated colors of original image
* c^x′and c^y: CVD simulated colors of recolored image (model output)
* ||w||: Global (or large) window of image
* ||wx||: Local window or neighborhood around a pixel x

3. '''Naturalness Loss''':
The naturalness loss drives output image to have colors that are visually similar and close to natural distributions. <math>
\mathcal{L}_{\text{natural}} = 1 - \text{SSIM}(I', I)
</math>

Where:
* I(i), I'(i): Original and recolored images respectively

== Results ==
=== Deep Learning based methods ===
The results focus on evaluating the performance of the above neural network architectures—Conditional Parallel RGB MLP, Deep U-Net, and Conditional Autoencoder. Quantitive metrics such as Structural Similarity Index (SSIM), total color contrast (TCC), Chromatic Difference (CD), and inference time were used to assess the effectiveness of the models provided in [1] and [2].

==== Qualitative Results ====
The recolored outputs were visually evaluated to determine their alignment with expected results. The 'expected' results for supervised mean how closely they resemble ground truth recolored image and for unsupervised method mean how much contrast and naturalness is observed in the CVD simulated recolored images compared to original.
The results and takeaways can be summarized as follows:

1. '''Conditional Parallel RGB MLP''': (Figure 5)
[[File:Mlp_res.png|right|400px|thumb|Figure 5 Conditional MLP: Model failure]]
* Recoloring was inconsistent, with visible artifacts in regions where spatial correlations were essential.
* The pixels seemed more discretized, suggesting that disentanglement was not very useful for this case (especially naturalness).
* Failed to preserve natural color transitions, particularly in complex images.
2. '''Conditional U-Net''': (Figure 6, 7)
[[File:Unet_res1.png|right|400px|thumb|Figure 6 Conditional U-Net: Model failure]]
[[File:Unet_res2.png|right|400px|thumb|Figure 7 Conditional U-Net: CVD Simulated examples]]
* Produced stable recoloring, preserving structural details.
* Initially showed improvement towards resembling ground truth, but over time started 'reconstructing' the colors of the original image.
* The CVD simulations of recolored versus original were similar or worse meaning that the model was not doing well for this task
* Sometimes it over-saturated some colors, affecting the visual appeal.
3. '''Conditional Autoencoder''': (Figure 8, 9)
[[File:ae_res1.png|right|400px|thumb|Figure 8 Conditional Autoencoder: Majority good results]]
[[File:ae_res1.png|right|400px|thumb|Figure 9 Conditional Autoencoder: Marginal or negative improvement + Blurriness]]
* Achieved smooth and natural recoloring, with fewer artifacts.
* Showed the highest contrast improvement among the three models.
* In some cases, hurt the contrast in the CVD simulated colors and in some there was marginal improvement in contrast.
* Blurriness in the recolored images was seen (possibly due to naturalness factor being more prioritized even though weight coefficients in the loss term favored contrast (alpha = 0.25, beta = 1.0)).

==== Quantitative Results ====
Based on the above qualitative results, we decided to score and evaluate metrics for comparison with related work only using the Conditional Autoencoder.
As mentioned above, the evaluation metrics are adapted from [1] and [2]. Please refer to the definitions in the paper, as we have used the same. On a high level, the three components are:
* SSIM: Measures the structural similarity between the original and recolored images, ensuring the structural integrity of the recolored image is maintained.
<math>
SSIM(X, Y) = \frac{(2\mu_X\mu_Y + c_1)(2\sigma_{XY} + c_2)}{(\mu_X^2 + \mu_Y^2 + c_1)(\sigma_X^2 + \sigma_Y^2 + c_2)}
</math>

* Total Color Contrast: Quantifies the visibility improvement between indistinguishable colors for CVD individuals.
<math>
TCC = \frac{1}{n_1} \sum_{(i,j) \in \Omega_1} |x_i - x_j|
+ \frac{1}{N \cdot n_2} \sum_{i=1}^{N} \sum_{j \in \Omega_2} |x_i - x_j|
</math>
* Chromatic Difference: Quantifies the perceptual differences in color before and after recoloring, ensuring enhanced distinguishability
<math>
CD(i) = \sqrt{\lambda (l_i' - l_i)^2 + (a_i' - a_i)^2 + (b_i' - b_i)^2}
</math>
(lamda is a constant, not wavelength and l,a,b represent LAB space coordinates of recolored (') and original respectively.)
* Inference Time: Determines the computational efficiency of the models.

The key results are in Table 1 and takeaways for the Conditional Autoencoder can be summarized as follows:

{| class="wikitable" style="text-align:center; width:30%; margin:auto;"
|-
! Metric
! Value
|-
| Inference Time
| 2.6 seconds/image
|-
| SSIM ("Structure")
| 0.8707
|-
| Total Color Contrast ("Distinguishability")
| 0.5771 / (~0.851)*
|-
| Chromatic Difference ("Color")
| 0.3521 / (~0.963)*
|+ '''Table 1: Quantitative Evaluation Results'''
|}

Note: * indicates results from paper [2] for protan/deutan whichever is larger.

* TCC and CD are good but not as good as paper [2] because they use optimize networks for each CVD type separately.
* Blurry (SSIM is not optimized for enough)
* Mixing CVD types in the same network needs to be more sophisticated

== Conclusions ==
Through our (many) experiments, we learned a couple of things:

1. '''Model Effectiveness''':
Among the models, the Conditional Autoencoder showed the best balance between enhancing color contrast and preserving naturalness. It improved the distinguishability of colors for CVD individuals while maintaining a smooth, visually appealing output. However, it produced slightly blurry images, which could be improved with better loss functions or refinement techniques. The Conditional U-Net was also effective in preserving structure and providing stable recoloring, but it required careful training to avoid overfitting. The Conditional Parallel RGB MLP, while computationally fast, lacked the ability to capture spatial relationships between pixels, making it unsuitable for this task.

2. '''Importance of Loss Functions''':
Designing appropriate loss functions was crucial for achieving the right balance between naturalness, contrast enhancement, and structural preservation. The global and local contrast losses significantly improved the visibility of recolored images, while the naturalness loss ensured that the outputs did not look artificial. Incorporating metrics like SSIM and Chromatic Difference into the evaluation also helped us better understand how well the models performed.

3. '''Challenges with Data''':
One of the biggest challenges was ensuring that the dataset effectively represented real-world scenarios for CVD individuals. Simulating CVD perceptions and generating recolored images that matched those perceptions required a well-defined pipeline. A more diverse dataset or additional user studies with CVD participants could help fine-tune the models further.

4. '''Computational Efficiency''':
While models like the Conditional Autoencoder and Conditional U-Net provided high-quality recoloring, their inference times were moderate, making them feasible for real-time applications. Optimizing these models further could make them more scalable for real-world use cases, such as accessibility tools in apps or websites.

5. '''What Worked and What Didn’t''':
* Worked: Contrast enhancement methods using local and global losses were effective in improving visibility for CVD individuals. Transformer-inspired loss functions borrowed from Swin architecture added robustness.
* Didn’t Work: Pixel-wise methods like the Conditional RGB MLP struggled due to their inability to handle spatial dependencies. Additionally, overfitting was a recurring issue in larger architectures without careful training.

6. '''Future Directions''':
* Better Loss Functions: Refining the loss functions to address issues like blurriness in outputs could further improve results.
* User Studies: Testing the models with real CVD participants would provide valuable insights and help validate the results.
* Model Optimization: Reducing the computational cost of high-performing models like the Conditional Autoencoder could make them more practical for deployment.
* Exploration of New Architectures: Trying newer methods, such as lightweight transformers or diffusion-based models, might enhance recoloring performance while maintaining efficiency.

While there’s still room for improvement, our models demonstrated the potential of deep learning in addressing the challenges faced by individuals with CVD. Our future work would focus on refining these methods and bringing them closer to practical, everyday applications.

== References ==
[1] Li, H., Zhang, L., Zhang, X., Zhang, M., Zhu, G., Shen, P., ... & Shah, S. A. A. (2020). Color vision deficiency datasets & recoloring evaluation using GANs. Multimedia Tools and Applications, 79, 27583-27614.

[2] Chen, L., Zhu, Z., Huang, W., Go, K., Chen, X., & Mao, X. (2024). Image recoloring for color vision deficiency compensation using Swin transformer. Neural Computing and Applications, 36(11), 6051-6066.

[3] Jiang, S., Liu, D., Li, D., & Xu, C. (2023). Personalized image generation for color vision deficiency population. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22571-22580).

[4] Huang, J.-B., Chen, C.-S., Jen, T.-C., & Wang, S.-J. (n.d.). Image recolorization for the colorblind [GitHub repository]. Retrieved December 12, 2024, from https://github.com/jbhuang0604/RecolorForColorblind

[5] Dietrich, J. (n.d.). Daltonize Python Package [GitHub repository]. Retrieved December 12, 2024, from https://github.com/joergdietrich/daltonize/blob/main/daltonize/daltonize.py

[6] Dougherty, B., & Wade, A. (2000). Vischeck. Retrieved December 12, 2024, from https://www.vischeck.com/

[7] Brettel, H., Viénot, F., & Mollon, J. D. (1997). Computerized simulation of color appearance for dichromats. Josa a, 14(10), 2647-2655.

[8] Zhu, Z., Toyoura, M., Go, K., Fujishiro, I., Kashiwagi, K., & Mao, X. (2019). Processing images for red–green dichromats compensation via naturalness and information-preservation considered recoloring. The Visual Computer, 35, 1053-1066.

[9] Zhu, Z., Toyoura, M., Go, K., Kashiwagi, K., Fujishiro, I., Wong, T. T., & Mao, X. (2021). Personalized image recoloring for color vision deficiency compensation. IEEE Transactions on Multimedia, 24, 1721-1734.

[10] Tsekouras, G. E., Rigos, A., Chatzistamatis, S., Tsimikas, J., Kotis, K., Caridakis, G., & Anagnostopoulos, C. N. (2021). A novel approach to image recoloring for color vision deficiency. Sensors, 21(8), 2740.

[11] Huang, J. B., Chen, C. S., Jen, T. C., & Wang, S. J. (2009, April). Image recolorization for the colorblind. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 1161-1164). IEEE.

== Appendix I ==
* [https://github.com/rainasong/psych221-aut24-final-project.git Code]
* [https://drive.google.com/drive/folders/10WMXPbtpV7Hy5_qBA_TCEbW-kCpj1D7v Dataset]

=== Additional results ===
1. '''Recolored Images - Conditional Autoencoder'''
<div style="display: inline; width: 220px; float: center;">
[[File:eb_1.png|400 px|Wikipedia encyclopedia]][[File:eb_2.png|400 px]] </div>

2. '''Loss curves'''
<div style="display: inline; width: 800px; float: center;">
[[File:loss_ae.png|300 px|center|thumb|Losses - Conditional Autoencoder]][[File:loss_unet.png|300 px|thumb|center|Losses - Conditional U-Net]][[File:loss_mlp.png|300 px|center|thumb|Losses - Conditional MLP]]</div>

== Appendix II ==
'''Ishikaa''':
* Training, evaluation and visualization for all deep learning methods (MLP, U-Net and Autoencoder)
* GMM recoloring method in Python & adding severity index
* 'Ground Truth' dataset creation and logging
* AWS Compute setup & configuration
* Written Report & Presentation

'''Raina''':

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-13T06:33:28Z

Rainas: /* Methods */

== Introduction ==
Color Vision Deficiency (CVD) affects approximately 350 million individuals worldwide, impairing their ability to distinguish certain colors. Image recoloring for individuals with CVDs has been a well-researched area, with numerous attempts aimed at creating images that make colors more distinguishable for those with CVDs. CVDs are typically classified into three main types: protanopia (difficulty perceiving red), deuteranopia (difficulty perceiving green), and tritanopia (difficulty perceiving blue). These conditions arise due to the absence or malfunction of certain types of cone cells in the retina, which are responsible for color vision. For instance, the lack or defect of red or green cones leads to protanopia or deuteranopia, respectively. This can result in partial or complete loss of perception of specific colors. Moreover, individuals with CVD can be dichromats (completely missing one type of cone cell), or anomalous trichromats (having all three types of cones but with altered sensitivity), causing milder color perception issues. Most research has focused on protanopia and deuteranopia, as these conditions are more prevalent, and only a few consider different severity levels.

In this work, we aim to consider all three types of CVDs, taking into account varying levels of severity for personalization. We explore existing methods in the field and experiment through two main approaches: mathematical transformations and deep learning techniques. We will start by reviewing current advancements in these two domains, followed by presenting our experiments and results. Evaluations of each method will be provided, leading to a discussion of our findings and outlining potential directions for future work.

The motivation for this work arose from a personal experience. While admiring a rainbow, a phenomenon I find among the most beautiful in the world, with a friend who has deuteranopia, I realized that they were unable to distinguish the vibrant array of colors. This experience highlighted the emotional and perceptual gap caused by CVD, inspiring the goal of this project: to develop personalized and efficient tools that enhance color perception for individuals with CVDs. Ultimately, we aim to enable those with CVDs to enjoy the same vivid experiences, such as the beauty of a rainbow, experienced by those with normal color vision.

== Background ==
In recent years, numerous methods have been developed to recolor images for individuals with CVDs, ranging from traditional mathematical approaches to advanced deep learning techniques. This section focuses on the prominent recent works in these two categories.

=== Mathematical-based methods ===
Mathematical approaches to image recoloring for individuals with CVDs have been extensively developed to enhance color discrimination while trying to preserve the natural appearance of images. These methods typically involve color space transformations, optimization techniques, and perceptual modeling to achieve their objectives.

==== Daltonization ====
Daltonization enhances images for individuals with CVD by correcting colors based on the simulated deficiency. The process involves comparing the original LMS values with the simulated deficient values to compute the error:
<math display="block">
\text{Error}_{\text{LMS}} = \text{LMS}_{\text{original}} - \text{LMS}_{\text{simulated}}
</math>

The error is then mapped back to the RGB space using a correction matrix. For example, the correction matrix for protanopia, as implemented in tools like Vischeck [6], is:
<math display="block"> \text{Correction Matrix for Protanopia} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

The corrected RGB values are added back to the original LMS values to generate a daltonized image that improves contrast for CVD viewers.

The simulation of CVDs relies on the physiology of human vision, particularly the responses of the Long (L), Medium (M), and Short (S) wavelength-sensitive cones in the retina. The LMS color space is derived from the spectral sensitivities of these cones, making it an ideal framework for modeling human color perception.

To simulate CVD, colors are first transformed into the LMS color space using the following linear transformation matrix based on Stockman and Sharpe’s cone fundamentals:
<math display="block">
T_{\text{RGB-to-LMS}} = \begin{bmatrix}
0.3904725 & 0.54990437 & 0.00890159 \\
0.07092586 & 0.96310739 & 0.00135809 \\
0.02314268 & 0.12801221 & 0.93605194
\end{bmatrix}
</math>

For individuals with CVD, the missing cone’s response is replaced by a weighted combination of the remaining two cones. This approach, introduced by Brettel, Viénot, and Mollon (1997) [7], uses specific coefficients derived from cone sensitivities. For example, in protanopia (L-cone deficiency), the L-cone response is approximated using the M- and S-cone responses as:
<math display="block">
L_{\text{simulated}} = 0 \cdot L + 0.90822864 \cdot M + 0.008192 \cdot S
</math>

For deuteranopia (M-cone deficiency), the M-cone is replaced as:
<math display="block">
M_{\text{simulated}} = 1.10104433 \cdot L + 0 \cdot M - 0.00901975 \cdot S
</math>

For tritanopia (S-cone deficiency), the S-cone is replaced as:
<math display="block">
S_{\text{simulated}} = -0.15773032 \cdot L + 1.19465634 \cdot M + 0 \cdot S
</math>

These transformations allow accurate simulation of the perceptual experience of individuals with CVD. (The numbers are derived from [5]).

==== Optimization-based Method ====
Zhu et al. [8] introduced an optimization-based recoloring framework for red-green dichromacy, aiming to balance naturalness and contrast. The framework minimizes a total loss function defined as:

<math display="block"> E = \beta E_{\text{nat}} + E_{\text{cont}} </math>

where <math>\beta</math> is a scalar weight that controls the trade-off between the two objectives: naturalness preservation (<math>E_{\text{nat}}</math>) and contrast enhancement (<math>E_{\text{cont}}</math>).

The naturalness term, <math>E_{\text{nat}}</math>, ensures that the recolored image closely resembles the original image for CVD viewers by minimizing perceptual differences:

<math display="block"> E_{\text{nat}} = \sum_{i=1}^N \| c_i^+ - c_i \|^2, </math>

where:
* <math>N</math> is the total number of pixels in the image,
* <math>c_i</math> is the original color of the <math>i</math>-th pixel,
* <math>c_i^+</math> is the recolored value of the <math>i</math>-th pixel,
* <math>\| c_i^+ - c_i \|</math> is the Euclidean distance, measuring the perceptual difference between the original and recolored colors.

The contrast term, <math>E_{\text{cont}}</math>, enhances the distinguishability of colors in the recolored image by minimizing changes in color contrast:

<math display="block"> E_{\text{cont}} = \sum_{i \neq j} \| (c_i^+ - c_j^+) - (c_i - c_j) \|^2, </math>

where:
* <math>(c_i^+ - c_j^+)</math> is the perceived color difference between pixels <math>i</math> and <math>j</math> after recoloring,
* <math>(c_i - c_j)</math> is the original color difference,
* <math>\| (c_i^+ - c_j^+) - (c_i - c_j) \|</math> represents the deviation in color contrast before and after recoloring.

To address the limitations of this approach, Zhu et al. [9] proposed a degree-adaptable framework incorporating a transformation matrix <math>T</math> that simulates CVD perception. The transformation matrix is defined as:

<math display="block"> T = \begin{bmatrix} t_{11} & t_{12} & t_{13} \\ t_{21} & t_{22} & t_{23} \\ t_{31} & t_{32} & t_{33} \end{bmatrix}, </math>

where <math>t_{ij}</math> are the elements representing the relationships between the original and perceived LMS (Long, Medium, Short wavelength) cone responses for individuals with CVD.

The degree-adaptable loss function extends the optimization by adjusting weights based on perceptual importance, defined as:

<math display="block"> E = \beta \sum_{i=1}^N \alpha_i \| T(c_i^+ - c_i) \|^2 + \sum_{i \neq j} \| T(c_i^+ - c_j^+) - T(c_i - c_j) \|^2. </math>

Here:
* <math>\alpha_i</math> assigns weights to each pixel, prioritizing the preservation of colors with smaller perception errors,
* <math>\| T(c_i^+ - c_i) \|</math> measures the perceptual difference after recoloring,
* <math>\| T(c_i^+ - c_j^+) - T(c_i - c_j) \|</math> quantifies the deviation in color contrast under CVD simulation.

This framework improves both contrast and personalization but requires further optimization for real-time performance.

==== Confusion lines based Method ====
Tsekouras et al. [10] proposed a novel image recoloring approach for individuals with protanopia and deuteranopia, focusing on improving color naturalness and enhancing contrast. Their framework consists of four modules, with a key focus on shifting confusing colors along confusion lines in the CIE 1931 chromaticity diagram.

The method begins with fuzzy clustering to extract representative colors (key colors) from the input image. These colors are mapped onto the CIE 1931 chromaticity diagram, where confusion lines represent loci of colors perceived as identical by individuals with CVD. Confusion lines are defined using the copunctal point of the missing cone type and another reference point:

<math display="block">
d(v, L) = \frac{\left|(x_{cp} - x_0)(y_0 - y_v) - (x_0 - x_v)(y_{cp} - y_0)\right|}{\sqrt{(x_{cp} - x_0)^2 + (y_{cp} - y_0)^2}},
</math>

where:
* <math display="inline">v = (x_v, y_v)</math> is the chromaticity coordinate of the color,
* <math display="inline">L</math> is the confusion line passing through the copunctal point <math display="inline">(x_{cp}, y_{cp})</math> and another reference point <math display="inline">(x_0, y_0)</math>,
* <math display="inline">d(v, L)</math> measures the perpendicular distance from the point <math display="inline">v</math> to the confusion line <math display="inline">L</math>.

Confusing colors, identified as key colors lying on occupied confusion lines, are iteratively shifted to the nearest non-occupied confusion lines to enhance discriminability for CVD viewers. The translation process involves:

1. Ranking key colors by their cluster sizes:
<math display="block">
\text{rank}(v_i) = \frac{|A_i|}{\sum_{j=1}^{n_A}|A_j|},
</math>

where:
* <math display="inline">v_i</math> is the chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">|A_i|</math> is the cardinality (number of pixels) of its associated cluster,
* <math display="inline">n_A</math> is the total number of clusters.

2. Translating the highest-ranked confusing color <math display="inline">v^*</math> to its projection on the nearest non-occupied confusion line:
<math display="block">
v^*_{\text{tr}} = \text{proj}(v^*, L^*),
</math>

where:
* <math display="inline">v^*_{\text{tr}}</math> is the new position of the color <math display="inline">v^*</math> after translation,
* <math display="inline">L^*</math> is the nearest non-occupied confusion line, determined as:
<math display="block">
d(v^*, L^*) = \min_{L \in \text{CL}_D} d(v^*, L).
</math>

3. Updating the sets of confusing colors and non-occupied confusion lines iteratively:
<math display="block">
\Phi_V = \Phi_V - \{v^*\}, \quad \text{CL}_D = \text{CL}_D - \{L^*\}.
</math>

where:
* <math display="inline">\Phi_V</math> is the set of confusing colors,
* <math display="inline">\text{CL}_D</math> is the set of non-occupied confusion lines.

After shifting, the luminance of the recolored key colors is optimized using a regularized objective function to balance naturalness and contrast:
<math display="block">
E = (E_1 + E_2) + \lambda E_3,
</math>

where:
* <math display="inline">E</math> is the total loss,
* <math display="inline">\lambda</math> is a weight parameter controlling the trade-off between contrast enhancement and naturalness preservation.

The first term, <math display="inline">E_1</math>, measures contrast enhancement for normal trichromats:

<math display="block">
E_1 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - b_j\| - \|f_D(a_{i,\text{rec}}) - f_D(b_j)\| \right|,
</math>

where:
* <math display="inline">n_A</math> and <math display="inline">n_B</math> are the number of key colors in clusters <math display="inline">A</math> and <math display="inline">B</math>, respectively,
* <math display="inline">a_i</math> is the chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">b_j</math> is the chromaticity of the <math display="inline">j</math>-th key color in cluster <math display="inline">B</math>,
* <math display="inline">f_D</math> is a function simulating the dichromatic vision of individuals with color vision deficiencies,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color.

The second term, <math display="inline">E_2</math>, measures contrast enhancement for dichromats:

<math display="block">
E_2 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - a_j\| - \|f_D(a_{i,\text{rec}}) - f_D(a_{j,\text{rec}})\| \right|,
</math>

where:
* <math display="inline">a_i</math> and <math display="inline">a_j</math> are the chromaticities of the <math display="inline">i</math>-th and <math display="inline">j</math>-th key colors in cluster <math display="inline">A</math>,
* <math display="inline">f_D(a_{i,\text{rec}})</math> simulates the dichromatic perception of the recolored chromaticity <math display="inline">a_{i,\text{rec}}</math>.

The third term, <math display="inline">E_3</math>, preserves the naturalness of the recolored image:

<math display="block">
E_3 = \frac{1}{n_A} \sum_{i=1}^{n_A} \|a_i - a_{i,\text{rec}}\|,
</math>

where:
* <math display="inline">a_i</math> is the original chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">\|a_i - a_{i,\text{rec}}\|</math> is the Euclidean distance between the original and recolored chromaticities, measuring how much the naturalness is preserved.

This method significantly enhances the contrast and naturalness of recolored images by leveraging confusion line geometry and regularized optimization. However, challenges remain in achieving real-time performance and handling cases where shifting may distort the aesthetic quality of the image.

==== GMM-based Method ====
Huang et al. [11] proposed an efficient and effective re-coloring algorithm for individuals with CVD using a Gaussian Mixture Model (GMM) to represent color distributions. The algorithm comprises four main steps: feature extraction, clustering using GMM, optimization of Gaussian components, and interpolation for recoloring.

Step 1 - Feature Extraction:
Each pixel in the input image is represented in the CIEL*a*b* color space, which approximates perceptual differences using the Euclidean distance between colors. The color feature vector <math display="inline">x</math> is used as input for clustering.

Step 2 - Clustering via GMM:
The color distribution of the image is modeled using a GMM with <math display="inline">K</math> Gaussian components:
<math display="block">
p(x|\Theta) = \sum_{i=1}^K \omega_i G_i(x|\theta_i),
</math>
where:
* <math display="inline">\Theta</math> is the parameter set containing all weights, means, and covariance matrices,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian,
* <math display="inline">G_i(x|\theta_i)</math> is the 3D normal distribution with parameters <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix).

Diving into more details, the parameters of the GMM are initialized using the K-means algorithm and refined via the Expectation-Maximization (EM) algorithm, which consists of the E-step and the M-step:

The E-step calculates the probability of each color (or pixel) belonging to a specific Gaussian component in the GMM. This probability, also known as the "responsibility," reflects how much each Gaussian contributes to the representation of a color:

<math display="block">
p(i|x_j, \Theta^{\text{old}}) = \frac{\omega_i G_i(x_j|\theta_i)}{\sum_{k=1}^K \omega_k G_k(x_j|\theta_k)}.
</math>

Here:
* <math display="inline">p(i|x_j, \Theta^{\text{old}})</math> is the probability of the <math display="inline">j</math>-th color feature <math display="inline">x_j</math> belonging to the <math display="inline">i</math>-th Gaussian component,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian, representing its overall contribution to the color distribution,
* <math display="inline">G_i(x_j|\theta_i)</math> is the Gaussian distribution for the <math display="inline">i</math>-th component, evaluated at <math display="inline">x_j</math>, where <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix),
* <math display="inline">\sum_{k=1}^K \omega_k G_k(x_j|\theta_k)</math> normalizes the probabilities by considering the contributions of all <math display="inline">K</math> Gaussians to the <math display="inline">j</math>-th pixel.

This step essentially assigns each pixel a "soft" membership to each Gaussian component, rather than forcing a hard clustering decision. Pixels that are close to a Gaussian's mean (in feature space) will have higher probabilities of belonging to that Gaussian.

The M-step updates the parameters of each Gaussian component based on the probabilities computed in the E-step. These updates refine the Gaussian model to better fit the data:

1. Update the mixing weights:
<math display="block">
\omega_i^{\text{new}} = \frac{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}})}{N},
</math>
This equation calculates the proportion of pixels assigned to the <math display="inline">i</math>-th Gaussian. It reflects how dominant each Gaussian is in representing the color distribution.

2. Update the means:
<math display="block">
\mu_i^{\text{new}} = \frac{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}}) x_j}{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}})},
</math>
This equation computes the new mean vector <math display="inline">\mu_i^{\text{new}}</math> for the <math display="inline">i</math>-th Gaussian. It is a weighted average of all pixel feature vectors <math display="inline">x_j</math>, where the weights are the probabilities <math display="inline">p(i|x_j, \Theta^{\text{old}})</math>. Pixels with higher probabilities contribute more to the new mean.

3. Update the covariance matrices:
<math display="block">
\Sigma_i^{\text{new}} = \frac{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}}) (x_j - \mu_i^{\text{new}})(x_j - \mu_i^{\text{new}})^T}{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}})}.
</math>
This equation calculates the new covariance matrix <math display="inline">\Sigma_i^{\text{new}}</math> for the <math display="inline">i</math>-th Gaussian. It measures the spread of pixel features around the new mean, weighted by the probabilities from the E-step.

Step 3 - Optimization:
To ensure color distinguishability for CVD viewers, the algorithm adjusts the mean vector of each Gaussian component using an optimization function that preserves the symmetric Kullback-Leibler (KL) divergence:
<math display="block">
D_{sKL}(G_i, G_j) = D_{KL}(G_i \| G_j) + D_{KL}(G_j \| G_i),
</math>
where:
* <math display="inline">D_{KL}(G_i \| G_j)</math> measures the dissimilarity between two Gaussian distributions <math display="inline">G_i</math> and <math display="inline">G_j</math>.

The optimization aims to preserve the contrast perceived by CVD viewers while maintaining naturalness. Weights are assigned to Gaussian components based on the perceptual importance of colors:
<math display="block">
\lambda_i = \frac{\sum_{j=1}^N \alpha_j p(i|x_j, \Theta)}{\sum_{k=1}^K \sum_{j=1}^N \alpha_j p(k|x_j, \Theta)},
</math>
where:
* <math display="inline">\alpha_j = \|x_j - \text{Sim}(x_j)\|</math> is the perceptual error of the <math display="inline">j</math>-th color feature when simulated for CVD,
* <math display="inline">\text{Sim}(\cdot)</math> is the simulation function for CVD perception.

Step 4 - Interpolation for Recoloring:
After optimizing the Gaussians, the mapping function <math display="inline">M_i(\cdot)</math> relocates the mean vectors while maintaining covariance matrices. Interpolation ensures smooth transitions between recolored regions:
<math display="block">
T(x_j)_H = x_j^H + \sum_{i=1}^K p(i|x_j, \Theta) (M_i(\mu_i)_H - \mu_i^H),
</math>
where:
* <math display="inline">T(x_j)_H</math> is the hue adjustment for the <math display="inline">j</math>-th color,
* <math display="inline">M_i(\mu_i)_H</math> is the mapped hue of the <math display="inline">i</math>-th Gaussian's mean.

While the GMM-based approach effectively models color distributions and enhances the contrast of recolored images significantly, it has limitations:
* The accuracy of recoloring depends on the choice of <math display="inline">K</math>, which may vary for different images.
* The method assumes diagonal covariance matrices for computational efficiency, which may oversimplify real-world color distributions. Sometimes the colors in the recolored images are not very natural.
* The high computational complexity in the optimization step of this algorithm may be difficult for real-time applications.

==== Segmentation Guided Recoloring Method ====
One interesting method we found using machine learning algorithms is based on semantic segmentation. Chatzistamatis et al. [12] introduced a recoloring approach for digitized art paintings to enhance color perception for individuals with protanopia and deuteranopia. A key component of their method involves semantic segmentation, guided by transfer learning, to identify and preserve important visual elements in art paintings.

The segmentation process leverages the Mask R-CNN architecture, utilizing transfer learning to adapt from natural image datasets to the domain of art paintings. This adaptation involves the following steps:

1. Preprocessing and augmentation: images are preprocessed with techniques such as horizontal flips, random cropping, Gaussian blur, and affine transformations to enhance the diversity of the training set.

2. Feature extraction: the ResNet-101 backbone, pre-trained on the ImageNet dataset, extracts features from input images. Lower layers detect basic features like edges, while higher layers identify complex structures such as objects within paintings.

3. Region proposal: the Region Proposal Network (RPN) identifies regions of interest (RoIs) in the feature maps using sliding windows. These RoIs are further refined into accurate object boundaries.

4. Object masking: masks for the identified objects are generated to preserve fine-grained details of the paintings.

This semantic segmentation process divides the image pixels into two disjoint sets:
<math display="inline">T_V</math>, pixels belonging to segmented objects, and <math display="inline">T_U</math>, background pixels outside segmented objects.

By separating these sets, the algorithm focuses recoloring efforts on regions that are visually significant while maintaining the natural appearance of the background.

The recoloring process modifies the colors in the segmented object set <math display="inline">T_V</math> while leaving the background set <math display="inline">T_U</math> largely intact. Key steps include:

1. Color simulation: colors in <math display="inline">T_V</math> are transformed to simulate the perception of dichromatic viewers, enabling identification of indistinguishable colors.

2. Color clustering: fuzzy c-means clustering groups colors into clusters for efficient manipulation. Cluster centers, or "key colors," are adjusted to reduce color confusion while preserving visual coherence.

3. Recoloring optimization: an objective function is minimized to enhance contrast and naturalness, similar to the objective functions in the methods mentioned above:
<math display="block">
E = E_1 + E_2 + cE_3,
</math>
where:
where:
* <math display="inline">E_1</math> preserves the contrast between object colors and background colors:
<math display="block">
E_1 = \sum_{p \in T_V, q \in T_U} \|f(p) - f(q)\| - \|p - q\|,
</math>
* <math display="inline">E_2</math> enhances contrast within object colors:
<math display="block">
E_2 = \sum_{p, q \in T_V} \|f(p) - f(q)\| - \|p - q\|,
</math>
* <math display="inline">E_3</math> minimizes the perceptual difference between original and recolored key colors:
<math display="block">
E_3 = \sum_{p \in T_V} \|f(p) - p\|^2.
</math>

Here:
* <math display="inline">p</math> and <math display="inline">q</math> are pixel values in the image.
* <math display="inline">f(p)</math> is the recolored pixel value for pixel <math display="inline">p</math>.
* <math display="inline">\| \cdot \|</math> represents the Euclidean distance in the perceptual color space.
* <math display="inline">c</math> is a weighting factor that controls the importance of naturalness preservation relative to contrast enhancement.

While the method effectively balances contrast and naturalness for CVD viewers, it has several limitations:
* The success of segmentation relies heavily on the pre-trained Mask R-CNN model, which may not generalize well to all styles of art or real-life images.
* Semantic segmentation and optimization introduce significant computational overhead, making the method slow and less suitable for real-time applications.
* Errors in segmentation are difficult to control and may lead to misclassification of visually important regions, resulting in suboptimal recoloring.
* The focus of this method is restricted to protanopia and deuteranopia and without any flexibility for personalization.

=== Deep Learning based methods ===
Conventional methods for recoloring, including optimization-based approaches (as discussed above), fail to generalize well across varying severity levels and CVD types. While these methods improve color differentiation, they frequently compromise naturalness or require extensive computational resources, making them less suitable for real-time, efficient, personalized applications.

==== GAN-Based Recoloring for CVD ====

In [1] GANs (Generative Adversarial Networks) was explored for recoloring, with a backbone Pix2Pix-GAN, Cycle-GAN, and Bicycle-GAN structure showing promising results. These models are generate creative recolored images by learning mappings between normal and CVD-affected color spaces. However, this and existing GAN approaches struggle with balancing naturalness and contrast. This specific reference also requires paired datasets (since it is adapted from style transfer), making it computationally intensive and less suitable for personalization.

==== Swin Transformer Recoloring ====

The authors in [2] introduced a hierarchical vision transformer (SWIN) architecture that processes images through shifted windows, effectively capturing both local and global contextual information. In computer vision, this design generally allows efficient handling of high-resolution images and has been applied to various tasks, including image classification and object detection. Despite its robust performance, this architecture is still computationally intensive and does not inherently account for the specific needs of CVD individuals, as it lacks mechanisms for personalized color adjustments.

==== Personalized CVD-GAN ====

To cater to the diverse needs of the CVD population, the Personalized CVD-GAN [3] was developed. This model generates images that are not only CVD-friendly but also tailored to individual degrees of color vision deficiency. By disentangling color representations using a unique triple-latent structure in their method, continuous personalization was possible to adjust images according to specific CVD severities. While effective, this approach is computationally demanding, making it less practical for real-time applications. In our experiment, it took around 18 days for one epoch (or one iteration over the entire dataset).

Thus, existing methods either lack personalization or are too resource-intensive for widespread use.

== Methods ==
We aim to find effective and efficient ways to recolor images for people with CVD with the personalization of different severity levels. We start by exploring existing methods and identifying opportunities for improvement. Since mathematical-based approaches provide a solid foundation and are well-documented, we began our experiments by testing these methods, as described in the background. We later extended our exploration to deep learning based methods.

=== Mathematical based ===
We explored four main methods, building on the foundational work discussed in the background section.

==== Method 1: Daltonization as a baseline ====

=== Deep Learning based ===

==== Task Overview ====
Given an input RGB image and a label for the user (as shown in the figure), we want a deep learning model to output a recolored RGB image that is specific to that user. More details on inputs and outputs are discussed in further sections but an overview is shown in Figure 1. All of the code was done in Python using a deep learning framework called [https://pytorch.org PyTorch]
[[File:Io.png|right|thumb|200px|Figure 1: Dataset]]

==== Types ====
1. ''' Supervised methods ''':
These are deep learning models that require a 'ground truth' recolored image for the neural network to learn recolorization. While these methods are simple, easy to train and integrate the user label, they require an already present ground truth comparison of expected output.

2. ''' Unsupervised methods ''':
These models are trained without a ground truth and can also encode user label information while training. They are generally better at generating more natural images, but they require more compute and sophisticated model architectures or loss functions for the recoloring task

==== Dataset ====
The dataset used for this project was constructed specifically to address the challenges of recoloring images for individuals with color vision deficiency (CVD). We first gathered an open-source RGB image dataset from [2]. To improve the capability of the proposed model to enhance the contrast between CVD-indistinguishable color
pairs, in their study, they created a new dataset consisting of 141,000 pictures of both natural scenes and artificial images containing
CVD-confusing colors without labels. To generate labels (and ground truth recolored images for supervised methods), we randomly sampled 15,000 images and recolored by simulating random labels for severity and type of CVD. The recoloring for ground truth images was done using a [https://github.com/jbhuang0604/RecolorForColorblind/tree/master MATLAB script] (adapted to Python) from [4]. Note: The open-source tools used in the Python version for the recoloring script were [https://scikit-image.org Scikit-Image], [https://scipy.org Scipy] and [https://python-colormath.readthedocs.io/en/latest/ Colormath].

As shown in Figure 1, each sample in the dataset consists of:

1. ''' Original RGB Image''' : High-resolution images, resized to <code> 256x256</code> pixels and normalized to <code>[0,1]</code> range, representing the standard color space.

2. ''' CVD Labels ''' : Condition labels encoded as <code>severity * [protan, deutan]</code>, where severity ranges from 0.1 to 1.0. For example, a label <code>[0.6, 0]</code> corresponds to protanopia at 60% severity.

Data augmentation techniques such as random rotations, crops, and brightness adjustments were applied to expand the dataset, ensuring robust model generalization across diverse scenarios.

==== Supervised Methods ====
===== Conditional Parallel RGB MLP =====
[[File:mlp.png|right|thumb|Figure 2: Conditional MLP architecture]]
As shown in Figure 2, the model predicts the R, G, and B channels separately using an independent multi-layer perceptron (MLP) for each channel. The input image is concatenated with the label encoding along the channel dimension and is passed to 3 parallel MLPs simultaneously. These parallel networks are learned to predicted R, G, B channels of a recolored image based on given ground truth. The outputs from each of these networks are concatenated to produce the recolored RGB image of same spatial dimensions as input. Essentially, each channel is disentangled, enabling targeted adjustments.

The loss function used to train was pixel wise, mean-squared error loss:
<math>
\mathcal{L}_{\text{MSE}} = \frac{1}{N} \sum_{p=1}^{N} \left( I(p) - I'(p) \right)^2
</math>

Where:
* I, I': Recolored (model output) image and ground truth recolored image respectively
* p: Image index
* N: Total number of images

===== Conditional U-Net =====
In a similar fashion of inputs, a convolutional neural network (CNN)-based U-Net architecture was tested to generate a full recolored image as output. The conditional inputs here affect both the encoder and decoder. [[File:Unet condtional.png|right|thumb|Figure 3: Conditional U-Net architecture]]
U-Nets are widely used in computer vision tasks and are very robust to new tasks as well. The architecture we adopted is shown in Figure 3.
The loss function used to train the U-Net was a commonly used VGG Perceptual Loss:
<math>
\mathcal{L}_{\text{VGG}} = \sum_{l} \frac{1}{N_l} \| \phi_l(I) - \phi_l(I') \|_2^2
</math>

Where:
* I and I': are recolored (model output) and ground recolored images respectively
* <math>\phi_l</math> is the l-th of the pre-trained VGG network

==== Unsupervised Methods ====
===== Conditional Autoencoder =====
As shown in Figure4, an unsupervised CNN-based encoder-decoder network was trained to reconstruct full recolored images with a CVD-aware color palette. The key to making this network align with the recoloring task was the loss functions. The loss functions we used to train this network were inspired from [2]. [[File:Ae.png|right|350px|thumb|Figure 4: Conditional Autoencoder architecture]]

The total loss function is given by:
<math>
\mathcal{L}_{\text{total}} = \alpha \cdot \mathcal{L}_{\text{naturalness}} + 2 \cdot (1 - \alpha) \cdot \mathcal{L}_{\text{contrast}}
</math>

Where:
<math>
\mathcal{L}_{\text{contrast}} = \beta \cdot \mathcal{L}_{\text{global}} + (2 - \beta) \cdot \mathcal{L}_{\text{local}}
</math>

The components of the loss functions are described below:

1. '''Global Contrast Loss''':
The global contrast loss ensures that the overall contrast of the recolored image is preserved. It is defined as
<math>
\mathcal{L}_{global} = \frac{1}{\|\omega\|} \sum_{<x, y> \in \epsilon \omega} \text{CL}(x, y)
</math>

2. '''Local Contrast Loss''':
The local contrast loss focuses on preserving the contrast within a small neighborhood around each pixel. <math>
\mathcal{L}_{l} = \frac{1}{N} \sum_{x=1}^{N} \sum_{y \in \omega_x} \frac{\text{CL}(x, y)}{\|\omega_x\|}
</math>

Note:

<math>
\text{CL}(x, y) = \|\hat{c}_x' - \hat{c}_y'\| - \|c_x - c_y\|
</math>

* x,y: Two distinct pixels in the image.
* cx and cy: CVD simulated colors of original image
* c^x′and c^y: CVD simulated colors of recolored image (model output)
* ||w||: Global (or large) window of image
* ||wx||: Local window or neighborhood around a pixel x

3. '''Naturalness Loss''':
The naturalness loss drives output image to have colors that are visually similar and close to natural distributions. <math>
\mathcal{L}_{\text{natural}} = 1 - \text{SSIM}(I', I)
</math>

Where:
* I(i), I'(i): Original and recolored images respectively

== Results ==
=== Deep Learning based methods ===
The results focus on evaluating the performance of the above neural network architectures—Conditional Parallel RGB MLP, Deep U-Net, and Conditional Autoencoder. Quantitive metrics such as Structural Similarity Index (SSIM), total color contrast (TCC), Chromatic Difference (CD), and inference time were used to assess the effectiveness of the models provided in [1] and [2].

==== Qualitative Results ====
The recolored outputs were visually evaluated to determine their alignment with expected results. The 'expected' results for supervised mean how closely they resemble ground truth recolored image and for unsupervised method mean how much contrast and naturalness is observed in the CVD simulated recolored images compared to original.
The results and takeaways can be summarized as follows:

1. '''Conditional Parallel RGB MLP''': (Figure 5)
[[File:Mlp_res.png|right|400px|thumb|Figure 5 Conditional MLP: Model failure]]
* Recoloring was inconsistent, with visible artifacts in regions where spatial correlations were essential.
* The pixels seemed more discretized, suggesting that disentanglement was not very useful for this case (especially naturalness).
* Failed to preserve natural color transitions, particularly in complex images.
2. '''Conditional U-Net''': (Figure 6, 7)
[[File:Unet_res1.png|right|400px|thumb|Figure 6 Conditional U-Net: Model failure]]
[[File:Unet_res2.png|right|400px|thumb|Figure 7 Conditional U-Net: CVD Simulated examples]]
* Produced stable recoloring, preserving structural details.
* Initially showed improvement towards resembling ground truth, but over time started 'reconstructing' the colors of the original image.
* The CVD simulations of recolored versus original were similar or worse meaning that the model was not doing well for this task
* Sometimes it over-saturated some colors, affecting the visual appeal.
3. '''Conditional Autoencoder''': (Figure 8, 9)
[[File:ae_res1.png|right|400px|thumb|Figure 8 Conditional Autoencoder: Majority good results]]
[[File:ae_res1.png|right|400px|thumb|Figure 9 Conditional Autoencoder: Marginal or negative improvement + Blurriness]]
* Achieved smooth and natural recoloring, with fewer artifacts.
* Showed the highest contrast improvement among the three models.
* In some cases, hurt the contrast in the CVD simulated colors and in some there was marginal improvement in contrast.
* Blurriness in the recolored images was seen (possibly due to naturalness factor being more prioritized even though weight coefficients in the loss term favored contrast (alpha = 0.25, beta = 1.0)).

==== Quantitative Results ====
Based on the above qualitative results, we decided to score and evaluate metrics for comparison with related work only using the Conditional Autoencoder.
As mentioned above, the evaluation metrics are adapted from [1] and [2]. Please refer to the definitions in the paper, as we have used the same. On a high level, the three components are:
* SSIM: Measures the structural similarity between the original and recolored images, ensuring the structural integrity of the recolored image is maintained.
<math>
SSIM(X, Y) = \frac{(2\mu_X\mu_Y + c_1)(2\sigma_{XY} + c_2)}{(\mu_X^2 + \mu_Y^2 + c_1)(\sigma_X^2 + \sigma_Y^2 + c_2)}
</math>

* Total Color Contrast: Quantifies the visibility improvement between indistinguishable colors for CVD individuals.
<math>
TCC = \frac{1}{n_1} \sum_{(i,j) \in \Omega_1} |x_i - x_j|
+ \frac{1}{N \cdot n_2} \sum_{i=1}^{N} \sum_{j \in \Omega_2} |x_i - x_j|
</math>
* Chromatic Difference: Quantifies the perceptual differences in color before and after recoloring, ensuring enhanced distinguishability
<math>
CD(i) = \sqrt{\lambda (l_i' - l_i)^2 + (a_i' - a_i)^2 + (b_i' - b_i)^2}
</math>
(lamda is a constant, not wavelength and l,a,b represent LAB space coordinates of recolored (') and original respectively.)
* Inference Time: Determines the computational efficiency of the models.

The key results are in Table 1 and takeaways for the Conditional Autoencoder can be summarized as follows:

{| class="wikitable" style="text-align:center; width:30%; margin:auto;"
|-
! Metric
! Value
|-
| Inference Time
| 2.6 seconds/image
|-
| SSIM ("Structure")
| 0.8707
|-
| Total Color Contrast ("Distinguishability")
| 0.5771 / (~0.851)*
|-
| Chromatic Difference ("Color")
| 0.3521 / (~0.963)*
|+ '''Table 1: Quantitative Evaluation Results'''
|}

Note: * indicates results from paper [2] for protan/deutan whichever is larger.

* TCC and CD are good but not as good as paper [2] because they use optimize networks for each CVD type separately.
* Blurry (SSIM is not optimized for enough)
* Mixing CVD types in the same network needs to be more sophisticated

== Conclusions ==
Through our (many) experiments, we learned a couple of things:

1. '''Model Effectiveness''':
Among the models, the Conditional Autoencoder showed the best balance between enhancing color contrast and preserving naturalness. It improved the distinguishability of colors for CVD individuals while maintaining a smooth, visually appealing output. However, it produced slightly blurry images, which could be improved with better loss functions or refinement techniques. The Conditional U-Net was also effective in preserving structure and providing stable recoloring, but it required careful training to avoid overfitting. The Conditional Parallel RGB MLP, while computationally fast, lacked the ability to capture spatial relationships between pixels, making it unsuitable for this task.

2. '''Importance of Loss Functions''':
Designing appropriate loss functions was crucial for achieving the right balance between naturalness, contrast enhancement, and structural preservation. The global and local contrast losses significantly improved the visibility of recolored images, while the naturalness loss ensured that the outputs did not look artificial. Incorporating metrics like SSIM and Chromatic Difference into the evaluation also helped us better understand how well the models performed.

3. '''Challenges with Data''':
One of the biggest challenges was ensuring that the dataset effectively represented real-world scenarios for CVD individuals. Simulating CVD perceptions and generating recolored images that matched those perceptions required a well-defined pipeline. A more diverse dataset or additional user studies with CVD participants could help fine-tune the models further.

4. '''Computational Efficiency''':
While models like the Conditional Autoencoder and Conditional U-Net provided high-quality recoloring, their inference times were moderate, making them feasible for real-time applications. Optimizing these models further could make them more scalable for real-world use cases, such as accessibility tools in apps or websites.

5. '''What Worked and What Didn’t''':
* Worked: Contrast enhancement methods using local and global losses were effective in improving visibility for CVD individuals. Transformer-inspired loss functions borrowed from Swin architecture added robustness.
* Didn’t Work: Pixel-wise methods like the Conditional RGB MLP struggled due to their inability to handle spatial dependencies. Additionally, overfitting was a recurring issue in larger architectures without careful training.

6. '''Future Directions''':
* Better Loss Functions: Refining the loss functions to address issues like blurriness in outputs could further improve results.
* User Studies: Testing the models with real CVD participants would provide valuable insights and help validate the results.
* Model Optimization: Reducing the computational cost of high-performing models like the Conditional Autoencoder could make them more practical for deployment.
* Exploration of New Architectures: Trying newer methods, such as lightweight transformers or diffusion-based models, might enhance recoloring performance while maintaining efficiency.

While there’s still room for improvement, our models demonstrated the potential of deep learning in addressing the challenges faced by individuals with CVD. Our future work would focus on refining these methods and bringing them closer to practical, everyday applications.

== References ==
[1] Li, H., Zhang, L., Zhang, X., Zhang, M., Zhu, G., Shen, P., ... & Shah, S. A. A. (2020). Color vision deficiency datasets & recoloring evaluation using GANs. Multimedia Tools and Applications, 79, 27583-27614.

[2] Chen, L., Zhu, Z., Huang, W., Go, K., Chen, X., & Mao, X. (2024). Image recoloring for color vision deficiency compensation using Swin transformer. Neural Computing and Applications, 36(11), 6051-6066.

[3] Jiang, S., Liu, D., Li, D., & Xu, C. (2023). Personalized image generation for color vision deficiency population. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22571-22580).

[4] Huang, J.-B., Chen, C.-S., Jen, T.-C., & Wang, S.-J. (n.d.). Image recolorization for the colorblind [GitHub repository]. Retrieved December 12, 2024, from https://github.com/jbhuang0604/RecolorForColorblind

[5] Dietrich, J. (n.d.). Daltonize Python Package [GitHub repository]. Retrieved December 12, 2024, from https://github.com/joergdietrich/daltonize/blob/main/daltonize/daltonize.py

[6] Dougherty, B., & Wade, A. (2000). Vischeck. Retrieved December 12, 2024, from https://www.vischeck.com/

[7] Brettel, H., Viénot, F., & Mollon, J. D. (1997). Computerized simulation of color appearance for dichromats. Josa a, 14(10), 2647-2655.

[8] Zhu, Z., Toyoura, M., Go, K., Fujishiro, I., Kashiwagi, K., & Mao, X. (2019). Processing images for red–green dichromats compensation via naturalness and information-preservation considered recoloring. The Visual Computer, 35, 1053-1066.

[9] Zhu, Z., Toyoura, M., Go, K., Kashiwagi, K., Fujishiro, I., Wong, T. T., & Mao, X. (2021). Personalized image recoloring for color vision deficiency compensation. IEEE Transactions on Multimedia, 24, 1721-1734.

[10] Tsekouras, G. E., Rigos, A., Chatzistamatis, S., Tsimikas, J., Kotis, K., Caridakis, G., & Anagnostopoulos, C. N. (2021). A novel approach to image recoloring for color vision deficiency. Sensors, 21(8), 2740.

[11] Huang, J. B., Chen, C. S., Jen, T. C., & Wang, S. J. (2009, April). Image recolorization for the colorblind. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 1161-1164). IEEE.

== Appendix I ==
* [https://github.com/rainasong/psych221-aut24-final-project.git Code]
* [https://drive.google.com/drive/folders/10WMXPbtpV7Hy5_qBA_TCEbW-kCpj1D7v Dataset]

=== Additional results ===
1. '''Recolored Images - Conditional Autoencoder'''
<div style="display: inline; width: 220px; float: center;">
[[File:eb_1.png|400 px|Wikipedia encyclopedia]][[File:eb_2.png|400 px]] </div>

2. '''Loss curves'''
<div style="display: inline; width: 800px; float: center;">
[[File:loss_ae.png|300 px|center|thumb|Losses - Conditional Autoencoder]][[File:loss_unet.png|300 px|thumb|center|Losses - Conditional U-Net]][[File:loss_mlp.png|300 px|center|thumb|Losses - Conditional MLP]]</div>

== Appendix II ==
'''Ishikaa''':
* Training, evaluation and visualization for all deep learning methods (MLP, U-Net and Autoencoder)
* GMM recoloring method in Python & adding severity index
* 'Ground Truth' dataset creation and logging
* AWS Compute setup & configuration
* Written Report & Presentation

'''Raina''':

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-13T05:11:28Z

Rainas:

== Introduction ==
Color Vision Deficiency (CVD) affects approximately 350 million individuals worldwide, impairing their ability to distinguish certain colors. Image recoloring for individuals with CVDs has been a well-researched area, with numerous attempts aimed at creating images that make colors more distinguishable for those with CVDs. CVDs are typically classified into three main types: protanopia (difficulty perceiving red), deuteranopia (difficulty perceiving green), and tritanopia (difficulty perceiving blue). These conditions arise due to the absence or malfunction of certain types of cone cells in the retina, which are responsible for color vision. For instance, the lack or defect of red or green cones leads to protanopia or deuteranopia, respectively. This can result in partial or complete loss of perception of specific colors. Moreover, individuals with CVD can be dichromats (completely missing one type of cone cell), or anomalous trichromats (having all three types of cones but with altered sensitivity), causing milder color perception issues. Most research has focused on protanopia and deuteranopia, as these conditions are more prevalent, and only a few consider different severity levels.

In this work, we aim to consider all three types of CVDs, taking into account varying levels of severity for personalization. We explore existing methods in the field and experiment through two main approaches: mathematical transformations and deep learning techniques. We will start by reviewing current advancements in these two domains, followed by presenting our experiments and results. Evaluations of each method will be provided, leading to a discussion of our findings and outlining potential directions for future work.

The motivation for this work arose from a personal experience. While admiring a rainbow, a phenomenon I find among the most beautiful in the world, with a friend who has deuteranopia, I realized that they were unable to distinguish the vibrant array of colors. This experience highlighted the emotional and perceptual gap caused by CVD, inspiring the goal of this project: to develop personalized and efficient tools that enhance color perception for individuals with CVDs. Ultimately, we aim to enable those with CVDs to enjoy the same vivid experiences, such as the beauty of a rainbow, experienced by those with normal color vision.

== Background ==
In recent years, numerous methods have been developed to recolor images for individuals with CVDs, ranging from traditional mathematical approaches to advanced deep learning techniques. This section focuses on the prominent recent works in these two categories.

=== Mathematical-based methods ===
Mathematical approaches to image recoloring for individuals with CVDs have been extensively developed to enhance color discrimination while trying to preserve the natural appearance of images. These methods typically involve color space transformations, optimization techniques, and perceptual modeling to achieve their objectives.

==== Daltonization ====
Daltonization enhances images for individuals with CVD by correcting colors based on the simulated deficiency. The process involves comparing the original LMS values with the simulated deficient values to compute the error:
<math display="block">
\text{Error}_{\text{LMS}} = \text{LMS}_{\text{original}} - \text{LMS}_{\text{simulated}}
</math>

The error is then mapped back to the RGB space using a correction matrix. For example, the correction matrix for protanopia, as implemented in tools like Vischeck [6], is:
<math display="block"> \text{Correction Matrix for Protanopia} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

The corrected RGB values are added back to the original LMS values to generate a daltonized image that improves contrast for CVD viewers.

The simulation of CVDs relies on the physiology of human vision, particularly the responses of the Long (L), Medium (M), and Short (S) wavelength-sensitive cones in the retina. The LMS color space is derived from the spectral sensitivities of these cones, making it an ideal framework for modeling human color perception.

To simulate CVD, colors are first transformed into the LMS color space using the following linear transformation matrix based on Stockman and Sharpe’s cone fundamentals:
<math display="block">
T_{\text{RGB-to-LMS}} = \begin{bmatrix}
0.3904725 & 0.54990437 & 0.00890159 \\
0.07092586 & 0.96310739 & 0.00135809 \\
0.02314268 & 0.12801221 & 0.93605194
\end{bmatrix}
</math>

For individuals with CVD, the missing cone’s response is replaced by a weighted combination of the remaining two cones. This approach, introduced by Brettel, Viénot, and Mollon (1997) [7], uses specific coefficients derived from cone sensitivities. For example, in protanopia (L-cone deficiency), the L-cone response is approximated using the M- and S-cone responses as:
<math display="block">
L_{\text{simulated}} = 0 \cdot L + 0.90822864 \cdot M + 0.008192 \cdot S
</math>

For deuteranopia (M-cone deficiency), the M-cone is replaced as:
<math display="block">
M_{\text{simulated}} = 1.10104433 \cdot L + 0 \cdot M - 0.00901975 \cdot S
</math>

For tritanopia (S-cone deficiency), the S-cone is replaced as:
<math display="block">
S_{\text{simulated}} = -0.15773032 \cdot L + 1.19465634 \cdot M + 0 \cdot S
</math>

These transformations allow accurate simulation of the perceptual experience of individuals with CVD. (The numbers are derived from [5]).

==== Optimization-based Method ====
Zhu et al. [8] introduced an optimization-based recoloring framework for red-green dichromacy, aiming to balance naturalness and contrast. The framework minimizes a total loss function defined as:

<math display="block"> E = \beta E_{\text{nat}} + E_{\text{cont}} </math>

where <math>\beta</math> is a scalar weight that controls the trade-off between the two objectives: naturalness preservation (<math>E_{\text{nat}}</math>) and contrast enhancement (<math>E_{\text{cont}}</math>).

The naturalness term, <math>E_{\text{nat}}</math>, ensures that the recolored image closely resembles the original image for CVD viewers by minimizing perceptual differences:

<math display="block"> E_{\text{nat}} = \sum_{i=1}^N \| c_i^+ - c_i \|^2, </math>

where:
* <math>N</math> is the total number of pixels in the image,
* <math>c_i</math> is the original color of the <math>i</math>-th pixel,
* <math>c_i^+</math> is the recolored value of the <math>i</math>-th pixel,
* <math>\| c_i^+ - c_i \|</math> is the Euclidean distance, measuring the perceptual difference between the original and recolored colors.

The contrast term, <math>E_{\text{cont}}</math>, enhances the distinguishability of colors in the recolored image by minimizing changes in color contrast:

<math display="block"> E_{\text{cont}} = \sum_{i \neq j} \| (c_i^+ - c_j^+) - (c_i - c_j) \|^2, </math>

where:
* <math>(c_i^+ - c_j^+)</math> is the perceived color difference between pixels <math>i</math> and <math>j</math> after recoloring,
* <math>(c_i - c_j)</math> is the original color difference,
* <math>\| (c_i^+ - c_j^+) - (c_i - c_j) \|</math> represents the deviation in color contrast before and after recoloring.

To address the limitations of this approach, Zhu et al. [9] proposed a degree-adaptable framework incorporating a transformation matrix <math>T</math> that simulates CVD perception. The transformation matrix is defined as:

<math display="block"> T = \begin{bmatrix} t_{11} & t_{12} & t_{13} \\ t_{21} & t_{22} & t_{23} \\ t_{31} & t_{32} & t_{33} \end{bmatrix}, </math>

where <math>t_{ij}</math> are the elements representing the relationships between the original and perceived LMS (Long, Medium, Short wavelength) cone responses for individuals with CVD.

The degree-adaptable loss function extends the optimization by adjusting weights based on perceptual importance, defined as:

<math display="block"> E = \beta \sum_{i=1}^N \alpha_i \| T(c_i^+ - c_i) \|^2 + \sum_{i \neq j} \| T(c_i^+ - c_j^+) - T(c_i - c_j) \|^2. </math>

Here:
* <math>\alpha_i</math> assigns weights to each pixel, prioritizing the preservation of colors with smaller perception errors,
* <math>\| T(c_i^+ - c_i) \|</math> measures the perceptual difference after recoloring,
* <math>\| T(c_i^+ - c_j^+) - T(c_i - c_j) \|</math> quantifies the deviation in color contrast under CVD simulation.

This framework improves both contrast and personalization but requires further optimization for real-time performance.

==== Confusion lines based Method ====
Tsekouras et al. [10] proposed a novel image recoloring approach for individuals with protanopia and deuteranopia, focusing on improving color naturalness and enhancing contrast. Their framework consists of four modules, with a key focus on shifting confusing colors along confusion lines in the CIE 1931 chromaticity diagram.

The method begins with fuzzy clustering to extract representative colors (key colors) from the input image. These colors are mapped onto the CIE 1931 chromaticity diagram, where confusion lines represent loci of colors perceived as identical by individuals with CVD. Confusion lines are defined using the copunctal point of the missing cone type and another reference point:

<math display="block">
d(v, L) = \frac{\left|(x_{cp} - x_0)(y_0 - y_v) - (x_0 - x_v)(y_{cp} - y_0)\right|}{\sqrt{(x_{cp} - x_0)^2 + (y_{cp} - y_0)^2}},
</math>

where:
* <math display="inline">v = (x_v, y_v)</math> is the chromaticity coordinate of the color,
* <math display="inline">L</math> is the confusion line passing through the copunctal point <math display="inline">(x_{cp}, y_{cp})</math> and another reference point <math display="inline">(x_0, y_0)</math>,
* <math display="inline">d(v, L)</math> measures the perpendicular distance from the point <math display="inline">v</math> to the confusion line <math display="inline">L</math>.

Confusing colors, identified as key colors lying on occupied confusion lines, are iteratively shifted to the nearest non-occupied confusion lines to enhance discriminability for CVD viewers. The translation process involves:

1. Ranking key colors by their cluster sizes:
<math display="block">
\text{rank}(v_i) = \frac{|A_i|}{\sum_{j=1}^{n_A}|A_j|},
</math>

where:
* <math display="inline">v_i</math> is the chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">|A_i|</math> is the cardinality (number of pixels) of its associated cluster,
* <math display="inline">n_A</math> is the total number of clusters.

2. Translating the highest-ranked confusing color <math display="inline">v^*</math> to its projection on the nearest non-occupied confusion line:
<math display="block">
v^*_{\text{tr}} = \text{proj}(v^*, L^*),
</math>

where:
* <math display="inline">v^*_{\text{tr}}</math> is the new position of the color <math display="inline">v^*</math> after translation,
* <math display="inline">L^*</math> is the nearest non-occupied confusion line, determined as:
<math display="block">
d(v^*, L^*) = \min_{L \in \text{CL}_D} d(v^*, L).
</math>

3. Updating the sets of confusing colors and non-occupied confusion lines iteratively:
<math display="block">
\Phi_V = \Phi_V - \{v^*\}, \quad \text{CL}_D = \text{CL}_D - \{L^*\}.
</math>

where:
* <math display="inline">\Phi_V</math> is the set of confusing colors,
* <math display="inline">\text{CL}_D</math> is the set of non-occupied confusion lines.

After shifting, the luminance of the recolored key colors is optimized using a regularized objective function to balance naturalness and contrast:
<math display="block">
E = (E_1 + E_2) + \lambda E_3,
</math>

where:
* <math display="inline">E</math> is the total loss,
* <math display="inline">\lambda</math> is a weight parameter controlling the trade-off between contrast enhancement and naturalness preservation.

The first term, <math display="inline">E_1</math>, measures contrast enhancement for normal trichromats:

<math display="block">
E_1 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - b_j\| - \|f_D(a_{i,\text{rec}}) - f_D(b_j)\| \right|,
</math>

where:
* <math display="inline">n_A</math> and <math display="inline">n_B</math> are the number of key colors in clusters <math display="inline">A</math> and <math display="inline">B</math>, respectively,
* <math display="inline">a_i</math> is the chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">b_j</math> is the chromaticity of the <math display="inline">j</math>-th key color in cluster <math display="inline">B</math>,
* <math display="inline">f_D</math> is a function simulating the dichromatic vision of individuals with color vision deficiencies,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color.

The second term, <math display="inline">E_2</math>, measures contrast enhancement for dichromats:

<math display="block">
E_2 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - a_j\| - \|f_D(a_{i,\text{rec}}) - f_D(a_{j,\text{rec}})\| \right|,
</math>

where:
* <math display="inline">a_i</math> and <math display="inline">a_j</math> are the chromaticities of the <math display="inline">i</math>-th and <math display="inline">j</math>-th key colors in cluster <math display="inline">A</math>,
* <math display="inline">f_D(a_{i,\text{rec}})</math> simulates the dichromatic perception of the recolored chromaticity <math display="inline">a_{i,\text{rec}}</math>.

The third term, <math display="inline">E_3</math>, preserves the naturalness of the recolored image:

<math display="block">
E_3 = \frac{1}{n_A} \sum_{i=1}^{n_A} \|a_i - a_{i,\text{rec}}\|,
</math>

where:
* <math display="inline">a_i</math> is the original chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">\|a_i - a_{i,\text{rec}}\|</math> is the Euclidean distance between the original and recolored chromaticities, measuring how much the naturalness is preserved.

This method significantly enhances the contrast and naturalness of recolored images by leveraging confusion line geometry and regularized optimization. However, challenges remain in achieving real-time performance and handling cases where shifting may distort the aesthetic quality of the image.

==== GMM-based Method ====
Huang et al. [11] proposed an efficient and effective re-coloring algorithm for individuals with CVD using a Gaussian Mixture Model (GMM) to represent color distributions. The algorithm comprises four main steps: feature extraction, clustering using GMM, optimization of Gaussian components, and interpolation for recoloring.

Step 1 - Feature Extraction:
Each pixel in the input image is represented in the CIEL*a*b* color space, which approximates perceptual differences using the Euclidean distance between colors. The color feature vector <math display="inline">x</math> is used as input for clustering.

Step 2 - Clustering via GMM:
The color distribution of the image is modeled using a GMM with <math display="inline">K</math> Gaussian components:
<math display="block">
p(x|\Theta) = \sum_{i=1}^K \omega_i G_i(x|\theta_i),
</math>
where:
* <math display="inline">\Theta</math> is the parameter set containing all weights, means, and covariance matrices,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian,
* <math display="inline">G_i(x|\theta_i)</math> is the 3D normal distribution with parameters <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix).

Diving into more details, the parameters of the GMM are initialized using the K-means algorithm and refined via the Expectation-Maximization (EM) algorithm, which consists of the E-step and the M-step:

The E-step calculates the probability of each color (or pixel) belonging to a specific Gaussian component in the GMM. This probability, also known as the "responsibility," reflects how much each Gaussian contributes to the representation of a color:

<math display="block">
p(i|x_j, \Theta^{\text{old}}) = \frac{\omega_i G_i(x_j|\theta_i)}{\sum_{k=1}^K \omega_k G_k(x_j|\theta_k)}.
</math>

Here:
* <math display="inline">p(i|x_j, \Theta^{\text{old}})</math> is the probability of the <math display="inline">j</math>-th color feature <math display="inline">x_j</math> belonging to the <math display="inline">i</math>-th Gaussian component,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian, representing its overall contribution to the color distribution,
* <math display="inline">G_i(x_j|\theta_i)</math> is the Gaussian distribution for the <math display="inline">i</math>-th component, evaluated at <math display="inline">x_j</math>, where <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix),
* <math display="inline">\sum_{k=1}^K \omega_k G_k(x_j|\theta_k)</math> normalizes the probabilities by considering the contributions of all <math display="inline">K</math> Gaussians to the <math display="inline">j</math>-th pixel.

This step essentially assigns each pixel a "soft" membership to each Gaussian component, rather than forcing a hard clustering decision. Pixels that are close to a Gaussian's mean (in feature space) will have higher probabilities of belonging to that Gaussian.

The M-step updates the parameters of each Gaussian component based on the probabilities computed in the E-step. These updates refine the Gaussian model to better fit the data:

1. Update the mixing weights:
<math display="block">
\omega_i^{\text{new}} = \frac{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}})}{N},
</math>
This equation calculates the proportion of pixels assigned to the <math display="inline">i</math>-th Gaussian. It reflects how dominant each Gaussian is in representing the color distribution.

2. Update the means:
<math display="block">
\mu_i^{\text{new}} = \frac{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}}) x_j}{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}})},
</math>
This equation computes the new mean vector <math display="inline">\mu_i^{\text{new}}</math> for the <math display="inline">i</math>-th Gaussian. It is a weighted average of all pixel feature vectors <math display="inline">x_j</math>, where the weights are the probabilities <math display="inline">p(i|x_j, \Theta^{\text{old}})</math>. Pixels with higher probabilities contribute more to the new mean.

3. Update the covariance matrices:
<math display="block">
\Sigma_i^{\text{new}} = \frac{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}}) (x_j - \mu_i^{\text{new}})(x_j - \mu_i^{\text{new}})^T}{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}})}.
</math>
This equation calculates the new covariance matrix <math display="inline">\Sigma_i^{\text{new}}</math> for the <math display="inline">i</math>-th Gaussian. It measures the spread of pixel features around the new mean, weighted by the probabilities from the E-step.

Step 3 - Optimization:
To ensure color distinguishability for CVD viewers, the algorithm adjusts the mean vector of each Gaussian component using an optimization function that preserves the symmetric Kullback-Leibler (KL) divergence:
<math display="block">
D_{sKL}(G_i, G_j) = D_{KL}(G_i \| G_j) + D_{KL}(G_j \| G_i),
</math>
where:
* <math display="inline">D_{KL}(G_i \| G_j)</math> measures the dissimilarity between two Gaussian distributions <math display="inline">G_i</math> and <math display="inline">G_j</math>.

The optimization aims to preserve the contrast perceived by CVD viewers while maintaining naturalness. Weights are assigned to Gaussian components based on the perceptual importance of colors:
<math display="block">
\lambda_i = \frac{\sum_{j=1}^N \alpha_j p(i|x_j, \Theta)}{\sum_{k=1}^K \sum_{j=1}^N \alpha_j p(k|x_j, \Theta)},
</math>
where:
* <math display="inline">\alpha_j = \|x_j - \text{Sim}(x_j)\|</math> is the perceptual error of the <math display="inline">j</math>-th color feature when simulated for CVD,
* <math display="inline">\text{Sim}(\cdot)</math> is the simulation function for CVD perception.

Step 4 - Interpolation for Recoloring:
After optimizing the Gaussians, the mapping function <math display="inline">M_i(\cdot)</math> relocates the mean vectors while maintaining covariance matrices. Interpolation ensures smooth transitions between recolored regions:
<math display="block">
T(x_j)_H = x_j^H + \sum_{i=1}^K p(i|x_j, \Theta) (M_i(\mu_i)_H - \mu_i^H),
</math>
where:
* <math display="inline">T(x_j)_H</math> is the hue adjustment for the <math display="inline">j</math>-th color,
* <math display="inline">M_i(\mu_i)_H</math> is the mapped hue of the <math display="inline">i</math>-th Gaussian's mean.

While the GMM-based approach effectively models color distributions and enhances the contrast of recolored images significantly, it has limitations:
* The accuracy of recoloring depends on the choice of <math display="inline">K</math>, which may vary for different images.
* The method assumes diagonal covariance matrices for computational efficiency, which may oversimplify real-world color distributions. Sometimes the colors in the recolored images are not very natural.
* The high computational complexity in the optimization step of this algorithm may be difficult for real-time applications.

==== Segmentation Guided Recoloring Method ====
One interesting method we found using machine learning algorithms is based on semantic segmentation. Chatzistamatis et al. [12] introduced a recoloring approach for digitized art paintings to enhance color perception for individuals with protanopia and deuteranopia. A key component of their method involves semantic segmentation, guided by transfer learning, to identify and preserve important visual elements in art paintings.

The segmentation process leverages the Mask R-CNN architecture, utilizing transfer learning to adapt from natural image datasets to the domain of art paintings. This adaptation involves the following steps:

1. Preprocessing and augmentation: images are preprocessed with techniques such as horizontal flips, random cropping, Gaussian blur, and affine transformations to enhance the diversity of the training set.

2. Feature extraction: the ResNet-101 backbone, pre-trained on the ImageNet dataset, extracts features from input images. Lower layers detect basic features like edges, while higher layers identify complex structures such as objects within paintings.

3. Region proposal: the Region Proposal Network (RPN) identifies regions of interest (RoIs) in the feature maps using sliding windows. These RoIs are further refined into accurate object boundaries.

4. Object masking: masks for the identified objects are generated to preserve fine-grained details of the paintings.

This semantic segmentation process divides the image pixels into two disjoint sets:
<math display="inline">T_V</math>, pixels belonging to segmented objects, and <math display="inline">T_U</math>, background pixels outside segmented objects.

By separating these sets, the algorithm focuses recoloring efforts on regions that are visually significant while maintaining the natural appearance of the background.

The recoloring process modifies the colors in the segmented object set <math display="inline">T_V</math> while leaving the background set <math display="inline">T_U</math> largely intact. Key steps include:

1. Color simulation: colors in <math display="inline">T_V</math> are transformed to simulate the perception of dichromatic viewers, enabling identification of indistinguishable colors.

2. Color clustering: fuzzy c-means clustering groups colors into clusters for efficient manipulation. Cluster centers, or "key colors," are adjusted to reduce color confusion while preserving visual coherence.

3. Recoloring optimization: an objective function is minimized to enhance contrast and naturalness, similar to the objective functions in the methods mentioned above:
<math display="block">
E = E_1 + E_2 + cE_3,
</math>
where:
where:
* <math display="inline">E_1</math> preserves the contrast between object colors and background colors:
<math display="block">
E_1 = \sum_{p \in T_V, q \in T_U} \|f(p) - f(q)\| - \|p - q\|,
</math>
* <math display="inline">E_2</math> enhances contrast within object colors:
<math display="block">
E_2 = \sum_{p, q \in T_V} \|f(p) - f(q)\| - \|p - q\|,
</math>
* <math display="inline">E_3</math> minimizes the perceptual difference between original and recolored key colors:
<math display="block">
E_3 = \sum_{p \in T_V} \|f(p) - p\|^2.
</math>

Here:
* <math display="inline">p</math> and <math display="inline">q</math> are pixel values in the image.
* <math display="inline">f(p)</math> is the recolored pixel value for pixel <math display="inline">p</math>.
* <math display="inline">\| \cdot \|</math> represents the Euclidean distance in the perceptual color space.
* <math display="inline">c</math> is a weighting factor that controls the importance of naturalness preservation relative to contrast enhancement.

While the method effectively balances contrast and naturalness for CVD viewers, it has several limitations:
* The success of segmentation relies heavily on the pre-trained Mask R-CNN model, which may not generalize well to all styles of art or real-life images.
* Semantic segmentation and optimization introduce significant computational overhead, making the method slow and less suitable for real-time applications.
* Errors in segmentation are difficult to control and may lead to misclassification of visually important regions, resulting in suboptimal recoloring.
* The focus of this method is restricted to protanopia and deuteranopia and without any flexibility for personalization.

=== Deep Learning based methods ===
Conventional methods for recoloring, including optimization-based approaches (as discussed above), fail to generalize well across varying severity levels and CVD types. While these methods improve color differentiation, they frequently compromise naturalness or require extensive computational resources, making them less suitable for real-time, efficient, personalized applications.

==== GAN-Based Recoloring for CVD ====

In [1] GANs (Generative Adversarial Networks) was explored for recoloring, with a backbone Pix2Pix-GAN, Cycle-GAN, and Bicycle-GAN structure showing promising results. These models are generate creative recolored images by learning mappings between normal and CVD-affected color spaces. However, this and existing GAN approaches struggle with balancing naturalness and contrast. This specific reference also requires paired datasets (since it is adapted from style transfer), making it computationally intensive and less suitable for personalization.

==== Swin Transformer Recoloring ====

The authors in [2] introduced a hierarchical vision transformer (SWIN) architecture that processes images through shifted windows, effectively capturing both local and global contextual information. In computer vision, this design generally allows efficient handling of high-resolution images and has been applied to various tasks, including image classification and object detection. Despite its robust performance, this architecture is still computationally intensive and does not inherently account for the specific needs of CVD individuals, as it lacks mechanisms for personalized color adjustments.

==== Personalized CVD-GAN ====

To cater to the diverse needs of the CVD population, the Personalized CVD-GAN [3] was developed. This model generates images that are not only CVD-friendly but also tailored to individual degrees of color vision deficiency. By disentangling color representations using a unique triple-latent structure in their method, continuous personalization was possible to adjust images according to specific CVD severities. While effective, this approach is computationally demanding, making it less practical for real-time applications. In our experiment, it took around 18 days for one epoch (or one iteration over the entire dataset).

Thus, existing methods either lack personalization or are too resource-intensive for widespread use.

== Methods ==

=== Deep Learning based ===

==== Task Overview ====
Given an input RGB image and a label for the user (as shown in the figure), we want a deep learning model to output a recolored RGB image that is specific to that user. More details on inputs and outputs are discussed in further sections but an overview is shown in Figure 1. All of the code was done in Python using a deep learning framework called [https://pytorch.org PyTorch]
[[File:Io.png|right|thumb|200px|Figure 1: Dataset]]

==== Types ====
1. ''' Supervised methods ''':
These are deep learning models that require a 'ground truth' recolored image for the neural network to learn recolorization. While these methods are simple, easy to train and integrate the user label, they require an already present ground truth comparison of expected output.

2. ''' Unsupervised methods ''':
These models are trained without a ground truth and can also encode user label information while training. They are generally better at generating more natural images, but they require more compute and sophisticated model architectures or loss functions for the recoloring task

==== Dataset ====
The dataset used for this project was constructed specifically to address the challenges of recoloring images for individuals with color vision deficiency (CVD). We first gathered an open-source RGB image dataset from [2]. To improve the capability of the proposed model to enhance the contrast between CVD-indistinguishable color
pairs, in their study, they created a new dataset consisting of 141,000 pictures of both natural scenes and artificial images containing
CVD-confusing colors without labels. To generate labels (and ground truth recolored images for supervised methods), we randomly sampled 15,000 images and recolored by simulating random labels for severity and type of CVD. The recoloring for ground truth images was done using a [https://github.com/jbhuang0604/RecolorForColorblind/tree/master MATLAB script] (adapted to Python) from [4]. Note: The open-source tools used in the Python version for the recoloring script were [https://scikit-image.org Scikit-Image], [https://scipy.org Scipy] and [https://python-colormath.readthedocs.io/en/latest/ Colormath].

As shown in Figure 1, each sample in the dataset consists of:
1. ''' Original RGB Image''' : High-resolution images, resized to <code> 256x256</code> pixels and normalized to <code>[0,1]</code> range, representing the standard color space.

2. ''' CVD Labels ''' : Condition labels encoded as <code>severity * [protan, deutan]</code>, where severity ranges from 0.1 to 1.0. For example, a label <code>[0.6, 0]</code> corresponds to protanopia at 60% severity.

Data augmentation techniques such as random rotations, crops, and brightness adjustments were applied to expand the dataset, ensuring robust model generalization across diverse scenarios.

==== Supervised Methods ====
===== Conditional Parallel RGB MLP =====
[[File:mlp.png|right|thumb|Figure 2: Conditional MLP architecture]]
As shown in Figure 2, the model predicts the R, G, and B channels separately using an independent multi-layer perceptron (MLP) for each channel. The input image is concatenated with the label encoding along the channel dimension and is passed to 3 parallel MLPs simultaneously. These parallel networks are learned to predicted R, G, B channels of a recolored image based on given ground truth. The outputs from each of these networks are concatenated to produce the recolored RGB image of same spatial dimensions as input. Essentially, each channel is disentangled, enabling targeted adjustments.

The loss function used to train was pixel wise, mean-squared error loss:
<math>
\mathcal{L}_{\text{MSE}} = \frac{1}{N} \sum_{p=1}^{N} \left( I(p) - I'(p) \right)^2
</math>

Where:
* I, I': Recolored (model output) image and ground truth recolored image respectively
* p: Image index
* N: Total number of images

===== Conditional U-Net =====
In a similar fashion of inputs, a convolutional neural network (CNN)-based U-Net architecture was tested to generate a full recolored image as output. The conditional inputs here affect both the encoder and decoder. [[File:Unet condtional.png|right|thumb|Figure 3: Conditional U-Net architecture]]
U-Nets are widely used in computer vision tasks and are very robust to new tasks as well. The architecture we adopted is shown in Figure 3.
The loss function used to train the U-Net was a commonly used VGG Perceptual Loss:
<math>
\mathcal{L}_{\text{VGG}} = \sum_{l} \frac{1}{N_l} \| \phi_l(I) - \phi_l(I') \|_2^2
</math>

Where:
* I and I': are recolored (model output) and ground recolored images respectively
* <math>\phi_l</math> is the l-th of the pre-trained VGG network

==== Unsupervised Methods ====
===== Conditional Autoencoder =====
As shown in Figure4, an unsupervised CNN-based encoder-decoder network was trained to reconstruct full recolored images with a CVD-aware color palette. The key to making this network align with the recoloring task was the loss functions. The loss functions we used to train this network were inspired from [2]. [[File:Ae.png|right|350px|thumb|Figure 4: Conditional Autoencoder architecture]]

The total loss function is given by:
<math>
\mathcal{L}_{\text{total}} = \alpha \cdot \mathcal{L}_{\text{naturalness}} + 2 \cdot (1 - \alpha) \cdot \mathcal{L}_{\text{contrast}}
</math>

Where:
<math>
\mathcal{L}_{\text{contrast}} = \beta \cdot \mathcal{L}_{\text{global}} + (2 - \beta) \cdot \mathcal{L}_{\text{local}}
</math>

The components of the loss functions are described below:

1. '''Global Contrast Loss''':
The global contrast loss ensures that the overall contrast of the recolored image is preserved. It is defined as
<math>
\mathcal{L}_{global} = \frac{1}{\|\omega\|} \sum_{<x, y> \in \epsilon \omega} \text{CL}(x, y)
</math>

2. '''Local Contrast Loss''':
The local contrast loss focuses on preserving the contrast within a small neighborhood around each pixel. <math>
\mathcal{L}_{l} = \frac{1}{N} \sum_{x=1}^{N} \sum_{y \in \omega_x} \frac{\text{CL}(x, y)}{\|\omega_x\|}
</math>

Note:

<math>
\text{CL}(x, y) = \|\hat{c}_x' - \hat{c}_y'\| - \|c_x - c_y\|
</math>

* x,y: Two distinct pixels in the image.
* cx and cy: CVD simulated colors of original image
* c^x′and c^y: CVD simulated colors of recolored image (model output)
* ||w||: Global (or large) window of image
* ||wx||: Local window or neighborhood around a pixel x

3. '''Naturalness Loss''':
The naturalness loss drives output image to have colors that are visually similar and close to natural distributions. <math>
\mathcal{L}_{\text{natural}} = 1 - \text{SSIM}(I', I)
</math>

Where:
* I(i), I'(i): Original and recolored images respectively

== Results ==
=== Deep Learning based methods ===
The results focus on evaluating the performance of the above neural network architectures—Conditional Parallel RGB MLP, Deep U-Net, and Conditional Autoencoder. Quantitive metrics such as Structural Similarity Index (SSIM), total color contrast (TCC), Chromatic Difference (CD), and inference time were used to assess the effectiveness of the models provided in [1] and [2].

==== Qualitative Results ====
The recolored outputs were visually evaluated to determine their alignment with expected results. The 'expected' results for supervised mean how closely they resemble ground truth recolored image and for unsupervised method mean how much contrast and naturalness is observed in the CVD simulated recolored images compared to original.
The results and takeaways can be summarized as follows:

1. '''Conditional Parallel RGB MLP''': (Figure 5)
[[File:Mlp_res.png|right|400px|thumb|Figure 5 Conditional MLP: Model failure]]
* Recoloring was inconsistent, with visible artifacts in regions where spatial correlations were essential.
* The pixels seemed more discretized, suggesting that disentanglement was not very useful for this case (especially naturalness).
* Failed to preserve natural color transitions, particularly in complex images.
2. '''Conditional U-Net''': (Figure 6, 7)
[[File:Unet_res1.png|right|400px|thumb|Figure 6 Conditional U-Net: Model failure]]
[[File:Unet_res2.png|right|400px|thumb|Figure 7 Conditional U-Net: CVD Simulated examples]]
* Produced stable recoloring, preserving structural details.
* Initially showed improvement towards resembling ground truth, but over time started 'reconstructing' the colors of the original image.
* The CVD simulations of recolored versus original were similar or worse meaning that the model was not doing well for this task
* Sometimes it over-saturated some colors, affecting the visual appeal.
3. '''Conditional Autoencoder''': (Figure 8, 9)
[[File:ae_res1.png|right|400px|thumb|Figure 8 Conditional Autoencoder: Majority good results]]
[[File:ae_res1.png|right|400px|thumb|Figure 9 Conditional Autoencoder: Marginal or negative improvement + Blurriness]]
* Achieved smooth and natural recoloring, with fewer artifacts.
* Showed the highest contrast improvement among the three models.
* In some cases, hurt the contrast in the CVD simulated colors and in some there was marginal improvement in contrast.
* Blurriness in the recolored images was seen (possibly due to naturalness factor being more prioritized even though weight coefficients in the loss term favored contrast (alpha = 0.25, beta = 1.0)).

==== Quantitative Results ====
Based on the above qualitative results, we decided to score and evaluate metrics for comparison with related work only using the Conditional Autoencoder.
As mentioned above, the evaluation metrics are adapted from [1] and [2]. Please refer to the definitions in the paper, as we have used the same. On a high level, the three components are:
* SSIM: Measures the structural similarity between the original and recolored images, ensuring the structural integrity of the recolored image is maintained.
<math>
SSIM(X, Y) = \frac{(2\mu_X\mu_Y + c_1)(2\sigma_{XY} + c_2)}{(\mu_X^2 + \mu_Y^2 + c_1)(\sigma_X^2 + \sigma_Y^2 + c_2)}
</math>

* Total Color Contrast: Quantifies the visibility improvement between indistinguishable colors for CVD individuals.
<math>
TCC = \frac{1}{n_1} \sum_{(i,j) \in \Omega_1} |x_i - x_j|
+ \frac{1}{N \cdot n_2} \sum_{i=1}^{N} \sum_{j \in \Omega_2} |x_i - x_j|
</math>
* Chromatic Difference: Quantifies the perceptual differences in color before and after recoloring, ensuring enhanced distinguishability
<math>
CD(i) = \sqrt{\lambda (l_i' - l_i)^2 + (a_i' - a_i)^2 + (b_i' - b_i)^2}
</math>
(lamda is a constant, not wavelength and l,a,b represent LAB space coordinates of recolored (') and original respectively.)
* Inference Time: Determines the computational efficiency of the models.

The key results are in Table 1 and takeaways for the Conditional Autoencoder can be summarized as follows:

{| class="wikitable" style="text-align:center; width:30%; margin:auto;"
|-
! Metric
! Value
|-
| Inference Time
| 2.6 seconds/image
|-
| SSIM ("Structure")
| 0.8707
|-
| Total Color Contrast ("Distinguishability")
| 0.5771 / (~0.851)*
|-
| Chromatic Difference ("Color")
| 0.3521 / (~0.963)*
|+ '''Table 1: Quantitative Evaluation Results'''
|}

Note: * indicates results from paper [2] for protan/deutan whichever is larger.

* TCC and CD are good but not as good as paper [2] because they use optimize networks for each CVD type separately.
* Blurry (SSIM is not optimized for enough)
* Mixing CVD types in the same network needs to be more sophisticated

== Conclusions ==
Through our (many) experiments, we learned a couple of things:

1. '''Model Effectiveness''':
Among the models, the Conditional Autoencoder showed the best balance between enhancing color contrast and preserving naturalness. It improved the distinguishability of colors for CVD individuals while maintaining a smooth, visually appealing output. However, it produced slightly blurry images, which could be improved with better loss functions or refinement techniques. The Conditional U-Net was also effective in preserving structure and providing stable recoloring, but it required careful training to avoid overfitting. The Conditional Parallel RGB MLP, while computationally fast, lacked the ability to capture spatial relationships between pixels, making it unsuitable for this task.

2. '''Importance of Loss Functions''':
Designing appropriate loss functions was crucial for achieving the right balance between naturalness, contrast enhancement, and structural preservation. The global and local contrast losses significantly improved the visibility of recolored images, while the naturalness loss ensured that the outputs did not look artificial. Incorporating metrics like SSIM and Chromatic Difference into the evaluation also helped us better understand how well the models performed.

3. '''Challenges with Data''':
One of the biggest challenges was ensuring that the dataset effectively represented real-world scenarios for CVD individuals. Simulating CVD perceptions and generating recolored images that matched those perceptions required a well-defined pipeline. A more diverse dataset or additional user studies with CVD participants could help fine-tune the models further.

4. '''Computational Efficiency''':
While models like the Conditional Autoencoder and Conditional U-Net provided high-quality recoloring, their inference times were moderate, making them feasible for real-time applications. Optimizing these models further could make them more scalable for real-world use cases, such as accessibility tools in apps or websites.

5. '''What Worked and What Didn’t''':
* Worked: Contrast enhancement methods using local and global losses were effective in improving visibility for CVD individuals. Transformer-inspired loss functions borrowed from Swin architecture added robustness.
* Didn’t Work: Pixel-wise methods like the Conditional RGB MLP struggled due to their inability to handle spatial dependencies. Additionally, overfitting was a recurring issue in larger architectures without careful training.

6. '''Future Directions''':
* Better Loss Functions: Refining the loss functions to address issues like blurriness in outputs could further improve results.
* User Studies: Testing the models with real CVD participants would provide valuable insights and help validate the results.
* Model Optimization: Reducing the computational cost of high-performing models like the Conditional Autoencoder could make them more practical for deployment.
* Exploration of New Architectures: Trying newer methods, such as lightweight transformers or diffusion-based models, might enhance recoloring performance while maintaining efficiency.

While there’s still room for improvement, our models demonstrated the potential of deep learning in addressing the challenges faced by individuals with CVD. Our future work would focus on refining these methods and bringing them closer to practical, everyday applications.

== References ==
[1] Li, H., Zhang, L., Zhang, X., Zhang, M., Zhu, G., Shen, P., ... & Shah, S. A. A. (2020). Color vision deficiency datasets & recoloring evaluation using GANs. Multimedia Tools and Applications, 79, 27583-27614.

[2] Chen, L., Zhu, Z., Huang, W., Go, K., Chen, X., & Mao, X. (2024). Image recoloring for color vision deficiency compensation using Swin transformer. Neural Computing and Applications, 36(11), 6051-6066.

[3] Jiang, S., Liu, D., Li, D., & Xu, C. (2023). Personalized image generation for color vision deficiency population. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22571-22580).

[4] Huang, J.-B., Chen, C.-S., Jen, T.-C., & Wang, S.-J. (n.d.). Image recolorization for the colorblind [GitHub repository]. Retrieved December 12, 2024, from https://github.com/jbhuang0604/RecolorForColorblind

[5] Dietrich, J. (n.d.). Daltonize Python Package [GitHub repository]. Retrieved December 12, 2024, from https://github.com/joergdietrich/daltonize/blob/main/daltonize/daltonize.py

[6] Dougherty, B., & Wade, A. (2000). Vischeck. Retrieved December 12, 2024, from https://www.vischeck.com/

[7] Brettel, H., Viénot, F., & Mollon, J. D. (1997). Computerized simulation of color appearance for dichromats. Josa a, 14(10), 2647-2655.

[8] Zhu, Z., Toyoura, M., Go, K., Fujishiro, I., Kashiwagi, K., & Mao, X. (2019). Processing images for red–green dichromats compensation via naturalness and information-preservation considered recoloring. The Visual Computer, 35, 1053-1066.

[9] Zhu, Z., Toyoura, M., Go, K., Kashiwagi, K., Fujishiro, I., Wong, T. T., & Mao, X. (2021). Personalized image recoloring for color vision deficiency compensation. IEEE Transactions on Multimedia, 24, 1721-1734.

[10] Tsekouras, G. E., Rigos, A., Chatzistamatis, S., Tsimikas, J., Kotis, K., Caridakis, G., & Anagnostopoulos, C. N. (2021). A novel approach to image recoloring for color vision deficiency. Sensors, 21(8), 2740.

[11] Huang, J. B., Chen, C. S., Jen, T. C., & Wang, S. J. (2009, April). Image recolorization for the colorblind. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 1161-1164). IEEE.

== Appendix I ==
- Upload source code, test images, etc, and give a description of each link. In some cases, your acquired data may be too large to store practically. In this case, use your judgement (or consult one of us) and only link the most relevant data. Be sure to describe the purpose of your code and to edit the code for clarity. The purpose of placing the code online is to allow others to verify your methods and to learn from your ideas.

== Appendix II ==
'''Ishikaa''':
* Training, evaluation and visualization for each of MLP, U-Net and Autoencoder
* GMM-based recolorization (adapting from [4]) & adding severity index
* 'Ground Truth' dataset creation and logging
* AWS Compute setup & configuration
* Written Report & Presentation

'''Raina''':

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-13T05:11:02Z

Rainas:

== Introduction ==
Color Vision Deficiency (CVD) affects approximately 350 million individuals worldwide, impairing their ability to distinguish certain colors. Image recoloring for individuals with CVDs has been a well-researched area, with numerous attempts aimed at creating images that make colors more distinguishable for those with CVDs. CVDs are typically classified into three main types: protanopia (difficulty perceiving red), deuteranopia (difficulty perceiving green), and tritanopia (difficulty perceiving blue). These conditions arise due to the absence or malfunction of certain types of cone cells in the retina, which are responsible for color vision. For instance, the lack or defect of red or green cones leads to protanopia or deuteranopia, respectively. This can result in partial or complete loss of perception of specific colors. Moreover, individuals with CVD can be dichromats (completely missing one type of cone cell), or anomalous trichromats (having all three types of cones but with altered sensitivity), causing milder color perception issues. Most research has focused on protanopia and deuteranopia, as these conditions are more prevalent, and only a few consider different severity levels.

In this work, we aim to consider all three types of CVDs, taking into account varying levels of severity for personalization. We explore existing methods in the field and experiment through two main approaches: mathematical transformations and deep learning techniques. We will start by reviewing current advancements in these two domains, followed by presenting our experiments and results. Evaluations of each method will be provided, leading to a discussion of our findings and outlining potential directions for future work.

The motivation for this work arose from a personal experience. While admiring a rainbow, a phenomenon I find among the most beautiful in the world, with a friend who has deuteranopia, I realized that they were unable to distinguish the vibrant array of colors. This experience highlighted the emotional and perceptual gap caused by CVD, inspiring the goal of this project: to develop personalized and efficient tools that enhance color perception for individuals with CVDs. Ultimately, we aim to enable those with CVDs to enjoy the same vivid experiences, such as the beauty of a rainbow, experienced by those with normal color vision.

== Background ==
In recent years, numerous methods have been developed to recolor images for individuals with CVDs, ranging from traditional mathematical approaches to advanced deep learning techniques. This section focuses on the prominent recent works in these two categories.

=== Mathematical-based methods ===
Mathematical approaches to image recoloring for individuals with CVDs have been extensively developed to enhance color discrimination while trying to preserve the natural appearance of images. These methods typically involve color space transformations, optimization techniques, and perceptual modeling to achieve their objectives.

==== Daltonization ====
Daltonization enhances images for individuals with CVD by correcting colors based on the simulated deficiency. The process involves comparing the original LMS values with the simulated deficient values to compute the error:
<math display="block">
\text{Error}_{\text{LMS}} = \text{LMS}_{\text{original}} - \text{LMS}_{\text{simulated}}
</math>

The error is then mapped back to the RGB space using a correction matrix. For example, the correction matrix for protanopia, as implemented in tools like Vischeck [6], is:
<math display="block"> \text{Correction Matrix for Protanopia} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

The corrected RGB values are added back to the original LMS values to generate a daltonized image that improves contrast for CVD viewers.

The simulation of CVDs relies on the physiology of human vision, particularly the responses of the Long (L), Medium (M), and Short (S) wavelength-sensitive cones in the retina. The LMS color space is derived from the spectral sensitivities of these cones, making it an ideal framework for modeling human color perception.

To simulate CVD, colors are first transformed into the LMS color space using the following linear transformation matrix based on Stockman and Sharpe’s cone fundamentals:
<math display="block">
T_{\text{RGB-to-LMS}} = \begin{bmatrix}
0.3904725 & 0.54990437 & 0.00890159 \\
0.07092586 & 0.96310739 & 0.00135809 \\
0.02314268 & 0.12801221 & 0.93605194
\end{bmatrix}
</math>

For individuals with CVD, the missing cone’s response is replaced by a weighted combination of the remaining two cones. This approach, introduced by Brettel, Viénot, and Mollon (1997) [7], uses specific coefficients derived from cone sensitivities. For example, in protanopia (L-cone deficiency), the L-cone response is approximated using the M- and S-cone responses as:
<math display="block">
L_{\text{simulated}} = 0 \cdot L + 0.90822864 \cdot M + 0.008192 \cdot S
</math>

For deuteranopia (M-cone deficiency), the M-cone is replaced as:
<math display="block">
M_{\text{simulated}} = 1.10104433 \cdot L + 0 \cdot M - 0.00901975 \cdot S
</math>

For tritanopia (S-cone deficiency), the S-cone is replaced as:
<math display="block">
S_{\text{simulated}} = -0.15773032 \cdot L + 1.19465634 \cdot M + 0 \cdot S
</math>

These transformations allow accurate simulation of the perceptual experience of individuals with CVD. (The numbers are derived from [5]).

==== Optimization-based Method ====
Zhu et al. [8] introduced an optimization-based recoloring framework for red-green dichromacy, aiming to balance naturalness and contrast. The framework minimizes a total loss function defined as:

<math display="block"> E = \beta E_{\text{nat}} + E_{\text{cont}} </math>

where <math>\beta</math> is a scalar weight that controls the trade-off between the two objectives: naturalness preservation (<math>E_{\text{nat}}</math>) and contrast enhancement (<math>E_{\text{cont}}</math>).

The naturalness term, <math>E_{\text{nat}}</math>, ensures that the recolored image closely resembles the original image for CVD viewers by minimizing perceptual differences:

<math display="block"> E_{\text{nat}} = \sum_{i=1}^N \| c_i^+ - c_i \|^2, </math>

where:
* <math>N</math> is the total number of pixels in the image,
* <math>c_i</math> is the original color of the <math>i</math>-th pixel,
* <math>c_i^+</math> is the recolored value of the <math>i</math>-th pixel,
* <math>\| c_i^+ - c_i \|</math> is the Euclidean distance, measuring the perceptual difference between the original and recolored colors.

The contrast term, <math>E_{\text{cont}}</math>, enhances the distinguishability of colors in the recolored image by minimizing changes in color contrast:

<math display="block"> E_{\text{cont}} = \sum_{i \neq j} \| (c_i^+ - c_j^+) - (c_i - c_j) \|^2, </math>

where:
* <math>(c_i^+ - c_j^+)</math> is the perceived color difference between pixels <math>i</math> and <math>j</math> after recoloring,
* <math>(c_i - c_j)</math> is the original color difference,
* <math>\| (c_i^+ - c_j^+) - (c_i - c_j) \|</math> represents the deviation in color contrast before and after recoloring.

To address the limitations of this approach, Zhu et al. [9] proposed a degree-adaptable framework incorporating a transformation matrix <math>T</math> that simulates CVD perception. The transformation matrix is defined as:

<math display="block"> T = \begin{bmatrix} t_{11} & t_{12} & t_{13} \\ t_{21} & t_{22} & t_{23} \\ t_{31} & t_{32} & t_{33} \end{bmatrix}, </math>

where <math>t_{ij}</math> are the elements representing the relationships between the original and perceived LMS (Long, Medium, Short wavelength) cone responses for individuals with CVD.

The degree-adaptable loss function extends the optimization by adjusting weights based on perceptual importance, defined as:

<math display="block"> E = \beta \sum_{i=1}^N \alpha_i \| T(c_i^+ - c_i) \|^2 + \sum_{i \neq j} \| T(c_i^+ - c_j^+) - T(c_i - c_j) \|^2. </math>

Here:
* <math>\alpha_i</math> assigns weights to each pixel, prioritizing the preservation of colors with smaller perception errors,
* <math>\| T(c_i^+ - c_i) \|</math> measures the perceptual difference after recoloring,
* <math>\| T(c_i^+ - c_j^+) - T(c_i - c_j) \|</math> quantifies the deviation in color contrast under CVD simulation.

This framework improves both contrast and personalization but requires further optimization for real-time performance.

==== Confusion lines based Method ====
Tsekouras et al. [10] proposed a novel image recoloring approach for individuals with protanopia and deuteranopia, focusing on improving color naturalness and enhancing contrast. Their framework consists of four modules, with a key focus on shifting confusing colors along confusion lines in the CIE 1931 chromaticity diagram.

The method begins with fuzzy clustering to extract representative colors (key colors) from the input image. These colors are mapped onto the CIE 1931 chromaticity diagram, where confusion lines represent loci of colors perceived as identical by individuals with CVD. Confusion lines are defined using the copunctal point of the missing cone type and another reference point:

<math display="block">
d(v, L) = \frac{\left|(x_{cp} - x_0)(y_0 - y_v) - (x_0 - x_v)(y_{cp} - y_0)\right|}{\sqrt{(x_{cp} - x_0)^2 + (y_{cp} - y_0)^2}},
</math>

where:
* <math display="inline">v = (x_v, y_v)</math> is the chromaticity coordinate of the color,
* <math display="inline">L</math> is the confusion line passing through the copunctal point <math display="inline">(x_{cp}, y_{cp})</math> and another reference point <math display="inline">(x_0, y_0)</math>,
* <math display="inline">d(v, L)</math> measures the perpendicular distance from the point <math display="inline">v</math> to the confusion line <math display="inline">L</math>.

Confusing colors, identified as key colors lying on occupied confusion lines, are iteratively shifted to the nearest non-occupied confusion lines to enhance discriminability for CVD viewers. The translation process involves:

1. Ranking key colors by their cluster sizes:
<math display="block">
\text{rank}(v_i) = \frac{|A_i|}{\sum_{j=1}^{n_A}|A_j|},
</math>

where:
* <math display="inline">v_i</math> is the chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">|A_i|</math> is the cardinality (number of pixels) of its associated cluster,
* <math display="inline">n_A</math> is the total number of clusters.

2. Translating the highest-ranked confusing color <math display="inline">v^*</math> to its projection on the nearest non-occupied confusion line:
<math display="block">
v^*_{\text{tr}} = \text{proj}(v^*, L^*),
</math>

where:
* <math display="inline">v^*_{\text{tr}}</math> is the new position of the color <math display="inline">v^*</math> after translation,
* <math display="inline">L^*</math> is the nearest non-occupied confusion line, determined as:
<math display="block">
d(v^*, L^*) = \min_{L \in \text{CL}_D} d(v^*, L).
</math>

3. Updating the sets of confusing colors and non-occupied confusion lines iteratively:
<math display="block">
\Phi_V = \Phi_V - \{v^*\}, \quad \text{CL}_D = \text{CL}_D - \{L^*\}.
</math>

where:
* <math display="inline">\Phi_V</math> is the set of confusing colors,
* <math display="inline">\text{CL}_D</math> is the set of non-occupied confusion lines.

After shifting, the luminance of the recolored key colors is optimized using a regularized objective function to balance naturalness and contrast:
<math display="block">
E = (E_1 + E_2) + \lambda E_3,
</math>

where:
* <math display="inline">E</math> is the total loss,
* <math display="inline">\lambda</math> is a weight parameter controlling the trade-off between contrast enhancement and naturalness preservation.

The first term, <math display="inline">E_1</math>, measures contrast enhancement for normal trichromats:

<math display="block">
E_1 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - b_j\| - \|f_D(a_{i,\text{rec}}) - f_D(b_j)\| \right|,
</math>

where:
* <math display="inline">n_A</math> and <math display="inline">n_B</math> are the number of key colors in clusters <math display="inline">A</math> and <math display="inline">B</math>, respectively,
* <math display="inline">a_i</math> is the chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">b_j</math> is the chromaticity of the <math display="inline">j</math>-th key color in cluster <math display="inline">B</math>,
* <math display="inline">f_D</math> is a function simulating the dichromatic vision of individuals with color vision deficiencies,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color.

The second term, <math display="inline">E_2</math>, measures contrast enhancement for dichromats:

<math display="block">
E_2 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - a_j\| - \|f_D(a_{i,\text{rec}}) - f_D(a_{j,\text{rec}})\| \right|,
</math>

where:
* <math display="inline">a_i</math> and <math display="inline">a_j</math> are the chromaticities of the <math display="inline">i</math>-th and <math display="inline">j</math>-th key colors in cluster <math display="inline">A</math>,
* <math display="inline">f_D(a_{i,\text{rec}})</math> simulates the dichromatic perception of the recolored chromaticity <math display="inline">a_{i,\text{rec}}</math>.

The third term, <math display="inline">E_3</math>, preserves the naturalness of the recolored image:

<math display="block">
E_3 = \frac{1}{n_A} \sum_{i=1}^{n_A} \|a_i - a_{i,\text{rec}}\|,
</math>

where:
* <math display="inline">a_i</math> is the original chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">\|a_i - a_{i,\text{rec}}\|</math> is the Euclidean distance between the original and recolored chromaticities, measuring how much the naturalness is preserved.

This method significantly enhances the contrast and naturalness of recolored images by leveraging confusion line geometry and regularized optimization. However, challenges remain in achieving real-time performance and handling cases where shifting may distort the aesthetic quality of the image.

==== GMM-based Method ====
Huang et al. [11] proposed an efficient and effective re-coloring algorithm for individuals with CVD using a Gaussian Mixture Model (GMM) to represent color distributions. The algorithm comprises four main steps: feature extraction, clustering using GMM, optimization of Gaussian components, and interpolation for recoloring.

Step 1 - Feature Extraction:
Each pixel in the input image is represented in the CIEL*a*b* color space, which approximates perceptual differences using the Euclidean distance between colors. The color feature vector <math display="inline">x</math> is used as input for clustering.

Step 2 - Clustering via GMM:
The color distribution of the image is modeled using a GMM with <math display="inline">K</math> Gaussian components:
<math display="block">
p(x|\Theta) = \sum_{i=1}^K \omega_i G_i(x|\theta_i),
</math>
where:
* <math display="inline">\Theta</math> is the parameter set containing all weights, means, and covariance matrices,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian,
* <math display="inline">G_i(x|\theta_i)</math> is the 3D normal distribution with parameters <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix).

Diving into more details, the parameters of the GMM are initialized using the K-means algorithm and refined via the Expectation-Maximization (EM) algorithm, which consists of the E-step and the M-step:

The E-step calculates the probability of each color (or pixel) belonging to a specific Gaussian component in the GMM. This probability, also known as the "responsibility," reflects how much each Gaussian contributes to the representation of a color:

<math display="block">
p(i|x_j, \Theta^{\text{old}}) = \frac{\omega_i G_i(x_j|\theta_i)}{\sum_{k=1}^K \omega_k G_k(x_j|\theta_k)}.
</math>

Here:
* <math display="inline">p(i|x_j, \Theta^{\text{old}})</math> is the probability of the <math display="inline">j</math>-th color feature <math display="inline">x_j</math> belonging to the <math display="inline">i</math>-th Gaussian component,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian, representing its overall contribution to the color distribution,
* <math display="inline">G_i(x_j|\theta_i)</math> is the Gaussian distribution for the <math display="inline">i</math>-th component, evaluated at <math display="inline">x_j</math>, where <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix),
* <math display="inline">\sum_{k=1}^K \omega_k G_k(x_j|\theta_k)</math> normalizes the probabilities by considering the contributions of all <math display="inline">K</math> Gaussians to the <math display="inline">j</math>-th pixel.

This step essentially assigns each pixel a "soft" membership to each Gaussian component, rather than forcing a hard clustering decision. Pixels that are close to a Gaussian's mean (in feature space) will have higher probabilities of belonging to that Gaussian.

The M-step updates the parameters of each Gaussian component based on the probabilities computed in the E-step. These updates refine the Gaussian model to better fit the data:

1. Update the mixing weights:
<math display="block">
\omega_i^{\text{new}} = \frac{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}})}{N},
</math>
This equation calculates the proportion of pixels assigned to the <math display="inline">i</math>-th Gaussian. It reflects how dominant each Gaussian is in representing the color distribution.

2. Update the means:
<math display="block">
\mu_i^{\text{new}} = \frac{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}}) x_j}{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}})},
</math>
This equation computes the new mean vector <math display="inline">\mu_i^{\text{new}}</math> for the <math display="inline">i</math>-th Gaussian. It is a weighted average of all pixel feature vectors <math display="inline">x_j</math>, where the weights are the probabilities <math display="inline">p(i|x_j, \Theta^{\text{old}})</math>. Pixels with higher probabilities contribute more to the new mean.

3. Update the covariance matrices:
<math display="block">
\Sigma_i^{\text{new}} = \frac{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}}) (x_j - \mu_i^{\text{new}})(x_j - \mu_i^{\text{new}})^T}{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}})}.
</math>
This equation calculates the new covariance matrix <math display="inline">\Sigma_i^{\text{new}}</math> for the <math display="inline">i</math>-th Gaussian. It measures the spread of pixel features around the new mean, weighted by the probabilities from the E-step.

Step 3 - Optimization:
To ensure color distinguishability for CVD viewers, the algorithm adjusts the mean vector of each Gaussian component using an optimization function that preserves the symmetric Kullback-Leibler (KL) divergence:
<math display="block">
D_{sKL}(G_i, G_j) = D_{KL}(G_i \| G_j) + D_{KL}(G_j \| G_i),
</math>
where:
* <math display="inline">D_{KL}(G_i \| G_j)</math> measures the dissimilarity between two Gaussian distributions <math display="inline">G_i</math> and <math display="inline">G_j</math>.

The optimization aims to preserve the contrast perceived by CVD viewers while maintaining naturalness. Weights are assigned to Gaussian components based on the perceptual importance of colors:
<math display="block">
\lambda_i = \frac{\sum_{j=1}^N \alpha_j p(i|x_j, \Theta)}{\sum_{k=1}^K \sum_{j=1}^N \alpha_j p(k|x_j, \Theta)},
</math>
where:
* <math display="inline">\alpha_j = \|x_j - \text{Sim}(x_j)\|</math> is the perceptual error of the <math display="inline">j</math>-th color feature when simulated for CVD,
* <math display="inline">\text{Sim}(\cdot)</math> is the simulation function for CVD perception.

Step 4 - Interpolation for Recoloring:
After optimizing the Gaussians, the mapping function <math display="inline">M_i(\cdot)</math> relocates the mean vectors while maintaining covariance matrices. Interpolation ensures smooth transitions between recolored regions:
<math display="block">
T(x_j)_H = x_j^H + \sum_{i=1}^K p(i|x_j, \Theta) (M_i(\mu_i)_H - \mu_i^H),
</math>
where:
* <math display="inline">T(x_j)_H</math> is the hue adjustment for the <math display="inline">j</math>-th color,
* <math display="inline">M_i(\mu_i)_H</math> is the mapped hue of the <math display="inline">i</math>-th Gaussian's mean.

While the GMM-based approach effectively models color distributions and enhances the contrast of recolored images significantly, it has limitations:
* The accuracy of recoloring depends on the choice of <math display="inline">K</math>, which may vary for different images.
* The method assumes diagonal covariance matrices for computational efficiency, which may oversimplify real-world color distributions. Sometimes the colors in the recolored images are not very natural.
* The high computational complexity in the optimization step of this algorithm may be difficult for real-time applications.

=== Deep Learning based methods ===
Conventional methods for recoloring, including optimization-based approaches (as discussed above), fail to generalize well across varying severity levels and CVD types. While these methods improve color differentiation, they frequently compromise naturalness or require extensive computational resources, making them less suitable for real-time, efficient, personalized applications.

==== Segmentation Guided Recoloring Method ====
One interesting method we found using machine learning algorithms is based on semantic segmentation. Chatzistamatis et al. [12] introduced a recoloring approach for digitized art paintings to enhance color perception for individuals with protanopia and deuteranopia. A key component of their method involves semantic segmentation, guided by transfer learning, to identify and preserve important visual elements in art paintings.

The segmentation process leverages the Mask R-CNN architecture, utilizing transfer learning to adapt from natural image datasets to the domain of art paintings. This adaptation involves the following steps:

1. Preprocessing and augmentation: images are preprocessed with techniques such as horizontal flips, random cropping, Gaussian blur, and affine transformations to enhance the diversity of the training set.

2. Feature extraction: the ResNet-101 backbone, pre-trained on the ImageNet dataset, extracts features from input images. Lower layers detect basic features like edges, while higher layers identify complex structures such as objects within paintings.

3. Region proposal: the Region Proposal Network (RPN) identifies regions of interest (RoIs) in the feature maps using sliding windows. These RoIs are further refined into accurate object boundaries.

4. Object masking: masks for the identified objects are generated to preserve fine-grained details of the paintings.

This semantic segmentation process divides the image pixels into two disjoint sets:
<math display="inline">T_V</math>, pixels belonging to segmented objects, and <math display="inline">T_U</math>, background pixels outside segmented objects.

By separating these sets, the algorithm focuses recoloring efforts on regions that are visually significant while maintaining the natural appearance of the background.

The recoloring process modifies the colors in the segmented object set <math display="inline">T_V</math> while leaving the background set <math display="inline">T_U</math> largely intact. Key steps include:

1. Color simulation: colors in <math display="inline">T_V</math> are transformed to simulate the perception of dichromatic viewers, enabling identification of indistinguishable colors.

2. Color clustering: fuzzy c-means clustering groups colors into clusters for efficient manipulation. Cluster centers, or "key colors," are adjusted to reduce color confusion while preserving visual coherence.

3. Recoloring optimization: an objective function is minimized to enhance contrast and naturalness, similar to the objective functions in the methods mentioned above:
<math display="block">
E = E_1 + E_2 + cE_3,
</math>
where:
where:
* <math display="inline">E_1</math> preserves the contrast between object colors and background colors:
<math display="block">
E_1 = \sum_{p \in T_V, q \in T_U} \|f(p) - f(q)\| - \|p - q\|,
</math>
* <math display="inline">E_2</math> enhances contrast within object colors:
<math display="block">
E_2 = \sum_{p, q \in T_V} \|f(p) - f(q)\| - \|p - q\|,
</math>
* <math display="inline">E_3</math> minimizes the perceptual difference between original and recolored key colors:
<math display="block">
E_3 = \sum_{p \in T_V} \|f(p) - p\|^2.
</math>

Here:
* <math display="inline">p</math> and <math display="inline">q</math> are pixel values in the image.
* <math display="inline">f(p)</math> is the recolored pixel value for pixel <math display="inline">p</math>.
* <math display="inline">\| \cdot \|</math> represents the Euclidean distance in the perceptual color space.
* <math display="inline">c</math> is a weighting factor that controls the importance of naturalness preservation relative to contrast enhancement.

While the method effectively balances contrast and naturalness for CVD viewers, it has several limitations:
* The success of segmentation relies heavily on the pre-trained Mask R-CNN model, which may not generalize well to all styles of art or real-life images.
* Semantic segmentation and optimization introduce significant computational overhead, making the method slow and less suitable for real-time applications.
* Errors in segmentation are difficult to control and may lead to misclassification of visually important regions, resulting in suboptimal recoloring.
* The focus of this method is restricted to protanopia and deuteranopia and without any flexibility for personalization.

==== GAN-Based Recoloring for CVD ====

In [1] GANs (Generative Adversarial Networks) was explored for recoloring, with a backbone Pix2Pix-GAN, Cycle-GAN, and Bicycle-GAN structure showing promising results. These models are generate creative recolored images by learning mappings between normal and CVD-affected color spaces. However, this and existing GAN approaches struggle with balancing naturalness and contrast. This specific reference also requires paired datasets (since it is adapted from style transfer), making it computationally intensive and less suitable for personalization.

==== Swin Transformer Recoloring ====

The authors in [2] introduced a hierarchical vision transformer (SWIN) architecture that processes images through shifted windows, effectively capturing both local and global contextual information. In computer vision, this design generally allows efficient handling of high-resolution images and has been applied to various tasks, including image classification and object detection. Despite its robust performance, this architecture is still computationally intensive and does not inherently account for the specific needs of CVD individuals, as it lacks mechanisms for personalized color adjustments.

==== Personalized CVD-GAN ====

To cater to the diverse needs of the CVD population, the Personalized CVD-GAN [3] was developed. This model generates images that are not only CVD-friendly but also tailored to individual degrees of color vision deficiency. By disentangling color representations using a unique triple-latent structure in their method, continuous personalization was possible to adjust images according to specific CVD severities. While effective, this approach is computationally demanding, making it less practical for real-time applications. In our experiment, it took around 18 days for one epoch (or one iteration over the entire dataset).

Thus, existing methods either lack personalization or are too resource-intensive for widespread use.

== Methods ==

=== Deep Learning based ===

==== Task Overview ====
Given an input RGB image and a label for the user (as shown in the figure), we want a deep learning model to output a recolored RGB image that is specific to that user. More details on inputs and outputs are discussed in further sections but an overview is shown in Figure 1. All of the code was done in Python using a deep learning framework called [https://pytorch.org PyTorch]
[[File:Io.png|right|thumb|200px|Figure 1: Dataset]]

==== Types ====
1. ''' Supervised methods ''':
These are deep learning models that require a 'ground truth' recolored image for the neural network to learn recolorization. While these methods are simple, easy to train and integrate the user label, they require an already present ground truth comparison of expected output.

2. ''' Unsupervised methods ''':
These models are trained without a ground truth and can also encode user label information while training. They are generally better at generating more natural images, but they require more compute and sophisticated model architectures or loss functions for the recoloring task

==== Dataset ====
The dataset used for this project was constructed specifically to address the challenges of recoloring images for individuals with color vision deficiency (CVD). We first gathered an open-source RGB image dataset from [2]. To improve the capability of the proposed model to enhance the contrast between CVD-indistinguishable color
pairs, in their study, they created a new dataset consisting of 141,000 pictures of both natural scenes and artificial images containing
CVD-confusing colors without labels. To generate labels (and ground truth recolored images for supervised methods), we randomly sampled 15,000 images and recolored by simulating random labels for severity and type of CVD. The recoloring for ground truth images was done using a [https://github.com/jbhuang0604/RecolorForColorblind/tree/master MATLAB script] (adapted to Python) from [4]. Note: The open-source tools used in the Python version for the recoloring script were [https://scikit-image.org Scikit-Image], [https://scipy.org Scipy] and [https://python-colormath.readthedocs.io/en/latest/ Colormath].

As shown in Figure 1, each sample in the dataset consists of:
1. ''' Original RGB Image''' : High-resolution images, resized to <code> 256x256</code> pixels and normalized to <code>[0,1]</code> range, representing the standard color space.

2. ''' CVD Labels ''' : Condition labels encoded as <code>severity * [protan, deutan]</code>, where severity ranges from 0.1 to 1.0. For example, a label <code>[0.6, 0]</code> corresponds to protanopia at 60% severity.

Data augmentation techniques such as random rotations, crops, and brightness adjustments were applied to expand the dataset, ensuring robust model generalization across diverse scenarios.

==== Supervised Methods ====
===== Conditional Parallel RGB MLP =====
[[File:mlp.png|right|thumb|Figure 2: Conditional MLP architecture]]
As shown in Figure 2, the model predicts the R, G, and B channels separately using an independent multi-layer perceptron (MLP) for each channel. The input image is concatenated with the label encoding along the channel dimension and is passed to 3 parallel MLPs simultaneously. These parallel networks are learned to predicted R, G, B channels of a recolored image based on given ground truth. The outputs from each of these networks are concatenated to produce the recolored RGB image of same spatial dimensions as input. Essentially, each channel is disentangled, enabling targeted adjustments.

The loss function used to train was pixel wise, mean-squared error loss:
<math>
\mathcal{L}_{\text{MSE}} = \frac{1}{N} \sum_{p=1}^{N} \left( I(p) - I'(p) \right)^2
</math>

Where:
* I, I': Recolored (model output) image and ground truth recolored image respectively
* p: Image index
* N: Total number of images

===== Conditional U-Net =====
In a similar fashion of inputs, a convolutional neural network (CNN)-based U-Net architecture was tested to generate a full recolored image as output. The conditional inputs here affect both the encoder and decoder. [[File:Unet condtional.png|right|thumb|Figure 3: Conditional U-Net architecture]]
U-Nets are widely used in computer vision tasks and are very robust to new tasks as well. The architecture we adopted is shown in Figure 3.
The loss function used to train the U-Net was a commonly used VGG Perceptual Loss:
<math>
\mathcal{L}_{\text{VGG}} = \sum_{l} \frac{1}{N_l} \| \phi_l(I) - \phi_l(I') \|_2^2
</math>

Where:
* I and I': are recolored (model output) and ground recolored images respectively
* <math>\phi_l</math> is the l-th of the pre-trained VGG network

==== Unsupervised Methods ====
===== Conditional Autoencoder =====
As shown in Figure4, an unsupervised CNN-based encoder-decoder network was trained to reconstruct full recolored images with a CVD-aware color palette. The key to making this network align with the recoloring task was the loss functions. The loss functions we used to train this network were inspired from [2]. [[File:Ae.png|right|350px|thumb|Figure 4: Conditional Autoencoder architecture]]

The total loss function is given by:
<math>
\mathcal{L}_{\text{total}} = \alpha \cdot \mathcal{L}_{\text{naturalness}} + 2 \cdot (1 - \alpha) \cdot \mathcal{L}_{\text{contrast}}
</math>

Where:
<math>
\mathcal{L}_{\text{contrast}} = \beta \cdot \mathcal{L}_{\text{global}} + (2 - \beta) \cdot \mathcal{L}_{\text{local}}
</math>

The components of the loss functions are described below:

1. '''Global Contrast Loss''':
The global contrast loss ensures that the overall contrast of the recolored image is preserved. It is defined as
<math>
\mathcal{L}_{global} = \frac{1}{\|\omega\|} \sum_{<x, y> \in \epsilon \omega} \text{CL}(x, y)
</math>

2. '''Local Contrast Loss''':
The local contrast loss focuses on preserving the contrast within a small neighborhood around each pixel. <math>
\mathcal{L}_{l} = \frac{1}{N} \sum_{x=1}^{N} \sum_{y \in \omega_x} \frac{\text{CL}(x, y)}{\|\omega_x\|}
</math>

Note:

<math>
\text{CL}(x, y) = \|\hat{c}_x' - \hat{c}_y'\| - \|c_x - c_y\|
</math>

* x,y: Two distinct pixels in the image.
* cx and cy: CVD simulated colors of original image
* c^x′and c^y: CVD simulated colors of recolored image (model output)
* ||w||: Global (or large) window of image
* ||wx||: Local window or neighborhood around a pixel x

3. '''Naturalness Loss''':
The naturalness loss drives output image to have colors that are visually similar and close to natural distributions. <math>
\mathcal{L}_{\text{natural}} = 1 - \text{SSIM}(I', I)
</math>

Where:
* I(i), I'(i): Original and recolored images respectively

== Results ==
=== Deep Learning based methods ===
The results focus on evaluating the performance of the above neural network architectures—Conditional Parallel RGB MLP, Deep U-Net, and Conditional Autoencoder. Quantitive metrics such as Structural Similarity Index (SSIM), total color contrast (TCC), Chromatic Difference (CD), and inference time were used to assess the effectiveness of the models provided in [1] and [2].

==== Qualitative Results ====
The recolored outputs were visually evaluated to determine their alignment with expected results. The 'expected' results for supervised mean how closely they resemble ground truth recolored image and for unsupervised method mean how much contrast and naturalness is observed in the CVD simulated recolored images compared to original.
The results and takeaways can be summarized as follows:

1. '''Conditional Parallel RGB MLP''': (Figure 5)
[[File:Mlp_res.png|right|400px|thumb|Figure 5 Conditional MLP: Model failure]]
* Recoloring was inconsistent, with visible artifacts in regions where spatial correlations were essential.
* The pixels seemed more discretized, suggesting that disentanglement was not very useful for this case (especially naturalness).
* Failed to preserve natural color transitions, particularly in complex images.
2. '''Conditional U-Net''': (Figure 6, 7)
[[File:Unet_res1.png|right|400px|thumb|Figure 6 Conditional U-Net: Model failure]]
[[File:Unet_res2.png|right|400px|thumb|Figure 7 Conditional U-Net: CVD Simulated examples]]
* Produced stable recoloring, preserving structural details.
* Initially showed improvement towards resembling ground truth, but over time started 'reconstructing' the colors of the original image.
* The CVD simulations of recolored versus original were similar or worse meaning that the model was not doing well for this task
* Sometimes it over-saturated some colors, affecting the visual appeal.
3. '''Conditional Autoencoder''': (Figure 8, 9)
[[File:ae_res1.png|right|400px|thumb|Figure 8 Conditional Autoencoder: Majority good results]]
[[File:ae_res1.png|right|400px|thumb|Figure 9 Conditional Autoencoder: Marginal or negative improvement + Blurriness]]
* Achieved smooth and natural recoloring, with fewer artifacts.
* Showed the highest contrast improvement among the three models.
* In some cases, hurt the contrast in the CVD simulated colors and in some there was marginal improvement in contrast.
* Blurriness in the recolored images was seen (possibly due to naturalness factor being more prioritized even though weight coefficients in the loss term favored contrast (alpha = 0.25, beta = 1.0)).

==== Quantitative Results ====
Based on the above qualitative results, we decided to score and evaluate metrics for comparison with related work only using the Conditional Autoencoder.
As mentioned above, the evaluation metrics are adapted from [1] and [2]. Please refer to the definitions in the paper, as we have used the same. On a high level, the three components are:
* SSIM: Measures the structural similarity between the original and recolored images, ensuring the structural integrity of the recolored image is maintained.
<math>
SSIM(X, Y) = \frac{(2\mu_X\mu_Y + c_1)(2\sigma_{XY} + c_2)}{(\mu_X^2 + \mu_Y^2 + c_1)(\sigma_X^2 + \sigma_Y^2 + c_2)}
</math>

* Total Color Contrast: Quantifies the visibility improvement between indistinguishable colors for CVD individuals.
<math>
TCC = \frac{1}{n_1} \sum_{(i,j) \in \Omega_1} |x_i - x_j|
+ \frac{1}{N \cdot n_2} \sum_{i=1}^{N} \sum_{j \in \Omega_2} |x_i - x_j|
</math>
* Chromatic Difference: Quantifies the perceptual differences in color before and after recoloring, ensuring enhanced distinguishability
<math>
CD(i) = \sqrt{\lambda (l_i' - l_i)^2 + (a_i' - a_i)^2 + (b_i' - b_i)^2}
</math>
(lamda is a constant, not wavelength and l,a,b represent LAB space coordinates of recolored (') and original respectively.)
* Inference Time: Determines the computational efficiency of the models.

The key results are in Table 1 and takeaways for the Conditional Autoencoder can be summarized as follows:

{| class="wikitable" style="text-align:center; width:30%; margin:auto;"
|-
! Metric
! Value
|-
| Inference Time
| 2.6 seconds/image
|-
| SSIM ("Structure")
| 0.8707
|-
| Total Color Contrast ("Distinguishability")
| 0.5771 / (~0.851)*
|-
| Chromatic Difference ("Color")
| 0.3521 / (~0.963)*
|+ '''Table 1: Quantitative Evaluation Results'''
|}

Note: * indicates results from paper [2] for protan/deutan whichever is larger.

* TCC and CD are good but not as good as paper [2] because they use optimize networks for each CVD type separately.
* Blurry (SSIM is not optimized for enough)
* Mixing CVD types in the same network needs to be more sophisticated

== Conclusions ==
Through our (many) experiments, we learned a couple of things:

1. '''Model Effectiveness''':
Among the models, the Conditional Autoencoder showed the best balance between enhancing color contrast and preserving naturalness. It improved the distinguishability of colors for CVD individuals while maintaining a smooth, visually appealing output. However, it produced slightly blurry images, which could be improved with better loss functions or refinement techniques. The Conditional U-Net was also effective in preserving structure and providing stable recoloring, but it required careful training to avoid overfitting. The Conditional Parallel RGB MLP, while computationally fast, lacked the ability to capture spatial relationships between pixels, making it unsuitable for this task.

2. '''Importance of Loss Functions''':
Designing appropriate loss functions was crucial for achieving the right balance between naturalness, contrast enhancement, and structural preservation. The global and local contrast losses significantly improved the visibility of recolored images, while the naturalness loss ensured that the outputs did not look artificial. Incorporating metrics like SSIM and Chromatic Difference into the evaluation also helped us better understand how well the models performed.

3. '''Challenges with Data''':
One of the biggest challenges was ensuring that the dataset effectively represented real-world scenarios for CVD individuals. Simulating CVD perceptions and generating recolored images that matched those perceptions required a well-defined pipeline. A more diverse dataset or additional user studies with CVD participants could help fine-tune the models further.

4. '''Computational Efficiency''':
While models like the Conditional Autoencoder and Conditional U-Net provided high-quality recoloring, their inference times were moderate, making them feasible for real-time applications. Optimizing these models further could make them more scalable for real-world use cases, such as accessibility tools in apps or websites.

5. '''What Worked and What Didn’t''':
* Worked: Contrast enhancement methods using local and global losses were effective in improving visibility for CVD individuals. Transformer-inspired loss functions borrowed from Swin architecture added robustness.
* Didn’t Work: Pixel-wise methods like the Conditional RGB MLP struggled due to their inability to handle spatial dependencies. Additionally, overfitting was a recurring issue in larger architectures without careful training.

6. '''Future Directions''':
* Better Loss Functions: Refining the loss functions to address issues like blurriness in outputs could further improve results.
* User Studies: Testing the models with real CVD participants would provide valuable insights and help validate the results.
* Model Optimization: Reducing the computational cost of high-performing models like the Conditional Autoencoder could make them more practical for deployment.
* Exploration of New Architectures: Trying newer methods, such as lightweight transformers or diffusion-based models, might enhance recoloring performance while maintaining efficiency.

While there’s still room for improvement, our models demonstrated the potential of deep learning in addressing the challenges faced by individuals with CVD. Our future work would focus on refining these methods and bringing them closer to practical, everyday applications.

== References ==
[1] Li, H., Zhang, L., Zhang, X., Zhang, M., Zhu, G., Shen, P., ... & Shah, S. A. A. (2020). Color vision deficiency datasets & recoloring evaluation using GANs. Multimedia Tools and Applications, 79, 27583-27614.

[2] Chen, L., Zhu, Z., Huang, W., Go, K., Chen, X., & Mao, X. (2024). Image recoloring for color vision deficiency compensation using Swin transformer. Neural Computing and Applications, 36(11), 6051-6066.

[3] Jiang, S., Liu, D., Li, D., & Xu, C. (2023). Personalized image generation for color vision deficiency population. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22571-22580).

[4] Huang, J.-B., Chen, C.-S., Jen, T.-C., & Wang, S.-J. (n.d.). Image recolorization for the colorblind [GitHub repository]. Retrieved December 12, 2024, from https://github.com/jbhuang0604/RecolorForColorblind

[5] Dietrich, J. (n.d.). Daltonize Python Package [GitHub repository]. Retrieved December 12, 2024, from https://github.com/joergdietrich/daltonize/blob/main/daltonize/daltonize.py

[6] Dougherty, B., & Wade, A. (2000). Vischeck. Retrieved December 12, 2024, from https://www.vischeck.com/

[7] Brettel, H., Viénot, F., & Mollon, J. D. (1997). Computerized simulation of color appearance for dichromats. Josa a, 14(10), 2647-2655.

[8] Zhu, Z., Toyoura, M., Go, K., Fujishiro, I., Kashiwagi, K., & Mao, X. (2019). Processing images for red–green dichromats compensation via naturalness and information-preservation considered recoloring. The Visual Computer, 35, 1053-1066.

[9] Zhu, Z., Toyoura, M., Go, K., Kashiwagi, K., Fujishiro, I., Wong, T. T., & Mao, X. (2021). Personalized image recoloring for color vision deficiency compensation. IEEE Transactions on Multimedia, 24, 1721-1734.

[10] Tsekouras, G. E., Rigos, A., Chatzistamatis, S., Tsimikas, J., Kotis, K., Caridakis, G., & Anagnostopoulos, C. N. (2021). A novel approach to image recoloring for color vision deficiency. Sensors, 21(8), 2740.

[11] Huang, J. B., Chen, C. S., Jen, T. C., & Wang, S. J. (2009, April). Image recolorization for the colorblind. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 1161-1164). IEEE.

== Appendix I ==
- Upload source code, test images, etc, and give a description of each link. In some cases, your acquired data may be too large to store practically. In this case, use your judgement (or consult one of us) and only link the most relevant data. Be sure to describe the purpose of your code and to edit the code for clarity. The purpose of placing the code online is to allow others to verify your methods and to learn from your ideas.

== Appendix II ==
'''Ishikaa''':
* Training, evaluation and visualization for each of MLP, U-Net and Autoencoder
* Recolorization script (adapting from MATLAB) and adding severity index
* 'Ground Truth' dataset creation and logging
* AWS Compute setup and configuration
* Written Report & Presentation

'''Raina''':

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-13T05:10:05Z

Rainas: /* Segmentation-based Method */

== Introduction ==
Color Vision Deficiency (CVD) affects approximately 350 million individuals worldwide, impairing their ability to distinguish certain colors. Image recoloring for individuals with CVDs has been a well-researched area, with numerous attempts aimed at creating images that make colors more distinguishable for those with CVDs. CVDs are typically classified into three main types: protanopia (difficulty perceiving red), deuteranopia (difficulty perceiving green), and tritanopia (difficulty perceiving blue). These conditions arise due to the absence or malfunction of certain types of cone cells in the retina, which are responsible for color vision. For instance, the lack or defect of red or green cones leads to protanopia or deuteranopia, respectively. This can result in partial or complete loss of perception of specific colors. Moreover, individuals with CVD can be dichromats (completely missing one type of cone cell), or anomalous trichromats (having all three types of cones but with altered sensitivity), causing milder color perception issues. Most research has focused on protanopia and deuteranopia, as these conditions are more prevalent, and only a few consider different severity levels.

In this work, we aim to consider all three types of CVDs, taking into account varying levels of severity for personalization. We explore existing methods in the field and experiment through two main approaches: mathematical transformations and deep learning techniques. We will start by reviewing current advancements in these two domains, followed by presenting our experiments and results. Evaluations of each method will be provided, leading to a discussion of our findings and outlining potential directions for future work.

The motivation for this work arose from a personal experience. While admiring a rainbow, a phenomenon I find among the most beautiful in the world, with a friend who has deuteranopia, I realized that they were unable to distinguish the vibrant array of colors. This experience highlighted the emotional and perceptual gap caused by CVD, inspiring the goal of this project: to develop personalized and efficient tools that enhance color perception for individuals with CVDs. Ultimately, we aim to enable those with CVDs to enjoy the same vivid experiences, such as the beauty of a rainbow, experienced by those with normal color vision.

== Background ==
In recent years, numerous methods have been developed to recolor images for individuals with CVDs, ranging from traditional mathematical approaches to advanced deep learning techniques. This section focuses on the prominent recent works in these two categories.

=== Mathematical-based methods ===
Mathematical approaches to image recoloring for individuals with CVDs have been extensively developed to enhance color discrimination while trying to preserve the natural appearance of images. These methods typically involve color space transformations, optimization techniques, and perceptual modeling to achieve their objectives.

==== Daltonization ====
Daltonization enhances images for individuals with CVD by correcting colors based on the simulated deficiency. The process involves comparing the original LMS values with the simulated deficient values to compute the error:
<math display="block">
\text{Error}_{\text{LMS}} = \text{LMS}_{\text{original}} - \text{LMS}_{\text{simulated}}
</math>

The error is then mapped back to the RGB space using a correction matrix. For example, the correction matrix for protanopia, as implemented in tools like Vischeck [6], is:
<math display="block"> \text{Correction Matrix for Protanopia} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

The corrected RGB values are added back to the original LMS values to generate a daltonized image that improves contrast for CVD viewers.

The simulation of CVDs relies on the physiology of human vision, particularly the responses of the Long (L), Medium (M), and Short (S) wavelength-sensitive cones in the retina. The LMS color space is derived from the spectral sensitivities of these cones, making it an ideal framework for modeling human color perception.

To simulate CVD, colors are first transformed into the LMS color space using the following linear transformation matrix based on Stockman and Sharpe’s cone fundamentals:
<math display="block">
T_{\text{RGB-to-LMS}} = \begin{bmatrix}
0.3904725 & 0.54990437 & 0.00890159 \\
0.07092586 & 0.96310739 & 0.00135809 \\
0.02314268 & 0.12801221 & 0.93605194
\end{bmatrix}
</math>

For individuals with CVD, the missing cone’s response is replaced by a weighted combination of the remaining two cones. This approach, introduced by Brettel, Viénot, and Mollon (1997) [7], uses specific coefficients derived from cone sensitivities. For example, in protanopia (L-cone deficiency), the L-cone response is approximated using the M- and S-cone responses as:
<math display="block">
L_{\text{simulated}} = 0 \cdot L + 0.90822864 \cdot M + 0.008192 \cdot S
</math>

For deuteranopia (M-cone deficiency), the M-cone is replaced as:
<math display="block">
M_{\text{simulated}} = 1.10104433 \cdot L + 0 \cdot M - 0.00901975 \cdot S
</math>

For tritanopia (S-cone deficiency), the S-cone is replaced as:
<math display="block">
S_{\text{simulated}} = -0.15773032 \cdot L + 1.19465634 \cdot M + 0 \cdot S
</math>

These transformations allow accurate simulation of the perceptual experience of individuals with CVD. (The numbers are derived from [5]).

==== Optimization-based Method ====
Zhu et al. [8] introduced an optimization-based recoloring framework for red-green dichromacy, aiming to balance naturalness and contrast. The framework minimizes a total loss function defined as:

<math display="block"> E = \beta E_{\text{nat}} + E_{\text{cont}} </math>

where <math>\beta</math> is a scalar weight that controls the trade-off between the two objectives: naturalness preservation (<math>E_{\text{nat}}</math>) and contrast enhancement (<math>E_{\text{cont}}</math>).

The naturalness term, <math>E_{\text{nat}}</math>, ensures that the recolored image closely resembles the original image for CVD viewers by minimizing perceptual differences:

<math display="block"> E_{\text{nat}} = \sum_{i=1}^N \| c_i^+ - c_i \|^2, </math>

where:
* <math>N</math> is the total number of pixels in the image,
* <math>c_i</math> is the original color of the <math>i</math>-th pixel,
* <math>c_i^+</math> is the recolored value of the <math>i</math>-th pixel,
* <math>\| c_i^+ - c_i \|</math> is the Euclidean distance, measuring the perceptual difference between the original and recolored colors.

The contrast term, <math>E_{\text{cont}}</math>, enhances the distinguishability of colors in the recolored image by minimizing changes in color contrast:

<math display="block"> E_{\text{cont}} = \sum_{i \neq j} \| (c_i^+ - c_j^+) - (c_i - c_j) \|^2, </math>

where:
* <math>(c_i^+ - c_j^+)</math> is the perceived color difference between pixels <math>i</math> and <math>j</math> after recoloring,
* <math>(c_i - c_j)</math> is the original color difference,
* <math>\| (c_i^+ - c_j^+) - (c_i - c_j) \|</math> represents the deviation in color contrast before and after recoloring.

To address the limitations of this approach, Zhu et al. [9] proposed a degree-adaptable framework incorporating a transformation matrix <math>T</math> that simulates CVD perception. The transformation matrix is defined as:

<math display="block"> T = \begin{bmatrix} t_{11} & t_{12} & t_{13} \\ t_{21} & t_{22} & t_{23} \\ t_{31} & t_{32} & t_{33} \end{bmatrix}, </math>

where <math>t_{ij}</math> are the elements representing the relationships between the original and perceived LMS (Long, Medium, Short wavelength) cone responses for individuals with CVD.

The degree-adaptable loss function extends the optimization by adjusting weights based on perceptual importance, defined as:

<math display="block"> E = \beta \sum_{i=1}^N \alpha_i \| T(c_i^+ - c_i) \|^2 + \sum_{i \neq j} \| T(c_i^+ - c_j^+) - T(c_i - c_j) \|^2. </math>

Here:
* <math>\alpha_i</math> assigns weights to each pixel, prioritizing the preservation of colors with smaller perception errors,
* <math>\| T(c_i^+ - c_i) \|</math> measures the perceptual difference after recoloring,
* <math>\| T(c_i^+ - c_j^+) - T(c_i - c_j) \|</math> quantifies the deviation in color contrast under CVD simulation.

This framework improves both contrast and personalization but requires further optimization for real-time performance.

==== Confusion lines based Method ====
Tsekouras et al. [10] proposed a novel image recoloring approach for individuals with protanopia and deuteranopia, focusing on improving color naturalness and enhancing contrast. Their framework consists of four modules, with a key focus on shifting confusing colors along confusion lines in the CIE 1931 chromaticity diagram.

The method begins with fuzzy clustering to extract representative colors (key colors) from the input image. These colors are mapped onto the CIE 1931 chromaticity diagram, where confusion lines represent loci of colors perceived as identical by individuals with CVD. Confusion lines are defined using the copunctal point of the missing cone type and another reference point:

<math display="block">
d(v, L) = \frac{\left|(x_{cp} - x_0)(y_0 - y_v) - (x_0 - x_v)(y_{cp} - y_0)\right|}{\sqrt{(x_{cp} - x_0)^2 + (y_{cp} - y_0)^2}},
</math>

where:
* <math display="inline">v = (x_v, y_v)</math> is the chromaticity coordinate of the color,
* <math display="inline">L</math> is the confusion line passing through the copunctal point <math display="inline">(x_{cp}, y_{cp})</math> and another reference point <math display="inline">(x_0, y_0)</math>,
* <math display="inline">d(v, L)</math> measures the perpendicular distance from the point <math display="inline">v</math> to the confusion line <math display="inline">L</math>.

Confusing colors, identified as key colors lying on occupied confusion lines, are iteratively shifted to the nearest non-occupied confusion lines to enhance discriminability for CVD viewers. The translation process involves:

1. Ranking key colors by their cluster sizes:
<math display="block">
\text{rank}(v_i) = \frac{|A_i|}{\sum_{j=1}^{n_A}|A_j|},
</math>

where:
* <math display="inline">v_i</math> is the chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">|A_i|</math> is the cardinality (number of pixels) of its associated cluster,
* <math display="inline">n_A</math> is the total number of clusters.

2. Translating the highest-ranked confusing color <math display="inline">v^*</math> to its projection on the nearest non-occupied confusion line:
<math display="block">
v^*_{\text{tr}} = \text{proj}(v^*, L^*),
</math>

where:
* <math display="inline">v^*_{\text{tr}}</math> is the new position of the color <math display="inline">v^*</math> after translation,
* <math display="inline">L^*</math> is the nearest non-occupied confusion line, determined as:
<math display="block">
d(v^*, L^*) = \min_{L \in \text{CL}_D} d(v^*, L).
</math>

3. Updating the sets of confusing colors and non-occupied confusion lines iteratively:
<math display="block">
\Phi_V = \Phi_V - \{v^*\}, \quad \text{CL}_D = \text{CL}_D - \{L^*\}.
</math>

where:
* <math display="inline">\Phi_V</math> is the set of confusing colors,
* <math display="inline">\text{CL}_D</math> is the set of non-occupied confusion lines.

After shifting, the luminance of the recolored key colors is optimized using a regularized objective function to balance naturalness and contrast:
<math display="block">
E = (E_1 + E_2) + \lambda E_3,
</math>

where:
* <math display="inline">E</math> is the total loss,
* <math display="inline">\lambda</math> is a weight parameter controlling the trade-off between contrast enhancement and naturalness preservation.

The first term, <math display="inline">E_1</math>, measures contrast enhancement for normal trichromats:

<math display="block">
E_1 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - b_j\| - \|f_D(a_{i,\text{rec}}) - f_D(b_j)\| \right|,
</math>

where:
* <math display="inline">n_A</math> and <math display="inline">n_B</math> are the number of key colors in clusters <math display="inline">A</math> and <math display="inline">B</math>, respectively,
* <math display="inline">a_i</math> is the chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">b_j</math> is the chromaticity of the <math display="inline">j</math>-th key color in cluster <math display="inline">B</math>,
* <math display="inline">f_D</math> is a function simulating the dichromatic vision of individuals with color vision deficiencies,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color.

The second term, <math display="inline">E_2</math>, measures contrast enhancement for dichromats:

<math display="block">
E_2 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - a_j\| - \|f_D(a_{i,\text{rec}}) - f_D(a_{j,\text{rec}})\| \right|,
</math>

where:
* <math display="inline">a_i</math> and <math display="inline">a_j</math> are the chromaticities of the <math display="inline">i</math>-th and <math display="inline">j</math>-th key colors in cluster <math display="inline">A</math>,
* <math display="inline">f_D(a_{i,\text{rec}})</math> simulates the dichromatic perception of the recolored chromaticity <math display="inline">a_{i,\text{rec}}</math>.

The third term, <math display="inline">E_3</math>, preserves the naturalness of the recolored image:

<math display="block">
E_3 = \frac{1}{n_A} \sum_{i=1}^{n_A} \|a_i - a_{i,\text{rec}}\|,
</math>

where:
* <math display="inline">a_i</math> is the original chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">\|a_i - a_{i,\text{rec}}\|</math> is the Euclidean distance between the original and recolored chromaticities, measuring how much the naturalness is preserved.

This method significantly enhances the contrast and naturalness of recolored images by leveraging confusion line geometry and regularized optimization. However, challenges remain in achieving real-time performance and handling cases where shifting may distort the aesthetic quality of the image.

==== Segmentation Guided Recoloring Method ====
Lastly, one interesting method we found using machine learning algorithms is based on semantic segmentation. Chatzistamatis et al. [12] introduced a recoloring approach for digitized art paintings to enhance color perception for individuals with protanopia and deuteranopia. A key component of their method involves semantic segmentation, guided by transfer learning, to identify and preserve important visual elements in art paintings.

The segmentation process leverages the Mask R-CNN architecture, utilizing transfer learning to adapt from natural image datasets to the domain of art paintings. This adaptation involves the following steps:

1. Preprocessing and augmentation: images are preprocessed with techniques such as horizontal flips, random cropping, Gaussian blur, and affine transformations to enhance the diversity of the training set.

2. Feature extraction: the ResNet-101 backbone, pre-trained on the ImageNet dataset, extracts features from input images. Lower layers detect basic features like edges, while higher layers identify complex structures such as objects within paintings.

3. Region proposal: the Region Proposal Network (RPN) identifies regions of interest (RoIs) in the feature maps using sliding windows. These RoIs are further refined into accurate object boundaries.

4. Object masking: masks for the identified objects are generated to preserve fine-grained details of the paintings.

This semantic segmentation process divides the image pixels into two disjoint sets:
<math display="inline">T_V</math>, pixels belonging to segmented objects, and <math display="inline">T_U</math>, background pixels outside segmented objects.

By separating these sets, the algorithm focuses recoloring efforts on regions that are visually significant while maintaining the natural appearance of the background.

The recoloring process modifies the colors in the segmented object set <math display="inline">T_V</math> while leaving the background set <math display="inline">T_U</math> largely intact. Key steps include:

1. Color simulation: colors in <math display="inline">T_V</math> are transformed to simulate the perception of dichromatic viewers, enabling identification of indistinguishable colors.

2. Color clustering: fuzzy c-means clustering groups colors into clusters for efficient manipulation. Cluster centers, or "key colors," are adjusted to reduce color confusion while preserving visual coherence.

3. Recoloring optimization: an objective function is minimized to enhance contrast and naturalness, similar to the objective functions in the methods mentioned above:
<math display="block">
E = E_1 + E_2 + cE_3,
</math>
where:
where:
* <math display="inline">E_1</math> preserves the contrast between object colors and background colors:
<math display="block">
E_1 = \sum_{p \in T_V, q \in T_U} \|f(p) - f(q)\| - \|p - q\|,
</math>
* <math display="inline">E_2</math> enhances contrast within object colors:
<math display="block">
E_2 = \sum_{p, q \in T_V} \|f(p) - f(q)\| - \|p - q\|,
</math>
* <math display="inline">E_3</math> minimizes the perceptual difference between original and recolored key colors:
<math display="block">
E_3 = \sum_{p \in T_V} \|f(p) - p\|^2.
</math>

Here:
* <math display="inline">p</math> and <math display="inline">q</math> are pixel values in the image.
* <math display="inline">f(p)</math> is the recolored pixel value for pixel <math display="inline">p</math>.
* <math display="inline">\| \cdot \|</math> represents the Euclidean distance in the perceptual color space.
* <math display="inline">c</math> is a weighting factor that controls the importance of naturalness preservation relative to contrast enhancement.

While the method effectively balances contrast and naturalness for CVD viewers, it has several limitations:
* The success of segmentation relies heavily on the pre-trained Mask R-CNN model, which may not generalize well to all styles of art or real-life images.
* Semantic segmentation and optimization introduce significant computational overhead, making the method slow and less suitable for real-time applications.
* Errors in segmentation are difficult to control and may lead to misclassification of visually important regions, resulting in suboptimal recoloring.
* The focus of this method is restricted to protanopia and deuteranopia and without any flexibility for personalization.

==== GMM-based Method ====
Huang et al. [11] proposed an efficient and effective re-coloring algorithm for individuals with CVD using a Gaussian Mixture Model (GMM) to represent color distributions. The algorithm comprises four main steps: feature extraction, clustering using GMM, optimization of Gaussian components, and interpolation for recoloring.

Step 1 - Feature Extraction:
Each pixel in the input image is represented in the CIEL*a*b* color space, which approximates perceptual differences using the Euclidean distance between colors. The color feature vector <math display="inline">x</math> is used as input for clustering.

Step 2 - Clustering via GMM:
The color distribution of the image is modeled using a GMM with <math display="inline">K</math> Gaussian components:
<math display="block">
p(x|\Theta) = \sum_{i=1}^K \omega_i G_i(x|\theta_i),
</math>
where:
* <math display="inline">\Theta</math> is the parameter set containing all weights, means, and covariance matrices,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian,
* <math display="inline">G_i(x|\theta_i)</math> is the 3D normal distribution with parameters <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix).

Diving into more details, the parameters of the GMM are initialized using the K-means algorithm and refined via the Expectation-Maximization (EM) algorithm, which consists of the E-step and the M-step:

The E-step calculates the probability of each color (or pixel) belonging to a specific Gaussian component in the GMM. This probability, also known as the "responsibility," reflects how much each Gaussian contributes to the representation of a color:

<math display="block">
p(i|x_j, \Theta^{\text{old}}) = \frac{\omega_i G_i(x_j|\theta_i)}{\sum_{k=1}^K \omega_k G_k(x_j|\theta_k)}.
</math>

Here:
* <math display="inline">p(i|x_j, \Theta^{\text{old}})</math> is the probability of the <math display="inline">j</math>-th color feature <math display="inline">x_j</math> belonging to the <math display="inline">i</math>-th Gaussian component,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian, representing its overall contribution to the color distribution,
* <math display="inline">G_i(x_j|\theta_i)</math> is the Gaussian distribution for the <math display="inline">i</math>-th component, evaluated at <math display="inline">x_j</math>, where <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix),
* <math display="inline">\sum_{k=1}^K \omega_k G_k(x_j|\theta_k)</math> normalizes the probabilities by considering the contributions of all <math display="inline">K</math> Gaussians to the <math display="inline">j</math>-th pixel.

This step essentially assigns each pixel a "soft" membership to each Gaussian component, rather than forcing a hard clustering decision. Pixels that are close to a Gaussian's mean (in feature space) will have higher probabilities of belonging to that Gaussian.

The M-step updates the parameters of each Gaussian component based on the probabilities computed in the E-step. These updates refine the Gaussian model to better fit the data:

1. Update the mixing weights:
<math display="block">
\omega_i^{\text{new}} = \frac{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}})}{N},
</math>
This equation calculates the proportion of pixels assigned to the <math display="inline">i</math>-th Gaussian. It reflects how dominant each Gaussian is in representing the color distribution.

2. Update the means:
<math display="block">
\mu_i^{\text{new}} = \frac{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}}) x_j}{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}})},
</math>
This equation computes the new mean vector <math display="inline">\mu_i^{\text{new}}</math> for the <math display="inline">i</math>-th Gaussian. It is a weighted average of all pixel feature vectors <math display="inline">x_j</math>, where the weights are the probabilities <math display="inline">p(i|x_j, \Theta^{\text{old}})</math>. Pixels with higher probabilities contribute more to the new mean.

3. Update the covariance matrices:
<math display="block">
\Sigma_i^{\text{new}} = \frac{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}}) (x_j - \mu_i^{\text{new}})(x_j - \mu_i^{\text{new}})^T}{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}})}.
</math>
This equation calculates the new covariance matrix <math display="inline">\Sigma_i^{\text{new}}</math> for the <math display="inline">i</math>-th Gaussian. It measures the spread of pixel features around the new mean, weighted by the probabilities from the E-step.

Step 3 - Optimization:
To ensure color distinguishability for CVD viewers, the algorithm adjusts the mean vector of each Gaussian component using an optimization function that preserves the symmetric Kullback-Leibler (KL) divergence:
<math display="block">
D_{sKL}(G_i, G_j) = D_{KL}(G_i \| G_j) + D_{KL}(G_j \| G_i),
</math>
where:
* <math display="inline">D_{KL}(G_i \| G_j)</math> measures the dissimilarity between two Gaussian distributions <math display="inline">G_i</math> and <math display="inline">G_j</math>.

The optimization aims to preserve the contrast perceived by CVD viewers while maintaining naturalness. Weights are assigned to Gaussian components based on the perceptual importance of colors:
<math display="block">
\lambda_i = \frac{\sum_{j=1}^N \alpha_j p(i|x_j, \Theta)}{\sum_{k=1}^K \sum_{j=1}^N \alpha_j p(k|x_j, \Theta)},
</math>
where:
* <math display="inline">\alpha_j = \|x_j - \text{Sim}(x_j)\|</math> is the perceptual error of the <math display="inline">j</math>-th color feature when simulated for CVD,
* <math display="inline">\text{Sim}(\cdot)</math> is the simulation function for CVD perception.

Step 4 - Interpolation for Recoloring:
After optimizing the Gaussians, the mapping function <math display="inline">M_i(\cdot)</math> relocates the mean vectors while maintaining covariance matrices. Interpolation ensures smooth transitions between recolored regions:
<math display="block">
T(x_j)_H = x_j^H + \sum_{i=1}^K p(i|x_j, \Theta) (M_i(\mu_i)_H - \mu_i^H),
</math>
where:
* <math display="inline">T(x_j)_H</math> is the hue adjustment for the <math display="inline">j</math>-th color,
* <math display="inline">M_i(\mu_i)_H</math> is the mapped hue of the <math display="inline">i</math>-th Gaussian's mean.

While the GMM-based approach effectively models color distributions and enhances the contrast of recolored images significantly, it has limitations:
* The accuracy of recoloring depends on the choice of <math display="inline">K</math>, which may vary for different images.
* The method assumes diagonal covariance matrices for computational efficiency, which may oversimplify real-world color distributions. Sometimes the colors in the recolored images are not very natural.
* The high computational complexity in the optimization step of this algorithm may be difficult for real-time applications.

=== Deep Learning based methods ===
Conventional methods for recoloring, including optimization-based approaches (as discussed above), fail to generalize well across varying severity levels and CVD types. While these methods improve color differentiation, they frequently compromise naturalness or require extensive computational resources, making them less suitable for real-time, efficient, personalized applications.

==== GAN-Based Recoloring for CVD ====

In [1] GANs (Generative Adversarial Networks) was explored for recoloring, with a backbone Pix2Pix-GAN, Cycle-GAN, and Bicycle-GAN structure showing promising results. These models are generate creative recolored images by learning mappings between normal and CVD-affected color spaces. However, this and existing GAN approaches struggle with balancing naturalness and contrast. This specific reference also requires paired datasets (since it is adapted from style transfer), making it computationally intensive and less suitable for personalization.

==== Swin Transformer Recoloring ====

The authors in [2] introduced a hierarchical vision transformer (SWIN) architecture that processes images through shifted windows, effectively capturing both local and global contextual information. In computer vision, this design generally allows efficient handling of high-resolution images and has been applied to various tasks, including image classification and object detection. Despite its robust performance, this architecture is still computationally intensive and does not inherently account for the specific needs of CVD individuals, as it lacks mechanisms for personalized color adjustments.

==== Personalized CVD-GAN ====

To cater to the diverse needs of the CVD population, the Personalized CVD-GAN [3] was developed. This model generates images that are not only CVD-friendly but also tailored to individual degrees of color vision deficiency. By disentangling color representations using a unique triple-latent structure in their method, continuous personalization was possible to adjust images according to specific CVD severities. While effective, this approach is computationally demanding, making it less practical for real-time applications. In our experiment, it took around 18 days for one epoch (or one iteration over the entire dataset).

Thus, existing methods either lack personalization or are too resource-intensive for widespread use.

== Methods ==

=== Deep Learning based ===

==== Task Overview ====
Given an input RGB image and a label for the user (as shown in the figure), we want a deep learning model to output a recolored RGB image that is specific to that user. More details on inputs and outputs are discussed in further sections but an overview is shown in Figure 1. All of the code was done in Python using a deep learning framework called [https://pytorch.org PyTorch]
[[File:Io.png|right|thumb|200px|Figure 1: Dataset]]

==== Types ====
1. ''' Supervised methods ''':
These are deep learning models that require a 'ground truth' recolored image for the neural network to learn recolorization. While these methods are simple, easy to train and integrate the user label, they require an already present ground truth comparison of expected output.

2. ''' Unsupervised methods ''':
These models are trained without a ground truth and can also encode user label information while training. They are generally better at generating more natural images, but they require more compute and sophisticated model architectures or loss functions for the recoloring task

==== Dataset ====
The dataset used for this project was constructed specifically to address the challenges of recoloring images for individuals with color vision deficiency (CVD). We first gathered an open-source RGB image dataset from [2]. To improve the capability of the proposed model to enhance the contrast between CVD-indistinguishable color
pairs, in their study, they created a new dataset consisting of 141,000 pictures of both natural scenes and artificial images containing
CVD-confusing colors without labels. To generate labels (and ground truth recolored images for supervised methods), we randomly sampled 15,000 images and recolored by simulating random labels for severity and type of CVD. The recoloring for ground truth images was done using a [https://github.com/jbhuang0604/RecolorForColorblind/tree/master MATLAB script] (adapted to Python) from [4]. Note: The open-source tools used in the Python version for the recoloring script were [https://scikit-image.org Scikit-Image], [https://scipy.org Scipy] and [https://python-colormath.readthedocs.io/en/latest/ Colormath].

As shown in Figure 1, each sample in the dataset consists of:
1. ''' Original RGB Image''' : High-resolution images, resized to <code> 256x256</code> pixels and normalized to <code>[0,1]</code> range, representing the standard color space.

2. ''' CVD Labels ''' : Condition labels encoded as <code>severity * [protan, deutan]</code>, where severity ranges from 0.1 to 1.0. For example, a label <code>[0.6, 0]</code> corresponds to protanopia at 60% severity.

Data augmentation techniques such as random rotations, crops, and brightness adjustments were applied to expand the dataset, ensuring robust model generalization across diverse scenarios.

==== Supervised Methods ====
===== Conditional Parallel RGB MLP =====
[[File:mlp.png|right|thumb|Figure 2: Conditional MLP architecture]]
As shown in Figure 2, the model predicts the R, G, and B channels separately using an independent multi-layer perceptron (MLP) for each channel. The input image is concatenated with the label encoding along the channel dimension and is passed to 3 parallel MLPs simultaneously. These parallel networks are learned to predicted R, G, B channels of a recolored image based on given ground truth. The outputs from each of these networks are concatenated to produce the recolored RGB image of same spatial dimensions as input. Essentially, each channel is disentangled, enabling targeted adjustments.

The loss function used to train was pixel wise, mean-squared error loss:
<math>
\mathcal{L}_{\text{MSE}} = \frac{1}{N} \sum_{p=1}^{N} \left( I(p) - I'(p) \right)^2
</math>

Where:
* I, I': Recolored (model output) image and ground truth recolored image respectively
* p: Image index
* N: Total number of images

===== Conditional U-Net =====
In a similar fashion of inputs, a convolutional neural network (CNN)-based U-Net architecture was tested to generate a full recolored image as output. The conditional inputs here affect both the encoder and decoder. [[File:Unet condtional.png|right|thumb|Figure 3: Conditional U-Net architecture]]
U-Nets are widely used in computer vision tasks and are very robust to new tasks as well. The architecture we adopted is shown in Figure 3.
The loss function used to train the U-Net was a commonly used VGG Perceptual Loss:
<math>
\mathcal{L}_{\text{VGG}} = \sum_{l} \frac{1}{N_l} \| \phi_l(I) - \phi_l(I') \|_2^2
</math>

Where:
* I and I': are recolored (model output) and ground recolored images respectively
* <math>\phi_l</math> is the l-th of the pre-trained VGG network

==== Unsupervised Methods ====
===== Conditional Autoencoder =====
As shown in Figure4, an unsupervised CNN-based encoder-decoder network was trained to reconstruct full recolored images with a CVD-aware color palette. The key to making this network align with the recoloring task was the loss functions. The loss functions we used to train this network were inspired from [2]. [[File:Ae.png|right|350px|thumb|Figure 4: Conditional Autoencoder architecture]]

The total loss function is given by:
<math>
\mathcal{L}_{\text{total}} = \alpha \cdot \mathcal{L}_{\text{naturalness}} + 2 \cdot (1 - \alpha) \cdot \mathcal{L}_{\text{contrast}}
</math>

Where:
<math>
\mathcal{L}_{\text{contrast}} = \beta \cdot \mathcal{L}_{\text{global}} + (2 - \beta) \cdot \mathcal{L}_{\text{local}}
</math>

The components of the loss functions are described below:

1. '''Global Contrast Loss''':
The global contrast loss ensures that the overall contrast of the recolored image is preserved. It is defined as
<math>
\mathcal{L}_{global} = \frac{1}{\|\omega\|} \sum_{<x, y> \in \epsilon \omega} \text{CL}(x, y)
</math>

2. '''Local Contrast Loss''':
The local contrast loss focuses on preserving the contrast within a small neighborhood around each pixel. <math>
\mathcal{L}_{l} = \frac{1}{N} \sum_{x=1}^{N} \sum_{y \in \omega_x} \frac{\text{CL}(x, y)}{\|\omega_x\|}
</math>

Note:

<math>
\text{CL}(x, y) = \|\hat{c}_x' - \hat{c}_y'\| - \|c_x - c_y\|
</math>

* x,y: Two distinct pixels in the image.
* cx and cy: CVD simulated colors of original image
* c^x′and c^y: CVD simulated colors of recolored image (model output)
* ||w||: Global (or large) window of image
* ||wx||: Local window or neighborhood around a pixel x

3. '''Naturalness Loss''':
The naturalness loss drives output image to have colors that are visually similar and close to natural distributions. <math>
\mathcal{L}_{\text{natural}} = 1 - \text{SSIM}(I', I)
</math>

Where:
* I(i), I'(i): Original and recolored images respectively

== Results ==
=== Deep Learning based methods ===
The results focus on evaluating the performance of the above neural network architectures—Conditional Parallel RGB MLP, Deep U-Net, and Conditional Autoencoder. Quantitive metrics such as Structural Similarity Index (SSIM), total color contrast (TCC), Chromatic Difference (CD), and inference time were used to assess the effectiveness of the models provided in [1] and [2].

==== Qualitative Results ====
The recolored outputs were visually evaluated to determine their alignment with expected results. The 'expected' results for supervised mean how closely they resemble ground truth recolored image and for unsupervised method mean how much contrast and naturalness is observed in the CVD simulated recolored images compared to original.
The results and takeaways can be summarized as follows:

1. '''Conditional Parallel RGB MLP''': (Figure 5)
[[File:Mlp_res.png|right|400px|thumb|Figure 5 Conditional MLP: Model failure]]
* Recoloring was inconsistent, with visible artifacts in regions where spatial correlations were essential.
* The pixels seemed more discretized, suggesting that disentanglement was not very useful for this case (especially naturalness).
* Failed to preserve natural color transitions, particularly in complex images.
2. '''Conditional U-Net''': (Figure 6, 7)
[[File:Unet_res1.png|right|400px|thumb|Figure 6 Conditional U-Net: Model failure]]
[[File:Unet_res2.png|right|400px|thumb|Figure 7 Conditional U-Net: CVD Simulated examples]]
* Produced stable recoloring, preserving structural details.
* Initially showed improvement towards resembling ground truth, but over time started 'reconstructing' the colors of the original image.
* The CVD simulations of recolored versus original were similar or worse meaning that the model was not doing well for this task
* Sometimes it over-saturated some colors, affecting the visual appeal.
3. '''Conditional Autoencoder''': (Figure 8, 9)
[[File:ae_res1.png|right|400px|thumb|Figure 8 Conditional Autoencoder: Majority good results]]
[[File:ae_res1.png|right|400px|thumb|Figure 9 Conditional Autoencoder: Marginal or negative improvement + Blurriness]]
* Achieved smooth and natural recoloring, with fewer artifacts.
* Showed the highest contrast improvement among the three models.
* In some cases, hurt the contrast in the CVD simulated colors and in some there was marginal improvement in contrast.
* Blurriness in the recolored images was seen (possibly due to naturalness factor being more prioritized even though weight coefficients in the loss term favored contrast (alpha = 0.25, beta = 1.0)).

==== Quantitative Results ====
Based on the above qualitative results, we decided to score and evaluate metrics for comparison with related work only using the Conditional Autoencoder.
As mentioned above, the evaluation metrics are adapted from [1] and [2]. Please refer to the definitions in the paper, as we have used the same. On a high level, the three components are:
* SSIM: Measures the structural similarity between the original and recolored images, ensuring the structural integrity of the recolored image is maintained.
<math>
SSIM(X, Y) = \frac{(2\mu_X\mu_Y + c_1)(2\sigma_{XY} + c_2)}{(\mu_X^2 + \mu_Y^2 + c_1)(\sigma_X^2 + \sigma_Y^2 + c_2)}
</math>

* Total Color Contrast: Quantifies the visibility improvement between indistinguishable colors for CVD individuals.
<math>
TCC = \frac{1}{n_1} \sum_{(i,j) \in \Omega_1} |x_i - x_j|
+ \frac{1}{N \cdot n_2} \sum_{i=1}^{N} \sum_{j \in \Omega_2} |x_i - x_j|
</math>
* Chromatic Difference: Quantifies the perceptual differences in color before and after recoloring, ensuring enhanced distinguishability
<math>
CD(i) = \sqrt{\lambda (l_i' - l_i)^2 + (a_i' - a_i)^2 + (b_i' - b_i)^2}
</math>
(lamda is a constant, not wavelength and l,a,b represent LAB space coordinates of recolored (') and original respectively.)
* Inference Time: Determines the computational efficiency of the models.

The key results are in Table 1 and takeaways for the Conditional Autoencoder can be summarized as follows:

{| class="wikitable" style="text-align:center; width:30%; margin:auto;"
|-
! Metric
! Value
|-
| Inference Time
| 2.6 seconds/image
|-
| SSIM ("Structure")
| 0.8707
|-
| Total Color Contrast ("Distinguishability")
| 0.5771 / (~0.851)*
|-
| Chromatic Difference ("Color")
| 0.3521 / (~0.963)*
|+ '''Table 1: Quantitative Evaluation Results'''
|}

Note: * indicates results from paper [2] for protan/deutan whichever is larger.

* TCC and CD are good but not as good as paper [2] because they use optimize networks for each CVD type separately.
* Blurry (SSIM is not optimized for enough)
* Mixing CVD types in the same network needs to be more sophisticated

== Conclusions ==
Through our (many) experiments, we learned a couple of things:

1. '''Model Effectiveness''':
Among the models, the Conditional Autoencoder showed the best balance between enhancing color contrast and preserving naturalness. It improved the distinguishability of colors for CVD individuals while maintaining a smooth, visually appealing output. However, it produced slightly blurry images, which could be improved with better loss functions or refinement techniques. The Conditional U-Net was also effective in preserving structure and providing stable recoloring, but it required careful training to avoid overfitting. The Conditional Parallel RGB MLP, while computationally fast, lacked the ability to capture spatial relationships between pixels, making it unsuitable for this task.

2. '''Importance of Loss Functions''':
Designing appropriate loss functions was crucial for achieving the right balance between naturalness, contrast enhancement, and structural preservation. The global and local contrast losses significantly improved the visibility of recolored images, while the naturalness loss ensured that the outputs did not look artificial. Incorporating metrics like SSIM and Chromatic Difference into the evaluation also helped us better understand how well the models performed.

3. '''Challenges with Data''':
One of the biggest challenges was ensuring that the dataset effectively represented real-world scenarios for CVD individuals. Simulating CVD perceptions and generating recolored images that matched those perceptions required a well-defined pipeline. A more diverse dataset or additional user studies with CVD participants could help fine-tune the models further.

4. '''Computational Efficiency''':
While models like the Conditional Autoencoder and Conditional U-Net provided high-quality recoloring, their inference times were moderate, making them feasible for real-time applications. Optimizing these models further could make them more scalable for real-world use cases, such as accessibility tools in apps or websites.

5. '''What Worked and What Didn’t''':
* Worked: Contrast enhancement methods using local and global losses were effective in improving visibility for CVD individuals. Transformer-inspired loss functions borrowed from Swin architecture added robustness.
* Didn’t Work: Pixel-wise methods like the Conditional RGB MLP struggled due to their inability to handle spatial dependencies. Additionally, overfitting was a recurring issue in larger architectures without careful training.

6. '''Future Directions''':
* Better Loss Functions: Refining the loss functions to address issues like blurriness in outputs could further improve results.
* User Studies: Testing the models with real CVD participants would provide valuable insights and help validate the results.
* Model Optimization: Reducing the computational cost of high-performing models like the Conditional Autoencoder could make them more practical for deployment.
* Exploration of New Architectures: Trying newer methods, such as lightweight transformers or diffusion-based models, might enhance recoloring performance while maintaining efficiency.

While there’s still room for improvement, our models demonstrated the potential of deep learning in addressing the challenges faced by individuals with CVD. Our future work would focus on refining these methods and bringing them closer to practical, everyday applications.

== References ==
[1] Li, H., Zhang, L., Zhang, X., Zhang, M., Zhu, G., Shen, P., ... & Shah, S. A. A. (2020). Color vision deficiency datasets & recoloring evaluation using GANs. Multimedia Tools and Applications, 79, 27583-27614.

[2] Chen, L., Zhu, Z., Huang, W., Go, K., Chen, X., & Mao, X. (2024). Image recoloring for color vision deficiency compensation using Swin transformer. Neural Computing and Applications, 36(11), 6051-6066.

[3] Jiang, S., Liu, D., Li, D., & Xu, C. (2023). Personalized image generation for color vision deficiency population. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22571-22580).

[4] Huang, J.-B., Chen, C.-S., Jen, T.-C., & Wang, S.-J. (n.d.). Image recolorization for the colorblind [GitHub repository]. Retrieved December 12, 2024, from https://github.com/jbhuang0604/RecolorForColorblind

[5] Dietrich, J. (n.d.). Daltonize Python Package [GitHub repository]. Retrieved December 12, 2024, from https://github.com/joergdietrich/daltonize/blob/main/daltonize/daltonize.py

[6] Dougherty, B., & Wade, A. (2000). Vischeck. Retrieved December 12, 2024, from https://www.vischeck.com/

[7] Brettel, H., Viénot, F., & Mollon, J. D. (1997). Computerized simulation of color appearance for dichromats. Josa a, 14(10), 2647-2655.

[8] Zhu, Z., Toyoura, M., Go, K., Fujishiro, I., Kashiwagi, K., & Mao, X. (2019). Processing images for red–green dichromats compensation via naturalness and information-preservation considered recoloring. The Visual Computer, 35, 1053-1066.

[9] Zhu, Z., Toyoura, M., Go, K., Kashiwagi, K., Fujishiro, I., Wong, T. T., & Mao, X. (2021). Personalized image recoloring for color vision deficiency compensation. IEEE Transactions on Multimedia, 24, 1721-1734.

[10] Tsekouras, G. E., Rigos, A., Chatzistamatis, S., Tsimikas, J., Kotis, K., Caridakis, G., & Anagnostopoulos, C. N. (2021). A novel approach to image recoloring for color vision deficiency. Sensors, 21(8), 2740.

[11] Huang, J. B., Chen, C. S., Jen, T. C., & Wang, S. J. (2009, April). Image recolorization for the colorblind. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 1161-1164). IEEE.

== Appendix I ==
- Upload source code, test images, etc, and give a description of each link. In some cases, your acquired data may be too large to store practically. In this case, use your judgement (or consult one of us) and only link the most relevant data. Be sure to describe the purpose of your code and to edit the code for clarity. The purpose of placing the code online is to allow others to verify your methods and to learn from your ideas.

== Appendix II ==
'''Ishikaa''':
* Training, evaluation and visualization for each of MLP, U-Net and Autoencoder
* Recolorization script (adapting from MATLAB) and adding severity index
* 'Ground Truth' dataset creation and logging
* AWS Compute setup and configuration
* Written Report & Presentation

'''Raina''':

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-13T04:56:33Z

Rainas: /* References */

== Introduction ==
Color Vision Deficiency (CVD) affects approximately 350 million individuals worldwide, impairing their ability to distinguish certain colors. Image recoloring for individuals with CVDs has been a well-researched area, with numerous attempts aimed at creating images that make colors more distinguishable for those with CVDs. CVDs are typically classified into three main types: protanopia (difficulty perceiving red), deuteranopia (difficulty perceiving green), and tritanopia (difficulty perceiving blue). These conditions arise due to the absence or malfunction of certain types of cone cells in the retina, which are responsible for color vision. For instance, the lack or defect of red or green cones leads to protanopia or deuteranopia, respectively. This can result in partial or complete loss of perception of specific colors. Moreover, individuals with CVD can be dichromats (completely missing one type of cone cell), or anomalous trichromats (having all three types of cones but with altered sensitivity), causing milder color perception issues. Most research has focused on protanopia and deuteranopia, as these conditions are more prevalent, and only a few consider different severity levels.

In this work, we aim to consider all three types of CVDs, taking into account varying levels of severity for personalization. We explore existing methods in the field and experiment through two main approaches: mathematical transformations and deep learning techniques. We will start by reviewing current advancements in these two domains, followed by presenting our experiments and results. Evaluations of each method will be provided, leading to a discussion of our findings and outlining potential directions for future work.

The motivation for this work arose from a personal experience. While admiring a rainbow, a phenomenon I find among the most beautiful in the world, with a friend who has deuteranopia, I realized that they were unable to distinguish the vibrant array of colors. This experience highlighted the emotional and perceptual gap caused by CVD, inspiring the goal of this project: to develop personalized and efficient tools that enhance color perception for individuals with CVDs. Ultimately, we aim to enable those with CVDs to enjoy the same vivid experiences, such as the beauty of a rainbow, experienced by those with normal color vision.

== Background ==
In recent years, numerous methods have been developed to recolor images for individuals with CVDs, ranging from traditional mathematical approaches to advanced deep learning techniques. This section focuses on the prominent recent works in these two categories.

=== Mathematical-based methods ===
Mathematical approaches to image recoloring for individuals with CVDs have been extensively developed to enhance color discrimination while trying to preserve the natural appearance of images. These methods typically involve color space transformations, optimization techniques, and perceptual modeling to achieve their objectives.

==== Daltonization ====
Daltonization enhances images for individuals with CVD by correcting colors based on the simulated deficiency. The process involves comparing the original LMS values with the simulated deficient values to compute the error:
<math display="block">
\text{Error}_{\text{LMS}} = \text{LMS}_{\text{original}} - \text{LMS}_{\text{simulated}}
</math>

The error is then mapped back to the RGB space using a correction matrix. For example, the correction matrix for protanopia, as implemented in tools like Vischeck [6], is:
<math display="block"> \text{Correction Matrix for Protanopia} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

The corrected RGB values are added back to the original LMS values to generate a daltonized image that improves contrast for CVD viewers.

The simulation of CVDs relies on the physiology of human vision, particularly the responses of the Long (L), Medium (M), and Short (S) wavelength-sensitive cones in the retina. The LMS color space is derived from the spectral sensitivities of these cones, making it an ideal framework for modeling human color perception.

To simulate CVD, colors are first transformed into the LMS color space using the following linear transformation matrix based on Stockman and Sharpe’s cone fundamentals:
<math display="block">
T_{\text{RGB-to-LMS}} = \begin{bmatrix}
0.3904725 & 0.54990437 & 0.00890159 \\
0.07092586 & 0.96310739 & 0.00135809 \\
0.02314268 & 0.12801221 & 0.93605194
\end{bmatrix}
</math>

For individuals with CVD, the missing cone’s response is replaced by a weighted combination of the remaining two cones. This approach, introduced by Brettel, Viénot, and Mollon (1997) [7], uses specific coefficients derived from cone sensitivities. For example, in protanopia (L-cone deficiency), the L-cone response is approximated using the M- and S-cone responses as:
<math display="block">
L_{\text{simulated}} = 0 \cdot L + 0.90822864 \cdot M + 0.008192 \cdot S
</math>

For deuteranopia (M-cone deficiency), the M-cone is replaced as:
<math display="block">
M_{\text{simulated}} = 1.10104433 \cdot L + 0 \cdot M - 0.00901975 \cdot S
</math>

For tritanopia (S-cone deficiency), the S-cone is replaced as:
<math display="block">
S_{\text{simulated}} = -0.15773032 \cdot L + 1.19465634 \cdot M + 0 \cdot S
</math>

These transformations allow accurate simulation of the perceptual experience of individuals with CVD. (The numbers are derived from [5]).

==== Optimization-based Method ====
Zhu et al. [8] introduced an optimization-based recoloring framework for red-green dichromacy, aiming to balance naturalness and contrast. The framework minimizes a total loss function defined as:

<math display="block"> E = \beta E_{\text{nat}} + E_{\text{cont}} </math>

where <math>\beta</math> is a scalar weight that controls the trade-off between the two objectives: naturalness preservation (<math>E_{\text{nat}}</math>) and contrast enhancement (<math>E_{\text{cont}}</math>).

The naturalness term, <math>E_{\text{nat}}</math>, ensures that the recolored image closely resembles the original image for CVD viewers by minimizing perceptual differences:

<math display="block"> E_{\text{nat}} = \sum_{i=1}^N \| c_i^+ - c_i \|^2, </math>

where:
* <math>N</math> is the total number of pixels in the image,
* <math>c_i</math> is the original color of the <math>i</math>-th pixel,
* <math>c_i^+</math> is the recolored value of the <math>i</math>-th pixel,
* <math>\| c_i^+ - c_i \|</math> is the Euclidean distance, measuring the perceptual difference between the original and recolored colors.

The contrast term, <math>E_{\text{cont}}</math>, enhances the distinguishability of colors in the recolored image by minimizing changes in color contrast:

<math display="block"> E_{\text{cont}} = \sum_{i \neq j} \| (c_i^+ - c_j^+) - (c_i - c_j) \|^2, </math>

where:
* <math>(c_i^+ - c_j^+)</math> is the perceived color difference between pixels <math>i</math> and <math>j</math> after recoloring,
* <math>(c_i - c_j)</math> is the original color difference,
* <math>\| (c_i^+ - c_j^+) - (c_i - c_j) \|</math> represents the deviation in color contrast before and after recoloring.

To address the limitations of this approach, Zhu et al. [9] proposed a degree-adaptable framework incorporating a transformation matrix <math>T</math> that simulates CVD perception. The transformation matrix is defined as:

<math display="block"> T = \begin{bmatrix} t_{11} & t_{12} & t_{13} \\ t_{21} & t_{22} & t_{23} \\ t_{31} & t_{32} & t_{33} \end{bmatrix}, </math>

where <math>t_{ij}</math> are the elements representing the relationships between the original and perceived LMS (Long, Medium, Short wavelength) cone responses for individuals with CVD.

The degree-adaptable loss function extends the optimization by adjusting weights based on perceptual importance, defined as:

<math display="block"> E = \beta \sum_{i=1}^N \alpha_i \| T(c_i^+ - c_i) \|^2 + \sum_{i \neq j} \| T(c_i^+ - c_j^+) - T(c_i - c_j) \|^2. </math>

Here:
* <math>\alpha_i</math> assigns weights to each pixel, prioritizing the preservation of colors with smaller perception errors,
* <math>\| T(c_i^+ - c_i) \|</math> measures the perceptual difference after recoloring,
* <math>\| T(c_i^+ - c_j^+) - T(c_i - c_j) \|</math> quantifies the deviation in color contrast under CVD simulation.

This framework improves both contrast and personalization but requires further optimization for real-time performance.

==== Confusion lines based Method ====
Tsekouras et al. [10] proposed a novel image recoloring approach for individuals with protanopia and deuteranopia, focusing on improving color naturalness and enhancing contrast. Their framework consists of four modules, with a key focus on shifting confusing colors along confusion lines in the CIE 1931 chromaticity diagram.

The method begins with fuzzy clustering to extract representative colors (key colors) from the input image. These colors are mapped onto the CIE 1931 chromaticity diagram, where confusion lines represent loci of colors perceived as identical by individuals with CVD. Confusion lines are defined using the copunctal point of the missing cone type and another reference point:

<math display="block">
d(v, L) = \frac{\left|(x_{cp} - x_0)(y_0 - y_v) - (x_0 - x_v)(y_{cp} - y_0)\right|}{\sqrt{(x_{cp} - x_0)^2 + (y_{cp} - y_0)^2}},
</math>

where:
* <math display="inline">v = (x_v, y_v)</math> is the chromaticity coordinate of the color,
* <math display="inline">L</math> is the confusion line passing through the copunctal point <math display="inline">(x_{cp}, y_{cp})</math> and another reference point <math display="inline">(x_0, y_0)</math>,
* <math display="inline">d(v, L)</math> measures the perpendicular distance from the point <math display="inline">v</math> to the confusion line <math display="inline">L</math>.

Confusing colors, identified as key colors lying on occupied confusion lines, are iteratively shifted to the nearest non-occupied confusion lines to enhance discriminability for CVD viewers. The translation process involves:

1. Ranking key colors by their cluster sizes:
<math display="block">
\text{rank}(v_i) = \frac{|A_i|}{\sum_{j=1}^{n_A}|A_j|},
</math>

where:
* <math display="inline">v_i</math> is the chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">|A_i|</math> is the cardinality (number of pixels) of its associated cluster,
* <math display="inline">n_A</math> is the total number of clusters.

2. Translating the highest-ranked confusing color <math display="inline">v^*</math> to its projection on the nearest non-occupied confusion line:
<math display="block">
v^*_{\text{tr}} = \text{proj}(v^*, L^*),
</math>

where:
* <math display="inline">v^*_{\text{tr}}</math> is the new position of the color <math display="inline">v^*</math> after translation,
* <math display="inline">L^*</math> is the nearest non-occupied confusion line, determined as:
<math display="block">
d(v^*, L^*) = \min_{L \in \text{CL}_D} d(v^*, L).
</math>

3. Updating the sets of confusing colors and non-occupied confusion lines iteratively:
<math display="block">
\Phi_V = \Phi_V - \{v^*\}, \quad \text{CL}_D = \text{CL}_D - \{L^*\}.
</math>

where:
* <math display="inline">\Phi_V</math> is the set of confusing colors,
* <math display="inline">\text{CL}_D</math> is the set of non-occupied confusion lines.

After shifting, the luminance of the recolored key colors is optimized using a regularized objective function to balance naturalness and contrast:
<math display="block">
E = (E_1 + E_2) + \lambda E_3,
</math>

where:
* <math display="inline">E</math> is the total loss,
* <math display="inline">\lambda</math> is a weight parameter controlling the trade-off between contrast enhancement and naturalness preservation.

The first term, <math display="inline">E_1</math>, measures contrast enhancement for normal trichromats:

<math display="block">
E_1 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - b_j\| - \|f_D(a_{i,\text{rec}}) - f_D(b_j)\| \right|,
</math>

where:
* <math display="inline">n_A</math> and <math display="inline">n_B</math> are the number of key colors in clusters <math display="inline">A</math> and <math display="inline">B</math>, respectively,
* <math display="inline">a_i</math> is the chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">b_j</math> is the chromaticity of the <math display="inline">j</math>-th key color in cluster <math display="inline">B</math>,
* <math display="inline">f_D</math> is a function simulating the dichromatic vision of individuals with color vision deficiencies,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color.

The second term, <math display="inline">E_2</math>, measures contrast enhancement for dichromats:

<math display="block">
E_2 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - a_j\| - \|f_D(a_{i,\text{rec}}) - f_D(a_{j,\text{rec}})\| \right|,
</math>

where:
* <math display="inline">a_i</math> and <math display="inline">a_j</math> are the chromaticities of the <math display="inline">i</math>-th and <math display="inline">j</math>-th key colors in cluster <math display="inline">A</math>,
* <math display="inline">f_D(a_{i,\text{rec}})</math> simulates the dichromatic perception of the recolored chromaticity <math display="inline">a_{i,\text{rec}}</math>.

The third term, <math display="inline">E_3</math>, preserves the naturalness of the recolored image:

<math display="block">
E_3 = \frac{1}{n_A} \sum_{i=1}^{n_A} \|a_i - a_{i,\text{rec}}\|,
</math>

where:
* <math display="inline">a_i</math> is the original chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">\|a_i - a_{i,\text{rec}}\|</math> is the Euclidean distance between the original and recolored chromaticities, measuring how much the naturalness is preserved.

This method significantly enhances the contrast and naturalness of recolored images by leveraging confusion line geometry and regularized optimization. However, challenges remain in achieving real-time performance and handling cases where shifting may distort the aesthetic quality of the image.

==== Segmentation-based Method ====

==== GMM-based Method ====
Huang et al. [11] proposed an efficient and effective re-coloring algorithm for individuals with CVD using a Gaussian Mixture Model (GMM) to represent color distributions. The algorithm comprises four main steps: feature extraction, clustering using GMM, optimization of Gaussian components, and interpolation for recoloring.

Step 1 - Feature Extraction:
Each pixel in the input image is represented in the CIEL*a*b* color space, which approximates perceptual differences using the Euclidean distance between colors. The color feature vector <math display="inline">x</math> is used as input for clustering.

Step 2 - Clustering via GMM:
The color distribution of the image is modeled using a GMM with <math display="inline">K</math> Gaussian components:
<math display="block">
p(x|\Theta) = \sum_{i=1}^K \omega_i G_i(x|\theta_i),
</math>
where:
* <math display="inline">\Theta</math> is the parameter set containing all weights, means, and covariance matrices,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian,
* <math display="inline">G_i(x|\theta_i)</math> is the 3D normal distribution with parameters <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix).

Diving into more details, the parameters of the GMM are initialized using the K-means algorithm and refined via the Expectation-Maximization (EM) algorithm, which consists of the E-step and the M-step:

The E-step calculates the probability of each color (or pixel) belonging to a specific Gaussian component in the GMM. This probability, also known as the "responsibility," reflects how much each Gaussian contributes to the representation of a color:

<math display="block">
p(i|x_j, \Theta^{\text{old}}) = \frac{\omega_i G_i(x_j|\theta_i)}{\sum_{k=1}^K \omega_k G_k(x_j|\theta_k)}.
</math>

Here:
* <math display="inline">p(i|x_j, \Theta^{\text{old}})</math> is the probability of the <math display="inline">j</math>-th color feature <math display="inline">x_j</math> belonging to the <math display="inline">i</math>-th Gaussian component,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian, representing its overall contribution to the color distribution,
* <math display="inline">G_i(x_j|\theta_i)</math> is the Gaussian distribution for the <math display="inline">i</math>-th component, evaluated at <math display="inline">x_j</math>, where <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix),
* <math display="inline">\sum_{k=1}^K \omega_k G_k(x_j|\theta_k)</math> normalizes the probabilities by considering the contributions of all <math display="inline">K</math> Gaussians to the <math display="inline">j</math>-th pixel.

This step essentially assigns each pixel a "soft" membership to each Gaussian component, rather than forcing a hard clustering decision. Pixels that are close to a Gaussian's mean (in feature space) will have higher probabilities of belonging to that Gaussian.

The M-step updates the parameters of each Gaussian component based on the probabilities computed in the E-step. These updates refine the Gaussian model to better fit the data:

1. Update the mixing weights:
<math display="block">
\omega_i^{\text{new}} = \frac{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}})}{N},
</math>
This equation calculates the proportion of pixels assigned to the <math display="inline">i</math>-th Gaussian. It reflects how dominant each Gaussian is in representing the color distribution.

2. Update the means:
<math display="block">
\mu_i^{\text{new}} = \frac{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}}) x_j}{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}})},
</math>
This equation computes the new mean vector <math display="inline">\mu_i^{\text{new}}</math> for the <math display="inline">i</math>-th Gaussian. It is a weighted average of all pixel feature vectors <math display="inline">x_j</math>, where the weights are the probabilities <math display="inline">p(i|x_j, \Theta^{\text{old}})</math>. Pixels with higher probabilities contribute more to the new mean.

3. Update the covariance matrices:
<math display="block">
\Sigma_i^{\text{new}} = \frac{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}}) (x_j - \mu_i^{\text{new}})(x_j - \mu_i^{\text{new}})^T}{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}})}.
</math>
This equation calculates the new covariance matrix <math display="inline">\Sigma_i^{\text{new}}</math> for the <math display="inline">i</math>-th Gaussian. It measures the spread of pixel features around the new mean, weighted by the probabilities from the E-step.

Step 3 - Optimization:
To ensure color distinguishability for CVD viewers, the algorithm adjusts the mean vector of each Gaussian component using an optimization function that preserves the symmetric Kullback-Leibler (KL) divergence:
<math display="block">
D_{sKL}(G_i, G_j) = D_{KL}(G_i \| G_j) + D_{KL}(G_j \| G_i),
</math>
where:
* <math display="inline">D_{KL}(G_i \| G_j)</math> measures the dissimilarity between two Gaussian distributions <math display="inline">G_i</math> and <math display="inline">G_j</math>.

The optimization aims to preserve the contrast perceived by CVD viewers while maintaining naturalness. Weights are assigned to Gaussian components based on the perceptual importance of colors:
<math display="block">
\lambda_i = \frac{\sum_{j=1}^N \alpha_j p(i|x_j, \Theta)}{\sum_{k=1}^K \sum_{j=1}^N \alpha_j p(k|x_j, \Theta)},
</math>
where:
* <math display="inline">\alpha_j = \|x_j - \text{Sim}(x_j)\|</math> is the perceptual error of the <math display="inline">j</math>-th color feature when simulated for CVD,
* <math display="inline">\text{Sim}(\cdot)</math> is the simulation function for CVD perception.

Step 4 - Interpolation for Recoloring:
After optimizing the Gaussians, the mapping function <math display="inline">M_i(\cdot)</math> relocates the mean vectors while maintaining covariance matrices. Interpolation ensures smooth transitions between recolored regions:
<math display="block">
T(x_j)_H = x_j^H + \sum_{i=1}^K p(i|x_j, \Theta) (M_i(\mu_i)_H - \mu_i^H),
</math>
where:
* <math display="inline">T(x_j)_H</math> is the hue adjustment for the <math display="inline">j</math>-th color,
* <math display="inline">M_i(\mu_i)_H</math> is the mapped hue of the <math display="inline">i</math>-th Gaussian's mean.

While the GMM-based approach effectively models color distributions and enhances the contrast of recolored images significantly, it has limitations:
* The accuracy of recoloring depends on the choice of <math display="inline">K</math>, which may vary for different images.
* The method assumes diagonal covariance matrices for computational efficiency, which may oversimplify real-world color distributions. Sometimes the colors in the recolored images are not very natural.
* The high computational complexity in the optimization step of this algorithm may be difficult for real-time applications.

=== Deep Learning based methods ===
Conventional methods for recoloring, including optimization-based approaches (as discussed above), fail to generalize well across varying severity levels and CVD types. While these methods improve color differentiation, they frequently compromise naturalness or require extensive computational resources, making them less suitable for real-time, efficient, personalized applications.

==== GAN-Based Recoloring for CVD ====

In [1] GANs (Generative Adversarial Networks) was explored for recoloring, with a backbone Pix2Pix-GAN, Cycle-GAN, and Bicycle-GAN structure showing promising results. These models are generate creative recolored images by learning mappings between normal and CVD-affected color spaces. However, this and existing GAN approaches struggle with balancing naturalness and contrast. This specific reference also requires paired datasets (since it is adapted from style transfer), making it computationally intensive and less suitable for personalization.

==== Swin Transformer Recoloring ====

The authors in [2] introduced a hierarchical vision transformer (SWIN) architecture that processes images through shifted windows, effectively capturing both local and global contextual information. In computer vision, this design generally allows efficient handling of high-resolution images and has been applied to various tasks, including image classification and object detection. Despite its robust performance, this architecture is still computationally intensive and does not inherently account for the specific needs of CVD individuals, as it lacks mechanisms for personalized color adjustments.

==== Personalized CVD-GAN ====

To cater to the diverse needs of the CVD population, the Personalized CVD-GAN [3] was developed. This model generates images that are not only CVD-friendly but also tailored to individual degrees of color vision deficiency. By disentangling color representations using a unique triple-latent structure in their method, continuous personalization was possible to adjust images according to specific CVD severities. While effective, this approach is computationally demanding, making it less practical for real-time applications. In our experiment, it took around 18 days for one epoch (or one iteration over the entire dataset).

Thus, existing methods either lack personalization or are too resource-intensive for widespread use.

== Methods ==

=== Deep Learning based ===

==== Task Overview ====
Given an input RGB image and a label for the user (as shown in the figure), we want a deep learning model to output a recolored RGB image that is specific to that user. More details on inputs and outputs are discussed in further sections but an overview is shown in Figure 1. All of the code was done in Python using a deep learning framework called [https://pytorch.org PyTorch]
[[File:Io.png|right|thumb|200px|Figure 1: Dataset]]

==== Types ====
1. ''' Supervised methods ''':
These are deep learning models that require a 'ground truth' recolored image for the neural network to learn recolorization. While these methods are simple, easy to train and integrate the user label, they require an already present ground truth comparison of expected output.

2. ''' Unsupervised methods ''':
These models are trained without a ground truth and can also encode user label information while training. They are generally better at generating more natural images, but they require more compute and sophisticated model architectures or loss functions for the recoloring task

==== Dataset ====
The dataset used for this project was constructed specifically to address the challenges of recoloring images for individuals with color vision deficiency (CVD). We first gathered an open-source RGB image dataset from [2]. To improve the capability of the proposed model to enhance the contrast between CVD-indistinguishable color
pairs, in their study, they created a new dataset consisting of 141,000 pictures of both natural scenes and artificial images containing
CVD-confusing colors without labels. To generate labels (and ground truth recolored images for supervised methods), we randomly sampled 15,000 images and recolored by simulating random labels for severity and type of CVD. The recoloring for ground truth images was done using a [https://github.com/jbhuang0604/RecolorForColorblind/tree/master MATLAB script] (adapted to Python) from [4]. Note: The open-source tools used in the Python version for the recoloring script were [https://scikit-image.org Scikit-Image], [https://scipy.org Scipy] and [https://python-colormath.readthedocs.io/en/latest/ Colormath].

As shown in Figure 1, each sample in the dataset consists of:
1. ''' Original RGB Image''' : High-resolution images, resized to <code> 256x256</code> pixels and normalized to <code>[0,1]</code> range, representing the standard color space.

2. ''' CVD Labels ''' : Condition labels encoded as <code>severity * [protan, deutan]</code>, where severity ranges from 0.1 to 1.0. For example, a label <code>[0.6, 0]</code> corresponds to protanopia at 60% severity.

Data augmentation techniques such as random rotations, crops, and brightness adjustments were applied to expand the dataset, ensuring robust model generalization across diverse scenarios.

==== Supervised Methods ====
===== Conditional Parallel RGB MLP =====
[[File:mlp.png|right|thumb|Figure 2: Conditional MLP architecture]]
As shown in Figure 2, the model predicts the R, G, and B channels separately using an independent multi-layer perceptron (MLP) for each channel. The input image is concatenated with the label encoding along the channel dimension and is passed to 3 parallel MLPs simultaneously. These parallel networks are learned to predicted R, G, B channels of a recolored image based on given ground truth. The outputs from each of these networks are concatenated to produce the recolored RGB image of same spatial dimensions as input. Essentially, each channel is disentangled, enabling targeted adjustments.

The loss function used to train was pixel wise, mean-squared error loss:
<math>
\mathcal{L}_{\text{MSE}} = \frac{1}{N} \sum_{p=1}^{N} \left( I(p) - I'(p) \right)^2
</math>

Where:
* I, I': Recolored (model output) image and ground truth recolored image respectively
* p: Image index
* N: Total number of images

===== Conditional U-Net =====
In a similar fashion of inputs, a convolutional neural network (CNN)-based U-Net architecture was tested to generate a full recolored image as output. The conditional inputs here affect both the encoder and decoder. [[File:Unet condtional.png|right|thumb|Figure 3: Conditional U-Net architecture]]
U-Nets are widely used in computer vision tasks and are very robust to new tasks as well. The architecture we adopted is shown in Figure 3.
The loss function used to train the U-Net was a commonly used VGG Perceptual Loss:
<math>
\mathcal{L}_{\text{VGG}} = \sum_{l} \frac{1}{N_l} \| \phi_l(I) - \phi_l(I') \|_2^2
</math>

Where:
* I and I': are recolored (model output) and ground recolored images respectively
* <math>\phi_l</math> is the l-th of the pre-trained VGG network

==== Unsupervised Methods ====
===== Conditional Autoencoder =====
As shown in Figure4, an unsupervised CNN-based encoder-decoder network was trained to reconstruct full recolored images with a CVD-aware color palette. The key to making this network align with the recoloring task was the loss functions. The loss functions we used to train this network were inspired from [2]. [[File:Ae.png|right|350px|thumb|Figure 4: Conditional Autoencoder architecture]]

The total loss function is given by:
<math>
\mathcal{L}_{\text{total}} = \alpha \cdot \mathcal{L}_{\text{naturalness}} + 2 \cdot (1 - \alpha) \cdot \mathcal{L}_{\text{contrast}}
</math>

Where:
<math>
\mathcal{L}_{\text{contrast}} = \beta \cdot \mathcal{L}_{\text{global}} + (2 - \beta) \cdot \mathcal{L}_{\text{local}}
</math>

The components of the loss functions are described below:

1. '''Global Contrast Loss''':
The global contrast loss ensures that the overall contrast of the recolored image is preserved. It is defined as
<math>
\mathcal{L}_{global} = \frac{1}{\|\omega\|} \sum_{<x, y> \in \epsilon \omega} \text{CL}(x, y)
</math>

2. '''Local Contrast Loss''':
The local contrast loss focuses on preserving the contrast within a small neighborhood around each pixel. <math>
\mathcal{L}_{l} = \frac{1}{N} \sum_{x=1}^{N} \sum_{y \in \omega_x} \frac{\text{CL}(x, y)}{\|\omega_x\|}
</math>

Note:

<math>
\text{CL}(x, y) = \|\hat{c}_x' - \hat{c}_y'\| - \|c_x - c_y\|
</math>

* x,y: Two distinct pixels in the image.
* cx and cy: CVD simulated colors of original image
* c^x′and c^y: CVD simulated colors of recolored image (model output)
* ||w||: Global (or large) window of image
* ||wx||: Local window or neighborhood around a pixel x

3. '''Naturalness Loss''':
The naturalness loss drives output image to have colors that are visually similar and close to natural distributions. <math>
\mathcal{L}_{\text{natural}} = 1 - \text{SSIM}(I', I)
</math>

Where:
* I(i), I'(i): Original and recolored images respectively

== Results ==
=== Deep Learning based methods ===
The results focus on evaluating the performance of the above neural network architectures—Conditional Parallel RGB MLP, Deep U-Net, and Conditional Autoencoder. Quantitive metrics such as Structural Similarity Index (SSIM), total color contrast (TCC), Chromatic Difference (CD), and inference time were used to assess the effectiveness of the models provided in [1] and [2].

==== Qualitative Results ====
The recolored outputs were visually evaluated to determine their alignment with expected results. The 'expected' results for supervised mean how closely they resemble ground truth recolored image and for unsupervised method mean how much contrast and naturalness is observed in the CVD simulated recolored images compared to original.
The results and takeaways can be summarized as follows:

1. '''Conditional Parallel RGB MLP''': (Figure 5)
[[File:Mlp_res.png|right|400px|thumb|Figure 5 Conditional MLP: Model failure]]
* Recoloring was inconsistent, with visible artifacts in regions where spatial correlations were essential.
* The pixels seemed more discretized, suggesting that disentanglement was not very useful for this case (especially naturalness).
* Failed to preserve natural color transitions, particularly in complex images.
2. '''Conditional U-Net''': (Figure 6, 7)
[[File:Unet_res1.png|right|400px|thumb|Figure 6 Conditional U-Net: Model failure]]
[[File:Unet_res2.png|right|400px|thumb|Figure 7 Conditional U-Net: CVD Simulated examples]]
* Produced stable recoloring, preserving structural details.
* Initially showed improvement towards resembling ground truth, but over time started 'reconstructing' the colors of the original image.
* The CVD simulations of recolored versus original were similar or worse meaning that the model was not doing well for this task
* Sometimes it over-saturated some colors, affecting the visual appeal.
3. '''Conditional Autoencoder''': (Figure 8, 9)
[[File:ae_res1.png|right|400px|thumb|Figure 8 Conditional Autoencoder: Majority good results]]
[[File:ae_res1.png|right|400px|thumb|Figure 9 Conditional Autoencoder: Marginal or negative improvement + Blurriness]]
* Achieved smooth and natural recoloring, with fewer artifacts.
* Showed the highest contrast improvement among the three models.
* In some cases, hurt the contrast in the CVD simulated colors and in some there was marginal improvement in contrast.
* Blurriness in the recolored images was seen (possibly due to naturalness factor being more prioritized even though weight coefficients in the loss term favored contrast (alpha = 0.25, beta = 1.0)).

==== Quantitative Results ====
Based on the above qualitative results, we decided to score and evaluate metrics for comparison with related work only using the Conditional Autoencoder.
As mentioned above, the evaluation metrics are adapted from [1] and [2]. Please refer to the definitions in the paper, as we have used the same. On a high level, the three components are:
* SSIM: Measures the structural similarity between the original and recolored images, ensuring the structural integrity of the recolored image is maintained.
<math>
SSIM(X, Y) = \frac{(2\mu_X\mu_Y + c_1)(2\sigma_{XY} + c_2)}{(\mu_X^2 + \mu_Y^2 + c_1)(\sigma_X^2 + \sigma_Y^2 + c_2)}
</math>

* Total Color Contrast: Quantifies the visibility improvement between indistinguishable colors for CVD individuals.
<math>
TCC = \frac{1}{n_1} \sum_{(i,j) \in \Omega_1} |x_i - x_j|
+ \frac{1}{N \cdot n_2} \sum_{i=1}^{N} \sum_{j \in \Omega_2} |x_i - x_j|
</math>
* Chromatic Difference: Quantifies the perceptual differences in color before and after recoloring, ensuring enhanced distinguishability
<math>
CD(i) = \sqrt{\lambda (l_i' - l_i)^2 + (a_i' - a_i)^2 + (b_i' - b_i)^2}
</math>
(lamda is a constant, not wavelength and l,a,b represent LAB space coordinates of recolored (') and original respectively.)
* Inference Time: Determines the computational efficiency of the models.

The key results are in Table 1 and takeaways for the Conditional Autoencoder can be summarized as follows:

{| class="wikitable" style="text-align:center; width:30%; margin:auto;"
|-
! Metric
! Value
|-
| Inference Time
| 2.6 seconds/image
|-
| SSIM ("Structure")
| 0.8707
|-
| Total Color Contrast ("Distinguishability")
| 0.5771 / (~0.851)*
|-
| Chromatic Difference ("Color")
| 0.3521 / (~0.963)*
|+ '''Table 1: Quantitative Evaluation Results'''
|}

Note: * indicates results from paper [2] for protan/deutan whichever is larger.

* TCC and CD are good but not as good as paper [2] because they use optimize networks for each CVD type separately.
* Blurry (SSIM is not optimized for enough)
* Mixing CVD types in the same network needs to be more sophisticated

== Conclusions ==
Through our (many) experiments, we learned a couple of things:

1. '''Model Effectiveness''':
Among the models, the Conditional Autoencoder showed the best balance between enhancing color contrast and preserving naturalness. It improved the distinguishability of colors for CVD individuals while maintaining a smooth, visually appealing output. However, it produced slightly blurry images, which could be improved with better loss functions or refinement techniques. The Conditional U-Net was also effective in preserving structure and providing stable recoloring, but it required careful training to avoid overfitting. The Conditional Parallel RGB MLP, while computationally fast, lacked the ability to capture spatial relationships between pixels, making it unsuitable for this task.

2. '''Importance of Loss Functions''':
Designing appropriate loss functions was crucial for achieving the right balance between naturalness, contrast enhancement, and structural preservation. The global and local contrast losses significantly improved the visibility of recolored images, while the naturalness loss ensured that the outputs did not look artificial. Incorporating metrics like SSIM and Chromatic Difference into the evaluation also helped us better understand how well the models performed.

3. '''Challenges with Data''':
One of the biggest challenges was ensuring that the dataset effectively represented real-world scenarios for CVD individuals. Simulating CVD perceptions and generating recolored images that matched those perceptions required a well-defined pipeline. A more diverse dataset or additional user studies with CVD participants could help fine-tune the models further.

4. '''Computational Efficiency''':
While models like the Conditional Autoencoder and Conditional U-Net provided high-quality recoloring, their inference times were moderate, making them feasible for real-time applications. Optimizing these models further could make them more scalable for real-world use cases, such as accessibility tools in apps or websites.

5. '''What Worked and What Didn’t''':
* Worked: Contrast enhancement methods using local and global losses were effective in improving visibility for CVD individuals. Transformer-inspired loss functions borrowed from Swin architecture added robustness.
* Didn’t Work: Pixel-wise methods like the Conditional RGB MLP struggled due to their inability to handle spatial dependencies. Additionally, overfitting was a recurring issue in larger architectures without careful training.

6. '''Future Directions''':
* Better Loss Functions: Refining the loss functions to address issues like blurriness in outputs could further improve results.
* User Studies: Testing the models with real CVD participants would provide valuable insights and help validate the results.
* Model Optimization: Reducing the computational cost of high-performing models like the Conditional Autoencoder could make them more practical for deployment.
* Exploration of New Architectures: Trying newer methods, such as lightweight transformers or diffusion-based models, might enhance recoloring performance while maintaining efficiency.

While there’s still room for improvement, our models demonstrated the potential of deep learning in addressing the challenges faced by individuals with CVD. Our future work would focus on refining these methods and bringing them closer to practical, everyday applications.

== References ==
[1] Li, H., Zhang, L., Zhang, X., Zhang, M., Zhu, G., Shen, P., ... & Shah, S. A. A. (2020). Color vision deficiency datasets & recoloring evaluation using GANs. Multimedia Tools and Applications, 79, 27583-27614.

[2] Chen, L., Zhu, Z., Huang, W., Go, K., Chen, X., & Mao, X. (2024). Image recoloring for color vision deficiency compensation using Swin transformer. Neural Computing and Applications, 36(11), 6051-6066.

[3] Jiang, S., Liu, D., Li, D., & Xu, C. (2023). Personalized image generation for color vision deficiency population. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 22571-22580).

[4] Huang, J.-B., Chen, C.-S., Jen, T.-C., & Wang, S.-J. (n.d.). Image recolorization for the colorblind [GitHub repository]. Retrieved December 12, 2024, from https://github.com/jbhuang0604/RecolorForColorblind

[5] Dietrich, J. (n.d.). Daltonize Python Package [GitHub repository]. Retrieved December 12, 2024, from https://github.com/joergdietrich/daltonize/blob/main/daltonize/daltonize.py

[6] Dougherty, B., & Wade, A. (2000). Vischeck. Retrieved December 12, 2024, from https://www.vischeck.com/

[7] Brettel, H., Viénot, F., & Mollon, J. D. (1997). Computerized simulation of color appearance for dichromats. Josa a, 14(10), 2647-2655.

[8] Zhu, Z., Toyoura, M., Go, K., Fujishiro, I., Kashiwagi, K., & Mao, X. (2019). Processing images for red–green dichromats compensation via naturalness and information-preservation considered recoloring. The Visual Computer, 35, 1053-1066.

[9] Zhu, Z., Toyoura, M., Go, K., Kashiwagi, K., Fujishiro, I., Wong, T. T., & Mao, X. (2021). Personalized image recoloring for color vision deficiency compensation. IEEE Transactions on Multimedia, 24, 1721-1734.

[10] Tsekouras, G. E., Rigos, A., Chatzistamatis, S., Tsimikas, J., Kotis, K., Caridakis, G., & Anagnostopoulos, C. N. (2021). A novel approach to image recoloring for color vision deficiency. Sensors, 21(8), 2740.

[11] Huang, J. B., Chen, C. S., Jen, T. C., & Wang, S. J. (2009, April). Image recolorization for the colorblind. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 1161-1164). IEEE.

== Appendix I ==
- Upload source code, test images, etc, and give a description of each link. In some cases, your acquired data may be too large to store practically. In this case, use your judgement (or consult one of us) and only link the most relevant data. Be sure to describe the purpose of your code and to edit the code for clarity. The purpose of placing the code online is to allow others to verify your methods and to learn from your ideas.

== Appendix II ==
'''Ishikaa''':
* Training, evaluation and visualization for each of MLP, U-Net and Autoencoder
* Recolorization script (adapting from MATLAB) and adding severity index
* 'Ground Truth' dataset creation and logging
* AWS Compute setup and configuration
* Written Report & Presentation

'''Raina''':

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-13T04:48:49Z

Rainas: /* GMM-based Method */

== Introduction ==
Color Vision Deficiency (CVD) affects approximately 350 million individuals worldwide, impairing their ability to distinguish certain colors. Image recoloring for individuals with CVDs has been a well-researched area, with numerous attempts aimed at creating images that make colors more distinguishable for those with CVDs. CVDs are typically classified into three main types: protanopia (difficulty perceiving red), deuteranopia (difficulty perceiving green), and tritanopia (difficulty perceiving blue). These conditions arise due to the absence or malfunction of certain types of cone cells in the retina, which are responsible for color vision. For instance, the lack or defect of red or green cones leads to protanopia or deuteranopia, respectively. This can result in partial or complete loss of perception of specific colors. Moreover, individuals with CVD can be dichromats (completely missing one type of cone cell), or anomalous trichromats (having all three types of cones but with altered sensitivity), causing milder color perception issues. Most research has focused on protanopia and deuteranopia, as these conditions are more prevalent, and only a few consider different severity levels.

In this work, we aim to consider all three types of CVDs, taking into account varying levels of severity for personalization. We explore existing methods in the field and experiment through two main approaches: mathematical transformations and deep learning techniques. We will start by reviewing current advancements in these two domains, followed by presenting our experiments and results. Evaluations of each method will be provided, leading to a discussion of our findings and outlining potential directions for future work.

The motivation for this work arose from a personal experience. While admiring a rainbow, a phenomenon I find among the most beautiful in the world, with a friend who has deuteranopia, I realized that they were unable to distinguish the vibrant array of colors. This experience highlighted the emotional and perceptual gap caused by CVD, inspiring the goal of this project: to develop personalized and efficient tools that enhance color perception for individuals with CVDs. Ultimately, we aim to enable those with CVDs to enjoy the same vivid experiences, such as the beauty of a rainbow, experienced by those with normal color vision.

== Background ==
In recent years, numerous methods have been developed to recolor images for individuals with CVDs, ranging from traditional mathematical approaches to advanced deep learning techniques. This section focuses on the prominent recent works in these two categories.

=== Mathematical-based methods ===
Mathematical approaches to image recoloring for individuals with CVDs have been extensively developed to enhance color discrimination while trying to preserve the natural appearance of images. These methods typically involve color space transformations, optimization techniques, and perceptual modeling to achieve their objectives.

==== Daltonization ====
Daltonization enhances images for individuals with CVD by correcting colors based on the simulated deficiency. The process involves comparing the original LMS values with the simulated deficient values to compute the error:
<math display="block">
\text{Error}_{\text{LMS}} = \text{LMS}_{\text{original}} - \text{LMS}_{\text{simulated}}
</math>

The error is then mapped back to the RGB space using a correction matrix. For example, the correction matrix for protanopia, as implemented in tools like Vischeck [6], is:
<math display="block"> \text{Correction Matrix for Protanopia} = \begin{bmatrix} 0.0 & 0.0 & 0.0 \\ 0.7 & 1.0 & 0.0 \\ 0.7 & 0.0 & 1.0 \end{bmatrix} </math>

The corrected RGB values are added back to the original LMS values to generate a daltonized image that improves contrast for CVD viewers.

The simulation of CVDs relies on the physiology of human vision, particularly the responses of the Long (L), Medium (M), and Short (S) wavelength-sensitive cones in the retina. The LMS color space is derived from the spectral sensitivities of these cones, making it an ideal framework for modeling human color perception.

To simulate CVD, colors are first transformed into the LMS color space using the following linear transformation matrix based on Stockman and Sharpe’s cone fundamentals:
<math display="block">
T_{\text{RGB-to-LMS}} = \begin{bmatrix}
0.3904725 & 0.54990437 & 0.00890159 \\
0.07092586 & 0.96310739 & 0.00135809 \\
0.02314268 & 0.12801221 & 0.93605194
\end{bmatrix}
</math>

For individuals with CVD, the missing cone’s response is replaced by a weighted combination of the remaining two cones. This approach, introduced by Brettel, Viénot, and Mollon (1997) [7], uses specific coefficients derived from cone sensitivities. For example, in protanopia (L-cone deficiency), the L-cone response is approximated using the M- and S-cone responses as:
<math display="block">
L_{\text{simulated}} = 0 \cdot L + 0.90822864 \cdot M + 0.008192 \cdot S
</math>

For deuteranopia (M-cone deficiency), the M-cone is replaced as:
<math display="block">
M_{\text{simulated}} = 1.10104433 \cdot L + 0 \cdot M - 0.00901975 \cdot S
</math>

For tritanopia (S-cone deficiency), the S-cone is replaced as:
<math display="block">
S_{\text{simulated}} = -0.15773032 \cdot L + 1.19465634 \cdot M + 0 \cdot S
</math>

These transformations allow accurate simulation of the perceptual experience of individuals with CVD. (The numbers are derived from [5]).

==== Optimization-based Method ====
Zhu et al. [8] introduced an optimization-based recoloring framework for red-green dichromacy, aiming to balance naturalness and contrast. The framework minimizes a total loss function defined as:

<math display="block"> E = \beta E_{\text{nat}} + E_{\text{cont}} </math>

where <math>\beta</math> is a scalar weight that controls the trade-off between the two objectives: naturalness preservation (<math>E_{\text{nat}}</math>) and contrast enhancement (<math>E_{\text{cont}}</math>).

The naturalness term, <math>E_{\text{nat}}</math>, ensures that the recolored image closely resembles the original image for CVD viewers by minimizing perceptual differences:

<math display="block"> E_{\text{nat}} = \sum_{i=1}^N \| c_i^+ - c_i \|^2, </math>

where:
* <math>N</math> is the total number of pixels in the image,
* <math>c_i</math> is the original color of the <math>i</math>-th pixel,
* <math>c_i^+</math> is the recolored value of the <math>i</math>-th pixel,
* <math>\| c_i^+ - c_i \|</math> is the Euclidean distance, measuring the perceptual difference between the original and recolored colors.

The contrast term, <math>E_{\text{cont}}</math>, enhances the distinguishability of colors in the recolored image by minimizing changes in color contrast:

<math display="block"> E_{\text{cont}} = \sum_{i \neq j} \| (c_i^+ - c_j^+) - (c_i - c_j) \|^2, </math>

where:
* <math>(c_i^+ - c_j^+)</math> is the perceived color difference between pixels <math>i</math> and <math>j</math> after recoloring,
* <math>(c_i - c_j)</math> is the original color difference,
* <math>\| (c_i^+ - c_j^+) - (c_i - c_j) \|</math> represents the deviation in color contrast before and after recoloring.

To address the limitations of this approach, Zhu et al. [9] proposed a degree-adaptable framework incorporating a transformation matrix <math>T</math> that simulates CVD perception. The transformation matrix is defined as:

<math display="block"> T = \begin{bmatrix} t_{11} & t_{12} & t_{13} \\ t_{21} & t_{22} & t_{23} \\ t_{31} & t_{32} & t_{33} \end{bmatrix}, </math>

where <math>t_{ij}</math> are the elements representing the relationships between the original and perceived LMS (Long, Medium, Short wavelength) cone responses for individuals with CVD.

The degree-adaptable loss function extends the optimization by adjusting weights based on perceptual importance, defined as:

<math display="block"> E = \beta \sum_{i=1}^N \alpha_i \| T(c_i^+ - c_i) \|^2 + \sum_{i \neq j} \| T(c_i^+ - c_j^+) - T(c_i - c_j) \|^2. </math>

Here:
* <math>\alpha_i</math> assigns weights to each pixel, prioritizing the preservation of colors with smaller perception errors,
* <math>\| T(c_i^+ - c_i) \|</math> measures the perceptual difference after recoloring,
* <math>\| T(c_i^+ - c_j^+) - T(c_i - c_j) \|</math> quantifies the deviation in color contrast under CVD simulation.

This framework improves both contrast and personalization but requires further optimization for real-time performance.

==== Confusion lines based Method ====
Tsekouras et al. [10] proposed a novel image recoloring approach for individuals with protanopia and deuteranopia, focusing on improving color naturalness and enhancing contrast. Their framework consists of four modules, with a key focus on shifting confusing colors along confusion lines in the CIE 1931 chromaticity diagram.

The method begins with fuzzy clustering to extract representative colors (key colors) from the input image. These colors are mapped onto the CIE 1931 chromaticity diagram, where confusion lines represent loci of colors perceived as identical by individuals with CVD. Confusion lines are defined using the copunctal point of the missing cone type and another reference point:

<math display="block">
d(v, L) = \frac{\left|(x_{cp} - x_0)(y_0 - y_v) - (x_0 - x_v)(y_{cp} - y_0)\right|}{\sqrt{(x_{cp} - x_0)^2 + (y_{cp} - y_0)^2}},
</math>

where:
* <math display="inline">v = (x_v, y_v)</math> is the chromaticity coordinate of the color,
* <math display="inline">L</math> is the confusion line passing through the copunctal point <math display="inline">(x_{cp}, y_{cp})</math> and another reference point <math display="inline">(x_0, y_0)</math>,
* <math display="inline">d(v, L)</math> measures the perpendicular distance from the point <math display="inline">v</math> to the confusion line <math display="inline">L</math>.

Confusing colors, identified as key colors lying on occupied confusion lines, are iteratively shifted to the nearest non-occupied confusion lines to enhance discriminability for CVD viewers. The translation process involves:

1. Ranking key colors by their cluster sizes:
<math display="block">
\text{rank}(v_i) = \frac{|A_i|}{\sum_{j=1}^{n_A}|A_j|},
</math>

where:
* <math display="inline">v_i</math> is the chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">|A_i|</math> is the cardinality (number of pixels) of its associated cluster,
* <math display="inline">n_A</math> is the total number of clusters.

2. Translating the highest-ranked confusing color <math display="inline">v^*</math> to its projection on the nearest non-occupied confusion line:
<math display="block">
v^*_{\text{tr}} = \text{proj}(v^*, L^*),
</math>

where:
* <math display="inline">v^*_{\text{tr}}</math> is the new position of the color <math display="inline">v^*</math> after translation,
* <math display="inline">L^*</math> is the nearest non-occupied confusion line, determined as:
<math display="block">
d(v^*, L^*) = \min_{L \in \text{CL}_D} d(v^*, L).
</math>

3. Updating the sets of confusing colors and non-occupied confusion lines iteratively:
<math display="block">
\Phi_V = \Phi_V - \{v^*\}, \quad \text{CL}_D = \text{CL}_D - \{L^*\}.
</math>

where:
* <math display="inline">\Phi_V</math> is the set of confusing colors,
* <math display="inline">\text{CL}_D</math> is the set of non-occupied confusion lines.

After shifting, the luminance of the recolored key colors is optimized using a regularized objective function to balance naturalness and contrast:
<math display="block">
E = (E_1 + E_2) + \lambda E_3,
</math>

where:
* <math display="inline">E</math> is the total loss,
* <math display="inline">\lambda</math> is a weight parameter controlling the trade-off between contrast enhancement and naturalness preservation.

The first term, <math display="inline">E_1</math>, measures contrast enhancement for normal trichromats:

<math display="block">
E_1 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - b_j\| - \|f_D(a_{i,\text{rec}}) - f_D(b_j)\| \right|,
</math>

where:
* <math display="inline">n_A</math> and <math display="inline">n_B</math> are the number of key colors in clusters <math display="inline">A</math> and <math display="inline">B</math>, respectively,
* <math display="inline">a_i</math> is the chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">b_j</math> is the chromaticity of the <math display="inline">j</math>-th key color in cluster <math display="inline">B</math>,
* <math display="inline">f_D</math> is a function simulating the dichromatic vision of individuals with color vision deficiencies,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color.

The second term, <math display="inline">E_2</math>, measures contrast enhancement for dichromats:

<math display="block">
E_2 = \frac{1}{n_A n_B} \sum_{i=1}^{n_A} \sum_{j=1}^{n_B} \left| \|a_i - a_j\| - \|f_D(a_{i,\text{rec}}) - f_D(a_{j,\text{rec}})\| \right|,
</math>

where:
* <math display="inline">a_i</math> and <math display="inline">a_j</math> are the chromaticities of the <math display="inline">i</math>-th and <math display="inline">j</math>-th key colors in cluster <math display="inline">A</math>,
* <math display="inline">f_D(a_{i,\text{rec}})</math> simulates the dichromatic perception of the recolored chromaticity <math display="inline">a_{i,\text{rec}}</math>.

The third term, <math display="inline">E_3</math>, preserves the naturalness of the recolored image:

<math display="block">
E_3 = \frac{1}{n_A} \sum_{i=1}^{n_A} \|a_i - a_{i,\text{rec}}\|,
</math>

where:
* <math display="inline">a_i</math> is the original chromaticity of the <math display="inline">i</math>-th key color in cluster <math display="inline">A</math>,
* <math display="inline">a_{i,\text{rec}}</math> is the recolored chromaticity of the <math display="inline">i</math>-th key color,
* <math display="inline">\|a_i - a_{i,\text{rec}}\|</math> is the Euclidean distance between the original and recolored chromaticities, measuring how much the naturalness is preserved.

This method significantly enhances the contrast and naturalness of recolored images by leveraging confusion line geometry and regularized optimization. However, challenges remain in achieving real-time performance and handling cases where shifting may distort the aesthetic quality of the image.

==== Segmentation-based Method ====

==== GMM-based Method ====
Huang et al. [11] proposed an efficient and effective re-coloring algorithm for individuals with CVD using a Gaussian Mixture Model (GMM) to represent color distributions. The algorithm comprises four main steps: feature extraction, clustering using GMM, optimization of Gaussian components, and interpolation for recoloring.

Step 1 - Feature Extraction:
Each pixel in the input image is represented in the CIEL*a*b* color space, which approximates perceptual differences using the Euclidean distance between colors. The color feature vector <math display="inline">x</math> is used as input for clustering.

Step 2 - Clustering via GMM:
The color distribution of the image is modeled using a GMM with <math display="inline">K</math> Gaussian components:
<math display="block">
p(x|\Theta) = \sum_{i=1}^K \omega_i G_i(x|\theta_i),
</math>
where:
* <math display="inline">\Theta</math> is the parameter set containing all weights, means, and covariance matrices,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian,
* <math display="inline">G_i(x|\theta_i)</math> is the 3D normal distribution with parameters <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix).

Diving into more details, the parameters of the GMM are initialized using the K-means algorithm and refined via the Expectation-Maximization (EM) algorithm, which consists of the E-step and the M-step:

The E-step calculates the probability of each color (or pixel) belonging to a specific Gaussian component in the GMM. This probability, also known as the "responsibility," reflects how much each Gaussian contributes to the representation of a color:

<math display="block">
p(i|x_j, \Theta^{\text{old}}) = \frac{\omega_i G_i(x_j|\theta_i)}{\sum_{k=1}^K \omega_k G_k(x_j|\theta_k)}.
</math>

Here:
* <math display="inline">p(i|x_j, \Theta^{\text{old}})</math> is the probability of the <math display="inline">j</math>-th color feature <math display="inline">x_j</math> belonging to the <math display="inline">i</math>-th Gaussian component,
* <math display="inline">\omega_i</math> is the mixing weight of the <math display="inline">i</math>-th Gaussian, representing its overall contribution to the color distribution,
* <math display="inline">G_i(x_j|\theta_i)</math> is the Gaussian distribution for the <math display="inline">i</math>-th component, evaluated at <math display="inline">x_j</math>, where <math display="inline">\theta_i = (\mu_i, \Sigma_i)</math> (mean vector and covariance matrix),
* <math display="inline">\sum_{k=1}^K \omega_k G_k(x_j|\theta_k)</math> normalizes the probabilities by considering the contributions of all <math display="inline">K</math> Gaussians to the <math display="inline">j</math>-th pixel.

This step essentially assigns each pixel a "soft" membership to each Gaussian component, rather than forcing a hard clustering decision. Pixels that are close to a Gaussian's mean (in feature space) will have higher probabilities of belonging to that Gaussian.

The M-step updates the parameters of each Gaussian component based on the probabilities computed in the E-step. These updates refine the Gaussian model to better fit the data:

1. Update the mixing weights:
<math display="block">
\omega_i^{\text{new}} = \frac{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}})}{N},
</math>
This equation calculates the proportion of pixels assigned to the <math display="inline">i</math>-th Gaussian. It reflects how dominant each Gaussian is in representing the color distribution.

2. Update the means:
<math display="block">
\mu_i^{\text{new}} = \frac{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}}) x_j}{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}})},
</math>
This equation computes the new mean vector <math display="inline">\mu_i^{\text{new}}</math> for the <math display="inline">i</math>-th Gaussian. It is a weighted average of all pixel feature vectors <math display="inline">x_j</math>, where the weights are the probabilities <math display="inline">p(i|x_j, \Theta^{\text{old}})</math>. Pixels with higher probabilities contribute more to the new mean.

3. Update the covariance matrices:
<math display="block">
\Sigma_i^{\text{new}} = \frac{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}}) (x_j - \mu_i^{\text{new}})(x_j - \mu_i^{\text{new}})^T}{\sum_{j=1}^N p(i|x_j, \Theta^{\text{old}})}.
</math>
This equation calculates the new covariance matrix <math display="inline">\Sigma_i^{\text{new}}</math> for the <math display="inline">i</math>-th Gaussian. It measures the spread of pixel features around the new mean, weighted by the probabilities from the E-step.

Step 3 - Optimization:
To ensure color distinguishability for CVD viewers, the algorithm adjusts the mean vector of each Gaussian component using an optimization function that preserves the symmetric Kullback-Leibler (KL) divergence:
<math display="block">
D_{sKL}(G_i, G_j) = D_{KL}(G_i \| G_j) + D_{KL}(G_j \| G_i),
</math>
where:
* <math display="inline">D_{KL}(G_i \| G_j)</math> measures the dissimilarity between two Gaussian distributions <math display="inline">G_i</math> and <math display="inline">G_j</math>.

The optimization aims to preserve the contrast perceived by CVD viewers while maintaining naturalness. Weights are assigned to Gaussian components based on the perceptual importance of colors:
<math display="block">
\lambda_i = \frac{\sum_{j=1}^N \alpha_j p(i|x_j, \Theta)}{\sum_{k=1}^K \sum_{j=1}^N \alpha_j p(k|x_j, \Theta)},
</math>
where:
* <math display="inline">\alpha_j = \|x_j - \text{Sim}(x_j)\|</math> is the perceptual error of the <math display="inline">j</math>-th color feature when simulated for CVD,
* <math display="inline">\text{Sim}(\cdot)</math> is the simulation function for CVD perception.

Step 4 - Interpolation for Recoloring:
After optimizing the Gaussians, the mapping function <math display="inline">M_i(\cdot)</math> relocates the mean vectors while maintaining covariance matrices. Interpolation ensures smooth transitions between recolored regions:
<math display="block">
T(x_j)_H = x_j^H + \sum_{i=1}^K p(i|x_j, \Theta) (M_i(\mu_i)_H - \mu_i^H),
</math>
where:
* <math display="inline">T(x_j)_H</math> is the hue adjustment for the <math display="inline">j</math>-th color,
* <math display="inline">M_i(\mu_i)_H</math> is the mapped hue of the <math display="inline">i</math>-th Gaussian's mean.

While the GMM-based approach effectively models color distributions and enhances the contrast of recolored images significantly, it has limitations:
* The accuracy of recoloring depends on the choice of <math display="inline">K</math>, which may vary for different images.
* The method assumes diagonal covariance matrices for computational efficiency, which may oversimplify real-world color distributions. Sometimes the colors in the recolored images are not very natural.
* The high computational complexity in the optimization step of this algorithm may be difficult for real-time applications.

=== Deep Learning based methods ===
Conventional methods for recoloring, including optimization-based approaches (as discussed above), fail to generalize well across varying severity levels and CVD types. While these methods improve color differentiation, they frequently compromise naturalness or require extensive computational resources, making them less suitable for real-time, efficient, personalized applications.

==== GAN-Based Recoloring for CVD ====

In [1] GANs (Generative Adversarial Networks) was explored for recoloring, with a backbone Pix2Pix-GAN, Cycle-GAN, and Bicycle-GAN structure showing promising results. These models are generate creative recolored images by learning mappings between normal and CVD-affected color spaces. However, this and existing GAN approaches struggle with balancing naturalness and contrast. This specific reference also requires paired datasets (since it is adapted from style transfer), making it computationally intensive and less suitable for personalization.

==== Swin Transformer Recoloring ====

The authors in [2] introduced a hierarchical vision transformer (SWIN) architecture that processes images through shifted windows, effectively capturing both local and global contextual information. In computer vision, this design generally allows efficient handling of high-resolution images and has been applied to various tasks, including image classification and object detection. Despite its robust performance, this architecture is still computationally intensive and does not inherently account for the specific needs of CVD individuals, as it lacks mechanisms for personalized color adjustments.

==== Personalized CVD-GAN ====

To cater to the diverse needs of the CVD population, the Personalized CVD-GAN [3] was developed. This model generates images that are not only CVD-friendly but also tailored to individual degrees of color vision deficiency. By disentangling color representations using a unique triple-latent structure in their method, continuous personalization was possible to adjust images according to specific CVD severities. While effective, this approach is computationally demanding, making it less practical for real-time applications. In our experiment, it took around 18 days for one epoch (or one iteration over the entire dataset).

Thus, existing methods either lack personalization or are too resource-intensive for widespread use.

== Methods ==

=== Deep Learning based ===

==== Task Overview ====
Given an input RGB image and a label for the user (as shown in the figure), we want a deep learning model to output a recolored RGB image that is specific to that user. More details on inputs and outputs are discussed in further sections but an overview is shown in Figure 1. All of the code was done in Python using a deep learning framework called [https://pytorch.org PyTorch]
[[File:Io.png|right|thumb|200px|Figure 1: Dataset]]

==== Types ====
1. ''' Supervised methods ''':
These are deep learning models that require a 'ground truth' recolored image for the neural network to learn recolorization. While these methods are simple, easy to train and integrate the user label, they require an already present ground truth comparison of expected output.

2. ''' Unsupervised methods ''':
These models are trained without a ground truth and can also encode user label information while training. They are generally better at generating more natural images, but they require more compute and sophisticated model architectures or loss functions for the recoloring task

==== Dataset ====
The dataset used for this project was constructed specifically to address the challenges of recoloring images for individuals with color vision deficiency (CVD). We first gathered an open-source RGB image dataset from [2]. To improve the capability of the proposed model to enhance the contrast between CVD-indistinguishable color
pairs, in their study, they created a new dataset consisting of 141,000 pictures of both natural scenes and artificial images containing
CVD-confusing colors without labels. To generate labels (and ground truth recolored images for supervised methods), we randomly sampled 15,000 images and recolored by simulating random labels for severity and type of CVD. The recoloring for ground truth images was done using a [https://github.com/jbhuang0604/RecolorForColorblind/tree/master MATLAB script] (adapted to Python) from [4]. Note: The open-source tools used in the Python version for the recoloring script were [https://scikit-image.org Scikit-Image], [https://scipy.org Scipy] and [https://python-colormath.readthedocs.io/en/latest/ Colormath].

As shown in Figure 1, each sample in the dataset consists of:
1. ''' Original RGB Image''' : High-resolution images, resized to <code> 256x256</code> pixels and normalized to <code>[0,1]</code> range, representing the standard color space.

2. ''' CVD Labels ''' : Condition labels encoded as <code>severity * [protan, deutan]</code>, where severity ranges from 0.1 to 1.0. For example, a label <code>[0.6, 0]</code> corresponds to protanopia at 60% severity.

Data augmentation techniques such as random rotations, crops, and brightness adjustments were applied to expand the dataset, ensuring robust model generalization across diverse scenarios.

==== Supervised Methods ====
===== Conditional Parallel RGB MLP =====
[[File:mlp.png|right|thumb|Figure 2: Conditional MLP architecture]]
As shown in Figure 2, the model predicts the R, G, and B channels separately using an independent multi-layer perceptron (MLP) for each channel. The input image is concatenated with the label encoding along the channel dimension and is passed to 3 parallel MLPs simultaneously. These parallel networks are learned to predicted R, G, B channels of a recolored image based on given ground truth. The outputs from each of these networks are concatenated to produce the recolored RGB image of same spatial dimensions as input. Essentially, each channel is disentangled, enabling targeted adjustments.

The loss function used to train was pixel wise, mean-squared error loss:
<math>
\mathcal{L}_{\text{MSE}} = \frac{1}{N} \sum_{p=1}^{N} \left( I(p) - I'(p) \right)^2
</math>

Where:
* I, I': Recolored (model output) image and ground truth recolored image respectively
* p: Image index
* N: Total number of images

===== Conditional U-Net =====
In a similar fashion of inputs, a convolutional neural network (CNN)-based U-Net architecture was tested to generate a full recolored image as output. The conditional inputs here affect both the encoder and decoder. [[File:Unet condtional.png|right|thumb|Figure 3: Conditional U-Net architecture]]
U-Nets are widely used in computer vision tasks and are very robust to new tasks as well. The architecture we adopted is shown in Figure 3.
The loss function used to train the U-Net was a commonly used VGG Perceptual Loss:
<math>
\mathcal{L}_{\text{VGG}} = \sum_{l} \frac{1}{N_l} \| \phi_l(I) - \phi_l(I') \|_2^2
</math>

Where:
* I and I': are recolored (model output) and ground recolored images respectively
* <math>\phi_l</math> is the l-th of the pre-trained VGG network

==== Unsupervised Methods ====
===== Conditional Autoencoder =====
As shown in Figure4, an unsupervised CNN-based encoder-decoder network was trained to reconstruct full recolored images with a CVD-aware color palette. The key to making this network align with the recoloring task was the loss functions. The loss functions we used to train this network were inspired from [2]. [[File:Ae.png|right|350px|thumb|Figure 4: Conditional Autoencoder architecture]]

The total loss function is given by:
<math>
\mathcal{L}_{\text{total}} = \alpha \cdot \mathcal{L}_{\text{naturalness}} + 2 \cdot (1 - \alpha) \cdot \mathcal{L}_{\text{contrast}}
</math>

Where:
<math>
\mathcal{L}_{\text{contrast}} = \beta \cdot \mathcal{L}_{\text{global}} + (2 - \beta) \cdot \mathcal{L}_{\text{local}}
</math>

The components of the loss functions are described below:

1. '''Global Contrast Loss''':
The global contrast loss ensures that the overall contrast of the recolored image is preserved. It is defined as
<math>
\mathcal{L}_{global} = \frac{1}{\|\omega\|} \sum_{<x, y> \in \epsilon \omega} \text{CL}(x, y)
</math>

2. '''Local Contrast Loss''':
The local contrast loss focuses on preserving the contrast within a small neighborhood around each pixel. <math>
\mathcal{L}_{l} = \frac{1}{N} \sum_{x=1}^{N} \sum_{y \in \omega_x} \frac{\text{CL}(x, y)}{\|\omega_x\|}
</math>

Note:

<math>
\text{CL}(x, y) = \|\hat{c}_x' - \hat{c}_y'\| - \|c_x - c_y\|
</math>

* x,y: Two distinct pixels in the image.
* cx and cy: CVD simulated colors of original image
* c^x′and c^y: CVD simulated colors of recolored image (model output)
* ||w||: Global (or large) window of image
* ||wx||: Local window or neighborhood around a pixel x

3. '''Naturalness Loss''':
The naturalness loss drives output image to have colors that are visually similar and close to natural distributions. <math>
\mathcal{L}_{\text{natural}} = 1 - \text{SSIM}(I', I)
</math>

Where:
* I(i), I'(i): Original and recolored images respectively

== Results ==
=== Deep Learning based methods ===
The results focus on evaluating the performance of the above neural network architectures—Conditional Parallel RGB MLP, Deep U-Net, and Conditional Autoencoder. Quantitive metrics such as Structural Similarity Index (SSIM), total color contrast (TCC), Chromatic Difference (CD), and inference time were used to assess the effectiveness of the models provided in [1] and [2].

==== Qualitative Results ====
The recolored outputs were visually evaluated to determine their alignment with expected results. The 'expected' results for supervised mean how closely they resemble ground truth recolored image and for unsupervised method mean how much contrast and naturalness is observed in the CVD simulated recolored images compared to original.
The results and takeaways can be summarized as follows:

1. '''Conditional Parallel RGB MLP''': (Figure 5)
[[File:Mlp_res.png|right|400px|thumb|Figure 5 Conditional MLP: Model failure]]
* Recoloring was inconsistent, with visible artifacts in regions where spatial correlations were essential.
* The pixels seemed more discretized, suggesting that disentanglement was not very useful for this case (especially naturalness).
* Failed to preserve natural color transitions, particularly in complex images.
2. '''Conditional U-Net''': (Figure 6, 7)
[[File:Unet_res1.png|right|400px|thumb|Figure 6 Conditional U-Net: Model failure]]
[[File:Unet_res2.png|right|400px|thumb|Figure 7 Conditional U-Net: CVD Simulated examples]]
* Produced stable recoloring, preserving structural details.
* Initially showed improvement towards resembling ground truth, but over time started 'reconstructing' the colors of the original image.
* The CVD simulations of recolored versus original were similar or worse meaning that the model was not doing well for this task
* Sometimes it over-saturated some colors, affecting the visual appeal.
3. '''Conditional Autoencoder''': (Figure 8, 9)
[[File:ae_res1.png|right|400px|thumb|Figure 8 Conditional Autoencoder: Majority good results]]
[[File:ae_res1.png|right|400px|thumb|Figure 9 Conditional Autoencoder: Marginal or negative improvement + Blurriness]]
* Achieved smooth and natural recoloring, with fewer artifacts.
* Showed the highest contrast improvement among the three models.
* In some cases, hurt the contrast in the CVD simulated colors and in some there was marginal improvement in contrast.
* Blurriness in the recolored images was seen (possibly due to naturalness factor being more prioritized even though weight coefficients in the loss term favored contrast (alpha = 0.25, beta = 1.0)).

==== Quantitative Results ====
Based on the above qualitative results, we decided to score and evaluate metrics for comparison with related work only using the Conditional Autoencoder.
As mentioned above, the evaluation metrics are adapted from [1] and [2]. Please refer to the definitions in the paper, as we have used the same. On a high level, the three components are:
* SSIM: Measures the structural similarity between the original and recolored images, ensuring the structural integrity of the recolored image is maintained.
<math>
SSIM(X, Y) = \frac{(2\mu_X\mu_Y + c_1)(2\sigma_{XY} + c_2)}{(\mu_X^2 + \mu_Y^2 + c_1)(\sigma_X^2 + \sigma_Y^2 + c_2)}
</math>

* Total Color Contrast: Quantifies the visibility improvement between indistinguishable colors for CVD individuals.
<math>
TCC = \frac{1}{n_1} \sum_{(i,j) \in \Omega_1} |x_i - x_j|
+ \frac{1}{N \cdot n_2} \sum_{i=1}^{N} \sum_{j \in \Omega_2} |x_i - x_j|
</math>
* Chromatic Difference: Quantifies the perceptual differences in color before and after recoloring, ensuring enhanced distinguishability
<math>
CD(i) = \sqrt{\lambda (l_i' - l_i)^2 + (a_i' - a_i)^2 + (b_i' - b_i)^2}
</math>
(lamda is a constant, not wavelength and l,a,b represent LAB space coordinates of recolored (') and original respectively.)
* Inference Time: Determines the computational efficiency of the models.

The key results are in Table 1 and takeaways for the Conditional Autoencoder can be summarized as follows:

{| class="wikitable" style="text-align:center; width:30%; margin:auto;"
|-
! Metric
! Value
|-
| Inference Time
| 2.6 seconds/image
|-
| SSIM ("Structure")
| 0.8707
|-
| Total Color Contrast ("Distinguishability")
| 0.5771 / (~0.851)*
|-
| Chromatic Difference ("Color")
| 0.3521 / (~0.963)*
|+ '''Table 1: Quantitative Evaluation Results'''
|}

Note: * indicates results from paper [2] for protan/deutan whichever is larger.

* TCC and CD are good but not as good as paper [2] because they use optimize networks for each CVD type separately.
* Blurry (SSIM is not optimized for enough)
* Mixing CVD types in the same network needs to be more sophisticated

== Conclusions ==
Through our (many) experiments, we learned a couple of things:

1. '''Model Effectiveness''':
Among the models, the Conditional Autoencoder showed the best balance between enhancing color contrast and preserving naturalness. It improved the distinguishability of colors for CVD individuals while maintaining a smooth, visually appealing output. However, it produced slightly blurry images, which could be improved with better loss functions or refinement techniques. The Conditional U-Net was also effective in preserving structure and providing stable recoloring, but it required careful training to avoid overfitting. The Conditional Parallel RGB MLP, while computationally fast, lacked the ability to capture spatial relationships between pixels, making it unsuitable for this task.

2. '''Importance of Loss Functions''':
Designing appropriate loss functions was crucial for achieving the right balance between naturalness, contrast enhancement, and structural preservation. The global and local contrast losses significantly improved the visibility of recolored images, while the naturalness loss ensured that the outputs did not look artificial. Incorporating metrics like SSIM and Chromatic Difference into the evaluation also helped us better understand how well the models performed.

3. '''Challenges with Data''':
One of the biggest challenges was ensuring that the dataset effectively represented real-world scenarios for CVD individuals. Simulating CVD perceptions and generating recolored images that matched those perceptions required a well-defined pipeline. A more diverse dataset or additional user studies with CVD participants could help fine-tune the models further.

4. '''Computational Efficiency''':
While models like the Conditional Autoencoder and Conditional U-Net provided high-quality recoloring, their inference times were moderate, making them feasible for real-time applications. Optimizing these models further could make them more scalable for real-world use cases, such as accessibility tools in apps or websites.

5. '''What Worked and What Didn’t''':
* Worked: Contrast enhancement methods using local and global losses were effective in improving visibility for CVD individuals. Transformer-inspired loss functions borrowed from Swin architecture added robustness.
* Didn’t Work: Pixel-wise methods like the Conditional RGB MLP struggled due to their inability to handle spatial dependencies. Additionally, overfitting was a recurring issue in larger architectures without careful training.

6. '''Future Directions''':
* Better Loss Functions: Refining the loss functions to address issues like blurriness in outputs could further improve results.
* User Studies: Testing the models with real CVD participants would provide valuable insights and help validate the results.
* Model Optimization: Reducing the computational cost of high-performing models like the Conditional Autoencoder could make them more practical for deployment.
* Exploration of New Architectures: Trying newer methods, such as lightweight transformers or diffusion-based models, might enhance recoloring performance while maintaining efficiency.

While there’s still room for improvement, our models demonstrated the potential of deep learning in addressing the challenges faced by individuals with CVD. Our future work would focus on refining these methods and bringing them closer to practical, everyday applications.

== References ==
[1] H. Li, L. Zhang, X. Zhang, M. Zhang, G. Zhu, P. Shen, P. Li, M. Bennamoun, and S. A. A. Shah, "Color vision deficiency datasets & recoloring evaluation using GANs," Multimedia Tools and Applications, vol. 79, no. 37–38, pp. 27583–27614, 2020, doi: 10.1007/s11042-020-09299-2.

[2] L. Chen, Z. Zhu, W. Huang, K. Go, X. Chen, and X. Mao, "Image recoloring for color vision deficiency compensation using Swin transformer," Neural Computing and Applications, vol. 36, no. 11, pp. 6051–6066, 2024, doi: 10.1007/s00521-023-09367-2.

[3] S. Jiang, D. Liu, D. Li, and C. Xu, "Personalized Image Generation for Color Vision Deficiency Population," in 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2023, pp. 22514–22523, doi: 10.1109/ICCV51070.2023.02063.

[4] J.-B. Huang, C.-S. Chen, T.-C. Jen, and S.-J. Wang, "Image recolorization for the colorblind," GitHub repository, [Online]. Available: https://github.com/jbhuang0604/RecolorForColorblind. [Accessed: 12-Dec-2024].

[5] J. Dietrich, "Daltonize Python Package," GitHub repository, [Online]. Available: https://github.com/joergdietrich/daltonize/blob/main/daltonize/daltonize.py. [Accessed: 12-Dec-2024].

[6] B. Dougherty and A. Wade, "Vischeck," 2000. [Online]. Available: https://www.vischeck.com/. [Accessed: 12-Dec-2024].

[7] J. Brettel, F. Viénot, and J. D. Mollon, "Computerized simulation of color appearance for dichromats," Journal of the Optical Society of America A, vol. 14, no. 10, pp. 2647–2655, 1997, doi: 10.1364/JOSAA.14.002647.

[8] Z. Zhu, M. Toyoura, K. Go, I. Fujishiro, K. Kashiwagi, and X. Mao, "Processing images for red-green dichromats compensation via naturalness and information-preservation considered recoloring," Visual Computer, vol. 35, no. 6–8, pp. 1053–1066, 2019, doi: 10.1007/s00371-019-01723-5.

[9] Z. Zhu, M. Toyoura, K. Go, I. Fujishiro, K. Kashiwagi, and X. Mao, "Personalized image recoloring for color vision deficiency compensation," IEEE Transactions on Multimedia, vol. 24, pp. 1721–1733, 2022, doi: 10.1109/TMM.2021.3130546.

[10] G. E. Tsekouras, A. Rigos, S. Chatzistamatis, and N. Grammalidis, "A novel approach to image recoloring for color vision deficiency," Sensors, vol. 21, no. 8, p. 2740, Apr. 2021, doi: 10.3390/s21082740.

== Appendix I ==
- Upload source code, test images, etc, and give a description of each link. In some cases, your acquired data may be too large to store practically. In this case, use your judgement (or consult one of us) and only link the most relevant data. Be sure to describe the purpose of your code and to edit the code for clarity. The purpose of placing the code online is to allow others to verify your methods and to learn from your ideas.

== Appendix II ==
'''Ishikaa''':
* Training, evaluation and visualization for each of MLP, U-Net and Autoencoder
* Recolorization script (adapting from MATLAB) and adding severity index
* 'Ground Truth' dataset creation and logging
* AWS Compute setup and configuration
* Written Report & Presentation

'''Raina''':

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-13T04:23:02Z

Rainas:

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-13T04:21:15Z

Rainas: /* Confusion-line-based Method */

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-13T04:11:52Z

Rainas: /* Introduction */

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-13T04:08:28Z

Rainas:

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-12T17:56:47Z

Rainas:

== Introduction ==
Image recoloring for individuals with color vision deficiencies (CVDs) has been a well-researched area, with numerous attempts aimed at creating images that make colors more distinguishable for those with CVDs. CVDs are typically classified into three main types: protanopia (difficulty perceiving red), deuteranopia (difficulty perceiving green), and tritanopia (difficulty perceiving blue). Most research has focused on protanopia and deuteranopia, as these conditions are more prevalent. These conditions arise due to the absence or malfunction of certain types of cone cells in the retina, which are responsible for color vision. For instance, the lack or defect of red or green cones leads to protanopia or deuteranopia, respectively. This can result in partial or complete loss of perception of specific colors. Moreover, individuals with CVD can be dichromats (completely missing one type of cone cell), or anomalous trichromats (having all three types of cones but with altered sensitivity), causing milder color perception issues.

In this work, we aim to consider all three types of CVDs, taking into account varying levels of severity for personalization. We explore existing methods in the field and experiment through two main approaches: mathematical transformations and deep learning techniques. We will start by reviewing current advancements in these two domains, followed by presenting our experiments and results. Evaluations of each method will be provided, leading to a discussion of our findings and outlining potential directions for future work.

The motivation for this work arose from a personal experience. While admiring a rainbow, a phenomenon I find among the most beautiful in the world, with a friend who has deuteranopia, I realized that they were unable to distinguish the vibrant array of colors. This experience highlighted the emotional and perceptual gap caused by CVD, inspiring the goal of this project: to develop personalized and efficient tools that enhance color perception for individuals with CVDs. Ultimately, we aim to enable those with CVDs to enjoy the same vivid experiences, such as the beauty of a rainbow, experienced by those with normal color vision.

== Background ==
- What is known from the literature.

== Methods ==
- Describe techniques you used to measure and analyze. Describe the instruments, and experimental procedures in enough detail so that someone could repeat your analysis. What software did you use? What was the idea of the algorithms and data analysis?

=== Simulation Tools ===

For this project, we have used three simulation tools:

*''' iset3d toolbox''' We used the iset3d toolbox in MATLAB to provide a physically-accurate ray traced rendering of 3D scenes.

*''' PBRT ''' We Ran PBRT in MATLAB to produce physically accurate images by physically based and ray tracing through scenes.

*''' Docker ''' The implementation of iset3d with PBRT is available in a set of Docker containers.

=== Sine Wave: ===

=== Blue Noise: ===

== Results ==
- Organize your results in a good logical order (not necessarily historical order). Include relevant graphs and/or images. Make sure graph axes are labeled. Make sure you draw the reader's attention to the key element of the figure. The key aspect should be the most visible element of the figure or graph. Help the reader by writing a clear figure caption.

== Conclusions ==
- Describe what you learned. What worked? What didn't? Why? What should someone next year try?

== References ==
- List references. Include links to papers that are online.

== Appendix I ==
- Upload source code, test images, etc, and give a description of each link. In some cases, your acquired data may be too large to store practically. In this case, use your judgement (or consult one of us) and only link the most relevant data. Be sure to describe the purpose of your code and to edit the code for clarity. The purpose of placing the code online is to allow others to verify your methods and to learn from your ideas.

== Appendix II ==
- (for groups only) - Work breakdown. Explain how the project work was divided among group members.

Bridget:

Caelia:

Brian:

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-12T17:55:28Z

Rainas:

== Introduction ==
Image recoloring for individuals with color vision deficiencies (CVDs) has been a well-researched area, with numerous attempts aimed at creating images that make colors more distinguishable for those with CVDs. CVDs are typically classified into three main types: protanopia (difficulty perceiving red), deuteranopia (difficulty perceiving green), and tritanopia (difficulty perceiving blue). Most research has focused on protanopia and deuteranopia, as these conditions are more prevalent. These conditions arise due to the absence or malfunction of certain types of cone cells in the retina, which are responsible for color vision. For instance, the lack or defect of red or green cones leads to protanopia or deuteranopia, respectively. This can result in partial or complete loss of perception of specific colors. Moreover, individuals with CVD can be dichromats (completely missing one type of cone cell), or anomalous trichromats (having all three types of cones but with altered sensitivity), causing milder color perception issues.

In this work, we aim to consider all three types of CVDs, taking into account varying levels of severity for personalization. We explore existing methods in the field and experiment through two main approaches: mathematical transformations and deep learning techniques. We will start by reviewing current advancements in these two domains, followed by presenting our experiments and results. Evaluations of each method will be provided, leading to a discussion of our findings and outlining potential directions for future work.

The motivation for this work arose from a personal experience. While admiring a rainbow, a phenomenon I find among the most beautiful in the world, with a friend who has deuteranopia, I realized that they were unable to distinguish the vibrant array of colors. This experience highlighted the emotional and perceptual gap caused by CVD, inspiring the goal of this project: to develop personalized and efficient tools that enhance color perception for individuals with CVDs. Ultimately, we aim to enable those with CVDs to enjoy the same vivid experiences—such as the beauty of a rainbow—experienced by those with normal color vision.

== Background ==
- What is known from the literature.

== Methods ==
- Describe techniques you used to measure and analyze. Describe the instruments, and experimental procedures in enough detail so that someone could repeat your analysis. What software did you use? What was the idea of the algorithms and data analysis?

=== Simulation Tools ===

For this project, we have used three simulation tools:

*''' iset3d toolbox''' We used the iset3d toolbox in MATLAB to provide a physically-accurate ray traced rendering of 3D scenes.

*''' PBRT ''' We Ran PBRT in MATLAB to produce physically accurate images by physically based and ray tracing through scenes.

*''' Docker ''' The implementation of iset3d with PBRT is available in a set of Docker containers.

=== Sine Wave: ===

=== Blue Noise: ===

== Results ==
- Organize your results in a good logical order (not necessarily historical order). Include relevant graphs and/or images. Make sure graph axes are labeled. Make sure you draw the reader's attention to the key element of the figure. The key aspect should be the most visible element of the figure or graph. Help the reader by writing a clear figure caption.

== Conclusions ==
- Describe what you learned. What worked? What didn't? Why? What should someone next year try?

== References ==
- List references. Include links to papers that are online.

== Appendix I ==
- Upload source code, test images, etc, and give a description of each link. In some cases, your acquired data may be too large to store practically. In this case, use your judgement (or consult one of us) and only link the most relevant data. Be sure to describe the purpose of your code and to edit the code for clarity. The purpose of placing the code online is to allow others to verify your methods and to learn from your ideas.

== Appendix II ==
- (for groups only) - Work breakdown. Explain how the project work was divided among group members.

Bridget:

Caelia:

Brian:

Personalized Recoloring for Color Vision Deficiency using Deep Learning

2024-12-12T17:54:03Z

Rainas:

== Introduction ==
Image recoloring for individuals with color vision deficiencies (CVDs) has been a well-researched area, with numerous attempts aimed at creating images that make colors more distinguishable for those with CVDs. CVDs are typically classified into three main types: protanopia (difficulty perceiving red), deuteranopia (difficulty perceiving green), and tritanopia (difficulty perceiving blue). Most research has focused on protanopia and deuteranopia, as these conditions are more prevalent. These conditions arise due to the absence or malfunction of certain types of cone cells in the retina, which are responsible for color vision. For instance, the lack or defect of red or green cones leads to protanopia or deuteranopia, respectively. This can result in partial or complete loss of perception of specific colors. Moreover, individuals with CVD can be dichromats—completely missing one type of cone cell—or anomalous trichromats—having all three types of cones but with altered sensitivity, causing milder color perception issues.

In this work, we aim to consider all three types of CVDs, taking into account varying levels of severity for personalization. We explore existing methods in the field and experiment through two main approaches: mathematical transformations and deep learning techniques. We will start by reviewing current advancements in these two domains, followed by presenting our experiments and results. Evaluations of each method will be provided, leading to a discussion of our findings and outlining potential directions for future work.

The motivation for this work arose from a personal experience. While admiring a rainbow—a phenomenon I find among the most beautiful in the world—with a friend who has deuteranopia, I realized that they were unable to distinguish the vibrant array of colors. This experience highlighted the emotional and perceptual gap caused by CVD, inspiring the goal of this project: to develop personalized and efficient tools that enhance color perception for individuals with CVDs. Ultimately, we aim to enable those with CVDs to enjoy the same vivid experiences—such as the beauty of a rainbow—experienced by those with normal color vision.

== Background ==
- What is known from the literature.

== Methods ==
- Describe techniques you used to measure and analyze. Describe the instruments, and experimental procedures in enough detail so that someone could repeat your analysis. What software did you use? What was the idea of the algorithms and data analysis?

=== Simulation Tools ===

For this project, we have used three simulation tools:

*''' iset3d toolbox''' We used the iset3d toolbox in MATLAB to provide a physically-accurate ray traced rendering of 3D scenes.

*''' PBRT ''' We Ran PBRT in MATLAB to produce physically accurate images by physically based and ray tracing through scenes.

*''' Docker ''' The implementation of iset3d with PBRT is available in a set of Docker containers.

=== Sine Wave: ===

=== Blue Noise: ===

== Results ==
- Organize your results in a good logical order (not necessarily historical order). Include relevant graphs and/or images. Make sure graph axes are labeled. Make sure you draw the reader's attention to the key element of the figure. The key aspect should be the most visible element of the figure or graph. Help the reader by writing a clear figure caption.

== Conclusions ==
- Describe what you learned. What worked? What didn't? Why? What should someone next year try?

== References ==
- List references. Include links to papers that are online.

== Appendix I ==
- Upload source code, test images, etc, and give a description of each link. In some cases, your acquired data may be too large to store practically. In this case, use your judgement (or consult one of us) and only link the most relevant data. Be sure to describe the purpose of your code and to edit the code for clarity. The purpose of placing the code online is to allow others to verify your methods and to learn from your ideas.

== Appendix II ==
- (for groups only) - Work breakdown. Explain how the project work was divided among group members.

Bridget:

Caelia:

Brian: