Pokemon Color Transfer: Difference between revisions

From Psych 221 Image Systems Engineering
Jump to navigation Jump to search
Wxcai (talk | contribs)
Wxcai (talk | contribs)
Line 2: Line 2:




== Background ==  
== Background ==


=== Palette Extraction ===
Palette extraction is the process of analyzing an image and summarizing its millions of pixel colors into a small, representative set of key colors — known as a color palette. Instead of working directly with every pixel’s RGB value, palette extraction identifies the dominant or most perceptually important colors that define the visual appearance of the image. A palette typically contains only 4–8 colors, yet these colors capture the essential chromatic structure of the image. This compact representation removes noise, eliminates redundant colors, and preserves the underlying style of the original artwork. We implement two palette extraction methods.
==== K-Means ====
We use K-Means to cluster pixel values and use its center as palette. To avoid clustering on millions of pixels, we quantize RGB space into 16 bins per channel. For each pixel with RGB value <math>(r,g,b)</math>:
<math display="block">
\left(\left\lfloor \frac{r}{16} \right\rfloor,\; \left\lfloor \frac{g}{16} \right\rfloor,\; \left\lfloor \frac{b}{16} \right\rfloor\right)
</math>
For each non-empty bin, we count pixels and compute average LAB value of all pixels in that bin. To avoid randomness in K-Means, the method uses a weighted farthest-point initialization. We select the bin with the largest weight as the first center. For each new center, compute squared distance from existing centers and apply attenuation:
<math display="block">
w_i' = w_i \cdot \left(1 - e^{-d_i^2 / \sigma_a^2}\right)
</math>
We then pick the bin with the largest attenuated weight. Each histogram bin <math>i</math> with LAB color <math>x_i</math> and weight <math>w_i</math> is assigned to the nearest center. The center update rule is:
<math display="block">
C_j = \frac{\sum_{i \in j} w_i \cdot x_i}{\sum_{i \in j} w_i}
</math>
Black and white anchors remain fixed. The convergence criterion is:
<math display="block">
\| C_{\text{new}} - C_{\text{old}} \|^2 < 10^{-4}
</math>
=== Blind Separation Palette Extraction (BSS-LLE Method) ===
This second method treats palette extraction as a blind unmixing problem with spatial smoothness constraints. It is computationally more expensive but yields globally coherent palettes.
Each pixel forms a 5-D feature vector:
<math display="block">
X = [R,\; G,\; B,\; \alpha x,\; \alpha y]
</math>
where: <math>(R,G,B)</math> ∈ <math>[0,1]</math>,  <math>(x,y)</math> are normalized coordinates,  <math>\alpha</math> controls spatial smoothness. For each pixel, we find <math>K = 30</math> nearest neighbors and compute LLE weights by solving:
<math display="block">
G w = \mathbf{1}, \qquad
G = (X_j - X_i)(X_j - X_i)^\top + \epsilon I
</math>
3. Normalize:
<math display="block">
\sum_j w_j = 1
</math>
This defines a sparse weight matrix <math>Z</math>.
Construct Laplacian:
<math display="block">
L = I - Z, \qquad L^T L
</math>
We assume each pixel’s color can be expressed as a mixture of <math>m</math> palette colors:
<math display="block">
I = W C
</math>
Where <math>W \in \mathbb{R}^{N \times m}</math> are mixture weights, and <math>C \in \mathbb{R}^{m \times 3}</math> are palette colors. We minimize:
<math display="block">
\|I - W C\|^2 +
\lambda_s \|W - ZW\|^2 +
\lambda_u \|W\mathbf{1} - \mathbf{1}\|^2 +
\lambda_{sp} \|W\|_0
</math>
The optimization uses alternating minimization:
1. '''Update W''' (closed-form linear system) 
2. '''Hard-threshold W''' to enforce sparsity 
3. '''Update C''' by solving 
<math display="block">
(W^T W)C = W^T I
</math>
4. '''Increase β''' to gradually enforce sparsity (continuation method)
The learned palette may lie off-manifold. Thus each palette color is replaced by the mean of its nearest real RGB pixels. This ensures interpretability and consistent color reproduction.
==== Final Output ====
Both methods return a palette:
<math display="block">
\{(R_1,G_1,B_1), \dots, (R_m,G_m,B_m)\}.
</math>


== Methods ==  
== Methods ==  

Revision as of 06:12, 22 November 2025

Introduction

Background

Palette Extraction

Palette extraction is the process of analyzing an image and summarizing its millions of pixel colors into a small, representative set of key colors — known as a color palette. Instead of working directly with every pixel’s RGB value, palette extraction identifies the dominant or most perceptually important colors that define the visual appearance of the image. A palette typically contains only 4–8 colors, yet these colors capture the essential chromatic structure of the image. This compact representation removes noise, eliminates redundant colors, and preserves the underlying style of the original artwork. We implement two palette extraction methods.

K-Means

We use K-Means to cluster pixel values and use its center as palette. To avoid clustering on millions of pixels, we quantize RGB space into 16 bins per channel. For each pixel with RGB value (r,g,b):

(r16,g16,b16)

For each non-empty bin, we count pixels and compute average LAB value of all pixels in that bin. To avoid randomness in K-Means, the method uses a weighted farthest-point initialization. We select the bin with the largest weight as the first center. For each new center, compute squared distance from existing centers and apply attenuation:

wi=wi(1edi2/σa2)

We then pick the bin with the largest attenuated weight. Each histogram bin i with LAB color xi and weight wi is assigned to the nearest center. The center update rule is:

Cj=ijwixiijwi

Black and white anchors remain fixed. The convergence criterion is:

CnewCold2<104

Blind Separation Palette Extraction (BSS-LLE Method)

This second method treats palette extraction as a blind unmixing problem with spatial smoothness constraints. It is computationally more expensive but yields globally coherent palettes.

Each pixel forms a 5-D feature vector:

X=[R,G,B,αx,αy]

where: (R,G,B)[0,1], (x,y) are normalized coordinates, α controls spatial smoothness. For each pixel, we find K=30 nearest neighbors and compute LLE weights by solving:

Gw=𝟏,G=(XjXi)(XjXi)+ϵI

3. Normalize:

jwj=1

This defines a sparse weight matrix Z.

Construct Laplacian:

L=IZ,LTL

We assume each pixel’s color can be expressed as a mixture of m palette colors:

I=WC

Where WN×m are mixture weights, and Cm×3 are palette colors. We minimize:

IWC2+λsWZW2+λuW𝟏𝟏2+λspW0

The optimization uses alternating minimization:

1. Update W (closed-form linear system) 2. Hard-threshold W to enforce sparsity 3. Update C by solving

(WTW)C=WTI

4. Increase β to gradually enforce sparsity (continuation method)

The learned palette may lie off-manifold. Thus each palette color is replaced by the mean of its nearest real RGB pixels. This ensures interpretability and consistent color reproduction.

Final Output

Both methods return a palette:

{(R1,G1,B1),,(Rm,Gm,Bm)}.

Methods

In this section, we describe the methods we use to transfer color between pokemons.

Baseline: Palette Based Random Transfer

Neighbor Segments

Neighbor Segments with Superpixel

Results

Table 1: Quantitative Evaluation Results on Bulbasaur and Squirtle
Metric Baseline Clustering Clustering-NP Convex Hall NS NS-S
FID 1 2 3 4 5 6
Histogram Similarity 1 2 3 4 5 6
CIELAB 1 2 3 4 5 6
CIE94 1 2 3 4 5 6
CIEDE2000 1 2 3 4 5 6
SSIM 1 2 3 4 5 6
VGG Latent Space Distance 1 2 3 4 5 6

Conclusions

Appendix I

Appendix II

Wenxiao Cai:


Yifei Deng: