Pokemon Color Transfer
Introduction
Background
Color and Style Transfer
Palette Extraction
Palette extraction is the process of analyzing an image and summarizing its millions of pixel colors into a small, representative set of key colors — known as a color palette. Instead of working directly with every pixel’s RGB value, palette extraction identifies the dominant or most perceptually important colors that define the visual appearance of the image. A palette typically contains only 4–8 colors, yet these colors capture the essential chromatic structure of the image. This compact representation removes noise, eliminates redundant colors, and preserves the underlying style of the original artwork. We implement two palette extraction methods.
K-Means
We use K-Means to cluster pixel values and use its center as palette. To avoid clustering on millions of pixels, we quantize RGB space into 16 bins per channel. For each pixel with RGB value :
For each non-empty bin, we count pixels and compute average LAB value of all pixels in that bin. To avoid randomness in K-Means, the method uses a weighted farthest-point initialization. We select the bin with the largest weight as the first center. For each new center, compute squared distance from existing centers and apply attenuation:
We then pick the bin with the largest attenuated weight. Each histogram bin with LAB color and weight is assigned to the nearest center. The center update rule is:
Black and white anchors remain fixed. The convergence criterion is:
Blind Separation Palette Extraction (BSS-LLE Method)
This second method treats palette extraction as a blind unmixing problem with spatial smoothness constraints. It is computationally more expensive but yields globally coherent palettes.
Each pixel forms a 5-D feature vector:
where: ∈ , are normalized coordinates, controls spatial smoothness. For each pixel, we find nearest neighbors and compute LLE weights by solving:
Construct Laplacian:
We assume each pixel’s color can be expressed as a mixture of palette colors:
Where are mixture weights, and are palette colors. We minimize:
The optimization uses alternating minimization:
1. Update W (closed-form linear system) 2. Hard-threshold W to enforce sparsity 3. Update C by solving
4. Increase β to gradually enforce sparsity (continuation method)
The learned palette may lie off-manifold. Thus each palette color is replaced by the mean of its nearest real RGB pixels. This ensures interpretability and consistent color reproduction.
Final Output
Both methods return a palette:
Methods
In this section, we describe the methods we use to transfer color between pokemons.
Baseline: Palette Based Random Transfer
Palette-based random transfer works by first extracting a compact color palette from each image, then randomly matching colors between the two palettes to generate a playful and diverse recoloring. Instead of enforcing a strict one-to-one correspondence or optimizing for perceptual similarity, the method randomly permutes or samples palette colors and maps all pixels associated with a source palette color to a randomly chosen target palette color. This allows the transferred image to preserve structural details while producing vivid, surprising, and stylistically varied recolorings. Because it operates only on palette colors rather than individual pixels, palette-based random transfer is fast, interpretable, and ideal for generating creative variations in tasks like Pokemon color stylization. We use Palette-based random transfer as the baseline.
Neighbor Segments
Neighbor Segments (NS) Method groups pixels into local, perceptually coherent regions and uses neighborhood relationships to guide smooth and consistent recoloring. Instead of treating each pixel independently, the image is first segmented into small regions (superpixels or color-coherent clusters), where each segment represents a set of spatially adjacent pixels with similar color statistics. Let the image be segmented into , where each segment contains pixels with similar color features.
These segments form a neighborhood graph , where an undirected edge indicates that the two segments touch in the spatial domain. The adjacency matrix of the graph satisfies:
During color transfer, each segment is assigned one palette color. Let be the target palette, and let denote the transferred color assigned to segment . The NS method encourages smoothness by minimizing a neighborhood-consistency energy:
where is a weight encoding the similarity of the segments (e.g., based on LAB difference or boundary strength). This term penalizes large color differences between adjacent segments, preventing abrupt color transitions or blocky artifacts.
At the same time, the transferred color for each segment should remain close to its mapped palette color determined by the palette mapping rule:
The final transferred colors are obtained by minimizing the combined objective:
where controls the strength of neighborhood smoothing.
This neighborhood-aware propagation allows the algorithm to maintain structural consistency, preserve texture boundaries, and generate recolorings that are both stable and visually coherent. Overall, Neighbor Segments provides a lightweight way to incorporate spatial smoothness into palette-based transfer, producing natural transitions while keeping computation efficient.
Neighbor Segments with Superpixel
Results
| Metric | Baseline | Clustering | Clustering-NP | Convex Hall | NS | NS-S |
|---|---|---|---|---|---|---|
| FID | 1 | 2 | 3 | 4 | 5 | 6 |
| Histogram Similarity | 1 | 2 | 3 | 4 | 5 | 6 |
| CIELAB | 1 | 2 | 3 | 4 | 5 | 6 |
| CIE94 | 1 | 2 | 3 | 4 | 5 | 6 |
| CIEDE2000 | 1 | 2 | 3 | 4 | 5 | 6 |
| SSIM | 1 | 2 | 3 | 4 | 5 | 6 |
| VGG Latent Space Distance | 1 | 2 | 3 | 4 | 5 | 6 |
Conclusions
Appendix I
Appendix II
Wenxiao Cai:
Yifei Deng: