Pokemon Color Transfer

From Psych 221 Image Systems Engineering
Jump to navigation Jump to search

Introduction

Color transfer is a technique that modifies the color style of an image by borrowing or reassociating colors from another source while preserving the original spatial structure and content. By separating “what the image is” from “how the image is colored,” color transfer enables creative style manipulation without altering shapes, outlines, or semantic meaning. It has been widely used in artistic stylization, recoloring, and visual design tasks due to its ability to generate visually coherent outputs from simple palette or distribution transformations.

In the context of Pokemon artwork, color transfer offers a powerful and intuitive way to design new visual styles. Pokemon are typically drawn using a small number of distinctive colors that define their character identity (e.g., Charmander’s warm oranges or Squirtle’s blues). By transferring color styles between Pokemon, we can create novel, aesthetically pleasing variations that feel natural and consistent with the Pokemon universe, while still preserving the original line art and recognizable features. In this project, we focus on two primary goals:

a. Aesthetic quality — the recolored Pokemon should look visually pleasing and harmonious.

b. Content preservation — the recolored image should remain structurally similar to the original, with no distortions to shapes, edges, or important details.

To achieve these goals, we apply palette-based color transfer methods to generate new Pokemon styles and systematically evaluate the results. Aesthetic quality is assessed through a human-preference survey questionnaire, while similarity to the original image is measured using quantitative metrics such as FID, SSIM, and other perceptual distance measures. Together, these evaluations allow us to study both the creativity and fidelity of color-transferred Pokemon, providing a balanced assessment of stylistic expressiveness and structural preservation.

Background

Color and Style Transfer

Palette Extraction

Palette extraction is the process of analyzing an image and summarizing its millions of pixel colors into a small, representative set of key colors — known as a color palette. Instead of working directly with every pixel’s RGB value, palette extraction identifies the dominant or most perceptually important colors that define the visual appearance of the image. A palette typically contains only 4–8 colors, yet these colors capture the essential chromatic structure of the image. This compact representation removes noise, eliminates redundant colors, and preserves the underlying style of the original artwork. We implement two palette extraction methods.

K-Means

We use K-Means to cluster pixel values and use its center as palette. To avoid clustering on millions of pixels, we quantize RGB space into 16 bins per channel. For each pixel with RGB value (r,g,b):

(r16,g16,b16)

For each non-empty bin, we count pixels and compute average LAB value of all pixels in that bin. To avoid randomness in K-Means, the method uses a weighted farthest-point initialization. We select the bin with the largest weight as the first center. For each new center, compute squared distance from existing centers and apply attenuation:

wi=wi(1edi2/σa2)

We then pick the bin with the largest attenuated weight. Each histogram bin i with LAB color xi and weight wi is assigned to the nearest center. The center update rule is:

Cj=ijwixiijwi

Black and white anchors remain fixed. The convergence criterion is:

CnewCold2<104

Blind Separation Palette Extraction (BSS-LLE Method)

This second method treats palette extraction as a blind unmixing problem with spatial smoothness constraints. It is computationally more expensive but yields globally coherent palettes.

Each pixel forms a 5-D feature vector:

X=[R,G,B,αx,αy]

where: (R,G,B)[0,1], (x,y) are normalized coordinates, α controls spatial smoothness. For each pixel, we find K=30 nearest neighbors and compute LLE weights by solving:

Gw=𝟏,G=(XjXi)(XjXi)+ϵI

Construct Laplacian:

L=IZ,LTL

We assume each pixel’s color can be expressed as a mixture of m palette colors:

I=WC

Where WN×m are mixture weights, and Cm×3 are palette colors. We minimize:

IWC2+λsWZW2+λuW𝟏𝟏2+λspW0

The optimization uses alternating minimization:

1. Update W (closed-form linear system) 2. Hard-threshold W to enforce sparsity 3. Update C by solving

(WTW)C=WTI

4. Increase β to gradually enforce sparsity (continuation method)

The learned palette may lie off-manifold. Thus each palette color is replaced by the mean of its nearest real RGB pixels. This ensures interpretability and consistent color reproduction.

Final Output

Both methods return a palette:

{(R1,G1,B1),,(Rm,Gm,Bm)}.

Methods

In this section, we describe the methods we use to transfer color between pokemons.

Baseline: Palette Based Random Transfer

Palette-based random transfer works by first extracting a compact color palette from each image, then randomly matching colors between the two palettes to generate a playful and diverse recoloring. Instead of enforcing a strict one-to-one correspondence or optimizing for perceptual similarity, the method randomly permutes or samples palette colors and maps all pixels associated with a source palette color to a randomly chosen target palette color. This allows the transferred image to preserve structural details while producing vivid, surprising, and stylistically varied recolorings. Because it operates only on palette colors rather than individual pixels, palette-based random transfer is fast, interpretable, and ideal for generating creative variations in tasks like Pokemon color stylization. We use Palette-based random transfer as the baseline.

Neighbor Segments

Neighbor Segments (NS) Method groups pixels into local, perceptually coherent regions and uses neighborhood relationships to guide smooth and consistent recoloring. Instead of treating each pixel independently, the image is first segmented into small regions (superpixels or color-coherent clusters), where each segment represents a set of spatially adjacent pixels with similar color statistics. Let the image be segmented into S=s1,s2,,sK, where each segment sk contains pixels with similar color features. These segments form a neighborhood graph G=(S,E), where an undirected edge (si,sj)E indicates that the two segments touch in the spatial domain. The adjacency matrix A of the graph satisfies: Aij=1,if(si,sj)E

During color transfer, each segment is assigned one palette color. Let P=p1,p2,,pm be the target palette, and let ck denote the transferred color assigned to segment sk. The NS method encourages smoothness by minimizing a neighborhood-consistency energy:

Esmooth=(i,j)Ewijcicj2,

where wij is a weight encoding the similarity of the segments (e.g., based on LAB difference or boundary strength). This term penalizes large color differences between adjacent segments, preventing abrupt color transitions or blocky artifacts. At the same time, the transferred color for each segment should remain close to its mapped palette color pπ(k) determined by the palette mapping rule:

Edata=k=1Kckpπ(k)2.

The final transferred colors are obtained by minimizing the combined objective:

E=Edata+λEsmooth,

where λ controls the strength of neighborhood smoothing. This neighborhood-aware propagation allows the algorithm to maintain structural consistency, preserve texture boundaries, and generate recolorings that are both stable and visually coherent. Overall, Neighbor Segments provides a lightweight way to incorporate spatial smoothness into palette-based transfer, producing natural transitions while keeping computation efficient.

Palette-aware cluster transfer

Palette-aware cluster transfer aligns the two images’ palettes first, then propagates that mapping to all pixels using soft, per-pixel weights. Let the two extracted palettes be PA=p1A,,pKA and PB=p1B,,pKB in CIE–Lab. We compute a one-to-one correspondence via the Hungarian algorithm by minimizing total Lab distance: π*=argminπSKi=1K|,piApπ(i)B,|2. This yields a permutation π* that best matches palette colors across images.

For each pixel x in image A, we compute soft memberships to A’s palette using a temperatured softmax in Lab: wi(x)=exp(|Lab(x)piA|22/τ)j=1Kexp(|Lab(x)pjA|22/τ),i=1Kwi(x)=1. For each matched pair, define the palette-to-palette shift in HSV: Δi=HSV(pπ*(i)B)HSV(piA), with hue wrapped to (12,,12] (in normalized units). The pixel’s HSV is updated by a weighted blend of these shifts, HSV(x)=HSV(x)+i=1Kwi(x),Δi, followed by clipping to valid ranges and conversion back to RGB. The same procedure recolors image B by swapping the roles of PA and PB.

Cluster Transfer without Palette

This method discovers palette-like color regions automatically by clustering each image in joint color–space and then transfers colors by matching those clusters across the two images.

Cluster discovery. For each image, we build Lab+XY features for every pixel, 𝐟(x)=[L(x),,a(x),,b(x),,α,x/W,,α,y/H], where (x,y) is the pixel coordinate, (W,H) are image dimensions, and α (the xy_scale) balances spatial locality versus color. K-means on these features yields K clusters Ckk=1K. For each cluster we store:

mean chroma 𝝁kab2,

chroma covariance 𝜮kab2×2,

spatial centroid 𝒎kxy, and area Ak=|Ck|.

Cluster matching. Let superscripts A and B denote the two images. We define a matching cost between cluster i in A and cluster j in B: cost(i,j)=λc|𝝁iab,A𝝁jab,B|22λs|𝒎ixy,A𝒎jxy,B|22λa|AiANAAjBNB|,

where λc,λs,λa weight color, spatial proximity, and relative area, and NA,NB are pixel counts. A one-to-one correspondence π* is obtained by minimizing the total cost with the Hungarian algorithm.

Per-pixel soft assignment. Each pixel x in image A gets soft memberships to A’s clusters using a temperatured softmax in chroma (ab) space: wi(x)=exp!(|,[a(x),b(x)]𝝁iab,A|22/τ)k=1Kexp!(|,[a(x),b(x)]𝝁kab,A|22/τ),iwi(x)=1, where τ (soft_tau) controls how sharply pixels commit to a single cluster (larger → crisper; smaller → smoother blends).

Cluster-pair color mapping. For each matched pair (i,π*(i)) we define either:

a mean-shift in chroma: Δabi=𝝁ab,Bπ*(i)𝝁iab,A, or

an optional affine mapping in chroma (when use_affine = True): 𝐀i=(𝜮ab,Bπ*(i))12(𝜮iab,A)12,𝐛i=𝝁ab,Bπ*(i)𝐀i,𝝁iab,A.

Pixel update and reconstruction. For a pixel x in A with chroma vector 𝐳(x)=[a(x),b(x)], we blend the cluster-pair mappings:

mean-shift mode: 𝐳(x)=𝐳(x)+i=1Kwi(x),Δiab,

affine mode: 𝐳(x)=i=1Kwi(x),(𝐀i,𝐳(x)+𝐛i). We preserve lightness (L) to keep shading/speculars intact and form Lab(x)=[L(x),𝐳(x)], then convert to RGB with clipping. The same symmetric procedure recolors image B using clusters discovered in B and matched to A.

Notes & knobs.

K (n_clusters) trades detail for stability (4–8 for flat sprites; 8–12 for photos).

xy_scale (α) increases spatial locality (reduces distant same-color mixing).

soft_tau tunes edge sharpness vs. smooth transitions.

use_affine helps when cluster contrast/chroma shapes differ (photos), but for flat art mean-shift is often cleaner.

A fixed seed gives reproducible K-means partitions.

Neighbor Segments with Superpixel (NS-S)

Neighbor Segments with Superpixels (NS-S) method extends palette-based color transfer by incorporating spatially coherent superpixel regions. Instead of operating on individual pixels, the image is first partitioned into a set of superpixels, each representing a compact region of adjacent pixels with similar color and texture characteristics. Let the superpixel segmentation produce S={s1,s2,,sK}, where each superpixel sk is treated as a unit for color assignment. A neighborhood graph G=(S,E) is constructed, where an edge (si,sj)E indicates that two superpixels touch or share a boundary in the image plane. The adjacency matrix is defined as:

Aij={1,(si,sj)E,0,otherwise.

Given a target palette P={p1,,pm}, each superpixel sk receives a transferred color ck, determined by a palette-matching function π(k) (e.g., hard assignment, soft matching, or nearest palette color).

To ensure smooth and visually coherent recoloring across the image, the NS-SP method minimizes the following energy:

E=k=1Kckpπ(k)2Edata+λ(i,j)Ewijcicj2Esmooth,

where:

  • Edata encourages each superpixel to adopt its intended palette color,
  • Esmooth enforces spatial smoothness between adjacent superpixels,
  • wij weights the similarity between boundary segments (e.g., based on color difference, gradient magnitude, or LAB distance),
  • λ controls the strength of neighborhood smoothing.

By working at the superpixel level, NS-SP effectively reduces noise, prevents pixel-level flicker, and maintains clean boundaries between meaningful regions. The neighborhood smoothing further prevents abrupt color jumps, producing recolored images that maintain Pokémon shapes, shading, and visual consistency while still allowing strong palette-driven stylistic changes.

Overall, the NS-SP formulation provides a computationally efficient and perceptually stable approach for palette-based color transfer, balancing stylization with structural fidelity.

Results

Table 1: Quantitative Evaluation Results on Bulbasaur and Squirtle
Metric Palette Dif Clustering Clustering-NP Convex Hall NS NS-S
FID 689.24/564.47 106.22/110.64 54.29/61.61 96.12/66.19 107.60/108.85 75.15/42.77
Histogram-Corr 0.91/0.96 0.91/0.96 0.00/0.96 0.87/0.96 0.91/0.96 0.90/0.96
Histogram-Chi2 2.76/1.78 8.97/2.35 930.03/2.62 23.28/2.02 8.35/2.39 6.38/9.97
CIELAB 19.86/16.00 18.32/14.88 14.22/11.20 13.03/8.76 18.33/14.63 16.88/15.51
CIE94 15.19/12.44 13.44/11.13 9.72/8.52 8.78/6.71 13.45/10.98 12.20/11.76
CIEDE2000 14.33/12.07 13.10/10.97 10.34/9.17 8.77/7.14 13.11/10.81 11.83/11.20
SSIM 0.81/0.87 0.96/0.95 0.99/0.99 0.98/0.99 0.96/0.96 0.97/0.95
VGG Latent Space Distance 36.80/36.46 15.46/21.98 16.40/22.27 13.73/18.18 15.76/22.05 17.29/18.31
Figure 1: Results of Bulbasaur and Squirtle

Conclusions

Appendix I

Appendix II

Wenxiao Cai:


Yifei Deng: