Back to Psych 221 Projects 2013

Introduction

The purpose of the camera forensics project is to automatically detect whether an image, that was produced by CFA interpolation, was tampered with. CFA interpolation is used by digital camera to generate digital images. The interpolation will result in specific statistical patterns in the pixels of an image, which can then be utilized to determine whether or not an image has been altered.

Background

What is CFA Interpolation?

When digital cameras capture images, it saves the output of a single sensor after passing through a color filter array (CFA). Every single pixel of a color image is composed of three color channels, red, green, and blue. However, the camera is only able to sample a single color channel per pixel, so to fully form a colored image, the values of the other two colors will have to be estimated. Various interpolation techniques are used to estimate the missing samples.

Different Types of CFA Interpolation Techniques

A common CFA is the Bayer Array. Colors are sampled according to the pattern in Fig. 1. As shown, the green color channel is sampled at twice the rate of the red and blue channels. All of the interpolation methods below assume that the image was captured through a Bayer Array.
Let ${\tilde {R}}\left(x,y\right)$ , ${\tilde {G}}\left(x,y\right)$ , and ${\tilde {B}}\left(x,y\right)$ represent the CFA sampled versions of the three color channels.

Bilinear/Bicubic

Bilinear and bicubic are the simplest interpolation algorithms. Each color channel is interpolated independently by using a 2D linear filter $h\left(x,y\right)$ on ${\tilde {R}}\left(x,y\right)$ , ${\tilde {G}}\left(x,y\right)$ , and ${\tilde {B}}\left(x,y\right)$ .
$R\left(x,y\right)={\tilde {R}}\left(x,y\right)$ for $x,y$ such that ${\tilde {R}}\left(x,y\right)\neq 0$ . Otherwise, $R(x,y)=h*{\tilde {R}}(x,y)$ . The same is done for the green and blue channels. For bilinear interpolation, a 3 by 3 filter is used and for bicubic interpolation, a 7 by 7 filter is used.

Smooth Hue Transition

One downfall of bilinear/bicubic interpolation is that neighboring pixels may differ significantly in value, which is unlikely in natural images.
Since there are twice as many green samples as red and blue, the missing samples in the green color channel are first bilinearly interpolated as described above. After interpolating the green channel, the red and blue channels can be estimated by bilinearly interpolating the ratio ${\tilde {R}}(x,y)/G(x,y)$ and ${\tilde {B}}(x,y)/G(x,y)$ respectively.

Median Filter

${\tilde {R}}(x,y)$ , ${\tilde {G}}(x,y)$ , and ${\tilde {B}}(x,y)$ are first interpolated. Then $M_{rg}\left(x,y\right)$ , $M_{rb}\left(x,y\right)$ , and $M_{gb}\left(x,y\right)$ are computed by taking pairwise differences of the interpolated color channels and filtering with a median filter.
At each pixel, only one color channel was originally sampled. To obtain the estimates for the other two channels, we look at the sum or difference between the original color sample and the corresponding median filtered point. For example, if at pixel $\left(x,y\right)$ , the green channel was sampled with a value of $g_{x,y}$ . Then we can estimate the red channel by $R\left(x,y\right)=g_{x,y}+M_{rg}\left(x,y\right)$ and the blue channel by $B\left(x,y\right)=g_{x,y}-M_{gb}\left(x,y\right)$ . The missing values are estimated similarly for pixels where the red or blue channel was sampled.

Gradient Based

This method attempts to preserve edges by preventing interpolation across edges. First, the horizontal and vertical derivates estimates for each pixel is computed and used in conjunction with the green color channel samples to estimate $G\left(x,y\right)$ by adaptively interpolating. Then the red color channel is then estimated by bilinearly interpolating the difference between red color channel samples and the green color channel estimates. The same can be done to estimate the blue color channel.

Adaptive Color Plane

The adaptive color plane is a modification of the gradient based interpolation method. It uses information from both the first and second derivatives.

Threshold-Based Variable Number of Gradients

The gradient estimates are calculated in eight different directions (N,E, S, W, NE, NW, SE, SW). Depending on whether the gradient value is above or below a threshold, the missing color samples are estimated based on a linear combination of its neighboring values given by equations described here.

Methods

We determine if an image is CFA interpolated by looking at the relationship between neighboring pixels and comparing this relationship to synthetic data. In particular, we determine the probability that each pixel in an image block has been interpolated. If the correlation between pixel probabilities is similar to the correlations we expect for a CFA interpolated image, then we can assume that our image is CFA interpolated. The following sections detail each step of this process.

Local Tampering

Tampering may occur in any portion of an image. To detect the location of tampered regions, we break input images into smaller blocks and independently run our algorithm on each section.

Expectation/Maximization Algorithm

To determine the probability that each pixel in an image block has been interpolated, we use the Expectation/Maximization (EM) algorithm. First, we assume that each pixel in the image has been CFA interpolated with a simple interpolation:

f(x,y)=\sum _{(u,v)=-N}^{N}\alpha _{u,v}f(x+u,y+v)+n(x,y)

where $f(x,y)$ is the intensity matrix for one of our channels, $N$ is the neighborhood we assume for interpolation, $\alpha {u,v}$ are the interpolation coefficients, and $n(x,y)$ is i.i.d. $N(0,\sigma ^{2})$ with $\sigma ^{2}$ unknown.

We use the EM algorithm because we do not know:

1. The probability that each pixel is interpolated

2. The coefficients used in the linear interpolation

However, the iterative approach of the EM algorithm allows us to begin with a random guess and converge towards a meaningful answer for both unknowns.

E-Step

In the first step, the EM algorithm assumes the interpolation coefficients are known. Then, it can calculate the probability that each pixel is interpolated using Bayes' Rule:

Pr\{f(x,y)\in M_{1}|f(x,y)\}={\frac {Pr\{f(x,y)\in M_{1}\}Pr\{f(x,y)\in M_{1}\}}{\sum _{i=1}^{2}Pr\{f(x,y)|f(x,y)\in M_{i}\}Pr\{f(x,y)\in M_{i}\}}}

where $M_{1}$ is the interpolated class and $M_{2}$ is the non-interpolated class. The priors are assumed to have equal probability, ie:

Pr\{f(x,y)\in M_{1}\}=Pr\{f(x,y)\in M_{2}\}=1/2

.

Since we assumed zero mean Gaussian noise, the probability of the pixel value given it's interpolated is:

Pr\{f(x,y)|f(x,y)\in M_{1}\}={\frac {1}{\sigma {\sqrt {2\pi }}}}exp\left[-{\frac {1}{2\sigma ^{2}}}\left(f(x,y)-\sum _{u,v=-N}^{N}\alpha _{u,v}f(x+u,y+v)\right)^{2}\right]

The probability of a pixel value given it is not interpolated is uniform over all possible values (typically $[0,255]$ ).

M-Step

In the second step, the EM algorithm assumes the probabilities that pixels are interpolated are known. Then, it can calculate the interpolation coefficients by minimizing the weighted error between the assumed interpolation and the actual pixel values:

E({\vec {\alpha }})=\sum _{x,y}w(x,y)\left(f(x,y)-\sum _{u,v=-N}^{N}\alpha _{u,v}f(x+u,y+v)\right)^{2}

where $E({\vec {\alpha }})$ is the weighted error, and $w(x,y)$ are weights. These weights correspond to the pixel interpolation probabilities:

w(x,y)=Pr\{f(x,y)\in M_{1}|f(x,y)\}

Minimization of the weighted error function can be computed by setting the partial derivative with respect to each interpolation coefficient to zero and solving the corresponding linear system of equations. It can be shown that each partial derivative yields an equation of the form:

\sum _{u,v=-N}^{N}\alpha _{u,v}\left(\sum _{x,y}w(x,y)f(x+s,y+t)f(x+u,y+v)\right)=\sum _{x,y}w(x,y)f(x+s,y+t)f(x,y)

for each $\alpha _{s,t}$ . The solution to this linear system of equations can be found with standard techniques.

Convergence

The EM algorithm runs until the interpolation coefficients in successive rounds do not differ by more than some threshold, $\tau$ .

Probability Map and Its Fourier Transform

From the EM algorithm, we compute the probability map of each block of the image. However, these probability maps, as is, do not give a clear indication of whether a block is CFA interpolated (Fig. 3, Fig. 4).

Fourier Transform

Taking the two-dimension Fourier transform of the probability map, we can more easily visualize any periodicities in the probability maps. CFA interpolated images that have not gone through tampering will have visible peaks in the Fourier transform of the probability map, which is the result of the correlation between pixels.

Figures 5, 6 illustrate the clear differences between CFA and non-CFA interpolated image blocks. CFA interpolated blocks have a few distinct peaks whereas non-CFA interpolated blocks tend to have much more scattered Fourier coefficients.

Thresholding for Determining Fake Images

After computing the probability map for each block over the image, we compute the cosine similarity between the Fourier transform of the probability map and the Fourier transform of a synthetic probability map. We do this for each of the three color channels.

Before we apply thresholding, we try to normalize the data by using two different normalization techniques:

Normalize by the max similarity value in each block.
Normalize by the mean similarity value in each block.

After acquiring the normalized similarity values, the threshold for each channel is determined empirically by iterating over different combinations of $thresh_{r}$ , $thresh_{g}$ , and $thresh_{b}$ to minimize the error defined by: $Error=\alpha \left[\#false\ positive\right]+\left[\#false\ negative\right]$ , where a false positives is defined as a fake image wrongly labeled as real and a false negative is a real image wrongly labeled as fake. The term $\alpha$ is used to weigh the contribution of the number of false positive and false negative in our error sum term. For $\alpha$ greater than 1, false positives are penalized more than false negatives. Vice versa is true for $\alpha$ smaller than 1.

An image is labeled as fake if there exists a block such that the normalized similarity value for all three channels is below the corresponding threshold value. The image is labeled as real if for all blocks, there is at least one color channel such that the similarity value is above the corresponding threshold. This thresholding method helps reduce the number false positives.

Results

Running the algorithm on an image, we obtain the following similarity scores. Figure 5. is the similarity scores for RGB channels for a fake image, Figure 6 is the similarity score for a real image.

Figure 5. Similarity values of a fake image using 256 x 256 windows

Figure 6. Similarity values of a real image using 256 x 256 windows

Data Set

Given a set of 70 pairs of images (real images and their corresponding fakes), we used 50 pairs of images as training data, to find the threshold values for each channel. The remaining 20 pairs are used to test the performance of the algorithm.

Error Rates

For the values below, $\alpha =1.7$ .

Normalization Techniques

Unnormalized: uses raw similarity values.
Max Normalized: normalize by the max similarity value in each block before finding and applying thresholds.
Mean Normalized: normalize by the mean similarity value in each block before finding and applying thresholds.

Unnormalized

Block size: 64x64

Error Rates
	True	False
Positive	100%	0%
Negative	80%	20%

Thresholds used [0.1200 0.1200 0.1300].
Total error: 10%

Max Normalized

Block size of 64x64.

Error Rates
	True	False
Positive	85%	15%
Negative	80%	20%

Thresholds used [0.1750 0.1750 0.1750].
Total error: 17.5%.

Mean Normalized

Block size: 64x64.

Error Rates
	True	False
Positive	95%	5%
Negative	65%	35%

Threshold used [0.3700 0.4400 0.5700]
Total error: 20%

Finding and applying thresholds to the unnormalized data performed the best and is the method used in all of the error calculations below.

Varying Block Size

Normalization method: Unnormalized.

Error Rates
	256x256	128x128	64x64
True Positive	55%	80%	100%
True Negative	65%	75%	80%
False Positive	45%	20%	0%
False Negative	35%	25%	20%
Total Error	40%	22.5%	10%

256x256 thresholds used [0.4500 0.5400 0.3600].
128x128 thresholds used [0.2100 0.2600 0.2500].
64x64 thresholds used [0.1200 0.1200 0.1300].

As seen above, using smaller block sizes improves the performance of the detection algorithm.

JPEG Compressed

Block size: 128x128
Normalization method: Unnormalized.

Error Rates
	Quality 100	Quality 90	Quality 75
True Positive	30%	30%	30%
True Negative	90%	90%	90%
False Positive	70%	70%	70%
False Negative	10%	10%	10%
Total Error	40%	40%	40%

Thresholds used [0.2100 0.2600 0.2500].

The detection algorithm was able to classify negative images fairly accurately but failed to correctly classify positive images. However, at increasingly lower quality levels of JPEG compression, the detection algorithm did not change in performance. This shows that even though the detection algorithm's performs decreases for JPEG compressed images, between the quality levels of 75-100, the difference in quality does not greatly affect the detection performance.

Cross Validation

Scores for first 100:
Threshold: [0.3500 0.1000 0.1000]
False Positive: 0.0.8
False Negative: 0.1
Score for test of last 40:
Threshold: [0.3500 0.1000 0.1000]
False Positive: 0.1500
False Negative: 0.2000

Scores for last 100:
Threshold: [0.1000 0.1500 0.1000]
False Positive: 0.1200
False Negative: 0.0600
Score for test of first 40:
Threshold: [0.1000 0.1500 0.1000]
False Positive: 0.1500
False Negative: 0.1000

Error Rates
	First 50	Last 20	Last 50	First 20
True Positive	92%	85%	88%	85%
True Negative	90%	80%	94%	90%
False Positive	8%	15%	12%	15%
False Negative	10%	20%	6%	10%
Total Error	9%	17.5%	9%	12.5%

Test Set Results

The following is the results of our detection algorithm on the 20 images test set.

Block size: 64x64

Error Rates
	True	False
Positive	100%	0%
Negative	50%	50%

Thresholds used [0.1200 0.1200 0.1300].
Total error: 25%

Figure 1 is an example where our algorithm successfully detects the location of the tampering. The black parts of the image indicate the difference between the fake and real image whereas the red box indicates our algoirthm's estimate of where the tampering happended.

Figure 1: Localization of the tampering was successful.

Conclusions

We showed in this project that detecting the presence or absence of CFA interpolation is a reasonable indicator that a digital image has been tampered with. We built a general CFA detector that did not require images to have a particular CFA interpolation algorithm and performed reasonably well. However, performance dropped when images underwent any kind of JPEG compression. We also found that smaller window sizes (eg: 64x64 pixels vs. 128x128 pixels) led to better performance without increasing the algorithm run time (on the order of 1/2 minute).

Future Work

Given more time, we could extend this project in a few ways.

1. Evaluate the performance of the algorithm on images corrupted by known types of noise (eg: by blurring, adding white Gaussian noise, and or rescaling)

2. Vary the neighborhood size (number of interpolation coefficients).

3. Using a more complex EM model where we do not assume independence between different color channels.

References - Resources and related work

Appendix I - Code and Data

Code

File:FinalCode.zip

Data

Images provided by the class, courtesy of Farid.

Appendix II - Work partition (if a group project)

The work was split evenly among all group members.

Matt: EM algorithm and similarity scores

Eric & Chongxuan: Thresholding techniques and testing scripts

Everyone contributed to the wiki

LamTangYu

Contents

Introduction

Background

What is CFA Interpolation?