Back to Psych 221 Projects 2013

Introduction

The primary motivation for this project is the proliferation of forged images in modern society (e.g. in advertising, viral images, and even political races) and the increasing need to detect these forgeries in a forensic setting. This project seeks to reproduce the results of [1], which proposes the use of the underlying statistics of an image resulting from color filter array (CFA) interpolation (i.e. demosaicking) to detect localized forgeries in an image.

CFA Interpolation

Most CMOS sensors used in cameras today have sensitivities spanning the entire visible spectrum or more. In order to obtain information about the different color bands when a photograph is taken, a CFA is placed in front of the sensor array. With this in place, each pixel in the array can only detect one band of colors, determined by the color filter in front of it. Numerous CFA patterns are utilized, each featuring three or more colors. The most common is the Bayer pattern, shown in Figure 1.

Figure 1. Bayer CFA [2]

The remaining color channels for a given pixel must be interpolated from neighboring pixels. In order to do this, a CFA demosaicking algorithm must be implemented, of which there are literally hundreds (for a description of some of these algorithms, see [1]). The common theme amongst all CFA algorithms is that the interpolated values are some combination of neighboring measured values. The method proposed in [1] and emulated here assumes a linear model (i.e. that interpolated values are a weighted sum of neighboring measured values).

Methods

As mentioned previously, the methods used here are those used in [1], which describes an algorithm the authors call expectation/maximization (EM), a version of iteratively reweighted least squares.

Expectation/Maximization Algorithm

Even assuming a linear model, there are still several issues that need to be addressed. The primary issue is that it is not known which pixels are interpolated and which pixels were actually measured by the sensor (estimated in the expectation step) or by how much each neighbor for a given pixel should be weighted (estimated in the maximization step).

Expectation Step

The EM algorithm begins by estimating the probability that each pixel is interpolated and fits the linear model

(1) [1]

where f(x,y) denotes the pixel value for a given color channel, α_u,v denotes the weighting of each neighbor, N is the chosen neighborhood size, and n(x,y) is an error (i.e. noise) term. As described in [1], the set of interpolated pixels is deemed M₁ and Bayes' rule can be used to determine whether a given pixel is part of this set.

(2) [1]

It is assumed that there is equal probability (i.e. 0.5) that a given pixel was interpolated (i.e. part of M₁) or measured (i.e. part of M₂). This assumption may not be valid given the scarcity of the red and blue pixels in a Bayer array, but was never tested. For pixels in M₂, it is assumed that there is a uniform distribution of values (i.e. each value has a probability of 1/256 for an 8-bit color channel). The conditional probability for a pixel in M₁ is assumed to be Gaussian and centered around the linear estimate of the pixel value. This is shown in equation (3).

(3) [1]

Maximization Step

In the maximization step, weighted least squares is used to determine the α_u,v terms that minimize the total error over the entire image between the actual pixel values and the estimates for the pixel values determined by the linear model.

(4) [1]

Each term in the sum is weighted by the probability that the pixel is interpolated, given by equation (2). If these probabilities were exactly correct (i.e. 1 for interpolated pixels and 0 for measured pixels), only pixels that were truly interpolated would contribute to the total error. As shown in [3], equation (4) is the characteristic error equation for iteratively reweighted least squares, which have solutions for the α_u,v terms of the form

(5) [3]

where t here represents the current iteration. The EM algorithm is iterated over until a stable estimate for the α_u,v terms is achieved.

Similarity Measure

The EM algorithm essentially provides two outputs (one from each of its steps): 1) a probability map of the likelihoods that each pixel is interpolated and 2) the α_u,v terms that best fit the linear model. The authors of [1] propose using the Fourier transform of the probability map as a metric to determine whether an image is a forgery by comparing it to the Fourier transform of a synthesized, ideal probability map. For each color channel, this ideal probability map is generated by assuming the CFA has a Bayer pattern (as shown in Figure 1) and distributing the probabilities perfectly (i.e. the interpolation probability is 0 for measured pixels and 1 for interpolated pixels). This is illustrated for each color channel in equation (6).

(6) [1]

As shown in equation (7), [1] proposes a measure of similarity between the two Fourier transforms as the inner product of their magnitudes. Magnitudes are used in order for the similarity measure to be phase-invariant (i.e. so that the synthesized probability map is not required to align perfectly with the EM-generated probability map).

(7) [1]

The authors of [1] propose setting an empirically determined similarity measure threshold, above which a color channel is determined to be CFA interpolated and thus not forged. Unfortunately, this similarity measure has too large a range for the image set in question in order to reliably generate a threshold that could be used to determine forgeries. The cosine similarity (equation 8) is proposed as an alternative measure of similarity, which takes into account the magnitudes of the vectors in question and generates a similarity measure in the range [0,1]. A phase-invariant version of this measure was used in this project.

(8) [4]

Results

Training Image Set

Expectation/Maximization Algorithm

The EM algorithm and cosine similarity measure described above (using N = 1) were applied to 128-pixel-by-128-pixel sliding windows with 50% overlap on 68 real images and their 68 tampered counterparts provided in a training set for this project. For each window location, probability maps and their Fourier transforms were generated for all three color channels. Figure 2 shows the Fourier transform of a synthetically generated probability map for the red channel, Figure 3 shows the Fourier transform of a probability map in an untampered region, and Figure 4 shows the Fourier transform of a probability map in a tampered region. Each Fourier transform was up-sampled by a factor of 2, high-pass filtered, blurred, scaled to fill the range [0,1], and gamma corrected with a coefficient of 2 for display purposes. As can be seen in Figure 4, the Fourier transform of the tampered region still contains a significant amount of energy in the frequencies matching the synthetically generated probability map.

Figure 2. FFT of synthetically generated probability map	Figure 3. FFT of probability map in untampered region	Figure 4. FFT of probability map in tampered region

Similarity Measure

The cosine similarity with the Fourier transforms of synthetically generated probability maps were tabulated for each window and color channel. The real images were each compared to their tampered counterpart. In some cases, there was clear separability between the untampered image and the tampered image (i.e. the minimum cosine similarity was much less for the tampered image than the untampered image). The cosine similarities for each of the red, green, and blue channels is shown in Figures 5, 6, and 7, respectively, for the same image used to generate Figures 3 and 4. The untampered image is represented by the red lines.

Figure 5. Red channel cosine similarity (untampered = red, tampered = blue)	Figure 6. Green channel cosine similarity (untampered = red, tampered = blue)	Figure 7. Blue channel cosine similarity (untampered = red, tampered = blue)

In many cases, however, generating a separability threshold was not possible for one or more of the color channels. Figures 8 - 10 illustrate this point for a difficult image.

Figure 8. Red channel cosine similarity (untampered = red, tampered = blue)	Figure 9. Green channel cosine similarity (untampered = red, tampered = blue)	Figure 10. Blue channel cosine similarity (untampered = red, tampered = blue)

Histograms of the minimum cosine similarities across all images for each image channel were generated in the hopes of finding a bimodal distribution that could provide guidance on how to choose thresholds that would maximize classification success. These histograms are shown in Figures 11, 12, and 13 for the red, green, and blue channels, respectively.

Figure 11. Red channel minimum cosine similarity distribution (untampered = red, tampered = blue)	Figure 12. Green channel minimum cosine similarity distribution (untampered = red, tampered = blue)	Figure 13. Blue channel minimum cosine similarity distribution (untampered = red, tampered = blue)

As shown in these histograms, no bimodal distribution was present for this metric (however, the tampered distribution is shifted slightly lower than the untampered distribution in all cases). This indicates the problem shown in Figures 8 - 10 is more prevalent than not and reliable separation thresholds cannot be determined.

Separation Attempts

Due to the success of the authors' proposed similarity measure in [1], many attempts were made to find a metric that would allow reliable separation of untampered and tampered images using similarity measures related to the authors' proposal. In addition to the phase-invariant cosine similarity already discussed, among the methods attempted were:

Phase-variant cosine similarity
Similarity measure proposed in [1] (both phase-invariant and phase-variant)
Vector projection (both phase-invariant and phase-variant)

For each method, several parameters of the EM and similarity measure algorithms were swept, including:

Window size
Convergence parameter, epsilon
High pass filter cutoff frequency (including no high pass filter)
Gamma correction coefficient

Several metrics were examined for each of the above combinations, including means, minimums, and the difference between these two quantities. None of these metrics produced separable distributions with any measurable improvement over the method presented above (and many of the methods were much worse).

Classification Results

Due to the lack of distribution separability, an attempt was made to select thresholds that could correctly classify tampered and untampered images at least 50% of the time. This success rate corresponds with what a random separation process could achieve. For an image to be classified as tampered, all of the minimum cosine similarity values must be below the respective thresholds. The cosine similarity thresholds for each channel that most closely achieved the 50% rate are shown in Table 1. Table 2 shows the actual success rates.

Red Channel	0.81
Green Channel	0.66
Blue Channel	0.81

Table 1. Cosine similarity thresholds

	Identified	Not Identified
Tampered	52.9%	47.1%
Untampered	48.5%	51.5%

Table 2. Training image set success rates

Test Image Set

The thresholds chosen for the training set of images were applied to a test set of images to determine if the same success rate could be achieved. The test set consisted of 10 real images and 10 of their tampered counterparts. The results are shown in Table 3. As shown, the algorithm incorrectly identified 90% of the tampered images and only 10% of the untampered images. This indicates that the thresholds established previously were likely too low for this set of images.

	Identified	Not Identified
Tampered	10%	90%
Untampered	90%	10%

Table 3. Test image set success rates

Conclusions

Despite great effort, the results here indicate that, at least for the images used in this project, the similarity measure proposed by [1] and the related cosine similarity measure are not reliable methods of detecting localized image forgeries from the outputs of the EM algorithm. Sweeping of numerous parameters within these methods proved fruitless; it is likely that too much time was spent with these similarity measures and a novel approach should have been developed.

The EM algorithm itself appears to reliably generate reasonable probability map estimates that can often be interpreted by humans to determine forgeries; the issue was reliably quantifying these differences. Future work (and work already done in other Psych 221, Winter 2013 projects) should focus on using the probability maps (and possibly the α_u,v values) in novel ways (i.e. not solely using the similarity measures used in this project) to determine localized image forgeries. With a more reliable method of discerning forgeries, distortions such as JPEG compression can be better analyzed as to their impact on the effectiveness of using CFA interpolation in forensic techniques.

Resources

References

[1] Popescu, A.C.; Farid, H.,Exposing digital forgeries in color filter array interpolated images, Signal Processing, IEEE Transactions on, vol.53, no.10, pp.3948,3959, Oct. 2005
[2] http://en.wikipedia.org/wiki/Bayer_filter
[3] http://en.wikipedia.org/wiki/Iteratively_reweighted_least_squares
[4] http://en.wikipedia.org/wiki/Cosine_similarity

Software

MATLAB Student Version (R2011a)

Appendix I - Code

File:CampiottiCodeFile.zip

Campiotti

Contents

Introduction

CFA Interpolation

Methods