Back to Psych 221 Projects 2013

Introduction

In this modern day and age, digital technology has made it very easy to manipulate and tamper digital images. The prevalence of the Internet and our easy access to a world of information makes us vulnerable to false data and counterfeit products. Therefore it is important to develop methods to detect these forgeries. Some important approaches for such detection are based on Camera Forensics methods. These methods are developed based on the specific pipeline of image formation inside the camera. One important class of methods exploits the periodicity of the camera Color Filter Array to detect tampering. Our algorithm is based on this.

There are two goals to our project. First of all, we want to build a classifier to distinguish between real and tampered images. Second, we want to be able to localize the tampered region(s) in fake images.

Background

CFA Interpolation

A Color Filter Array (CFA) is an array of color filters placed on top of a camera sensor to capture color information which can then be interpolated to produce color information for neighboring pixels. These color filters are sensitive to and record different wavelengths, usually classified as short, medium, and long. Different cameras can have different patterns for this CFA color sampling. The most common pattern used is the Bayer pattern as shown in Figure 1. The missing color information of each pixel is then interpolated in a specific fashion from neighboring pixels, a method called demosaicing.

For each channel, CFA interpolation introduces periodic statistical correlations between a subset of pixels. As mentioned previously, one important class of Camera Forensics methods exploits the periodicity of the Color Filter Array to detect possible tampering in images. The idea is that for tampered regions of an image, the periodic pattern of high correlations will be lost. However, often when given an image, we don’t know the sampling lattice, and also we don’t know how the interpolation was done. So in order to estimate the CFA sampling pattern and the interpolation coefficients, we can use the Expectation-Maximization (EM) algorithm.

Expectation/Maximization Algorithm

We assume that each pixel (sample) in an image belongs to one of two models: either the pixel was interpolated with weights ( $\alpha$ ) by neighboring pixels (model $M_{1}$ ), or the pixel was indeed "sampled" during the image formation process (model $M_{2}$ ). The EM algorithm is a two-step iterative algorithm that we can use to estimate the probability of each pixel being physically sampled (Estimation step), and also to estimate the specific interpolation coefficients for the pixels that were not (Maximization step).

Here, f(x,y) is the CFA image of one color channel. If the pixel is produced from interpolation by its neighborhood of N, we can denote f(x,y) by:

f(x,y)=\sum _{u,v=-N}^{N}\alpha _{u,v}f(x+u,y+v)+n(x,y)

where $\alpha _{u,v}$ are the interpolation coefficients and n(x,y) is the residual error of a Gaussian distribution with zero mean. The probability of observing a sample f(x,y) given that it was generated from model $M_{1}$ is given by the equation, which is estimated in the E-step:

Pr\{f(x,y)|f(x,y)\in M_{1}\}={\frac {1}{\sigma {\sqrt {2\pi }}}}exp\left[-{\frac {1}{2\sigma ^{2}}}\left(f(x,y)-\sum _{u,v=-N}^{N}\alpha _{u,v}f(x+u,y+v)\right)^{2}\right]

The variance, σ2, of this Gaussian distribution is estimated in the M-step. In the E-step, the initial ${\overrightarrow {\alpha }}$ is randomly chosen. In the M-step, a new estimate of ${\overrightarrow {\alpha }}$ is computed by minimizing this equation:

E({\overrightarrow {\alpha }})=\sum _{x,y}w(x,y)\left(f(x,y)-\sum _{u,v=-N}^{N}\alpha _{u,v}f(x+u,y+v)\right)^{2}

where $w(x,y)\equiv Pr\{f(x,y)\in M_{1}|f(x,y)\}$

The two steps are executed iteratively until ${\overrightarrow {\alpha }}$ is stable and the difference between previous iteration and current ${\overrightarrow {\alpha }}$ is less than a tolerance value.

Methods

Here is a brief outline of our algorithm.

We use a sliding window of size 64x64 pixels to traverse the test image with 32 pixel increments, and run the EM algorithm on every block. Then we have a clustering step to localize tampered region using the EM outputs. Finally, we use the cluster maps to classify our image as real or fake.

Localization: Clustering

After running EM on a block, we get two set of useful outputs. First, we get a probability map showing how likely each pixel was sampled or not. To examine the periodicity of correlations between subsets of pixels, we take the Fourier Transform of this probability map and look at the frequency data. Second, we get the estimated ${\overrightarrow {\alpha }}$ values which describe the interpolation weights of neighboring pixels. There is useful information in both of these outputs at different degrees for different images. Our algorithm uses both sets of data.

Fourier Transform data

For our Fourier Transform data, we assume the CFA has a Bayer sampling pattern, though this is not absolute in our algorithm, as will be shown later. For a non-tampered FT block, due to the patterns introduced by the sampling pattern, there will be high intensity in at least on of the 3 points that correspond to the highest horizontal, vertical and diagonal frequencies.

What we do is we pick the one point out of these three with the highest intensity, and normalize this value with the maximum intensity value of the entire block. We denote this value as the representative "FFT peak" of the block, and we do this for all three color channels. For non-tampered blocks, we expect the Fourier Transform to exhibit peaks in these frequency locations. Therefore, for non-tampered regions, we would mostly have 1's, while for tampered blocks, we would have much lower values.

Here we show an example of image of the difference between a real image and a tampered version of the image, where the tampered region is shown, highlighted in the yellow box.

Below we have shown the "FFT peak" map for this image, and we can see that the low value regions correspond well to the tampered region in the difference image.

Alpha vectors

For our EM algorithm, we set the neighborhood of interpolation to be 1, so we get 8 $\alpha$ coefficients that make up an ${\overrightarrow {\alpha }}$ for each block in every color channel. What we did is find the median ${\overrightarrow {\alpha }}$ of the image, and compute the norm of every ${\overrightarrow {\alpha }}$ associated with every block with this median value. The idea behind this is that for tampered regions, the resulting norm values would be large, since the tampered regions would have the most different ${\overrightarrow {\alpha }}$ values compared to the rest of the image. Here we are still using the same example difference image as shown above in Figure 3, where the tampered region is shown, highlighted in the yellow box. Below we have shown the map of the computed norms for this image, and we can see that the high value regions correspond well to the tampered region in the difference image.

Clustering Algorithm

For the general method of how we localize the tampered region in fake images, we use a clustering algorithm. From the EM outputs, we obtained a 27-dimensional vector per block, which includes the 8-element ${\overrightarrow {\alpha }}$ and a FFT peak for each color channel. We run something like a 2-cluster k-means algorithm to separate these 27-dimensional vectors into 2 clusters, representing the group of non-tampered blocks, versus the group of tampered blocks. We force one cluster to be smaller than the other by using an asymmetric ratio in the cost function. This is done to reflect a more "localized" tampered region as a smaller cluster.

We realized, from the FFT peak maps as mentioned before, that just using the minimum of that map already localizes the tampered region pretty well, though very roughly. Therefore, for initialization, we choose the centroid of the smaller cluster to be the block with the minimum FFT peak value. We then randomly generate the initial centroid for the other cluster until we have a centroid position that is at least half the image size away. This is done to reduce the chances of picking the second centroid in the tampered region as well.

Lastly, we run this clustering algorithm with different weightings between the FFT peak portion of the 27-dimensional vector and the ${\overrightarrow {\alpha }}$ portion of the vector. We do this because some images are classified more easily by the FFT peak while others are clustered more easily with the ${\overrightarrow {\alpha }}$ . For example, if the image was not taken from a camera with a Bayer pattern CFA, then the intensities of the FFT peak would not give us much information, since the peak of the Fourier Transform would be somewhere else. In that case we would rely more on using the ${\overrightarrow {\alpha }}$ to cluster. Below we show an example of 5 different weightings and the different clustering maps we obtain from them. We can see visually that there is an optimum for a certain weighting (red box).

Figure 6: Five clustering results using different weightings between the alpha vector and the FFT peaks. The optimum weighting is show in the red box.

Assumptions

To improve the results from the clustering algorithm and to use these results to classify between real and fake images, we make some more assumptions.

1. Only have one tampered zone.

We assume this because we assume we are working with a "localized" tampering zone. Also, we do want to detect multiple tampered regions, we can always run our algorithm multiple times with these assumptions by taking out one detected tampered region at each round. We can then reconstruct the total tampered region after multiple cycles.

2. Tampered zone is not too small.

We make this assumption because small tampered zones are hard to visually detect in the first place, so this is a hard problem by itself. We run our algorithm with 64x64 pixel sliding blocks, and if the tampered region is contained inside these blocks, it might not be detected. This will be discussed in a more detailed way in the "Analysis and ideas for improvement" section.

3. Tampered zone is not too big.

For our subsequent classification section, we assume that the tampered zone occupies less than a certain fraction of the test image (usually around 50%). This assumption is made because even if the tampered zone is larger than this assumed size, our clustering algorithm should still be able to cluster into 2 distinct regions. In other words, it's just a matter of switching of perspective: which is tampered and which is not, and which cluster is larger, and which is smaller.

Classification

For a given image, we generate 5 different maps for the blocks, each of those corresponding to the relative importance we give to the data from the peaks and the data from the components of alpha. Ideally, for a tampered image the clusters would perfectly subdivide the blocks in two categories: the ones tampered and the ones not. For a real image, there would not be a clear distinction between these clusters and the blocks of a cluster would not be as localized.

In practice, there can be noise even on tampered images (blocks out of the tampered zone that are considered as fake as we can see in figure 7) so our goal for this part is to draw a clear line between typical maps of tampered images or of untampered images.

Because of the assumptions we stated above, we can expect that on a map corresponding to a fake image there should be a group of fake blocks of moderate size, so we can use as a heuristic the number of pixels in each connected component on this image, noticing that:

A very small one very likely corresponds to noise (figure 8)
A big one that is greater than a certain portion of the total number of blocks is unlikely to be a tampered region (figure 9)
A connected component of moderate size surrounded by a lot of noise is unlikely to be a tampered region (figure 10)

For each of these criteria, we can associate a quantitative threshold that would allow us to do the classification. These thresholds are parameters that we can modify in order to achieve the best performances, performances that we could judge by using our algorithm on the 140 test images that we were given (70 tampered, 70 untampered). By trying different sets of parameters, we realized that we could either have a higher rate of false positives (untampered images classified as fake) or a higher rate of false negatives (tampered images classified as real). When we plot each of these on a 2-dimensional plan corresponding to the proportion of false positives and the proportion of false negatives, we can highlight some sets of parameters that are better than the others: they are situated on a Pareto optimality curve (figure 11).

Figure 11:Pareto optimality curve between the proportion of false positives and the proportion of false negatives.

The two main points of interest are the ones in red: they (and all the points on the line between them) minimize the sum of these two proportions. Depending on our goal, it made more sense to use one or the other. In our tests, we actually used the red point one on the right, which achieved the following performances on our training images:

4.29 % of false positives
28.57 % of false negatives

In real life, we judged that it made more sense to minimize the false positives if we want to prove an image is tampered. If our test turns out positive, then we have a rather good certainty that the image is tapered. If the test turns out negative, we cannot say anything for sure. NB : It is important to note that we did not use as an assumption the fact that the tampered zone is a rectangle, although it is the case for training and test images. It would have made our work much easier but unlike the other assumptions we made, this one was totally unjustified in the case of a really tampered picture, so we did not limit ourselves to detecting rectangular tamperings.

Localization Results

Here are some results showing the efficacy of our algorithm to localize tampered regions in fake images. For each example, we show the difference image between the given real image and the tampered version of that image, and we also show the corresponding localization results of our algorithm.

Analysis and ideas for improvement

Our Pareto curve is still far from the ideal (0,0) point that we would like to achieve ideally, and which would correspond to a 100% success rate in our detection. This is because some images have characteristics that make them very difficult to classify. Indeed, two of them make our job particularly difficult.

First, the very small tampered areas are difficult to detect. We used 64x64 windows on which we ran our EM algorithm, and we slid them of 32 pixels each. This means that if a tampered zone contains a square of size 96x96, we are sure to have at least one window that is fully included in the tampered zone. This is what we would like to have because if a window is partially included in the untampered zone, then it will include part of a CFA interpolated image, and our image will still contain correlations between some neighboring pixels. So, the output of our EM algorithm on such a window would not give clear results of whether that block is tampered or not. But we are almost sure to correctly identify a block big enough to be totally included in a window. In fact, we can also identify correctly a block that partially contains a CFA interpolated zone as long as it is small enough. A good rule of thumb in our case is that we cannot detect a tampered region smaller than our window (64x64 blocks).

At this point, we could wonder why we did not use a smaller window. But this is because small windows lead to the second problem of our algorithm. The more the image contained in the window looks uniform, the worse the results achieve become. This is because in this case, the EM algorithm has to invert an almost singular matrix, so the results can be unreliable. This can be seen in images where there is a big uniform zone, like a sky or a white wall for example (cf figure 12). The uniform blocks give in fact another different parasitic cluster that either prevent us from identifying the tampered zone cluster in the case of a tampered image or that we can mistake as a tampered zone cluster in the case of an untampered image.

Figure 12: Uniform zone (sky) that creates a parasitic cluster

A way to get rid of this problem would be to detect these zones and to not take them into account in our map. We did not implement this in our algorithm because it would have required many changes and we realized that it was a real problem for our classification only late in the project.

So, the smaller the window gets, the more uniform is the image in it, and the more unreliable our results become. There is a trade off between the minimum detail we want to be able to detect and the reliability of our algorithm. We chose empirically a window of 64x64 because it usually gave reliable results why not being too limiting.

Another way we could use to reduce the step between each window and to slide them of 8 or 16 pixels instead of 32 pixels each time. This would enable us to make sure that at least one window will be included (or almost) in a smaller tampered region, and then we could detect smaller zones. But the cost in time would increase tremendously (if we divide this step by only 2, it means we will have 4 times more blocks, so our algorithm will take 4 times more time than currently), for a gain that would probably not be that important: even if we had a continuous algorithm (which means we would slide our windows by only one pixel), we would still be limited by the size of our window, and we would not be able anyway to detect tampered zones much smaller than the size of the window, 64x64.

These observations justify the choices we made. We will now see if the full algorithm works well on a set of test images.

Results on actual test images

We ran our test on the 20 test images that were given to us. Here are the results of their processing through our algorithm, as well as an analysis of these results.

Uncompressed / not resized images (12 images)

These are the images we trained our classifier on, so it is where we expect to achieve good results.

4 tampered images were detected out of 6, with the right tampered zone. This gives a proportion of 33% of false negatives (to compare with the 28.57% we had on our training images)
6 untampered images were detected out of 6. This gives a proportion of 0% of false negatives (to compare with the 4.29% we had on our training images)

Of course, it is not very accurate to make statistics on such a small set of data, but we can see that the results we get are what we expected with the training images. We can also notice that the 2 tampered images that we did not detect as such have either a small tampered zone (69x69 pixels, which is the order of magnitude of the smallest zone we can detect) or have a very large uniform zone (black bands on the sides of the image). It can be understood that our algorithm does not perform well on these images, since they violate the assumptions we made for the working images of our algorithm.

JPEG compressed images (6 images)

For these images, we detected correctly:

0 tampered images out of 3
3 untampered images out of 3

There are two reasons of this result. In fact, our classifier does not perform well even on most of the high quality JPEG images partly because we did not train it on JPEG images. Indeed, these images would introduce a lot of additional noise on our block maps (cf figure 13), and then it will classify tampered images as untampered because there is too much surrounding noise. However, if we trained our classifier in the same way we did, but using JPEG lightly compressed images, it would be able to take into account this surrounding noise. So, in this case, the part that broke in our pipeline is the interpretation of the map, not its synthesis. But if we look at it, we can visually see where the tampered zone is. This means that it is possible to improve our classifier with minimal work so that it could work on lightly compressed JPEG images. We did not do it here because we wanted to see how the same algorithm would work on different kinds of images.

Figure 13 : Map in the case of a lightly compressed image. We can see that the amount of noise is much higher than in the uncompressed case. However, our algorithm still worked properly because we can identify that the fake zone is on the upper left corner.

For more heavily compressed images, we do not expect the EM and clustering part to work because the correlations between pixels and their neighbors will be destroyed.

Resized images (2 images)

For these images, we detected correctly:

0 tampered image out of 1
1 untampered image out of 1

These images are out of the scope of our classifier: we based partly our analysis on the peaks of the Fourier transform, which occur in the case of a Bayer pattern. If the image is resized or rotated, the “peaks” data are useless, and classifying the image properly becomes almost impossible.

Conclusions

We made a classifier that could work on images that verify a few assumptions. We also showed that it worked well if the images verify these constraints: we could detect tampered images and even show exactly where they were tampered. Some constraints are not such a problem. For example, we could also adapt it easily for high quality JPEG images, although we would need to use a different version of our algorithm (with modified parameters). But some constraints are more limiting, and are linked to the nature of our algorithm. Thus, it is for instance very difficult to notice a tampered zone of very small size, and it is probably impossible to be able to detect arbitrary small tampered zones. But getting a perfect classification with a camera forensic method is an illusion, and this method can be used with others in order to have a better way of detecting tampered parts of it.

References

1. Popescu, A.C.; Farid, H.,Exposing digital forgeries in color filter array interpolated images, Signal Processing, IEEE Transactions on, vol.53, no.10, pp.3948,3959, Oct. 2005
2. Farid, H., "Image Forgery Detection--A Survey", IEEE Signal Processing Magazine, pp. 16-25, Mar. 2009

Appendix I - Code

Code

File:CodeBoinJou.zip

Appendix II - Work partition

We did most of the basis of this project together as a team, although most of the clustering work was done by Tiffany, while the classification part was mainly done by Jean-Baptiste. The rest was very evenly distributed between us.

BoinJou

Contents

Introduction

Background

CFA Interpolation

Expectation/Maximization Algorithm

Methods