Introduction

Over the past decade, the market for compact digital cameras has slowly eroded in light of improvements in mobile phone camera technology. However, consumers in the market for a new mobile phone face a difficult challenge when attempting to compare the quality of different cameras. While other components such as battery and processor have relatively clear-cut metrics, such as hours of battery life and frequency/number of cores, there are no such metrics accurately representing camera quality of phones across the market.

Traditionally, consumers looked to the megapixel count as a measure of image quality. In the past, with many digital cameras in the 1-2 megapixel range, this metric could mean the difference between a pixelated photo print and a clear one. However, megapixel count is a very poor measure of the perceived quality of the images produced by a camera; it does not take into consideration many important camera qualities such as color accuracy, signal to noise ratio (SNR), or sharpness. Additionally, with many manufacturers pushing higher and higher megapixel counts, most of today's megapixel counts run well in excess of the amount required for detail in printing or viewing. And yet, because many of these same manufacturers are not pairing these with higher quality or larger sensors, this metric has become poor even as a description of image detail and printable size alone.

Background

Image Quality Metrics

The International Standards Organization (ISO), recognizing the limitations of existing metrics, has started the I3A Camera Phone Image Quality (CPIQ) initiative. This development is detailed in the paper Development of the I3A CPIQ spatial metrics[1]. The goal of the initiative is to develop a relevant set of camera metrics which correspond to subjective perceived image quality. To do so, metrics have been developed which measure the spatial resolution, noise, and color accuracy of mobile phone cameras. These metrics attempt to capture the differences discernible to humans viewing the images on a computer display or paper printout, while ignoring the qualities that do not correspond well to perceived quality.

In our project, we are primarily focusing on color accuracy. The measure for color accuracy which we are using is the International Commission on Illumination (CIE) distance metric ΔE* (Delta E). In the image below, the CIE XYZ chromaticity diagram can be seen. The XYZ color space was originally developed in 1931 where Y is a luminance, and combinations of X and Z represent all possible chromaticities. The Lab (L*, a*, b*) color space was developed by the CIE in 1976, which was derived from the prior XYZ color space with the intention of being more perceptually uniform. In Lab space, colors are again represented through three dimensions: L for lightness, and a and b for opposing dimensions of color. Each of these values for a color can be easily computed from their corresponding XYZ values.

The metric ΔE* represents the Euclidean distance between two colors in a Lab color space, calculated from their L*, a*, and b* values via the following formula:

\Delta E^{*}={\sqrt {(L_{2}^{*}-L_{1}^{*})^{2}+(a_{2}^{*}-a_{1}^{*})^{2}+(b_{2}^{*}-b_{1}^{*})^{2}}}

In this project, we will explore how effective ΔE* is as a measure of color accuracy to compare between different cameras. We will also look at the subjective appearance of various images as viewed by humans on a display, and how well a camera's ΔE* value correlates with perceived image quality.

Color Accuracy in Modern Digital Cameras

Correct capture of colors in a digital camera is complicated by a couple of factors:

First, the apparent color of the light radiating from an object is partly determined by the illuminant's spectrum and partly by the properties of the object itself. For example, a white object can be made to appear to be any color by illuminating it with different light sources. And even among just the natural (e.g., direct sun, blue sky) sources of light, there is a wide variety of color biases.

There is ample research showing that the human visual system is capable, to a great extent, of adapting to and substantially neutralizing the perception of color-offset of different illuminants on a given scene.

In order to accurately capture a scene, then, for accurate rendition at a later time on unknown media under unknown illumination, one of the tasks a camera system must perform during color conversion (the translation of data from the sensor's color space, determined by its color filters and sensor technology, to that of the output image, usually the sRGB standard) is to translate the observed color values to what they would look like if the scene was illuminated with a standard (e.g., D65) illuminant spectrum. This requires that we either know or estimate the illuminant incident on the current scene, in order to neutralize its color tint relative to the chosen standard. Neglecting attention to this results in a color cast on the image, often described as a lack of color balance, and perceived generally as unfavorable by human observers.

Auto White Balance

Accurate color conversion in real-world conditions can be tricky because, unless it is told explicitly by the user, or it learns through separate hardware or procedures, the camera is fundamentally unable to discern to what extent the perceived chromaticity in the scene is affected by an illuminant spectrum. And yet, due to ease-of-use concerns, modern consumer cameras are generally designed to operate most commonly without explicit knowledge of the illuminant. There is significant literature on methods for dealing with these situations.

Most modern cameras offer an auto White Balance processing step for these situations. One of the most popular methods for auto White Balance is based on the heuristic assumption that the chromaticity across all pixels in a properly color-balanced image should average out to approximate zero. With this assumption, the image's real average chromaticity is computed, and a single correction is applied such that this overall neutrality assumption holds. In a large portion of real-world photographs, this kind of automatic White Balance has a positive effect with no user involvement.

Under very biased or dim illuminants (e.g., lacking sufficient energy in a given range of visible wavelengths),or due to poor exposure saturating or underexposing sensor elements, color balance can be further complicated by noise or visible quantization from insufficient energy in certain parts of the spectrum (consider, for example, the color of objects when illuminated solely by orange sodium lamps in a nighttime urban scene). In these cases, even knowledge of the illuminant is no guarantee of proper color balance.

Methods

Experimental Setup

We used ISET for full camera system simulation. Aside from the data structures and built-in modeling, we leveraged some of the sample scripts, especially those regarding macbeth delta E, multispectral scene loading, and saving .tif files, for example, as initial examples, and developed our own similar scripts to automate our experimentation.

Scene-wise we started with the default Macbeth chart, used for obtaining conversion matrices as well as measuring deltaE for a given set of conditions.

Later we also downloaded and used several multispectral scenes. We also leveraged ISET's ability to change the scene's illuminant to simulate photographing the same scenes under different conditions. We used several of the built-in illuminant profiles, including D65, D50, Tungsten, and Fluorescent, as they represent the most popular lighting conditions in real-world use.

We simulated a relatively standard camera with simple optics, but varying many sensor and processing parameters to explore their effect on the color accuracy of resulting images.

The following table shows the default configuration as well as significant modifications we tried:

Scene and Camera Configuration
Parameter Name	Default Value	Additional Values Tried?
Scene Illumination	75 Lux	0.1 Lux, higher values on outdoor daylight scenes
Scene Distance	10m	N/A
Illuminant Type	D65	D50, Tungsten, Fluorescent
Sensor Pixel Size	2.8μm	1μm, 5μm
Aperture (F-stop)	f/4	N/A
Exposure Time	Auto	N/A
Focal Length	0.02m	N/A
White Balance	None	Gray world, White world
Bayer Pattern	standard rggb	(various, no dE effect)
Demosaic Method	nearest neighbor	(various, no dE effect)

Any of these which resulted in significant deltaE variation will be mentioned in the results.

We used the Macbeth-based CCM calculation method to simulate a "best possible" single-step conversion from sensor space to sRGB. We then used these calculated CCMs when computing other images under each corresponding set of simulated conditions. When exploring the effect of mismatched / unknown illuminant conversion, we also evaluated the built-in auto white balance algorithms.

Finally, we saved the resulting images as .tif files for our subjective evaluation, again leveraging ISET's included examples of code for this purpose.

Results

The first results section below shows our results from measuring optimal delta E values on a Macbeth chart under various scene and camera configurations. The following section looks at real-world scenes processed under various combinations of color correction matrices, illuminants, and white balance settings, and provides a subjective analysis of the result.

Optimal DeltaE Value vs. Scene and Camera Configuration

In the following tables, we show the optimal deltaE value as a function of sensor pixel size, scene luminance, and scene illuminant. This is the deltaE value measured on an image after being processed through an optimal color correction matrix.

Optimal DeltaE Values, 1μm Pixel Size
	CIE D65	CIE D50	Fluorescent	Tungsten
0.1 Lux	17.20	15.54	12.31	13.32
75 Lux	3.95	4.23	2.92	4.88

Optimal DeltaE Values, 2.8μm Pixel Size
	CIE D65	CIE D50	Fluorescent	Tungsten
0.1 Lux	6.76	7.21	8.02	9.00
75 Lux	3.86	4.20	2.88	4.83

Optimal DeltaE Values, 5μm Pixel Size
	CIE D65	CIE D50	Fluorescent	Tungsten
0.1 Lux	4.66	5.04	4.92	6.17
75 Lux	3.96	4.22	2.84	5.04

As expected, we see that in all cases, the low scene luminance of 0.1 lux results in less color accuracy than a luminance of 75 lux. The lower light levels result in much fewer photons and more noise in the final image, and thus is less color accurate.

We also see that reducing the size of the sensor pixel reduces color accuracy in the low-light case. This makes sense because the smaller the individual pixels, the fewer photons hit each individual pixel. With a larger pixel, more photons are collected, giving better accuracy. When the light level is only 0.1 lux, this difference is significant; for D65 light, we see the deltaE improve from 17.20 to 4.66 when moving from a small pixel to a large pixel. For normal light levels, however, we observed no real difference between the pixel sizes. Even with a small pixel size, enough photons were detected per pixel that the color accuracy was more or less optimal.

The illuminant with which we saw the best deltaE values was fluorescent lighting. The different spectrums of natural light came next, with D65 performing slightly better than D50. Tungsten light tended to be the least color accurate, having the highest deltaE values in almost all cases.

The following two charts show the spectral power distribution of our illuminants. The first shows four CIE standard illuminants, including D65 and D50. The latter shows the spectrum of tungsten (incandescent) light in red, and fluorescent light in blue.

The wavelengths corresponding to visible light for humans ranges from about 390nm to 700nm. We can clearly see that the spectral power distribution of fluorescent light falls almost entirely within this range. While the spectrum is clearly not uniform, we are able to correct for this via a color correction matrix. As a result, the maximum deltaE achievable under fluorescent light is better than for the other illuminants.

The spectral power distribution for the two CIE standard illuminants, D65 and D50, are clearly more spread out than for fluorescent light. However, they are both still mostly concentrated within the visible light spectrum. The differences between the two are fairly minor: D65 has slightly more power for wavelengths below 560nm, while D50 has slightly more above 560nm. D65 has a slightly higher proportion of its spectrum within the visible spectrum, resulting in a slightly better optimal deltaE.

In contrast to the other illuminants, we see that tungsten/incandescent light has a spectrum heavily biased towards higher wavelengths; most of the spectrum falls outside the visible spectrum of light. Thus, many of the photons reaching the sensor would not even be visible to a human viewing the same scene. This results in a higher maximum deltaE than the other illuminants in most cases, corresponding to less color accuracy even in well-lit scenes.

The one major exception to the above observations was the case with a small pixel size in the sensor and a low light level of 0.1 lux. In this case, tungsten light had much better deltaE color accuracy than the CIE illuminants, and D50 also performed slightly better than D65. This may be a result of the energy-wavelength relationship of photons, where longer wavelengths correspond to more energy. Given the combination of sensor pixel size and light level, the number of photons being absorbed by any given pixel in the sensor is clearly very low. As a result, the presence of high-energy photons should be beneficial towards color accuracy. We observed above that the spectrum of tungsten/incandescent light is very heavily skewed towards longer wavelength light, and D50 light also has more high wavelength spectrum than D65 light. This means that the average photon will have more energy, resulting in better color accuracy in this scenario.

We also experimented with both the Bayer pattern and the demosaic method, as well as other levels of illumination both above and between the main two cited levels, but none of these variables resulted in significantly different deltaE.

deltaE vs. viewer opinion of image quality

In this section we seek to get an idea about the relationship between deltaE values and human perception in the realm of color accuracy. The data obtained in the previous section provides our deltaE metrics; here we use the same virtual camera, with the same settings to correspond to each deltaE metric, on a number of real-world scenes; we then compare the results to each other and to the metrics.

Pixel Size

we saw before that the color accuracy of small pixels (1um in our case) suffers more acutely under low light conditions. in the image below we visualize the low-light condition for 1um, 2.8um, and 5um pixels (in each case the number of pixels in the image is adjusted for equal sensor size, such that the overall optical zoom / FOV / etc are the same):

There is a clear loss of contrast as our pixels get smaller, with a clear loss of color "saturation" (esp. visible on the blues and yellows) which might correspond to a sharper loss of SNR on the blue-filtered pixels of the sensor; These two scenes are under tungsten light, which already is biased towards having less energy in the "blue" part of the spectrum, and this is exacerbated here by the very low illumination level.

In this case we can say that higher deltaE does indeed correlate with our perception of "more inaccurate" color rendition; however, we can also argue for examining the deltaE of individual hues separately, as, clearly, blues and yellows are more heavily affected here than greens and reds.

when we move to better lighting levels, this difference in color accuracy between smaller and larger pixels goes away:

There are no significant visual color accuracy effects, and the deltaE metric is also similar.

Effect of Poor Estimation / Mismatch of Illuminant

Now we examine whether human perception agrees with deltaE for the cases where we have good illumination(75lux), and normal (2.8um) pixel size, but we vary whether the camera applies the correct Color Conversion Matrix (CCM), or a different one, during image processing. This can simulate the everyday cases when the camera is set up to expect images taken under one kind of illuminant, but instead the images are obtained under a different illuminant. (a common problem when going between indoor/outdoor settings, or when using casual cameras where users can't/don't usually specify the illuminant).

For the scenes below, we have listed actual illuminant, then the camera's assumed illuminant, and then the dE value. All these have autoWB set to 'none', and are otherwise taken with identical camera setup to the deltaE images prior, except that we have adjusted the FOV of the sensor to match the FOV of the scene data:

Indeed, the images that appear "properly" processed show small deltaE, and the improper processing results in clearly higher deltaE values.

Within each of these groups, however, it's hard to decide if the differences in deltaE are warranted, even examining the large original files, and considering different scenes (including many images not shown here). among the "good" images, deltaE of ~ 3, vs. 4, vs. 5, there is very little visible difference, let alone preference. Among the "bad" images, the range of deltaE spans 17.9 to 24.6. It's hard to make a case, from a human opinion for the bottom center image to be the worst of the lot. One other point worth investigating would be whether the bottom left, Tungsten illum. processed as D65, isn't perceived more often as "less bad" due simply to more frequent human experience with warm hued images along this direction (intentional or not).

Examining some outdoor-illuminant images (D65 and D50), with the same style of comparison, there is a stronger sense of the subtlety of the color variation and the difficulty of quantifying human preference. These are taken under otherwise identical setup as the above.

Here we have even more difficulty both noticing the difference, and having a clear preference among the choices. It's not clear that the deltaE differences here would correlate cleanly to human opinion on the accuracy or desirability of the results.

Auto White Balance

Now we will examine briefly the effect of the two methods of auto-White-Balance included in ISET: grayworld and whiteworld.

Auto-White-Balance, as mentioned earlier, is a way to try to overcome a mismatch in CCM, due either to lack of knowledge or non-standard mix of illuminants in the scene. The two methods are similar: gray world tries to correct based on an assumption that "usually" the average color balance in the scene should be a shade of gray. White world works to try to match the average highlight in the scene to approach white.

We obtained deltaE values in this case by a. Using the "wrong" CCM for an image of the Macbeth chart, causing a color-tinted image, and measuring the deltaE of this result. b. Enabling the Auto-White-Balance setting on this image, and measuring deltaE again.

For a subjective judgement, we then compare this data to the same mis-processing + AWB performed on a real-world scene.

in both AWB cases, the face image is, arguably, only little better, and far from "accurate" compared to the proper image (bottom right). On the Macbeth chart, it's hard to decide among the two images in the top row (both inaccurate colors, different average hue), though one might agree that the bottom left chart is better than both of the top row, and not as good as the proper processing.

To highlight our heightened human sensitivity to skin tones, we can examine another scene under the same 4 cases:

In this case, human reaction to the poor processing as well as the results of the Auto-White-Balance would arguably be less disapproving than the above portrait, though the difference is still noticeable.

It is worth noting that, though we can use our deltaE's as a rough guide when considering other scenes, they are probably an inaccurate indicator of the expected color accuracy in AWB: As Auto-White-Balance is a content-dependent transformation on each image(the offset applied depends on the average of the existing pixels), a standard deltaE measurement, post-AWB, is valid really only for that very image of the MCC. More accurate deltaE calculation on different scenes would require an experiment with more complex processing, which we did not undertake.

Conclusions

We began our project by performing initial testing using a Macbeth chart and various camera parameters. Through this process we learned that:

Sensor pixel size has a significant effect on color accuracy at low light levels.

The optimal deltaE value varies between different illuminant types, with fluorescent light being the most color accurate for still somewhat-mysterious reasons.

Afterwards, we moved to real-world scenes and applied various combinations of settings, producing images both good and bad. By looking at the output images and comparing their subjective qualities and measured deltaE values, we determined that:

Large differences in deltaE seem to correlate decently well with "good" vs. "awful" photos.

Finer-grain differences (e.g., <10) don’t always match subjective judgement / preference. Even finer differences (e.g., <5) are sometimes hard to spot depending on scene contents.

The Auto White Balance methods evaluated are an unreliable way to correct improper conversion, both subjectively and by deltaE measure.

Thus, we found that the deltaE color accuracy metric may be acceptable for coarse grain use, perhaps by an image processor to detect that an altogether incorrect color correction matrix has been applied (e.g., a camera doing automatic illuminant detection by examining a MCC target under a mystery illuminant). However, for the intended CPIQ application as a metric of comparison between different cell phone cameras, we find the deltaE metric insufficient. Amongst reasonably accurate images, we found little (and subjective) correlation between the deltaE value and our perception of the quality of an output image. As such, a consumer looking to compare multiple phone cameras with deltaE values that differ by less than about 10 cannot reliably use this metric to compare color accuracy in order to find the one they prefer. (especially given that many other variables such as jpg encoding, the quality of the display, etc on said device might affect the experience of the photos in a more substantial way).

Future Work / What We Didn't Try

Some ideas that occurred to us as we worked on this, which we did not have time to pursue:

LEDs are becoming popular as indoor lighting; Color spectra from common variants should be included in ISET, and their color accuracy performance should be evaluated. Colloquially, they are said to offer better color performance than Fluorescent light, but what does that mean?
Night photography vs. human night vision: what does color accuracy mean in this situation?
DeltaE after Auto White Balance: as mentioned, content-dependence of "grayworld" and "whiteworld" methods would require a more carefully designed experiment ( $\Delta E$ calculation uses macbeth chart scene, but we want to check the white balance transform generated from a different scene).
Maybe take a different approach: move the illuminant slowly along (for example) the black-body radiation curve, and find points of significant difference in noticeability or human preference. what is the corresponding difference in deltaE? also, evaluate the relative color inaccuracy of images taken along the black-body radiation curve, vs. those from synthetic illuminants significantly deviating from the curve.

References

Daxter, Donald; Frederic Cao; Henrik Eliasson; Jonathan Phillips. "Development of the I3A CPIQ spatial metrics." Web. 8 Mar 2014. <http://proceedings.spiedigitallibrary.org/data/Conferences/SPIEP/64097/829302_1.pdf>.

The ISET toolbox, particularly the scripts related to color accuracy and camera processing.

Thanks to Prof. Farrell as well for further guidance.

Appendix I

Multispectral Scenes Used

For the deltaE calculations, we used the basic Macbeth Color Chart scene from ISET. For the subjective evaluations, we obtained several other multispectral scenes, both indoor and outdoor, from the available archives, noted below.

Included with ISET:
- Macbeth scene(s)
- StuffedAnimals_tungsten-hdrs.mat
LoResFemale6_Cx.mat
LoResMale4_Cx.mat
StanfordDish_Cx.mat
StanfordMemorial_Cx.mat

Upload source code, test images, etc, and give a description of each link. In some cases, your acquired data may be too large to store practically. In this case, use your judgement (or consult one of us) and only link the most relevant data. Be sure to describe the purpose of your code and to edit the code for clarity. The purpose of placing the code online is to allow others to verify your methods and to learn from your ideas.

Scripts

s_metricsMacbethDeltaE: The original s_metricsMacbethDeltaE script, with various scene and camera parameters adjusted in order to produce color correction matrices and deltaE values for a variety of configurations.

makeCamera: a script to instantiate a camera model (optics,sensor,and processor). The goal is to have a consistent camera setup, where the parameters that turn out to be relevant to the experiment can be easily varied by simple inputs.

makeImage: a script to load a multispectral scene of user's choice, set up illumination type and intensity, call "makeCamera" with chosen parameters, set up camera conversion to one of our pre-computed CCM's, and "take a picture". It then saves the image as a .tif file with a distinctive name. The parameters are chosen by setting string variables in Matlab's workspace before calling the script. It also has ability to measure deltaE if the scene chosen is a macbeth color chart. It depends on finding the (included) .mat files with the pre-calculated CCM matrices for pixel size / illumination combinations; it also depends on being able to find the scene files mentioned above.

This last script can then be called from inside a (nested) loop to produce many variations of images without user intervention.

File:Winter14 deltaE scripts.zip

Further Simulated Images

Beyond the images used in this report, we examined many other simulated camera exposures; however, due to the size and number of the tif files as well as the fact they can be easily re-generated with the above scripts, they are not included here.

Appendix II

Edward Wang - Intro, Background I, calculating MCCs and deltaE with Macbeth Chart, Results I.

Camilo Moreno - Background II/III, generating simulated real-world images with MCCs/white balance, Results II.

WangMoreno

Contents

Introduction

Background

Image Quality Metrics

Color Accuracy in Modern Digital Cameras

Auto White Balance

Methods

Experimental Setup

Results

Optimal DeltaE Value vs. Scene and Camera Configuration

deltaE vs. viewer opinion of image quality

Pixel Size

Effect of Poor Estimation / Mismatch of Illuminant

Auto White Balance

Conclusions

Future Work / What We Didn't Try

References

Appendix I

Multispectral Scenes Used

Scripts

Further Simulated Images

Appendix II

Navigation menu

WangMoreno

Introduction

Background

Image Quality Metrics

Color Accuracy in Modern Digital Cameras

Auto White Balance

Methods

Experimental Setup

Results

Optimal DeltaE Value vs. Scene and Camera Configuration

deltaE vs. viewer opinion of image quality

Pixel Size

Effect of Poor Estimation / Mismatch of Illuminant

Auto White Balance

Conclusions

Future Work / What We Didn't Try

References

Appendix I

Multispectral Scenes Used

Scripts

Further Simulated Images

Appendix II

Navigation menu

Search