Introduction

With the increasing quality of smartphone cameras and software, everyone--from professional photographers to amateurs taking selfies--is using their smartphone as the primary device to capture images. In recent years, consumer desire to take and share high-quality photos with a device as portable as a smartphone has exploded, as evidenced by the popularity of image capture and sharing applications like Snapchat and Instagram and the ever-increasing need for photo storage space through services like Apple iCloud and Google Photos. Following this trend, there is increasing interest in quantifying and comparing the quality control of images captured by all smartphones that leave the manufacturing plant. The DxOMark rankings are treated as a reliable metric for comparison, but the sample size and complexity of the tests required to generate these rankings is limited to only a few units of each phone.

For our project, we would like to assess the unit-to-unit variation in image quality among smartphones in the real world. In particular, we are curious if there are variations based on smartphone manufacturer, model, price, firmware/OS version, length of ownership/use, and/or physical condition. We believe that things related to physical properties of camera may vary from unit to unit, so we think that chromatic aberration, distortion, and/or color metrics might show some variation in our preliminary results. Smartphones are now approaching $1000 with almost all recent improvements devoted to the camera, so we believe that this is an important and technologically-relevant area to explore for our final project.

Background

Smartphone camera quality is an emerging area of interest for researchers and consumers. There are two primary standards for smartphone camera quality evaluation: the DxOMark ratings¹, and the IEEE Standard².

DxOMark

DxOMark is a company that performs camera and lens image quality assessments and then provide ratings for consumers. In addition to digital camera sensors and lenses, DxOMark reviews mobile phone cameras and ranks them based on a variety of measurements. On a smartphone, they analyze the performance of the imaging pipeline in its entirety, including lens, sensor, camera control, and image processing. Their protocol includes a combination of lab testing and perceptual evaluation of images taken in the lab and in a variety of indoor and outdoor scenes. DxOMark reports sub-scores in several different categories in additional to an overall score, which is used to rank the smartphone cameras. DxOMark reviews both photos and videos captured on the smartphones. For photos, their evaluation metrics include:

Exposure and contrast, including dynamic range, exposure repeatability, and contrast
Color, including saturation and hue, white balance, white balance repeatability, and color shading
Texture and noise
Autofocus, including AF speed and repeatability
Artifacts, including softness in the frame, distortion, vignetting, chromatic aberrations, ringing, flare, ghosting, aliasing, moiré patterns, and more
Flash
Zoom at several subject distances
Bokeh

For videos, their evaluation metrics include:

Exposure
Color
Texture and noise
Autofocus
Artifacts
Stabilization

DxOMark is seen as the industry standard and their rankings are referenced in popular press and industry publications, including Forbes, The Verge, Wired, and TechRadar.

IEEE Standard

IEEE Std 1858-2016: Camera Phone Image Quality provides a detailed specification of test conditions and apparatus for evaluating smartphone image quality. The standard includes protocols for lab-based assessments as well as subjective perceptual evaluations. The evaluation metrics considered include spatial frequency response, lateral chromatic displacement, chroma level, color uniformity, local geometric distortion, visual noise, and texture blur.

Methods

Data Collection

For our project, we recruited participants to capture images with their smartphones on Stanford’s campus. Our objective was to have a broad sample of smartphone cameras, so we collected data at several different locations on campus over a period of two weeks. Our data collection locations included the Stanford Bookstore, the Graduate School of Business, the SCIEN Image Systems Engineering Seminar, the Psychology 221 class, and the SCIEN Affiliates Meeting.

Collecting data at the Stanford Bookstore

Our objective for our data collection procedure was for it to be controlled and repeatable. We created a series of three image targets and had participants take multiple photos of each of them. In order to provide a consistent illuminant, we borrowed a light booth and brought it to each data collection location and used the daylight illuminant for every image captured. We covered the front of the light booth with a piece of cardboard and cut a small opening to point the phone camera through in order to minimize stray light entering the light booth. We used a piece of Styrofoam mounted on the cardboard front cover to serve as a stand for the phones while participants were taking photos. While different phones have the camera at different locations, the stand ensured that all phones were held in the same horizontal alignment during image capture.

Light booth used to create consistent illuminant

We had participants capture three consecutive photos of each of three different image targets. All photos were taken with the phone in landscape orientation, using the following settings:

Photo mode
Flash, live, and filters off
Zoom out
Highest resolution
No image compression
All other settings (HDR, etc.) to auto

The image targets were mounted on pieces of cardboard that we cut to match the dimensions of the rear panel of the light booth. For each round of photos with a given image target, we removed the light booth’s lid and placed the piece of cardboard (with the image target on it) flush against the rear panel of the light booth and used binder clips to fasten it to the upper edge of the rear panel. We then placed the lid back on the light booth and had the participant take three photos. After these three photos were taken, we removed the light booth lid and replaced the cardboard panel with another panel with a different image targets.

Once participants had captured all nine images, we directed them to complete a Google form. On each face of the light booth, we posted a sign that included the URL for the form and a QR code that would take users directly to the form. The form included the following questions:

How long have you had your smartphone?
What brand of smartphone do you have?
What model of smartphone do you have?
What operating system version is your phone running?
Image upload
If you would like to be entered into a raffle for a $50 Amazon gift card, please enter your email

We encouraged participants to complete the form while they were with us at the light booth, as there is the possibility that they would forget to submit after the fact. We used the form responses to measure our progress and number of participants and it does not appear that many, if any, took the photos but failed to complete the form. We spent most of our data collection time at events where attendees are interested in image systems engineering and our project topic, as we believed they would be more willing to participate and more likely to complete the process once they started. Over two weeks, we had 51 different participants take photos, with a total of 472 images uploaded (some people submitted additional photos beyond the required nine).

After we had received all of these form responses, we exported the metadata (responses to the questions) into a spreadsheet and the image files into a folder on our local machines for further processing. For the metadata, we created a unique ID for each participant and used that to associate their images with the correct smartphone model for analysis. A summary of the metadata is included in the Results section below.

For each image target, we computed values to use as our evaluation metrics and compared these values across images captured by the same model of smartphone.

Image Targets

Each cardboard panel has the image target mounted on it, with a rectangular border surrounding it made of bright yellow duct tape. We chose brightly colored tape so that it would be clearly visible to participants and ensure that they were capturing the entire area of the image target if they could see all sides of the rectangle formed by the yellow tape. The three image targets were as follows:

Color checker + grey card: One X-Rite ColorChecker Classic and one 5.3 in x 7.28 in Anwenk 18% grey card affixed to the cardboard panel. Both were ordered online from Amazon.com.

Grid of dots: A grid of 14 x 22 dots printed on 11 in x 17 in white paper at 1200 dpi. Each dot has diameter 0.12 in and 0.39 in spacing between dots. The dots were located inside of a 10 in x 16 in box with 0.5 in margins on all sides, with a blue outline of width 1 pt. The image target was created in PowerPoint and is attached in Appendix I.

Grid of lines: A 10 in x 16 in grid of squares, each with side length 0.5 in. The lines are black and of width 1 pt. The grid has 0.5 in margins on all sides and is printed on 11 in x 17 in white paper at 1200 dpi. The image target was created in PowerPoint and is attached in [[#Appendix I]|Appendix I].

Image Processing

Image Standardization

Once all photos were collected and uploaded, it was time to start processing the images. Even though we did our best to control the exact image that a camera should see, varying resolutions, fields of view, and even the slight change in tilt based on how heavily the photographer pressed the camera against our light booth, provided a surprising amount of contrast between the pictures. Since it would have been a poor use of time to manually crop 500 images, the first thing we had to do was standardize each of the samples using the following process:

Rotating the photo 2 degrees counterclockwise – Since Matlab is only capable of cropping photos into perfect rectangles, it was important to find out where the corners of the edges existed. Built-in Matlab functions (discussed next) are capable of detecting where edges exist in an image, but these edges are stored as arrays of 2-dimensional points. Since Matlab would not know the what shape these points were forming, we extracted the maximum and minimum X and Y coordinates in order to identify which points identified the corners of the rectangle. Since the images were imperfect, there would be cases where the extreme values would be in unexpected locations. Therefore, the image was rotated to ensure that the top left corner was located at the minimum X coordinate. While rotation may have introduced error into the equation, the expectation is that this experiment is comparing the difference between similar phones, so the error would be present but offset in all photos.

Detecting the edges in the photo – Matlab has quite a few built in functions for detecting the edges of the photo. We ended up having the most success converting the photo to grayscale (rgb2gray), binarizing the photo using the Canny method with a threshold varying between 0.1 and 0.5 (edge), and grouping these resultant values together based on similar locations (bwboundaries).

Determining the boundary of the critical region – The above result left us with a list of dozens of boundaries and the challenge of finding out which one highlights our intended features. We found a consistent edge location that was generally free from noise on the lower half of the left side of the image. At this point, the contrast between the tape and the cardboard (or the black box and white background for the lines and dots) was large enough for us to start on the left side of this line and move right pixel by pixel until a boundary was found. It was then determined that this was the boundary that should box each of our display cards, and ideally, this would identify a complete box around our targets.

Rotation and Cropping – Using the corners that were found earlier, the image was rotated by an angle that was calculated to line up the extremes in vertical and horizontal rows. After determining the length of the edges of the critical region, the image was then cropped down to a standard field.

Grey Card

Since the exact RGB values of the 18% grey card are known [118 118 118], our goal was to evaluate how two things varied across different versions of the same phone:

Accuracy: Was the average grey color similar to that of the known value?
Consistency: Was there a large spread between the grey pixels across the field of view?

Since the images were now consistent, we were able to hardcode in the locations (as percentages of the width and height since different cameras had different resolutions) that corresponded to the dimensions of the grey card. Since each pixel had an RGB value, we were able to compute the ∇E value for each pixel relative to the known grey with a white point that was measured in the light booth. However, when this proved to be too computationally demanding, we ended up instead determining the average red, green, and blue values and then only computing the ∆E with those values. Due to the nonlinear properties of the CIELAB 1976 functions, we knew this value would not be the true average, but we determined that since our main objective was the compare the difference in results (instead of absolute results) that this would be a consistent offset for all samples. Therefore, we would still be able to compare ∆E values relative to each other. This average ∆E value was returned. In order to determine the spread of the variations, we paired together both the minimum red, green, and blue values as well as the maximum ones. Since the final step in computing the ∆E is the Euclidean distance of the sample and known L^*,a^*,b^* values, the value will always be positive. Therefore, the maximum ∆E created by either the min or max RGB values was returned to show the magnitude of the spread of color.

Color Checker

Just like the second objective of the grey card, the color checker was used to compare the ∆E values between the average RGB values of each color swatch and the known RGB values of each of the colors under consistent lighting conditions.

Grid of Dots

The intention of this target was to evaluate the chromatic aberration using a set of uniformly spaced and sized dots. There would be a large amount of aberration if there is a difference in the diameter of the dots between the dots located in the center and at the edges of the target. After the images were standardized, the boundaries were once again computed. These boundaries perfectly captured the pixels occupied by each of the dots. If dots were too close to the edge of the image (2% of the width/height of the image), there was a chance that the full dot was not seen, so these dots would be ignored from computation. Each dot’s boundary was made up of roughly 200 (x,y) coordinates. Using the “polyarea” function, the area for each function was then computed for each of the dots. The difference between the largest and smallest diameters of these dots was returned.

Grid of Lines

The intention of this target was to evaluate the curvature of the screen at the edges relative to the center by evaluating how the slope of the line changes. Much like the dots, once the image was cropped, the boundaries were again found—this time resulting in each little box being seen as an individual boundary. Once again taking the 2% buffer, the “fit” function was used to determine the best linear fit for each box. This resulted in an equation from which the slope was easily extracted. The slopes from both edges were computed, and the “worst case scenario” value was computed. Taking into consideration the resolution of the photo, the slope was returned as the maximum number of millimeters the line would increase or decrease over the span of one cube (1/2 inch).

Results

Summary of Data Collected

In total, we had 51 participants take photos for us, with a total of 472 images uploaded. Of these images, 157 captured the color checker and grey card, 154 captured the grid of lines, and 161 captured the grid of dots. After image pre-processing (cropping, rotating, etc.), we had 76 color checker/grey card images, 135 grid of dots images, and 109 grid of lines images that we were able to analyze and compare.

There were seven different smartphone manufacturers in our sample, with the distribution of smartphone manufacturers in our sample shown below:

Distribution of smartphone manufacturers in sample

There were seven different smartphone models in our sample, with the distribution of smartphone models in our sample shown below:

Distribution of smartphone models in sample

In the form, we captured information about how long each participant has had his or her smartphone. Given that we are located in the heart of Silicon Valley, we anticipated that most of the smartphones in our sample would be relatively new, but we found that over half of the smartphones in our sample were over one year old. The length of ownership of smartphones in our sample is distributed as shown below:

Distribution of length of ownership in sample

We were concerned about having too small of sample size for some phone models (in some cases we had only one participant with a given smartphone model), so we decided to group the images together for smartphones made by the same manufacturer with the same or very similar camera specifications. The groupings were defined as follows, along with the number of samples and images for each group:

Image Targets

For each of our image targets, we chose to represent our results using a box-and-whiskers plot for the computed value of interest for each smartphone model group. As in a standard box-and-whiskers plot, the hash marks at the top and bottom represent the maximum and minimum values, respectively. The top edge of the shaded box represents the 3rd quartile value, the bottom edge represents the 1st quartile value, and the line in the middle of the box represents the median value. The “X” mark represents the mean value. Along the horizontal axis, the groups are ordered from largest number of images in the sample to smallest, starting from the left. A box-and-whiskers plot where the values are all close together would indicate consistency across different units of the same model of smartphone, while a box-and-whiskers plot where the values are farther apart would indicate less consistency. All charts shown below were created in this way.

Color Checker and Grey Card

For the color checker/grey card image target, we computed the CIELAB ∆E value of each color patch imaged by the smartphone camera relative to the true patches in the color checker. We are using values measured in the lab with a spectrophotometer under the same illuminant conditions (the daylight illuminant in the light booth) as the true values for each patch in the color checker and the grey card as well as the white point. The ∆E value is on the vertical axis in the plot. A value of zero means that it would be difficult to distinguish the color produced by the smartphone camera from the true color in the color checker, while a positive value means that it would be easier to distinguish the difference.

Prior to data collection, we expected each smartphone camera to have values that were non-zero, and that the relative magnitude of the ∆E values would be somewhat consistent across smartphones made by the same manufacturer (i.e. all iPhones should produce images with similar colors). As can be seen in the plots, the 6th-generation and most recent generation iPhones appear to be fairly consistent unit-to-unit, while the 7th-generation iPhones appear to show more unit-to-unit variation. For the dark skin patch, the iPhone models appear to have a similar magnitude relative to zero, but for other color patches, we do not see this trend.

Our results for the grey card are given below:

For brevity, we have included results for a selection of patches from the color checker in this report. Because the appearance of skin tones is very important for the subjective perception of image realism, we have included the results for the light and dark skin patches on the color checker. Because of their use in color ink for printers, we have included the results for the cyan, magenta, and yellow patches on the color checker. Full results for all color patches can be found in the spreadsheet in Appendix II.

Our results for the light and dark skin patches are given below:

Our results for the cyan, magenta, and yellow patches are given below:

Grid of Dots

For the grid of dots image target, we computed the proportional diameter difference between the dots at the center of the image and the dots at the edge of the image, which is on the vertical axis in this plot. We are using this value as a proxy for chromatic aberration and geometric distortion. A value of zero means that there is no additional spreading for the dots at the edges of the image relative to those at the center, while a positive value means that the dots at the edges of the image exhibit more spreading that the dots at the center of the image.

Prior to data collection, we expected each smartphone camera to have values that were non-zero, and that the relative magnitude of the values would be smaller for newer smartphone models (i.e. newer phones have better lenses). As can be seen in the plots, the phones on the left-hand side of the chart seem to show distributions of a similar magnitude and no one phone model seems to be closer in magnitude to zero than any of the others.

Our results for the grid of dots images are as follows:

Grid of Lines

For the grid of lines image target, we computed the relative bend per box (in mm) of the boxes at the edges of the image relative to the boxes at the center of the image, which is on the vertical axis in this plot. We are using this value as a proxy for geometric distortion. A value of zero means that there is minimal geometric distortion, while a positive value means that there is more distortion at the edges relative to the center of the lens.

Prior to data collection, we expected each smartphone camera to have values that were non-zero, and that the relative magnitude of the values would be smaller for newer smartphone models (i.e. newer phones have better lenses). As can be seen in the plots, the phones on the left-hand side of the chart seem to show distributions of a similar magnitude and no one phone model seems to be closer in magnitude to zero than any of the others.

Our results for the grid of lines images are as follows:

Conclusions

Lessons Learned

The most valuable lesson learned was that future iterations of this effort should ensure there is a strong contrast for whatever line is used to crop the photo and the background. This would ensure that the target area is easily found by the native Matlab algorithms. Since our dot and line targets had a clear black line around the target area, 90% of them cropped perfectly on the first try. Since our grey/color checker border consisted of reflective yellow tape and a brown cardboard background, only 60% of the images were successfully cropped. A white background and black line would have greatly improved the amount of usable data.

The reflective tape and rough surface of the cardboard also created some confusion for the edge finder. With non-reflective tape and the above changes to the target border, this error would be greatly diminished.

It would also be useful to create a more rigid test stand and perfectly level targets. There will inherently be error introduced into the image once Matlab is told to rotate an image. While we operated under the assumption that the same error would be present in all samples so that it could be ignored when comparing similar phones, it would be much more accurate if the images were each taken so precisely that the images could be automatically processed without any sort of rotation being required to standardize the field of view.

When taking images of the grid of lines, it would be expected that with a circular lens the image distortion would be fairly consistent in all directions. Therefore, we would only need horizontal lines in order to experience the greatest amount of curvature due to the dimensions of the image. This would also allow us the assign boundaries to the entire line instead of just one box a time. We could fit that line to an exponential function instead of a line and get a more accurate measurement of the bend.

When we proposed this project, we anticipated that the data collection process would take less time than the image processing. In reality, we spent the majority of our time collecting data and attempting to recruit participants and still ended up with a sample size that was smaller than we had hoped. From this project, we’ve learned that it takes a lot of time to collect good, usable data, and this should be factored in to the planning at the start of the project.

Future Work

Because we had limited time to work on this project during a one-quarter course, there are several ways in which this project could be expanded upon in the future. In particular, additional time spent collecting data could increase the number and variety of smartphones assessed as well as the number of samples per smartphone model. If enough photos are taken with each smartphone in the sample, it would be possible to not only compare unit-to-unit variation but also variation within images from the same unit. With additional image targets, it would also be possible to use additional evaluation metrics in analyzing the differences in smartphone camera quality between smartphone models. Overall, our preliminary analysis indicates that there could be some unit-to-unit variation, in particular for color-related metrics, so this topic is worth exploring in more detail in the future.

Acknowledgements

We would like to thank our mentor for this project, David Cardinal, for proposing this project idea and providing input throughout the project. We would also like to thank the Psych 221 teaching team—Professor Brian Wandell, Dr. Joyce Farrell, and Trisha Lian—for their mentorship and support throughout the quarter and in particular, their help in recruiting participants from within the SCIEN network.

References

“How we test cameras, lenses, and smartphones,” DxOMark: https://www.dxomark.com/test-cameras-lenses-smartphones/
IEEE Standard 1858-2016 Camera Phone Image Quality: https://standards.ieee.org/develop/project/1858.html
“Image Quality Evaluation,” Dr. Joyce Farrell, Stanford Psychology 221 Lecture

Appendix I

Appendix II

We attempted to split the work as evenly as possible among the two team members. Megan took the lead on data collection, while Alex took the lead on image processing. Both team members evaluated the data, prepared and presented the findings, and wrote this report.

Smartphone Camera Quality

Contents

Introduction

Background

DxOMark

IEEE Standard

Methods

Data Collection

Image Targets