Smartphone Camera Quality: Difference between revisions
imported>Student2017 |
imported>Student2017 |
||
| Line 54: | Line 54: | ||
The image targets were mounted on pieces of cardboard that we cut to match the dimensions of the rear panel of the light booth. For each round of photos with a given image target, we removed the light booth’s lid and placed the piece of cardboard (with the image target on it) flush against the rear panel of the light booth and used binder clips to fasten it to the upper edge of the rear panel. We then placed the lid back on the light booth and had the participant take three photos. After these three photos were taken, we removed the light booth lid and replaced the cardboard panel with another panel with a different image targets. | The image targets were mounted on pieces of cardboard that we cut to match the dimensions of the rear panel of the light booth. For each round of photos with a given image target, we removed the light booth’s lid and placed the piece of cardboard (with the image target on it) flush against the rear panel of the light booth and used binder clips to fasten it to the upper edge of the rear panel. We then placed the lid back on the light booth and had the participant take three photos. After these three photos were taken, we removed the light booth lid and replaced the cardboard panel with another panel with a different image targets. | ||
Once participants had captured all nine images, we directed them to complete a Google form. On each face of the light booth, we posted a sign that included the URL for the form and a QR code that would take users directly to the form. The form included the following questions: | Once participants had captured all nine images, we directed them to complete a Google form. On each face of the light booth, we posted a sign that included the URL for the form and a QR code that would take users directly to the form. The form included the following questions: | ||
| Line 74: | Line 68: | ||
For each image target, we computed values to use as our evaluation metrics and compared these values across images captured by the same model of smartphone. | For each image target, we computed values to use as our evaluation metrics and compared these values across images captured by the same model of smartphone. | ||
==Image Targets== | |||
Each cardboard panel has the image target mounted on it, with a rectangular border surrounding it made of bright yellow duct tape. We chose brightly colored tape so that it would be clearly visible to participants and ensure that they were capturing the entire area of the image target if they could see all sides of the rectangle formed by the yellow tape. The three image targets were as follows: | |||
*Color checker + grey card: One X-Rite ColorChecker Classic and one 5.3 in x 7.28 in Anwenk 18% grey card affixed to the cardboard panel. Both were ordered online from Amazon.com. | |||
*Grid of dots: A grid of 14 x 22 dots printed on 11 in x 17 in white paper at 1200 dpi. Each dot has diameter 0.12 in and 0.39 in spacing between dots. The dots were located inside of a 10 in x 16 in box with 0.5 in margins on all sides, with a blue outline of width 1 pt. The image target was created in PowerPoint and is attached in Appendix II. | |||
*Grid of lines: A 10 in x 16 in grid of squares, each with side length 0.5 in. The lines are black and of width 1 pt. The grid has 0.5 in margins on all sides and is printed on 11 in x 17 in white paper at 1200 dpi. The image target was created in PowerPoint and is attached in Appendix II. | |||
==Image Processing== | |||
===Image Standardization=== | |||
Once all photos were collected and uploaded, it was time to start processing the images. Even though we did our best to control the exact image that a camera should see, varying resolutions, fields of view, and even the slight change in tilt based on how heavily the photographer pressed the camera against our light booth, provided a surprising amount of contrast between the pictures. Since it would have been a poor use of time to manually crop 500 images, the first thing we had to do was standardize each of the samples using the following process: | |||
*Rotating the photo 2 degrees counterclockwise – Since Matlab is only capable of cropping photos into perfect rectangles, it was important to find out where the corners of the edges existed. Built-in Matlab functions (discussed next) are capable of detecting where edges exist in an image, but these edges are stored as arrays of 2-dimensional points. Since Matlab would not know the what shape these points were forming, we extracted the maximum and minimum X and Y coordinates in order to identify which points identified the corners of the rectangle. Since the images were imperfect, there would be cases where the extreme values would be in unexpected locations. Therefore, the image was rotated to ensure that the top left corner had was located at the minimum X coordinate. While rotation may have introduced error into the equation, the expectation is that this experiment is comparing the difference between similar phones, so the error would be present but offset in all photos. | |||
*Detecting the edges in the photo – Matlab has quite a few built in functions for detecting the edges of the photo. We ended up having the most success converting the photo to grayscale (rgb2gray), binarizing the photo using the Canny method with a threshold varying between 0.1 and 0.5 (edge), and grouping these resultant values together based on similar locations (bwboundaries). | |||
*Determining the boundary of the critical region – The above result left us with a list of dozens of boundaries and the challenge of finding out which one highlights our intended features. We found a consistent edge location that was generally free from noise on the lower half of the left side of the image. At this point, the contrast between the tape and the cardboard (or the black box and white background for the lines and dots) was large enough for us to start on the left side of this line and move right pixel by pixel until a boundary was found. It was then determined that this was the boundary that should box each of our display cards, and ideally, this would identify a complete box around our targets. | |||
*Rotation and Cropping – Using the corners that were found earlier, the image was rotated by an angle that was calculated to line up the extremes in vertical and horizontal rows. After determining the length of the edges of the critical region, the image was then cropped down to a standard field. | |||
===Grey Card=== | |||
Since the exact RGB values of the 18% grey card are known [118 118 118], our goal was to evaluate how two things varied across different versions of the same phone: | |||
#Accuracy: Was the average grey color similar to that of the known value? | |||
#Consistency: Was there a large spread between the grey pixels across the field of view? | |||
Since the images were now consistent, we were able to hardcode in the locations (as percentages of the width and height since different cameras had different resolutions) that corresponded to the dimensions of the grey card. Since each pixel had an RGB value, we were able to compute the ∇E value for each pixel relative to the known grey with a white point that was measured in the light booth. However, when this proved to be too computationally demanding, we ended up instead determining the average red, green, and blue values and then only computing the ∇E with those values. Due to the nonlinear properties of the CIELAB 1976 functions, we knew this value would not be the true average, but we determined that since our main objective was the compare the difference in results (instead of absolute results) that this would be a consistent offset for all samples. Therefore, we would still be able to compare ∇E values relative to each other. This average ∇E value was returned. | |||
In order to determine the spread of the variations, we paired together both the minimum red, green, and blue values as well as the maximum ones. Since the final step in computing the ∇E is the Euclidean distance of the sample and known L^*,a^*,b^* values, the value will always be positive. Therefore, the maximum ∆E created by either the min or max RGB values was returned to show the magnitude of the spread of color. | |||
===Color Checker=== | |||
Just like the second objective of the grey card, the color checker was used to compare the ∆E values between the average RGB values of each color swatch and the known RGB values of each of the colors under consistent lighting conditions. | |||
===Grid of Dots=== | |||
The intention of this target was to evaluate the chromatic aberration using a set of uniformly spaced and sized dots. There would be a large amount of aberration if there is a difference in the diameter of the dots between the dots located in the center and at the edges of the target. After the images were standardized, the boundaries were once again computed. These boundaries perfectly captured the pixels occupied by each of the dots. If dots were too close to the edge of the image (2% of the width/height of the image), there was a chance that the full dot was not seen, so these dots would be ignored from computation. Each dot’s boundary was made up of roughly 200 (x,y) coordinates. Using the “polyarea” function, the area for each function was then computed for each of the dots. The difference between the largest and smallest diameters of these dots was returned. | |||
===Grid of Lines=== | |||
The intention of this target was to evaluate the curvature of the screen at the edges relative to the center by evaluating how the slope of the line changes. Much like the dots, once the image was cropped, the boundaries were again found—this time resulting in each little box being seen as an individual boundary. Once again taking the 2% buffer, the “fit” function was used to determine the best linear fit for each box. This resulted in an equation from which the slope was easily extracted. The slopes from both edges were computed, and the “worst case scenario” value was computed. Taking into consideration the resolution of the photo, the slope was returned as the maximum number of millimeters the line would increase or decrease over the span of one cube (1/2 inch). | |||
=Results= | =Results= | ||
Revision as of 07:46, 15 December 2017
Introduction
With the increasing quality of smartphone cameras and software, everyone--from professional photographers to amateurs taking selfies--is using their smartphone as the primary device to capture images. In recent years, consumer desire to take and share high-quality photos with a device as portable as a smartphone has exploded, as evidenced by the popularity of image capture and sharing applications like Snapchat and Instagram and the ever-increasing need for photo storage space through services like Apple iCloud and Google Photos. Following this trend, there is increasing interest in quantifying and comparing the quality control of images captured by all smartphones that leave the manufacturing plant. The DxOMark rankings are treated as a reliable metric for comparison, but the sample size and complexity of the tests required to generate these rankings is limited to only a few units of each phone.
For our project, we would like to assess the unit-to-unit variation in image quality among smartphones in the real world. In particular, we are curious if there are variations based on smartphone manufacturer, model, price, firmware/OS version, length of ownership/use, and/or physical condition. We believe that things related to physical properties of camera may vary from unit to unit, so we think that chromatic aberration, distortion, and/or color metrics might show some variation in our preliminary results. Smartphones are now approaching $1000 with almost all recent improvements devoted to the camera, so we believe that this is an important and technologically-relevant area to explore for our final project.
Background
Smartphone camera quality is an emerging area of interest for researchers and consumers. There are two primary standards for smartphone camera quality evaluation: the DxOMark ratings1, and the IEEE Standard2.
DxOMark
DxOMark is a company that performs camera and lens image quality assessments and then provide ratings for consumers. In addition to digital camera sensors and lenses, DxOMark reviews mobile phone cameras and ranks them based on a variety of measurements. On a smartphone, they analyze the performance of the imaging pipeline in its entirety, including lens, sensor, camera control, and image processing. Their protocol includes a combination of lab testing and perceptual evaluation of images taken in the lab and in a variety of indoor and outdoor scenes. DxOMark reports sub-scores in several different categories in additional to an overall score, which is used to rank the smartphone cameras. DxOMark reviews both photos and videos captured on the smartphones. For photos, their evaluation metrics include:
- Exposure and contrast, including dynamic range, exposure repeatability, and contrast
- Color, including saturation and hue, white balance, white balance repeatability, and color shading
- Texture and noise
- Autofocus, including AF speed and repeatability
- Artifacts, including softness in the frame, distortion, vignetting, chromatic aberrations, ringing, flare, ghosting, aliasing, moiré patterns, and more
- Flash
- Zoom at several subject distances
- Bokeh
For videos, their evaluation metrics include:
- Exposure
- Color
- Texture and noise
- Autofocus
- Artifacts
- Stabilization
DxOMark is seen as the industry standard and their rankings are referenced in popular press and industry publications, including Forbes, The Verge, Wired, and TechRadar.
IEEE Standard
IEEE Std 1858-2016: Camera Phone Image Quality provides a detailed specification of test conditions and apparatus for evaluating smartphone image quality. The standard includes protocols for lab-based assessments as well as subjective perceptual evaluations. The evaluation metrics considered include spatial frequency response, lateral chromatic displacement, chroma level, color uniformity, local geometric distortion, visual noise, and texture blur.
Methods
Data Collection
For our project, we recruited participants to capture images with their smartphones on Stanford’s campus. Our objective was to have a broad sample of smartphone cameras, so we collected data at several different locations on campus over a period of two weeks. Our data collection locations included the Stanford Bookstore, the Graduate School of Business, the SCIEN Image Systems Engineering Seminar, the Psychology 221 class, and the SCIEN Affiliates Meeting.
Our objective for our data collection procedure was for it to be controlled and repeatable. We created a series of three image targets and had participants take multiple photos of each of them. In order to provide a consistent illuminant, we borrowed a light booth and brought it to each data collection location and used the daylight illuminant for every image captured. We covered the front of the light booth with a piece of cardboard and cut a small opening to point the phone camera through in order to minimize stray light entering the light booth. We used a piece of Styrofoam mounted on the cardboard front cover to serve as a stand for the phones while participants were taking photos. While different phones have the camera at different locations, the stand ensured that all phones were held in the same horizontal alignment during image capture.
We had participants capture three consecutive photos of each of three different image targets. All photos were taken with the phone in landscape orientation, using the following settings:
- Photo mode
- Flash, live, and filters off
- Zoom out
- Highest resolution
- No image compression
- All other settings (HDR, etc.) to auto
The image targets were mounted on pieces of cardboard that we cut to match the dimensions of the rear panel of the light booth. For each round of photos with a given image target, we removed the light booth’s lid and placed the piece of cardboard (with the image target on it) flush against the rear panel of the light booth and used binder clips to fasten it to the upper edge of the rear panel. We then placed the lid back on the light booth and had the participant take three photos. After these three photos were taken, we removed the light booth lid and replaced the cardboard panel with another panel with a different image targets.
Once participants had captured all nine images, we directed them to complete a Google form. On each face of the light booth, we posted a sign that included the URL for the form and a QR code that would take users directly to the form. The form included the following questions:
- How long have you had your smartphone?
- What brand of smartphone do you have?
- What model of smartphone do you have?
- What operating system version is your phone running?
- Image upload
- If you would like to be entered into a raffle for a $50 Amazon gift card, please enter your email
We encouraged participants to complete the form while they were with us at the light booth, as there is the possibility that they would forget to submit after the fact. We used the form responses to measure our progress and number of participants and it does not appear that many, if any, took the photos but failed to complete the form. We spent most of our data collection time at events where attendees are interested in image systems engineering and our project topic, as we believed they would be more willing to participate and more likely to complete the process once they started. Over two weeks, we had 51 different participants take photos, with a total of 472 images uploaded (some people submitted additional photos beyond the required nine).
After we had received all of these form responses, we exported the metadata (responses to the questions) into a spreadsheet and the image files into a folder on our local machines for further processing. For the metadata, we created a unique ID for each participant and used that to associate their images with the correct smartphone model for analysis. A summary of the metadata is included in the Results section below.
For each image target, we computed values to use as our evaluation metrics and compared these values across images captured by the same model of smartphone.
Image Targets
Each cardboard panel has the image target mounted on it, with a rectangular border surrounding it made of bright yellow duct tape. We chose brightly colored tape so that it would be clearly visible to participants and ensure that they were capturing the entire area of the image target if they could see all sides of the rectangle formed by the yellow tape. The three image targets were as follows:
- Color checker + grey card: One X-Rite ColorChecker Classic and one 5.3 in x 7.28 in Anwenk 18% grey card affixed to the cardboard panel. Both were ordered online from Amazon.com.
- Grid of dots: A grid of 14 x 22 dots printed on 11 in x 17 in white paper at 1200 dpi. Each dot has diameter 0.12 in and 0.39 in spacing between dots. The dots were located inside of a 10 in x 16 in box with 0.5 in margins on all sides, with a blue outline of width 1 pt. The image target was created in PowerPoint and is attached in Appendix II.
- Grid of lines: A 10 in x 16 in grid of squares, each with side length 0.5 in. The lines are black and of width 1 pt. The grid has 0.5 in margins on all sides and is printed on 11 in x 17 in white paper at 1200 dpi. The image target was created in PowerPoint and is attached in Appendix II.
Image Processing
Image Standardization
Once all photos were collected and uploaded, it was time to start processing the images. Even though we did our best to control the exact image that a camera should see, varying resolutions, fields of view, and even the slight change in tilt based on how heavily the photographer pressed the camera against our light booth, provided a surprising amount of contrast between the pictures. Since it would have been a poor use of time to manually crop 500 images, the first thing we had to do was standardize each of the samples using the following process:
- Rotating the photo 2 degrees counterclockwise – Since Matlab is only capable of cropping photos into perfect rectangles, it was important to find out where the corners of the edges existed. Built-in Matlab functions (discussed next) are capable of detecting where edges exist in an image, but these edges are stored as arrays of 2-dimensional points. Since Matlab would not know the what shape these points were forming, we extracted the maximum and minimum X and Y coordinates in order to identify which points identified the corners of the rectangle. Since the images were imperfect, there would be cases where the extreme values would be in unexpected locations. Therefore, the image was rotated to ensure that the top left corner had was located at the minimum X coordinate. While rotation may have introduced error into the equation, the expectation is that this experiment is comparing the difference between similar phones, so the error would be present but offset in all photos.
- Detecting the edges in the photo – Matlab has quite a few built in functions for detecting the edges of the photo. We ended up having the most success converting the photo to grayscale (rgb2gray), binarizing the photo using the Canny method with a threshold varying between 0.1 and 0.5 (edge), and grouping these resultant values together based on similar locations (bwboundaries).
- Determining the boundary of the critical region – The above result left us with a list of dozens of boundaries and the challenge of finding out which one highlights our intended features. We found a consistent edge location that was generally free from noise on the lower half of the left side of the image. At this point, the contrast between the tape and the cardboard (or the black box and white background for the lines and dots) was large enough for us to start on the left side of this line and move right pixel by pixel until a boundary was found. It was then determined that this was the boundary that should box each of our display cards, and ideally, this would identify a complete box around our targets.
- Rotation and Cropping – Using the corners that were found earlier, the image was rotated by an angle that was calculated to line up the extremes in vertical and horizontal rows. After determining the length of the edges of the critical region, the image was then cropped down to a standard field.
Grey Card
Since the exact RGB values of the 18% grey card are known [118 118 118], our goal was to evaluate how two things varied across different versions of the same phone:
- Accuracy: Was the average grey color similar to that of the known value?
- Consistency: Was there a large spread between the grey pixels across the field of view?
Since the images were now consistent, we were able to hardcode in the locations (as percentages of the width and height since different cameras had different resolutions) that corresponded to the dimensions of the grey card. Since each pixel had an RGB value, we were able to compute the ∇E value for each pixel relative to the known grey with a white point that was measured in the light booth. However, when this proved to be too computationally demanding, we ended up instead determining the average red, green, and blue values and then only computing the ∇E with those values. Due to the nonlinear properties of the CIELAB 1976 functions, we knew this value would not be the true average, but we determined that since our main objective was the compare the difference in results (instead of absolute results) that this would be a consistent offset for all samples. Therefore, we would still be able to compare ∇E values relative to each other. This average ∇E value was returned. In order to determine the spread of the variations, we paired together both the minimum red, green, and blue values as well as the maximum ones. Since the final step in computing the ∇E is the Euclidean distance of the sample and known L^*,a^*,b^* values, the value will always be positive. Therefore, the maximum ∆E created by either the min or max RGB values was returned to show the magnitude of the spread of color.
Color Checker
Just like the second objective of the grey card, the color checker was used to compare the ∆E values between the average RGB values of each color swatch and the known RGB values of each of the colors under consistent lighting conditions.
Grid of Dots
The intention of this target was to evaluate the chromatic aberration using a set of uniformly spaced and sized dots. There would be a large amount of aberration if there is a difference in the diameter of the dots between the dots located in the center and at the edges of the target. After the images were standardized, the boundaries were once again computed. These boundaries perfectly captured the pixels occupied by each of the dots. If dots were too close to the edge of the image (2% of the width/height of the image), there was a chance that the full dot was not seen, so these dots would be ignored from computation. Each dot’s boundary was made up of roughly 200 (x,y) coordinates. Using the “polyarea” function, the area for each function was then computed for each of the dots. The difference between the largest and smallest diameters of these dots was returned.
Grid of Lines
The intention of this target was to evaluate the curvature of the screen at the edges relative to the center by evaluating how the slope of the line changes. Much like the dots, once the image was cropped, the boundaries were again found—this time resulting in each little box being seen as an individual boundary. Once again taking the 2% buffer, the “fit” function was used to determine the best linear fit for each box. This resulted in an equation from which the slope was easily extracted. The slopes from both edges were computed, and the “worst case scenario” value was computed. Taking into consideration the resolution of the photo, the slope was returned as the maximum number of millimeters the line would increase or decrease over the span of one cube (1/2 inch).