Virtual Background Detection

by Jaclyn Pytlarz

Introduction/Background

Over the past year, the use of video conferences has skyrocketed and with it has come the prevalence of virtual backgrounds. The video conference systems implement body/head detection and segmentation, attempting to replace the background of the user [1]. This study will explore the believability of these virtual backgrounds, targeting specifically color temperature differences between the subject and the background. A relationship will be shown between the color temperature of the original image, the color temperature of the background, and the detectability of the virtual background.

To avoid the extreme difficulties associated with image segmentation and cropping, this project will utilize a static image with a highly tuned Photoshop mask. The main goal of this project is: "what is the detection threshold of a virtual background as it relates to changes color temperature between the subject and the background"? One consideration is the typical mixed-lighting environment of video conferencing. It is not uncommon for the light from a display to influence and impact the appearance of a subject as distinctly different from the background. Since this is the case, I hypothesize that there may be a wide threshold of acceptability in changes in color temperature.

Some notable places to start include exploring the realm of color difference threshold. Many metrics have been developed to objectively determine a just-noticeable-difference (JND) between two colors. One of the most prevalent metrics is called $\Delta E2000$ or $\Delta E_{00}$ [5]. This metric takes as input L*a*b* values and outputs a threshold. Generally a value above 1 indicates a JND (although practically sometimes closer to 2.5 is used [4]). The JND may be an indication of the detection difference between illumination, although it is expected for the acceptance threshold to be well above detection.

Another place to start is exploring illumination detection. One of the more common methods is actually computer vision driven [9] and not related directly to human perception. The most common application is for white point adjustment in a camera. One study was completed that extended illumination discrimination to simulated human observers [1] following up after an experimental design [10]. In this paper, they showed that the detection threshold towards blue was larger than it was towards a warmer hue. So we would expect potentially a skew in our data towards the cooler temperatures and people might be more lenient in that direction (it's worth noting that these results were presented in $\Delta E_{uv}$ .

We can use these observations as the basis for our experiment design. In the end, you will see how the color temperature of the background influences our detection threshold of a fake background.

Methods

This section will cover the experiment design/considerations. It will cover the display characteristics, stimuli processing steps, as well as experimental design utilizing the Matlab AppDesigner.

Display Characteristics

The experiment was conducted on a 2020 13" Macbook Pro. To ensure accurate processing and color transformation, measurements were taken of the target display and are shown below. These measurements were taken in a dark lab using a PhotoResearch PR-740 spectroradiometer.

Red, green, blue and white patches were displayed full screen through the Preview app. Since these images are not tagged as DCI-P3 or Rec2020, it is not surprising that we see the display attempting a Rec709/sRGB recreation. Notice the green stimulus produces power in what appears to be the red primary. Hence we are not getting the true native capabilities. However, this is the mode that the experiment will be conducted in, so we can use these results as our "native" assumed display.

The chromaticity diagram shows us the color gamut. The primaries are quite close to Rec709/sRGB shown in white. The white point however is far away from the typical D65 (0.3127, 0.3290). We will assume that the images have been rendered properly on the display and will decode accordingly based on the measured display primaries/white point.

The gamma/power response was measured in steps of 10 8-bit code words, including 0 and 255. It is nearly spot-on a power of 2.1 after scaling to the peak and minimum measured luminance values. The peak luminance is 256.5cd/m^2 and the minimum luminance is 0.28cd/m^2 (with the backlight full on). From these measurements, I was then able to derive a transform to/from display space to XYZ or LMS for color transformation.

Stimuli

The stimuli were created by capturing a video of the subject in a scene, then leaving that scene to get identical framing on an iPhone11 Pro with auto exposure and auto focus locked in during the video capture. Then the proper frames were extracted from the video in Matlab (one with the subject, and one of just the background). A mask of the subject was created in Photoshop to be as realistic as possible. In addition, the frames were initially color graded in Photoshop to best match the hue/saturation/contrast of each image. This helps to ensure the direction I was probing (color temperature) was clear of interference from other image attributes. In addition, the framing of the two shots was taken to best match lighting angles so that the lighting direction on the face was as close as possible.

An example of the renderings is given below. On the top row is the background from shot A, the mask from shot B, and the subject from shot B. You may notice from the top row that the color tone of the two scenes (shot A and B) match very well. The mask is multiplied by the subject image and multiplied by the inverse to the opposite background. In these virtual background renderings, the opposite background/subject were always paired. On the bottom row we see how the transformation appears. The color transform is always applied to the background image. Hence the subject between the two images to be compared are identical. The two renderings showcase the extremes of testing (4500 and 20000 Kelvin). When blending with the mask, the blend was done in linear space to best emulate the physics of a true background.

CombinedImage=FromLinear(ToLinear(Background_{A})*(1-Mask_{B})+ToLinear(Subject_{B})*Mask_{B})

Once I had the base renderings and the color transforms defined from the display measurements, I created a database of virtual background pairings while adjusting the color temperature of the background. It was assumed that the desired white point was D65 and the chromatic adaptation was derived from here. The Hunt-Pointer-Estevez [11] LMS space was used to apply the Von Kries transform for chromatic adaptation. The $\rho ,\beta ,\gamma$ are the LMS values of the white of the target and of D65. Then adaptation is accomplished by a matrix multiplication in LMS. The inverse transform M^-1 is used to go back to XYZ and then to display space.

M_{HPE}={\begin{bmatrix}0.4002400&0.7076000&-0.0808100\\-0.2263000&1.1653200&0.0457000\\0.0000000&0.0000000&0.9182200\end{bmatrix}}

$A_{VonKries}={\begin{bmatrix}\rho _{new}/\rho _{D65}&0&0\\0&\gamma _{new}/\gamma _{D65}&0\\0&0&\beta _{new}/\beta _{D65}\end{bmatrix}}$

${\begin{bmatrix}L\\M\\S\end{bmatrix}}=M*{\begin{bmatrix}X\\Y\\Z\end{bmatrix}}$

${\begin{bmatrix}L_{new}\\M_{new}\\S_{new}\end{bmatrix}}=A*{\begin{bmatrix}X\\Y\\Z\end{bmatrix}}$

CCT=[4500,5100,5600,6000,6500,8000,12000,20000]K

The adjustments for the images were completed for the background image. The face remained the same as the original. The adjustments and CCT values are given above. Equal spacing is equivalent to roughly 1/CCT [12]. The translation between CCT and xy are computed via the use of a look-up-table (LUT) [3]. An xy plot of the adjustment points is also shown below with the white dashed line representing Rec709/sRGB. We have 8 tests points including D65 (at 6500K). The LUT was shifted to perfectly align D65 to standardized [0.3127 0.3290] chromaticity coordinates.

Here is a plot of the adaptation change/direction of the background image for a target of 4500K in L*a*b*. We see that the pixels are all shifted towards the bottom left in the same direction. This indicates a shift of all pixels towards a warmer/orange hue.

These plots showcase the adaptation change/direction of the background for a 20,000K target in L*a*b*. The 3D plot shows that the chromatic adaptation transforms are maintaining constant L* meaning the chromatic adaptation is maintaining the Luma and general brightness of the image. As expected, the direction of this cooler color temperature adjustment is in the opposite direction of the previous comparison as we are now moving towards a much cooler color temperature.

Experiment Design

The experiment was conducted using a 2-alternate forced choice experiment design where a reference (real background) image was always present. The observers were asked to select the image with the fake background. There were a total of 16 unique trials which were repeated twice for each observer. It was written using the Matlab App Designer, a screenshot of which is shown below.

There were two reference images which had the original background (unmodified, shown in the bottom two of the figure below). Then there were 8 image pairs that contained the same face (unaltered color) with the background from the opposite reference, combined as shown previously. The color of the background is modified using the color correction method described earlier. In total, this resulted in 32 trials for the experiment which took approximately 10 minutes on average to complete.

When an observer completed the experiment, they went through the following steps. First they were welcomed with a welcome screen. There was a trial label after each response so that the observer knew how much longer the experiment would take. For each trial they were asked to select the image that contained the "fake" background. They made their selection by clicking on the image. At the end of the experiment a completion page was signaled and the app closed while saving out a results file.

Results

The experiment had 10 participants. Unfortunately most participants did not posses a 2020 13" Macbook Pro, but for the purposes of this analysis, we will assume as much. Each participant had four results for each color temperature tested from the 8 scenarios and the 32 trials. The results were compiled (with some outlier analysis) to produce percent correct for each color temperature. The results are a series of 1's and 0's where a 1 indicates a correct choice, and a 0 indicates an incorrect choice.

For analysis, the average correct responses were computed for each CCT level tested. The results for the two background types were combined for a single percent correct detection for each CCT. The results are plotted below in red with "standard error" error bars. A 50% threshold line is also listed in black. This level represents an observer "guessing" (ie. if the observer could not tell which image had the fake background). A best-fit gaussian is shown in blue.

Some notable observations include that at D65 (6500K) the background color temperature has not be altered. At this point, it has been graded to best match the color/appearance of the face. At 6500K, the responses are around 50% detection threshold. This can be interpreted to mean that the fake and real background at this CCT were indistinguishable from each other. As we move further away from the reference D65, the detection threshold goes up meaning that observers are starting to be able to detect the difference. We can also consider the threshold point at the 75% percent correct marker. This can be considered the "just-noticeable-difference" (JND) or detection threshold which lies around 5100K and 12000K. Due to the limited observer pool, I do expect that more experienced observers would have a steeper response. However given this data, it seems that a wider range of testing would have been more appropriate to properly probe all the way to 100% detection.

Also, with such limited data, we do see the error bars are quite large. In the future it would be helpful to first have all participants complete the experiment on the same display and second to include more and more experienced observers.

Conclusions

An experiment was conducted that explored the detection threshold for a virtual background with changes in color temperature between the background and the face. We found that at the same color temperature, a virtual background was indistinguishable from a real background. We found the lower threshold to be around 5100K and the upper threshold to be around 12000K. This large threshold may be due to the fact that it is common to have different illumination between the face/background due to light contamination from a computer screen. It shows that matching the illumination between the background and the face can lead to a virtual background that is indistinguishable from a real background.

Code Base

The code to analyze display measurements, run the experiment, and analyze experiment results is given below. Some notable functions include:

VirtualBackgroundExperiment.mlapp:

Matlab App to run the experiment

AnalyzeExperimentResults.m:

Function to combine and analyze experiment results

Measurements/PlotMeasurements.m:

Function that plots the measurements of the 2020 13" Macbook Pro

ImageProcessing/ApplyChromaticAdjustment.m:

Function that applies the chromatic adjustment to the background image to create the stimuli for the experiment

Code Base: https://office365stanford-my.sharepoint.com/:f:/g/personal/jpytlarz_stanford_edu/Elv-OaT4FtRDpH7SKzwCnTcBTClh-qEpcYZE0ZHbtHC3yw?e=XIvwJ3

References

[1] G. Finlayson, C. Fredembach and M. S. Drew, "Detecting Illumination in Images," 2007 IEEE 11th International Conference on Computer Vision, Rio de Janeiro, 2007, pp. 1-8, doi: 10.1109/ICCV.2007.4409089.

[2] Xiaomao Ding, Ana Radonjić, Nicolas P. Cottaris, Haomiao Jiang, Brian A. Wandell, David H. Brainard; Computational-observer analysis of illumination discrimination. Journal of Vision 2019;19(7):11. doi: https://doi.org/10.1167/19.7.11.

[3] CCT LUT: https://www.waveformlighting.com/tech/black-body-and-reconstituted-daylight-locus-coordinates-by-cct-csv-excel-format

[4] Pytlarz, J., & Pieri, E. (2018). HOW CLOSE IS CLOSE ENOUGH? SPECIFYING COLOUR TOLERANCES FOR HDR AND WCG DISPLAYS.

[5] "The CIEDE2000 Color-Difference Formula: Implementation Notes, Supplementary Test Data, and Mathematical Observations,", G. Sharma, W. Wu, E. N. Dalal, Color Research and Application, vol. 30. No. 1, pp. 21-30, February 2005.

[6] ITU-R BT.709-6: Parameter values for the HDTV standards for production and international programme exchange. June, 2015. Note that the -6 is the current version; previous versions were -1 through to -5.

[7] ITU-R BT.2020: Parameter values for ultra-high definition television systems for production and international programme exchange". International Telecommunication Union. 2012-08-23. Retrieved 2014-08-31.

[8] The Society of Motion Picture and Television Engineers, 2011, New York: RP 431-2, D-Cinema Quality – Reference Projector and Environment for the Display of DCDM in Review Rooms and Theaters

[9] G. Finlayson, C. Fredembach and M. S. Drew, "Detecting Illumination in Images," 2007 IEEE 11th International Conference on Computer Vision, Rio de Janeiro, 2007, pp. 1-8, doi: 10.1109/ICCV.2007.4409089.

[10] Radonjić, A., Ding, X., Krieger, A., Aston, S., Hurlbert, A. C., & Brainard, D. H. (2018). Illumination discrimination in the absence of a fixed surface-reflectance layout. Journal of Vision, 18(5):11, 1–27, https://doi.org/10.1167/18.5.11

[11] Moroney, Nathan; Fairchild, Mark D.; Hunt, Robert W.G.; Li, Changjun; Luo, M. Ronnier; Newman, Todd (November 12, 2002). "The CIECAM02 Color Appearance Model". IS&T/SID Tenth Color Imaging Conference. Scottsdale, Arizona: The Society for Imaging Science and Technology. ISBN 0-89208-241-0.

[12] Tominaga, Shoji. Wandell, Brian A., Natural Scene-Illuminant Estimation Using the Sensor Correlation. PROCEEDINGS OF THE IEEE, VOL. 90, NO. 1, JANUARY 2002.

Pytlarz

Contents

Virtual Background Detection

Introduction/Background