Pytlarz
Virtual Background Detection
Introduction/Background
Over the past year, the use of video conferences has skyrocketed and with it has come the prevalence of virtual backgrounds. The video conference systems implement body/head detection, attempting to replace the background of the user [1]. This study will explore the believability of these virtual backgrounds, targeting specifically color temperature differences between the user and the background. A relationship will be shown between the color temperature of the original image, the color temperature of the background, and the detectability of the virtual background.
Image segmentation is one of the most challenging aspects of creating a believable virtual background experience [1] especially as it relates to video. This project will not explore these aspects and so will utilize a static image with a highly tuned Photoshop mask. The main goal of this project is: "what is the detection threshold of a virtual background as it relates to changes color temperature between the subject and the background"?
Methods
The experiment was conducted using a 2-alternate forced choice experiment design where a reference (real background) image was always present. The observers were asked to select the image with the fake background. There were a total of 16 unique trials which were repeated twice for each observer.
Display Characteristics
The experiment was conducted on a 2020 13" Macbook Pro. To ensure accurate processing and color transformation, measurements were taken of the target display and are shown below. These measurements were taken in a dark lab using a PhotoResearch PR-740 spectroradiometer.
Red, green, blue and white patches were displayed full screen through the Preview app. Since these images are not tagged as DCI-P3 or Rec2020, it is not surprising that we see the display attempting a Rec709/sRGB recreation. Notice the green stimulus produces power in what appears to be the red primary. Hence we are not getting the true native capabilities. However, this is the mode that the experiment will be conducted in, so we can use these results as our "native" assumed display.
The chromaticity diagram shows us the color gamut. The primaries are quite close to Rec709/sRGB shown in white. The white point however is far away from the typical D65 (0.3127, 0.3290). We will assume that the images have been rendered properly on the display and will decode accordingly based on the measured display primaries/white point.
The gamma/power response was measured in steps of 10 8-bit code words, including 0 and 255. It is nearly spot-on a power of 2.1 after scaling to the peak and minimum measured luminance values. The peak luminance is 256.5cd/m^2 and the minimum luminance is 0.28cd/m^2 (with the backlight full on). From these measurements, I was then able to derive a transform to/from display space to XYZ or LMS for color transformation.
Stimuli
The stimuli were created by capturing a video of the subject in a scene, then leaving that scene to get identical framing on an iPhone11Pro with auto exposure and auto focus locked in during the video capture. Then the proper frames were extracted in Matlab. A mask of the subject was created Photoshop to be as realistic as possible. In addition, the frames were initially color graded in Photoshop to best match the hue/saturation/contrast of each image. This helps to ensure the direction I was probing was clear of interference from other image attributes. In addition, the framing of the two shots was taken to best match lighting angles so that the lighting direction on the face was as close as possible. An example of the renderings is given below. You may notice from the top row that the color tone of the two scenes match very well. The two renderings showcase the extremes of testing (4500 and 20000 Kelvin). When blending with the mask, the blend was done in linear space to best emulate the physics of a true background.
Once I had the base renderings and the color transforms defined from the display measurements, I created a database of virtual background pairings while adjusting the color temperature of the background. It was assumed that the desired white point was D65 and the chromatic adaptation was derived from here. The Von Kries LMS space was used for chromatic adaptation. The are the LMS values of the white of the target and of D65. Then adaptation is accomplished by a matrix multiplication in LMS. The inverse transform M^-1 is used to go back to XYZ and then to display space.
The adjustments for the images were completed for the background image. The face remained the same as the original. The adjustments and CCT values are given above. The translation between CCT and xy are computed via the use of a look-up-table (LUT) [3]. An xy plot of the adjustment points is also shown below with the white dashed line representing Rec709/sRGB. We have 8 tests points including D65 (at 6500K). The LUT was shifted to perfectly align D65 to standardized [0.3127 0.3290].
Here is a plot of the adaptation change/direction of the background image for a target 4500K in L*a*b*. We see that the pixels are all shifted towards the bottom left in the same direction.
These plots showcase the adaptation change/direction of the background for a 20,000K target in L*a*b*. The 3D plot shows that I am maintaining L* meaning the chromatic adaptation is maintaining the Luma of the image. As expected, the direction of this cooler color temperature adjustment is in the opposite direction of the previous comparison.
Experiment Design
The experiment was conducted using a 2AFC (alternate forced choice design) using the Matlab App Designer.
There were two reference images which had the original background (unmodified, shown in the bottom two of the figure below). Then there were 8 image pairs that contained the same face (unaltered color) with the background from the opposite. The color of the background is modified using the color correction method described earlier.
When an observer completed the experiment, they went through the following steps. First they were welcomed with a welcome screen. There was a trial label after each response so that the observer knew how much longer the experiment would take. For each trial they were asked to select the image that contained the "fake" background. They made their selection by clicking on the image. At the end of the experiment a completion page was signaled and the app closes while saving out a results file.
Results
The experiment had six participants. Each participant had four results for each CCT tested. The results were compiled (with some outlier analysis) where the results are a series of 1's and 0's where a 1 indicates a correct choice, and a 0 indicates an incorrect choice.
For analysis, the average correct responses were computed for each CCT level tested. The results for the two background types were combined for a single percent correct detection for each CCT. The results are plotted below in red with standard error error bars. A 50% threshold line is also listed in black. This level represents an observer "guessing". If the observer could not tell which image had the fake background. A best-fit gaussian is shown in blue.
Some notable observations include that at D65 (6500K) the background color temperature has not be altered. At this point, it has been graded to best match the color/appearance of the face. At this point, the responses are around 50% detection threshold. This can be interpreted to mean that the fake and real background at this CCT were indistinguishable from each other. As we move further away from the reference D65, the detection threshold goes up meaning that observers are starting to be able to detect the difference. We can also considered the threshold point at the 75% percent correct marker. This can be considered the "just-noticeable-difference" JND or detection threshold which lies around 5100K and 12000K. Due to the limited observer pool, I do expect that more experienced observers would have a steeper response.
Conclusions
An experiment was conducted that explored the detection threshold for a virtual background with changes in color temperature between the background and the face. We found that at the same color temperature, a virtual background was indistinguishable from a real background. We found the lower threshold to be around 5100K and the upper threshold to be around 12000K. This large threshold may be due to the fact that it is common to have different illumination between the face/background due to light contamination from a computer screen. It shows that matching the illumination between the background and the face can lead to a virtual background that is indistinguishable from a real background.
References
[1] G. Finlayson, C. Fredembach and M. S. Drew, "Detecting Illumination in Images," 2007 IEEE 11th International Conference on Computer Vision, Rio de Janeiro, 2007, pp. 1-8, doi: 10.1109/ICCV.2007.4409089.
[2] Xiaomao Ding, Ana Radonjić, Nicolas P. Cottaris, Haomiao Jiang, Brian A. Wandell, David H. Brainard; Computational-observer analysis of illumination discrimination. Journal of Vision 2019;19(7):11. doi: https://doi.org/10.1167/19.7.11.
[3] CCT LUT: https://www.waveformlighting.com/tech/black-body-and-reconstituted-daylight-locus-coordinates-by-cct-csv-excel-format