User:Llamp
Introduction
For many years, the way light has been delivered from a display to our eyes has been the same. With the advancement of digital imagery, the way we encode & decode light followed this mechanism. But as display technology advances, how we encode & decode light must change along with it. This project will present the problem of encoding luminance of high dynamic range (HDR) imagery, cover work that has been done (how & why) to standardize a luminance encoding function for HDR imagery.
Background
What is an EOTF and Why is it Necessary?
An Electro-Optical Transfer Function (EOTF) defines the mathematical relationship between the electrical video signal input to a display and the resulting light output (luminance) measured in candelas per square meter (cd/m²). This function is essential for ensuring consistent picture presentation across different display devices and production environments. As ITU-R BT.1886 emphasizes, reference displays play a crucial role in television programme production, and their characteristics must be unified to ensure that programmes appear consistent regardless of where they are produced or viewed. Without a standardized EOTF, the same video signal could produce dramatically different visual results on different displays, undermining the creative intent of content creators and leading to inconsistent viewer experiences.[3]
Above, the "Linear" greyscale is set with code values of , increasing my 10% every time. As you can see, the ratio does not appear uniform.
The "Gamma" greyscale is set with code values of , and perceptually looks like a consistent luminance increase.
The Historical Use of Gamma Functions
The gamma function became the standard display EOTF due to the inherent physical characteristics of Cathode Ray Tube (CRT) display. CRT displays naturally exhibited a power-law relationship between input voltage and output luminance, typically approximating a gamma value of approximately 2.4-2.5, as documented in ITU-R BT.1886 which specifies γ = 2.4 for the reference EOTF. This power-law relationship offered several practical benefits: it provided perceptually uniform encoding of luminance values, meaning that equal steps in the electrical signal corresponded to roughly equal perceived brightness differences to the human visual system. The complementary relationship between the camera's opto-electronic transfer function (OETF) specified in ITU-R BT.709 with its 0.45 exponent and the display's EOTF with exponent 2.4 created an end-to-end system that was both technically efficient and perceptually optimized for human vision.[5]
BT1886
where:
- denotes screen luminance in
- denotes luminance for white
- denotes luminance for black
- denotes video signal level
- denotes the exponent power function,
- denotes user gain
- denotes black level
Problem
Want and need for HDR
A set of experiments at Dolby Laboratories aimed to discover what luminance ranges viewers preferred. They ran an experiment that gathered feedback of a group of viewers that observed imagery from 0.005-18,000nits (using a custom dual-modulation display). One conclusion of this experiment was that, given the variable of display size, a brightness range of 0.005-10,000nits would satisfy 90% of participants. [1] This extended brightness range is essential for accurately rendering specular highlights such as sunlight reflections, car headlights, fire, and other emissive light sources that are critical to conveying the creative intent and emotional impact of a scene. The 10,000 nit specification represents a practical compromise while remaining technically achievable for broadcast and distribution systems. By supporting this expanded dynamic range, HDR systems can finally break free from the CRT-era constraints and deliver images that more closely match the full richness and visual impact of the real world or the filmmaker's artistic vision.
Barten Model
The Barten model, developed by P.G.J. Barten, is a mathematical framework that describes the contrast sensitivity function (CSF) of the human visual system—essentially how sensitive our eyes are to differences in brightness at various luminance levels. The model is particularly valuable because it quantifies a fundamental characteristic of human vision: we can detect smaller brightness differences (have higher contrast sensitivity) in darker regions compared to brighter regions. This means that in shadows and mid-tones, the human eye can perceive subtle gradations that would be invisible in highlights. For HDR imaging applications, the Barten model is critically useful because it provides an objective, scientifically-grounded method to determine how many code values (bits) are needed at each luminance level to avoid visible banding or contouring artifacts.
Why not Gamma?
Explained in [4], the traditional gamma 2.4 transfer function is fundamentally unsuitable for HDR imagery because it inefficiently allocates code values across the extended dynamic range from 0 to 10,000 nits. According to the research presented in the paper, gamma 2.4 would require a 15-bit representation to adequately match the contrast sensitivity function of the human visual system across this expanded luminance range, which is impractical for broadcast and distribution systems. The gamma function wastes precious code values by over-allocating bits to bright areas that are already well below the visible threshold for contouring artifacts, while simultaneously under-allocating bits in darker regions where the human eye is more sensitive to subtle differences. This mismatch between gamma's encoding characteristics and human visual perception results in visible banding artifacts, particularly in shadow detail and mid-tones. In contrast, the newly developed Perceptual Quantizer (PQ) follows the Barten curve for contrast sensitivity, optimally distributing code values according to human perception and achieving artifact-free performance with only 12 bits for noise-free content or 10 bits for captured imagery with natural noise.
Methods
Just Noticeable Difference
However, only looking at luminance alone is not suitable. They needed to see if this EOTF would still be perceptually accurate with colors.
The researchers developed a specialized JND Cross test to evaluate color quantization requirements for HDR imagery, which consists of a grid of pixel blocks where each square is perturbed by just 1 code value away from a uniform grey background.
They generated all possible RGB variations, and if the patches are invisible against the grey background, then quantization artifacts are below the visible threshold. CIEDE2000 color difference formula was used as the metric to judge patch visibility. This revealed that noise-free content (animation, CGI, graphics) requires a ΔE2000 threshold of approximately 2.5 to avoid contouring artifacts, while captured imagery with natural noise can tolerate a higher threshold of approximately 5.0. By applying this JND Cross test across multiple grey levels spanning the entire dynamic range, the researchers determined that 12-bit PQ encoding keeps both monochromatic and color banding artifacts below the visible threshold for noise-free content.
Results
With the results of the data, in 2014 SMPTE standardized the PQ EOTF into their HDR EOTF standard for mastering displays, BT2084. From BT2084:
Where:
- denotes a nonlinear color value
- denotes the corresponding linear color value
This is an example of how a PQ encoded image would be displayed on various displays. In this example, the creative intent was mastered (created) with pixels that peak at 4000nits. It is being viewed on a bright TV, with a peak of 1000nits, and a dark TV, with a peak of 200nits. Here you can see, despite having different dynamic ranges, the image is not stretched to fit within that container. If the display cannot output a brightness encoded into the image, it simply clips:
Conclusions
Results
- Create a new, stable container able to express luminance that more closely resembles what can be found in the real world.
- Have an absolute EOTF/OETF. This means that the function used to convert the image data does not depend on any display characteristics. When setting levels properly, colorists can be reassured that the TV/display has no control over how the viewer will experience the media.
- Setting a peak brightness of 10,000nits saves room for technology to catch up, decreasing the need for constant new standardization
Outcomes
- Today, film & television production companies are mastering content in PQ, following ST2084. They are monitoring the content on 4000 nit reference monitors, ultimately setting the peak brightness of that content to up to 4000nits (any pixel above the mastering display capabilities may be flagged in QC).
- PQ encoded content has become the preferred choice for some OTT (over-the-top) service providers such as Netflix.
- PQ is one of 2 EOTFs valid for HDR content. TVs(and other displays) are capable of displaying HDR content encoded in PQ.
- Every year, brighter & brighter displays come out, that are able to output more of the PQ range
Appendix
- Bit Depth: The number of bits used to represent the color or luminance value of each pixel in a digital image. Higher bit depths allow for more discrete values and smoother gradations. Common bit depths include 8-bit (256 levels), 10-bit (1,024 levels), and 12-bit (4,096 levels).
- Nits: This is the standard unit of measurement for luminance, representing the amount of light emitted from a surface in a particular direction per unit area. Equal to
- CIE DE2000 (ΔE2000): A color difference formula developed by the International Commission on Illumination (CIE) that quantifies the perceptual difference between two colors. Lower values indicate colors that are more similar; a ΔE2000 value below 1.0 is generally considered imperceptible to the human eye under ideal viewing conditions.
- Dynamic Range: The ratio between the brightest and darkest values in an image or that a display can reproduce, typically expressed in stops (powers of 2) or as a ratio. HDR extends this range significantly beyond traditional Standard Dynamic Range (SDR).
- Just Noticeable Difference (JND): The minimum amount of change in a stimulus (such as brightness or color) that can be detected by a human observer. JND testing is used to determine the minimum bit depth required to avoid visible artifacts.
Citations
[1] Scott Daly, Timo Kunkel, Xing Sun, Suzanne Farrell, Poppy Crum, "Preference limits of the visual dynamic range for ultra high quality and aesthetic conveyance," Proc. SPIE 8651, Human Vision and Electronic Imaging XVIII, 86510J (14 March 2013); https://doi.org/10.1117/12.2013161
[2] "ST 2084:2014 - SMPTE Standard - High Dynamic Range Electro-Optical Transfer Function of Mastering Reference Displays". SMPTE. doi:10.5594/SMPTE.ST2084.2014. ISBN 978-1-61482-829-7
[3] "BT.1886 : Reference electro-optical transfer function for flat panel displays used in HDTV studio production". ITU-R Recommendation BT.1886. International Telecommunication Union. March 2011. www.itu.int. Retrieved 2021-11-07.
[4] Brooks, David. "The art of better pixels." (2015).
[5] "BT.709 : Parameter values for the HDTV standards for production and international programme exchange". ITU-R Recommendation BT.709-6. International Telecommunication Union. June 2015. www.itu.int.
[6] Dolby Laboratories. "Dolby Vision Whitepaper" (PDF). Archived from the original on 4 June 2016. Retrieved 24 August 2016 from Internet Archive: https://web.archive.org/web/20160604120415/http://www.dolby.com/us/en/technologies/dolby-vision/dolby-vision-white-paper.pdf