YCbCr

From Wikipedia, the free encyclopedia
A visualization of YCbCr color space
The CbCr plane at constant luma Y′=0.5
A color image and its Y′, CB and CR components. The Y′ image is essentially a greyscale copy of the main image.

YCbCr, Y′CbCr, or Y Pb/Cb Pr/Cr, also written as YCBCR or Y′CBCR, is a family of color spaces used as a part of the color image pipeline in video and digital photography systems. Y′ is the luma component and CB and CR are the blue-difference and red-difference chroma components. Y′ (with prime) is distinguished from Y, which is luminance, meaning that light intensity is nonlinearly encoded based on gamma corrected RGB primaries.

Y′CbCr color spaces are defined by a mathematical coordinate transformation from an associated RGB primaries and white point. If the underlying RGB color space is absolute, the Y′CbCr color space is an absolute color space as well; conversely, if the RGB space is ill-defined, so is Y′CbCr. The transformation is defined in equations 32, 33 in ITU-T H.273. Nevertheless that rule does not apply to P3-D65 primaries used by Netflix with BT.2020-NCL matrix, so that means matrix was not derived from primaries, but now Netflix allows BT.2020 primaries (since 2021).[1] Same happens with JPEG: it has BT.601 matrix derived from System M primaries, yet the primaries of most images are BT.709.

Rationale[edit]

Cathode ray tube displays are driven by red, green, and blue voltage signals, but these RGB signals are not efficient as a representation for storage and transmission, since they have a lot of redundancy.

YCbCr and Y′CbCr are a practical approximation to color processing and perceptual uniformity, where the primary colors corresponding roughly to red, green and blue are processed into perceptually meaningful information. By doing this, subsequent image/video processing, transmission and storage can do operations and introduce errors in perceptually meaningful ways. Y′CbCr is used to separate out a luma signal (Y′) that can be stored with high resolution or transmitted at high bandwidth, and two chroma components (CB and CR) that can be bandwidth-reduced, subsampled, compressed, or otherwise treated separately for improved system efficiency.

One practical example would be decreasing the bandwidth or resolution allocated to "color" compared to "black and white", since humans are more sensitive to the black-and-white information (see image example to the right). This is called chroma subsampling.

CbCr[edit]

YCbCr is sometimes abbreviated to YCC. Typically the terms Y′CbCr, YCbCr, YPbPr and YUV are used interchangeably, leading to some confusion. The main difference is that YPbPr is used with analog images and YCbCr with digital images, leading to different scaling values for Umax and Vmax (in YCbCr both are ) when converting to/from YUV. Y'CbCr and YCbCr differ due to the values being gamma corrected or not.

The equations below give a better picture of the common principles and general differences between these formats.

RGB conversion[edit]

R'G'B' to Y'PbPr[edit]

RGB to YCbCr conversion

Y′CbCr signals (prior to scaling and offsets to place the signals into digital form) are called YPbPr, and are created from the corresponding gamma-adjusted RGB (red, green and blue) source using three defined constants KR, KG, and KB as follows:

where KR, KG, and KB are ordinarily derived from the definition of the corresponding RGB space, and required to satisfy .

The equivalent matrix manipulation is often referred to as the "color matrix":

And its inverse:

Here, the prime (′) symbols mean gamma correction is being used; thus R′, G′ and B′ nominally range from 0 to 1, with 0 representing the minimum intensity (e.g., for display of the color black) and 1 the maximum (e.g., for display of the color white). The resulting luma (Y) value will then have a nominal range from 0 to 1, and the chroma (PB and PR) values will have a nominal range from -0.5 to +0.5. The reverse conversion process can be readily derived by inverting the above equations.

Y'PbPr to Y'CbCr[edit]

When representing the signals in digital form, the results are scaled and rounded, and offsets are typically added. For example, the scaling and offset applied to the Y′ component per specification (e.g. MPEG-2[2]) results in the value of 16 for black and the value of 235 for white when using an 8-bit representation. The standard has 8-bit digitized versions of CB and CR scaled to a different range of 16 to 240. Consequently, rescaling by the fraction (235-16)/(240-16) = 219/224 is sometimes required when doing color matrixing or processing in YCbCr space, resulting in quantization distortions when the subsequent processing is not performed using higher bit depths.

The scaling that results in the use of a smaller range of digital values than what might appear to be desirable for representation of the nominal range of the input data allows for some "overshoot" and "undershoot" during processing without necessitating undesirable clipping. This "headroom" and "toeroom"[3] can also be used for extension of the nominal color gamut, as specified by xvYCC.

The value 235 accommodates a maximum overshoot of (255 - 235) / (235 - 16) = 9.1%, which is slightly larger than the theoretical maximum overshoot (Gibbs' Phenomenon) of about 8.9% of the maximum (black-to-white) step. The toeroom is smaller, allowing only 16 / 219 = 7.3% overshoot, which is less than the theoretical maximum overshoot of 8.9%. In addition, because values 0 and 255 are reserved in HDMI, the room is actually slightly less.

Y'CbCr to xvYCC[edit]

Since the equations defining Y'CbCr are formed in a way that rotates the entire nominal RGB color cube and scales it to fit within a (larger) YCbCr color cube, there are some points within the Y'CbCr color cube that cannot be represented in the corresponding RGB domain (at least not within the nominal RGB range). This causes some difficulty in determining how to correctly interpret and display some Y'CbCr signals. These out-of-range Y'CbCr values are used by xvYCC to encode colors outside the BT.709 gamut.

ITU-R BT.601 conversion[edit]

The form of Y′CbCr that was defined for standard-definition television use in the ITU-R BT.601 (formerly CCIR 601) standard for use with digital component video is derived from the corresponding RGB space (ITU-R BT.470-6 System M primaries) as follows:

From the above constants and formulas, the following can be derived for ITU-R BT.601.

Analog YPbPr from analog R'G'B' is derived as follows:

Digital Y′CbCr (8 bits per sample) is derived from analog R'G'B' as follows:

or simply componentwise

The resultant signals range from 16 to 235 for Y′ (Cb and Cr range from 16 to 240); the values from 0 to 15 are called footroom, while the values from 236 to 255 are called headroom.

Alternatively, digital Y′CbCr can derived from digital R'dG'dB'd (8 bits per sample, each using the full range with zero representing black and 255 representing white) according to the following equations:

In the formula below, the scaling factors are multiplied by . This allows for the value 256 in the denominator, which can be calculated by a single bitshift.

If the R'd G'd B'd digital source includes footroom and headroom, the footroom offset 16 needs to be subtracted first from each signal, and a scale factor of needs to be included in the equations.

The inverse transform is:

The inverse transform without any roundings (using values coming directly from ITU-R BT.601 recommendation) is:

This form of Y′CbCr is used primarily for older standard-definition television systems, as it uses an RGB model that fits the phosphor emission characteristics of older CRTs.

ITU-R BT.709 conversion[edit]

Rec. 709 compared with Rec. 2020

A different form of Y′CbCr is specified in the ITU-R BT.709 standard, primarily for HDTV use. The newer form is also used in some computer-display oriented applications, as sRGB (though the matrix used for sRGB form of YCbCr, sYCC, is still BT.601). In this case, the values of Kb and Kr differ, but the formulas for using them are the same. For ITU-R BT.709, the constants are:

This form of Y′CbCr is based on an RGB model that more closely fits the phosphor emission characteristics of newer CRTs and other modern display equipment.[citation needed] The conversion matrices for BT.709 are these:

The definitions of the R', G', and B' signals also differ between BT.709 and BT.601, and differ within BT.601 depending on the type of TV system in use (625-line as in PAL and SECAM or 525-line as in NTSC), and differ further in other specifications. In different designs there are differences in the definitions of the R, G, and B chromaticity coordinates, the reference white point, the supported gamut range, the exact gamma pre-compensation functions for deriving R', G' and B' from R, G, and B, and in the scaling and offsets to be applied during conversion from R'G'B' to Y′CbCr. So proper conversion of Y′CbCr from one form to the other is not just a matter of inverting one matrix and applying the other. In fact, when Y′CbCr is designed ideally, the values of KB and KR are derived from the precise specification of the RGB color primary signals, so that the luma (Y′) signal corresponds as closely as possible to a gamma-adjusted measurement of luminance (typically based on the CIE 1931 measurements of the response of the human visual system to color stimuli).[4]

ITU-R BT.2020 conversion[edit]

The ITU-R BT.2020 standard defines both BT.709 gamma corrected Y′CbCr and using the same gamma correction (except for Y', that is calculated differently), but with constant luminance Cb, Cr called YcCbcCrc.[5]

For both, the coefficients are:

The decoding matrix for BT.2020-NCL is this with 14 decimal places:

The smaller values in the matrix are not rounded, they are precise values. For systems with limited precision (8 or 10 bit, for example) a lower precision of the above matrix could be used, for example, retaining only 6 digits after decimal point.[6]

YcCbcCrc may be used when the top priority is the most accurate retention of luminance information.[5] Color representation has true constant luminance (CL) when the luma channel (Y’ of Y′CbCr encoded with BT.709 transfer function or PQ, for example) matches with encoded luminance (BT.709 or PQ encoded luminance Y of XYZ), nevertheless YcCbcCrc does not provide for constant intensity (CI), that is done in ICTCP.[7][8] BT.2020 does not define PQ and thus HDR, it is further defined in SMPTE ST 2084 and BT.2100.

The derivation of BT.2020 coefficients from BT.2020 primaries further changes the space.[9]

SMPTE 240M conversion[edit]

The SMPTE 240M standard (used on the MUSE analog HD television system) defines YCC with these coefficients:

The coefficients are derived from SMPTE 170M primaries and white point, as used in 240M standard.

JPEG conversion[edit]

JFIF usage of JPEG supports a modified Rec. 601 Y′CbCr where Y′, CB and CR have the full 8-bit range of [0...255].[10] Below are the conversion equations expressed to six decimal digits of precision. (For ideal equations, see ITU-T T.871.[11]) Note that for the following formulae, the range of each input (R,G,B) is also the full 8-bit range of [0...255].

And back:

The above conversion is identical to sYCC when the input is given as sRGB, except that IEC 61966-2-1:1999/Amd1:2003 only gives four decimal digits.

JPEG also defines a "YCCK" format from Adobe for CMYK input. In this format, the "K" value is passed as-is, while CMY are used to derive YCbCr with the above matrix by assuming R=1-C, G=1-M, and B=1-Y. As a result, a similar set of subsampling techniques can be used.[12]

Coefficients for BT.470-6 System B, G primaries[edit]

These coefficients are not in use and were never in use.[13]

Chromaticity-derived luminance systems[edit]

H.273 also describes constant and non-constant luminance systems which are derived strictly from primaries and white point, so that situations like sRGB/BT.709 default primaries of JPEG that use BT.601 matrix (that is derived from BT.470-6 System M) do not happen.

Numerical approximations[edit]

Prior to the development of fast SIMD floating-point processors, most digital implementations of RGB → Y′UV used integer math, in particular fixed-point approximations. Approximation means that the precision of the used numbers (input data, output data and constant values) is limited, and thus a precision loss of typically about the last binary digit is accepted by whoever makes use of that option in typically a trade-off to improved computation speeds.

Y′ values are conventionally shifted and scaled to the range [16, 235] (referred to as studio swing or "TV levels") rather than using the full range of [0, 255] (referred to as full swing or "PC levels"). This practice was standardized in SMPTE-125M in order to accommodate signal overshoots ("ringing") due to filtering.[14] U and V values, which may be positive or negative, are summed with 128 to make them always positive, giving a studio range of 16–240 for U and V. (These ranges are important in video editing and production, since using the wrong range will result either in an image with "clipped" blacks and whites, or a low-contrast image.)

Approximate 8-bit matrices for BT.601[edit]

These matrices round all factors to the closest 1/256 unit. As a result, only one 16-bit intermediate value is formed for each component, and a simple right-shift with rounding (x + 128) >> 8 can take care of the division.[14]

For studio-swing:

For full-swing:

Google's Skia used to use the above 8-bit full-range matrix, resulting in a slight greening effect on JPEG images encoded by Android devices, more noticeable on repeated saving. The issue was fixed in 2016, when the more accurate version was used instead. Due to optimizations in libjpeg-turbo, the accurate version was actually faster.[15]

Packed pixel formats and conversion[edit]

RGB files are typically encoded in 8, 12, 16 or 24 bits per pixel. In these examples, we will assume 24 bits per pixel, which is written as RGB888. The standard byte format is:

r0, g0, b0, r1, g1, b1, ...

YCbCr Packed pixel formats are often referred to as "YUV". Such files can be encoded in 12, 16 or 24 bits per pixel. The common formats are YUV444, YUV411, YUV422 and YUV420p (or YUV420). The apostrophe after the Y is often omitted, as is the "p" after YUV420p. In terms of actual file formats, YUV420 is the most common, as the data is more reduced, and the file extension is usually ".YUV".

The relation between data rate and sampling (A:B:C) is defined by the ratio between Y to U and V channel.[16][17]

To convert from RGB to YUV or back, it is simplest to use RGB888 and YUV444. For YUV411, YUV422 and YUV420, the bytes need to be converted to YUV444 first.

YUV444    3 bytes per pixel     (12 bytes per 4 pixels)
YUV422    4 bytes per 2 pixels   (8 bytes per 4 pixels)
YUV411    6 bytes per 4 pixels
YUV420p   6 bytes per 4 pixels, reordered

Y′UV444 to RGB888 conversion[edit]

See § RGB conversion.

Y′UV422 to RGB888 conversion[edit]

Input: Read 4 bytes of Y′UV (u, y1, v, y2)
Output: Writes 6 bytes of RGB (R, G, B, R, G, B)
u  = yuv[0];
y1 = yuv[1];
v  = yuv[2];
y2 = yuv[3];

Using this information it could be parsed as regular Y′UV444 format to get 2 RGB pixels info:

rgb1 = Y′UV444toRGB888(y1, u, v);
rgb2 = Y′UV444toRGB888(y2, u, v);

Y′UV422 can also be expressed with the values in an alternative order, e.g. for the FourCC format code YUY2.

Input: Read 4 bytes of Y′UV (y1, u, y2, v), (y1, y2, u, v) or (u, v, y1, y2)

Y′UV411 to RGB888 conversion[edit]

Input: Read 6 bytes of Y′UV
Output: Writes 12 bytes of RGB
// Extract YUV components
u  = yuv[0];
y1 = yuv[1];
y2 = yuv[2];
v  = yuv[3];
y3 = yuv[4];
y4 = yuv[5];
rgb1 = Y′UV444toRGB888(y1, u, v);
rgb2 = Y′UV444toRGB888(y2, u, v);
rgb3 = Y′UV444toRGB888(y3, u, v);
rgb4 = Y′UV444toRGB888(y4, u, v);

So the result is we are getting 4 RGB pixels values (4*3 bytes) from 6 bytes. This means reducing the size of transferred data to half, with a loss of quality.

Y′UV420p (and Y′V12 or YV12) to RGB888 conversion[edit]

Y′UV420p is a planar format, meaning that the Y′, U, and V values are grouped together instead of interspersed. The reason for this is that by grouping the U and V values together, the image becomes much more compressible. When given an array of an image in the Y′UV420p format, all the Y′ values come first, followed by all the U values, followed finally by all the V values.

The Y′V12 format is essentially the same as Y′UV420p, but it has the U and V data switched: the Y′ values are followed by the V values, with the U values last. As long as care is taken to extract U and V values from the proper locations, both Y′UV420p and Y′V12 can be processed using the same algorithm.

As with most Y′UV formats, there are as many Y′ values as there are pixels. Where X equals the height multiplied by the width, the first X indices in the array are Y′ values that correspond to each individual pixel. However, there are only one fourth as many U and V values. The U and V values correspond to each 2 by 2 block of the image, meaning each U and V entry applies to four pixels. After the Y′ values, the next X/4 indices are the U values for each 2 by 2 block, and the next X/4 indices after that are the V values that also apply to each 2 by 2 block.

As shown in the above image, the Y′, U and V components in Y′UV420 are encoded separately in sequential blocks. A Y′ value is stored for every pixel, followed by a U value for each 2×2 square block of pixels, and finally a V value for each 2×2 block. Corresponding Y′, U and V values are shown using the same color in the diagram above. Read line-by-line as a byte stream from a device, the Y′ block would be found at position 0, the U block at position x×y (6×4 = 24 in this example) and the V block at position x×y + (x×y)/4 (here, 6×4 + (6×4)/4 = 30).

Y′UV420sp (NV21) to RGB conversion (Android)[edit]

This format (NV21) is the standard picture format on Android camera preview. YUV 4:2:0 planar image, with 8 bit Y samples, followed by interleaved V/U plane with 8bit 2x2 subsampled chroma samples.[18]

C++ code used on Android to convert pixels of YUVImage:[19]

void YUVImage::yuv2rgb(uint8_t yValue, uint8_t uValue, uint8_t vValue,
        uint8_t *r, uint8_t *g, uint8_t *b) const {
    int rTmp = yValue + (1.370705 * (vValue-128)); 
    // or fast integer computing with a small approximation
    // rTmp = yValue + (351*(vValue-128))>>8;
    int gTmp = yValue - (0.698001 * (vValue-128)) - (0.337633 * (uValue-128)); 
    // gTmp = yValue - (179*(vValue-128) + 86*(uValue-128))>>8;
    int bTmp = yValue + (1.732446 * (uValue-128));
    // bTmp = yValue + (443*(uValue-128))>>8;
    *r = clamp(rTmp, 0, 255);
    *g = clamp(gTmp, 0, 255);
    *b = clamp(bTmp, 0, 255);
}

References[edit]

  1. ^ "Full Non-Branded Delivery Specification v9.2". Netflix | Partner Help Center. Retrieved 2022-09-24.
  2. ^ e.g. the MPEG-2 specification, ITU-T H.262 2000 E pg. 44
  3. ^ "MFNominalRange (mfobjects.h) - Win32 apps". docs.microsoft.com. Retrieved 10 November 2020.
  4. ^ Charles Poynton, Digital Video and HDTV, Chapter 24, pp. 291–292, Morgan Kaufmann, 2003.
  5. ^ a b "BT.2020 : Parameter values for ultra-high definition television systems for production and international programme exchange". International Telecommunication Union. June 2014. Retrieved 2014-09-08.
  6. ^ "ITU-T H Suppl. 18". October 2017. hdl:11.1002/1000/13441.
  7. ^ "High dynamic range television for production and international programme exchange". www.itu.int. Retrieved 2021-01-16.
  8. ^ "What Is ICtCp – Introduction?" (PDF).
  9. ^ "H.273: Coding-independent code points for video signal type identification". www.itu.int. p. 10. Retrieved 2021-04-09.
  10. ^ JPEG File Interchange Format Version 1.02
  11. ^ T.871: Information technology – Digital compression and coding of continuous-tone still images: JPEG File Interchange Format (JFIF). ITU-T. September 11, 2012. Retrieved 2016-07-25.
  12. ^ See libjpeg-turbo documentation for: CS_YCCK 'YCCK (AKA "YCbCrK") is not an absolute colorspace but rather a mathematical transformation of CMYK designed solely for storage and transmission', cmyk_ycck_convert(); see
  13. ^ "EBU Tech 3237 Supplement 1" (PDF). p. 18. Retrieved 15 April 2021.
  14. ^ a b Jack, Keith (1993). Video Demystified. HighText Publications. p. 30. ISBN 1-878707-09-4.
  15. ^ "Use libjpeg-turbo for YUV->RGB conversion in jpeg encoder · google/skia@c7d01d3". GitHub.
  16. ^ msdn.microsoft.com, Recommended 8-Bit YUV Formats for Video Rendering
  17. ^ msdn.microsoft.com, YUV Video Subtypes
  18. ^ fourcc.com YUV pixel formas
  19. ^ "Media/Libstagefright/Yuv/YUVImage.CPP - platform/Frameworks/Av - Git at Google".

External links[edit]

Software resources for packed pixels: