- Order:
- Duration: 17:51
- Published: 29 Jul 2011
- Uploaded: 18 Aug 2011
- Author: natgeoindia
Name | Moving Picture Experts Group Phase 1 (MPEG-1) |
---|---|
Extension | .mpg, .mpeg, .mp1, .mp2, .mp3, .m1v, .m1a, .m2a, .mpa, .mpv |
Mime | audio/mpeg, video/mpeg |
Owner | ISO, IEC |
Genre | audio, video, container |
Created | 1988-1992 |
Extended from | JPEG, H.261 |
Extended to | MPEG-2 |
Standard | ISO/IEC 11172 |
MPEG-1 is a standard for lossy compression of video and audio. It is designed to compress VHS-quality raw digital video and CD audio down to 1.5 Mbit/s (26:1 and 6:1 compression ratios respectively)
Today, MPEG-1 has become the most widely compatible lossy audio/video format in the world, and is used in a large number of products and technologies. Perhaps the best-known part of the MPEG-1 standard is the MP3 audio format it introduced.
The MPEG-1 standard is published as ISO/IEC 11172 - Information technology—Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s. The standard consists of the following five Parts: # Systems (storage and synchronization of video, audio, and other data together) # Video (compressed video content) # Audio (compressed audio content) # Conformance testing (testing the correctness of implementations of the standard) # Reference software (example software showing how to encode and decode according to the standard)
__TOC__
Development of the MPEG-1 standard began in May 1988. 14 video and 14 audio codec proposals were submitted by individual companies and institutions for evaluation. The codecs were extensively tested for computational complexity and subjective (human perceived) quality, at data rates of 1.5 Mbit/s. This specific bitrate was chosen for transmission over T-1/E-1 lines and as the approximate data rate of audio CDs. The codecs that excelled in this testing were utilized as the basis for the standard and refined further, with additional features and other improvements being incorporated in the process.
After 20 meetings of the full group in various cities around the world, and 4½ years of development and testing, the final standard (for parts 1-3) was approved in early November 1992 and published a few months later. The reported completion date of the MPEG-1 standard, varies greatly: a largely complete draft standard was produced in September 1990, and from that point on, only minor changes were introduced. The standard was finished with the 6 November 1992 meeting. The Berkeley Plateau Multimedia Research Group developed a MPEG-1 decoder in November 1992. In July 1990, before the first draft of the MPEG-1 standard had even been written, work began on a second standard, MPEG-2, intended to extend MPEG-1 technology to provide full broadcast-quality video (as per CCIR 601) at high bitrates (3 - 15 Mbit/s), and support for interlaced video. Due in part to the similarity between the two codecs, the MPEG-2 standard includes full backwards compatibility with MPEG-1 video, so any MPEG-2 decoder can play MPEG-1 videos.
Notably, the MPEG-1 standard very strictly defines the bitstream, and decoder function, but does not define how MPEG-1 encoding is to be performed (although a reference implementation is provided in ISO/IEC-11172-5).
{| class="wikitable sortable" width="100%" |+MPEG-1 Parts | Systems | |- | Part 2 | ISO/IEC 11172-2 | 1993 | 2006 | Video | |- | Part 3 | ISO/IEC 11172-3 | 1993 | 1996 | Audio | |- | Part 4 | ISO/IEC 11172-4 | 1995 | 2007 | Compliance testing | |- | Part 5 | ISO/IEC TR 11172-5 | 1998 | 2007 | Software simulation | |}
MPEG-1 Systems specifies the logical layout and methods used to store the encoded audio, video, and other data into a standard bitstream, and to maintain synchronization between the different contents. This file format is specifically designed for storage on media, and transmission over data channels, that are considered relatively reliable. Only limited error protection is defined by the standard, and small errors in the bitstream may cause noticeable defects.
This structure was later named an MPEG program stream: "The MPEG-1 Systems design is essentially identical to the MPEG-2 Program Stream structure." This terminology is more popular, precise (differentiates it from an MPEG transport stream) and will be used here.
Packetized Elementary Streams (PES) are elementary streams packetized into packets of variable lengths, i.e., divided ES into independent chunks where cyclic redundancy check (CRC) checksum was added to each packet for error detection.
System Clock Reference (SCR) is a timing value stored in a 33-bit header of each PES, at a frequency/precision of 90 kHz, with an extra 9-bit extension that stores additional timing data with a precision of 27 MHz. These are inserted by the encoder, derived from the system time clock (STC). Simultaneously encoded audio and video streams will not have identical SCR values, however, due to buffering, encoding, jitter, and other delay.
Presentation time stamps (PTS) exist in PS to correct the inevitable disparity between audio and video SCR values (time-base correction). 90 kHz PTS values in the PS header tell the decoder which video SCR values match which audio SCR values. Either video or audio will be delayed by the decoder until the corresponding segment of the other arrives and can be decoded.
PTS handling can be problematic. Decoders must accept multiple program streams that have been concatenated (joined sequentially). This causes PTS values in the middle of the video to reset to zero, which then begin incrementing again. Such PTS wraparound disparities can cause timing issues that must be specially handled by the decoder.
Decoding Time Stamps (DTS), additionally, are required because of B-frames. With B-frames in the video stream, adjacent frames have to be encoded and decoded out-of-order (re-ordered frames). DTS is quite similar to PTS, but instead of just handling sequential frames, it contains the proper time-stamps to tell the decoder when to decode and display the next B-frame (types of frames explained below), ahead of its anchor (P- or I-) frame. Without B-frames in the video, PTS and DTS values are identical.
Determining how much data from each stream should be in each interleaved segment (the size of the interleave) is complicated, yet an important requirement. Improper interleaving will result in buffer underflows or overflows, as the receiver gets more of one stream than it can store (eg. audio), before it gets enough data to decode the other simultaneous stream (eg. video). The MPEG Video Buffering Verifier (VBV) assists in determining if a multiplexed PS can be decoded by a device with a specified data throughput rate and buffer size. This offers feedback to the muxer and the encoder, so that they can change the mux size or adjust bitrates as needed for compliance.
MPEG-1 Video exploits perceptual compression methods to significantly reduce the data rate required by a video stream. It reduces or completely discards information in certain frequencies and areas of the picture that the human eye has limited ability to fully perceive. It also exploits temporal (over time) and spatial (across a picture) redundancy common in video to achieve better data compression than would be possible otherwise. (See: Video compression)
Before encoding video to MPEG-1, the color-space is transformed to Y'CbCr (Y'=Luma, Cb=Chroma Blue, Cr=Chroma Red). Luma (brightness, resolution) is stored separately from chroma (color, hue, phase) and even further separated into red and blue components. The chroma is also subsampled to , meaning it is reduced by one half vertically and one half horizontally, to just one quarter the resolution of the video.
The length between I-frames is known as the group of pictures (GOP) size. MPEG-1 most commonly uses a GOP size of 15-18. i.e. 1 I-frame for every 14-17 non-I-frames (some combination of P- and B- frames). With more intelligent encoders, GOP size is dynamically chosen, up to some pre-selected maximum limit.
Partial macroblocks, and black borders/bars encoded into the video that do not fall exactly on a macroblock boundary, cause havoc with motion prediction. The block padding/border information prevents the macroblock from closely matching with any other area of the video, and so, significantly larger prediction error information must be encoded for every one of the several dozen partial macroblocks along the screen border. DCT encoding and quantization (see below) also isn't nearly as effective when there is large/sharp picture contrast in a block.
An even more serious problem exists with macroblocks that contain significant, random, edge noise, where the picture transitions to (typically) black. All the above problems also apply to edge noise. In addition, the added randomness is simply impossible to compress significantly. All of these effects will lower the quality (or increase the bitrate) of the video substantially.
The FDCT process converts the 8x8 block of uncompressed pixel values (brightness or color difference values) into an 8x8 indexed array of frequency coefficient values. One of these is the (statistically high in variance) DC coefficient, which represents the average value of the entire 8x8 block. The other 63 coefficients are the statistically smaller AC coefficients, which are positive or negative values each representing sinusoidal deviations from the flat block value represented by the DC coefficient.
An example of an encoded 8x8 FDCT block: :
Since the DC coefficient value is statistically correlated from one block to the next, it is compressed using DPCM encoding. Only the (smaller) amount of difference between each DC value and the value of the DC coefficient in the block to its left needs to be represented in the final bitstream.
Additionally, the frequency conversion performed by applying the DCT provides a statistical decorrelation function to efficiently concentrate the signal into fewer high-amplitude values prior to applying quantization (see below).
The frame-level quantizer is a number from 0 to 31 (although encoders will usually omit/disable some of the extreme values) which determines how much information will be removed from a given frame. The frame-level quantizer is either dynamically selected by the encoder to maintain a certain user-specified bitrate, or (much less commonly) directly specified by the user.
Contrary to popular belief, a fixed frame-level quantizer (set by the user) does not deliver a constant level of quality. Instead, it is an arbitrary metric that will provide a somewhat varying level of quality, depending on the contents of each frame. Given two files of identical sizes, the one encoded at an average bitrate should look better than the one encoded with a fixed quantizer (variable bitrate). Constant quantizer encoding can be used, however, to accurately determine the minimum and maximum bitrates possible for encoding a given video.
A quantization matrix is a string of 64-numbers (0-255) which tells the encoder how relatively important or unimportant each piece of visual information is. Each number in the matrix corresponds to a certain frequency component of the video image.
An example quantization matrix: :
Quantization is performed by taking each of the 64 frequency values of the DCT block, dividing them by the frame-level quantizer, then dividing them by their corresponding values in the quantization matrix. Finally, the result is rounded down. This significantly reduces, or completely eliminates, the information in some frequency components of the picture. Typically, high frequency information is less visually important, and so high frequencies are much more strongly quantized (drastically reduced). MPEG-1 actually uses two separate quantization matrices, one for intra-blocks (I-blocks) and one for inter-block (P- and B- blocks) so quantization of different block types can be done independently, and so, more effectively.
MPEG-1 Audio utilizes psychoacoustics to significantly reduce the data rate required by an audio stream. It reduces or completely discards certain parts of the audio that the human ear can't hear, either because they are in frequencies where the ear has limited sensitivity, or are masked by other (typically louder) sounds. as part of the EUREKA 147 pan-European inter-governmental research and development initiative for the development of digital audio broadcasting.
Most key features of MPEG-1 Audio were directly inherited from MUSICAM, including the filter bank, time-domain processing, audio frame sizes, etc. However, improvements were made, and the actual MUSICAM algorithm was not used in the final MPEG-1 Layer II audio standard. The widespread usage of the term MUSICAM to refer to Layer II is entirely incorrect and discouraged for both technical and legal reasons.
Layer II can also optionally use intensity stereo coding, a form of joint stereo. This means that the frequencies above 6 kHz of both channels are combined/down-mixed into one single (mono) channel, but the "side channel" information on the relative intensity (volume, amplitude) of each channel is preserved and encoded into the bitstream separately. On playback, the single channel is played through left and right speakers, with the intensity information applied to each channel to give the illusion of stereo sound. That (approximately) 1:6 compression ratio for CD audio is particularly impressive because it is quite close to the estimated upper limit of perceptual entropy, at just over 1:8. Achieving much higher compression is simply not possible without discarding some perceptible information.
MP2 remains a favoured lossy audio coding standard due to its particularly high audio coding performances on important audio material such as castanet, symphonic orchestra, male and female voices and particularly complex and high energy transients (impulses) like percussive sounds: triangle, glockenspiel and audience applause. This is one reason that MP2 audio continues to be used extensively. The MPEG-2 AAC Stereo verification tests reached a vastly different conclusion, however, showing AAC to provide superior performance to MP2 at half the bitrate. The reason for this disparity with both earlier and later tests is not clear, but strangely, a sample of applause is notably absent from this test.
Layer II audio files typically use the extension .mp2 or sometimes .m2a
MP3 works on 1152 samples like Layer II, but needs to take multiple frames for analysis before frequency-domain (MDCT) processing and quantization can be effective. It outputs a variable number of samples, using a bit buffer to enable this variable bitrate (VBR) encoding while maintaining 1152 sample size output frames. This causes a significantly longer delay before output, which has caused MP3 to be considered unsuitable for studio applications where editing or other processing needs to take place. MP3 uses pre-echo detection routines, and VBR encoding, which allows it to temporarily increase the bitrate during difficult passages, in an attempt to reduce this effect. It is also able to switch between the normal 36 sample quantization window, and instead using 3× short 12 sample windows instead, to reduce the temporal (time) length of quantization artifacts.
Unlike Layers I/II, MP3 uses variable-length Huffman coding (after perceptual) to further reduce the bitrate, without any further quality loss. MPEG-2 Audio is defined in ISO/IEC 13818-3
MPEG Multichannel - Backward compatible 5.1-channel surround sound.
Conformance: Procedures for testing conformance.
Provides two sets of guidelines and reference bitstreams for testing the conformance of MPEG-1 audio and video decoders, as well as the bitstreams produced by an encoder.
Simulation: Reference software.
C reference code for encoding and decoding of audio and video, as well as multiplexing and demultiplexing.
This includes the ISO Dist10 audio encoder code, which LAME and TooLAME were originally based upon.
.mp3 is the most common extension for files containing MPEG-1 Layer 3 audio. An MP3 file is typically an uncontained stream of raw audio; the conventional way to tag MP3 files is by writing data to "garbage" segments of each frame, which preserve the media information but are discarded by the player. This is similar in many respects to how raw .AAC files are tagged (but this is less supported nowadays, e.g. iTunes).
Note that although it would apply, .mpg does not normally append raw AAC or AAC in MPEG-2 Part 7 Containers. The .aac extension normally denotes these audio files.
;Implementations
This text is licensed under the Creative Commons CC-BY-SA License. This text was originally published on Wikipedia and was developed by the Wikipedia community.