Don't hesitate to comment below if you have any questions or additional phrases
18-493 Electroacoustics Coding of
Sound for Multimedia
Applications (making liberal use of material compiled by
Prof.
Tsuhan Chen)
Richard Stern rms@cs.cmu.edu
,
MPEG Audio
,Outline
Basics
Elements of psychoacoustics
Digitization of signals
Subband coding MPEG-1 audio
Layers I,
II, and III
Frame structure and packetization
MPEG-2 audio
Multichannel audio
Compatibility issues 18-796/
Spring 1999/
Chen
,Psychoacoustics
Threshold in quiet
26 critical bands 0~24 kHz
Frequency masking in the same critical band 18-796/Spring 1999/Chen
,Frequency Masking
SMR (Signal-to-Mask Ratio) Masking by bands of
1000, 250, and 10 Hz: 18-796/Spring 1999/Chen
,
Temporal Masking Pre-Masking: 1/10 of post-masking
Post-Masking: 50~200ms Backward and
Forward Masking (gaps of
100, 20, 0 ms): 18-796/Spring 1999/Chen
,The
Sampling Theorem Nyquist theorem: If the signal is sampled with a frequency that is at least twice the maximum frequency of the incoming speech, we can recover the original waveform by lowpass filtering. With lower sampling frequencies, aliasing will occur, which produces distortion from which the original signal cannot be recovered.
Recovered sound wave
Sound wave LOWPASS FILTER Sampling pulse train 18-796/Spring 1999/Chen
,Sampling of Continuous
Sounds Comment: Sampling introduces quantization
18-796/Spring 1999/Chen
,
Effects of Undersampling Undersampling at 10 kHz:
18-796/Spring 1999/Chen
,Effects of Quantization 16-bit representation
12-bit representation 8-bit representation 4-bit representation 18-796/Spring 1999/Chen
,
Digital Audio CD: 44.1 kHz × 16 bits × 2 channels = 1.
411 Mbits/s
18-796/Spring 1999/Chen
,Subband Coding Maximal downsampling
Q should be based on signal-to-masking ratio (SMR) Ear’s critical bands are not uniform, but logarithmic The filter bank should match the critical bands
Q Q Q
Analysis Filterbank Synthesis Filterbank 18-796/Spring 1999/Chen
,MPEG-1 Audio
ISO/
IEC 11172-3 (
1988~
1991)
First high quality audio compression standard
Sampling rates: 32, 44.1, 48 kHz
CD quality two-channel audio at ~256 kbits/s CD: 44.1 kHz × 16 bits ×
2 = 1.411 Mbits/s
Quality demonstration (
MPEG-1 Layer II)
Stereo 44.1 kHz at 64 kbits/s Stereo 44.1 kHz at 128 kbits/s Stereo 44.1 kHz at
192 kbits/s Stereo 44.1 kHz at 256 kbits/s 18-796/Spring 1999/Chen
,
Codec Block Diagram 18-796/Spring 1999/Chen
,Layers
Increasing complexity, delay, and quality
Layer I: ~384 kbits/s for perceptually lossless quality (
4:1)
Layer II: ~192 kbits/s for perceptually lossless quality (8:1) Layer
III: ~128 kbits/s for perceptually lossless quality (
12:1) (for two channels) 100% perceptual lossless 18-796/Spring 1999/Chen
,Layer I and II Encoder 32
Scaler & Quantizer Analysis Filterbank Mux 512-tap
Dynamic Bit Allocator Coder Masking Threshold
Generator FFT 512-pt for Layer I 1024-pt for Layer
II/III 18-796/Spring 1999/Chen
,Layer III Encoder 6 or 18 with overlap Scaler & Quantizer
Huffman Coding Analysis Filterbank
MDCT Mux Coding Masking Threshold Generator FFT
Freq Resolution = 24kHz / (3218) = 41.67Hz 18-796/Spring 1999/Chen
,Features in Layer III
Hybrid filterbank
MDCT with filterbank
Long/short window switching
Short for better temporal resolution (to prevent pre-echoes)
Long for better frequency resolution Nonuniform quantization
Entropy coding Run-length and
Huffman coding
Bit reservoir (buffer)
VBR
CBR 18-796/Spring 1999/Chen
,Stereo
Redundancy Coding Four modes: mono, stereo, dual with two separate channel, joint stereo
Joint stereo mode
Human stereo perception > 2kHz is based on envelope
Intensity stereo coding > 2kHz Encode (
L + R)
Assign independent left- and right- scalefactors Layer III supports (
L+R) and (
L–R) coding 18-796/Spring 1999/Chen
,MPEG-2 Audio ISO/IEC 13818-3
Allows lower sampling rates
16, 22.05, and 24 kHz: about half of MPEG-1
From wideband speech to mediumband audio
Higher frequency resolution Layer I, II, and III Multichann
- published: 02 Jun 2016
- views: 5