Don't hesitate to comment below if you have any questions or additional phrases
Automatic Music Genre Classification of
Audio SignalsGeorge Tzanetakis,
Georg Essl &
Perry Cook Presented by:
Dave Kauchak
Department of Computer Science University of California, San Diego dkauchak@cs.ucsd.edu
,
Image Classification ? ? ?
,Audio Classification ? ? ?
Rock Classical Country
,
Hierarchy of
Sound Sound Music Speech Other? ?
Jazz Country SportsAnnouncer
Male Rock Classical
Female Disco Hip Hop Choir Orchestra StringQuartet
Piano
,Raw audio Digitally encode
Extract features
Build class models Preprocessing Classification
Procedure Decide class Raw audio Digitally encode Extract features
Input processing
,Digitally Encoding Raw Sound is simply a longitudinal compression wave traveling through some medium (often, air).
Must be digitized to be processed
WAV
MIDI MP3 Others… ,WAV
Simple encoding
Sample sound at some interval (e.g
. 44 KHz).
High sound quality
Large file sizes ,MIDI
Musical Instrument Digital Interface
MIDI is a language Sentences describe the channel, note, loudness, etc. 16 channels (each can be though of and recorded as a separate instrument)
Common for audio retrieval an classification applications ,MIDI
Example Music Melodies Tempo Instrument Sequence of
Notes Channel Pitch amplitude
Duration
,MP3 Common compression format
3-4 MB vs.
30-40 MB for uncompressed
Perceptual noise shaping The human ear cannot hear certain sounds
Some sounds are heard better than others The louder of two sounds will be heard ,MP3 Example ,Extract Features Mel-scaled cepstral coefficients (MFCCs)
Musical surface features
Rhythm Features Others… ,
Tools for
Feature Extraction Fourier Transform (FT)
Short Term Fourier Transform (
STFT) Wavelets ,Fourier Transform (FT)
Time-domain Frequency-domain
,Another FT Example
Time Frequency
,
Problem?
,Problem with FT FT contains only frequency information
No Time information is retained
Works fine for stationary signals
Non-stationary or changing signals cause problems FT shows frequencies occurring at all times instead of specific times
,
Solution: STFT How can we still use FT, but handle non-stationary signals?
How can we include time?
Idea:
Break up the signal into discrete windows Each signal within a window is a stationary signal Take FT over each part ,STFT Example
Window functions
,
Better STFT Example ,Problem:
Resolution We can vary time and frequency accuracy
Narrow window: good time resolution, poor frequency resolution
Wide window: good time resolution, poor frequency resolution So, what’s the problem? ,Varying the resolution ,Where’s the problem? How do you pick an appropriate window?
Too small = poor frequency resolution Too large may result in violation of stationary condition
Different resolutions at different frequencies? ,Solution:
Wavelet Transform Idea: Take a wavelet and vary scale
Check response of varying scales on signal ,Wavelet Example:
Scale 1 ,Wavelet Example: Scale 2 ,Wavelet Example: Scale 3 ,Wavelet Example Scale = 1/frequency
Translation Time
,
Discrete Wavelet Transform (
DWT) Wavelet comes in pairs (high pass and low pass filter)
Split signal with filter and downsample ,DWT cont.
Continue this process on the high frequency portion of the signal
,DWT Example ,How did this solve the resolution problem?
Higher frequency resolution at high frequencies
Higher time frequency at low frequencies ,
Don’t Forget… Why did we do we need these tools (FT, STFT & DWT)?
Features extraction: Mel-frequency cepstral coefficients (MFCCs)
Musical surface features Rhythm Features ,
MFCC Common for speech
Pre-Emphasis
Filter out high frequencies to imitate ear
Window then
FFT Mel-scaling Run frequency signal through bandpass filters
Filters are designed to mimic “critical bandwidths” in human hearing Cepstral coefficients Normalized
Cosine transform
,Musical surface features Represents characteristics of music
Texture
Timbre Instrumentation
Statistics over spectral distribution Centroid
Rolloff Fl
- published: 19 May 2016
- views: 0