EE798P: Audio Representation Learning (Fall 2023)

MTh 1400-1515 L5

Instructor

Vipul Arora

Course Objectives:

This course aims at introducing the students to learning representations for audio. Speech audio can be represented as text and metadata, containing speaker information, etc. Music audio can be represented as note sequences, pitch contours, etc. Environmental sounds can be represented with event labels with onset/offset times. These are directly interpretable representations. But there are also statistically learnt latent representations that may not be directly interpretable but are useful for downstream tasks such as retrieval, analysis or extracting directly interpretable representations. The course will have reading assignments (papers). The lectures will focus on mathematical principles, and there will be coding based assignments for implementation.

Pre-requisites:

Basic course on machine learning
Digital signal processing (EE301A or equivalent)
Basics of Programming (ESc101 or equivalent)

Topics:

Basics of Digital Signal Processing
Basics of Audio processing
Audio classification
- Event-wise and segment-wise
Speech Representation Learning
- HMMs
- CTC training
Music Representation Learning
- Basics of music processing
Audio retrieval
- Information retrieval
- Unsupervised representations

References:

This course will take excerpts from some standard books on machine learning and signal processing. But it will largely be based on articles and research papers in ML and audio conferences (e.g., NeurIPS, ICML, ICLR, Interspeech, ICASSP, etc.) and journals (e.g., IEEE signal processing magazine, etc.).

Books:

DSP:
- Digital Signal Processing: A Computer-Based Approach - by Sanjit K. Mitra
- Discrete time signal processing (3rd ed.) – by Oppenheim and Schafer
- Lecture videos by Vipul Arora here
Machine Learning:
- “Pattern Recognition and Machine Learning”, C.M. Bishop, 2nd Edition, Springer, 2011. https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf