EE798P: Audio Representation Learning (Fall 2023)
MTh 1400-1515 L5
Instructor
Vipul Arora
Course Objectives:
This course aims at introducing the students to learning representations for audio. Speech audio can be represented as text and metadata, containing speaker information, etc. Music audio can be represented as note sequences, pitch contours, etc. Environmental sounds can be represented with event labels with onset/offset times. These are directly interpretable representations. But there are also statistically learnt latent representations that may not be directly interpretable but are useful for downstream tasks such as retrieval, analysis or extracting directly interpretable representations. The course will have reading assignments (papers). The lectures will focus on mathematical principles, and there will be coding based assignments for implementation.
Pre-requisites:
- Basic course on machine learning
- Digital signal processing (EE301A or equivalent)
- Basics of Programming (ESc101 or equivalent)
Topics:
- Basics of Digital Signal Processing
- Basics of Audio processing
- Audio classification
- Event-wise and segment-wise
- Speech Representation Learning
- HMMs
- CTC training
- Music Representation Learning
- Basics of music processing
- Audio retrieval
- Information retrieval
- Unsupervised representations
References:
This course will take excerpts from some standard books on machine learning and signal processing. But it will largely be based on articles and research papers in ML and audio conferences (e.g., NeurIPS, ICML, ICLR, Interspeech, ICASSP, etc.) and journals (e.g., IEEE signal processing magazine, etc.).
Books:
- DSP:
- Digital Signal Processing: A Computer-Based Approach - by Sanjit K. Mitra
- Discrete time signal processing (3rd ed.) – by Oppenheim and Schafer
- Lecture videos by Vipul Arora here
- Machine Learning:
- “Pattern Recognition and Machine Learning”, C.M. Bishop, 2nd Edition, Springer, 2011. https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf