EE798P: Audio Representation Learning (Fall 2023)

MTh 1400-1515 L5

Instructor

Vipul Arora

Course Objectives:

This course aims at introducing the students to learning representations for audio. Speech audio can be represented as text and metadata, containing speaker information, etc. Music audio can be represented as note sequences, pitch contours, etc. Environmental sounds can be represented with event labels with onset/offset times. These are directly interpretable representations. But there are also statistically learnt latent representations that may not be directly interpretable but are useful for downstream tasks such as retrieval, analysis or extracting directly interpretable representations. The course will have reading assignments (papers). The lectures will focus on mathematical principles, and there will be coding based assignments for implementation.

Pre-requisites:

Topics:

References:

This course will take excerpts from some standard books on machine learning and signal processing. But it will largely be based on articles and research papers in ML and audio conferences (e.g., NeurIPS, ICML, ICLR, Interspeech, ICASSP, etc.) and journals (e.g., IEEE signal processing magazine, etc.).

Books: