EE627A: Speech Signal Processing (Spring 2021)

Vipul Arora
Department of Electrical Engineering, IIT Kanpur

Course Videos: YouTube

Course Objectives:

This course will be taught jointly with Prof. Rajesh Hegde. I will be teaching later half focusing on ASR.

This part of the course aims at introducing the students to topics in automatic speech recognition (ASR). The course will deal with concepts involved in building a ASR system. Starting with the conventional methods, it will touch upon the latest deep learning based methods. The Kaldi and open-FST toolkits will be introduced. The lectures will focus on mathematical principles, and there will be coding based assignments for implementation.

Topics:

Conventional ASR systems
- Gaussian Mixture Models
- Hidden Markov Models
- Finite State Transducers
- Decision Trees
- Kaldi toolkit
Hybrid HMM-DNN ASR systems
- Deep Neural Networks
End-to-end ASR systems
- Connectionist Temporal Classification
Other topics of interest

Lecture Plan:

Wk of 2021	Week of Sem	Topics
11	Week-9	Hidden Markov Models
12	Week-10	Finite State Transducers (OpenFST) and Language Models
13	Week-11	GMM-HMM based ASR (HTK book, Kaldi)
14	Week-12	Decision Trees (HTK book, Kaldi)
15	Week-13	Kaldi toolkit
16	Week-14	Neural Networks
17	Week-15	DNN-HMM ASR
18	Week-16	End-to-end ASR

Grading Scheme

Project – 20%
Digit recognition using Kaldi ASR toolkit.
- Follow https://kaldi-asr.org/doc/kaldi_for_dummies.html
- Prepare your own dataset. Many of you can collaborate to build the dataset.
- The test set will be provided by the instructor.
- Submission:
  - 10% for project report (upto 3 pages + 1 extra for references) and presentation (5-8 min, present from report only no need of slides). Use ICASSP template.
  - 10% for test set evaluation.
  - Bonus (upto 5%) for real time ASR demo.
End-semester Exam – 30%
Written exam on CodeTantra platform.

Plagiarism Penalty:
As heavy as possible. Zero-tolerance policy.

References:

“Automatic Speech Recognition: A Deep Learning Approach”, D. Yu and L. Deng, Springer, 2016
The HTK book
https://kaldi-asr.org/doc/
“Pattern Recognition and Machine Learning”, C.M. Bishop, 2nd Edition, Springer, 2011. https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf
“Deep Learning”, I. Goodfellow, Y, Bengio, A. Courville, MIT Press, 2016. https://www.deeplearningbook.org/