Best Student Paper at INTERSPEECH 2012

Congratulations are due to Preethi Jyothi for her Best Student Paper recognition earned at the 13th Annual Conference of the International Speech Communication Association (Interspeech 2012). INTERSPEECH is the world's largest and most comprehensive technical conference focused on speech and language processing and its applications.

The paper, titled "Discriminatively Learning Factorized Finite State Pronunciation Models From Dynamic Bayesian Networks" is joint work with Dr. Eric Fosler-Lussier (Ms. Jyothi's advisor) and Karen Livescu. The work involves spoken language, especially conversational speech, as it is characterized by a large amount of pronunciation variability and words often do not conform to dictionary pronunciations. This makes the task of recognizing conversational speech particularly challenging for automatic speech recognition (ASR) systems. Traditionally, ASR systems explain pronunciations by assuming speech is composed of a single sequence of smaller sub-word units called phones. This restrictive representation has been argued to be insufficient in explaining the different pronunciation variations that are observed in spontaneous speech. An alternative that has been explored in recent times is to model speech as multiple streams of linguistic features rather than a single stream of phones. Our work explores one such model that uses a machine learning framework called dynamic Bayesian networks (DBN) to relate the movements of a speaker's articulators (i.e lips, tongue, etc) to sounds produced in the form of loosely coupled streams. We present a general approach to transform such DBN models into a finite state representation that allows for more flexible models that can be further trained to improve the accuracy of the recognizer. Our experimental results on an isolated word task show that the proposed approach performs significantly better than the original DBN model.

Preethi research focus is on Automatic Speech Recognition (ASR). She continues collaborating with Karen Livescu and Eric Fosler-Lussier on articulatory feature-based models for ASR. This approach attempts to cope with the large amounts of pronunciation variability in conversational speech by modeling it as the result of variations (asynchrony and substitution) in the movements of the various articulators that are involved in speech production.