Sound demos

Useful links


By Year
By Topic

Prof. D. Wang
J. Chang
J. Chen
M. Delfarah
Y. Liu
Z. Wang
D. Williamson
X.-L. Zhang





Perception and Neurodynamics Laboratory (PNL)

Affiliated with OSU Laboratory for AI Research (LAIR), Department of Computer Science and Engineering (CSE), and Center for Cognitive and Brain Sciences

This lab conducts research on developing effective algorithms for solving real-world problems related to machine perception as well as understanding neurocomputational mechanisms underlying perceptual processes. We view these two aspects of our goal as intimately related, as we believe that information-processing mechanisms of the brain as the product of millions of years of evolution represent the optimal or near-optimal computational algorithms, and conversely the computational algorithms that ultimately work for modeling perception will be closely related to what actually are used by the brain.

The general strategy adopted by this lab is to focus on challenging problems that arise from real-world perception, and then attack them with multidisciplinary approaches. The analysis includes computational, cognitive/perceptual, and neurobiological perspectives. While paying close attention to cognitive and neurobiological processes, the thrust of the work conducted in this lab is computational.

Recent work in the lab focuses on machine learning algorithms, particularly deep neural networks (DNNs), for auditory scene analysis. In order to achieve the ultimate goal of constructing a cocktail party processor that achieves the human ability in cocktail party environments, one must understand individual analyses, such as pitch, location, amplitude and frequency modulation, onset/offset, rhythm, and so on. One must also incorporate top-down information including attention and recognition. The lab conducts research on a variety of topics under the general theme of computational audition, including speech separation and robust automatic speech/speaker recognition. For example, this lab has originated the notion of the ideal binary mask (Wang, 2005), which formulates sound segregation as a classification problem. This formulation has enabled the use of supervised learning to address the source separation problem (known as supervised separation). This lab is the first to introduce DNN to the domain of speech separation or enhancement (Wang & Wang, 2013), and the resulting DNN based algorithm produced, for the first time, substantial speech intelligiblity improvements for hearing-impaired listeners in background noise (Healy et al., 2013; see Press Release, YouTube Demo, and Test Data.)

In terms of neurodynamics, we view the brain as a gigantic dynamical system, and we build dynamical systems for solving engineering problems and for understanding neurocomputational mechanisms. To illustrate this strategy, LEGION (Locally Excitatory Globally Inhibitory Oscillator Networks) invented by David Terman and DeLiang Wang (Terman & Wang, 1995, Wang & Terman, 1995) builds on neural oscillations in the brain and perceptual organization in human perception. The network shows remarkable computational power in synchronizing a locally coupled oscillator population and desynchronizing different populations. The LEGION network has been applied to image segmentation (see Wang & Terman, 1997) and speech segregation (see Wang & Brown, 1999), among other applications. The following figures show the input (left) to LEGION and the output (right) generated from LEGION (see Wang, 2005, for an extensive review on this effort).



(The above acoustic mixture is composed of phone ringing and male utterance "Why were you all weary")