Sound Demo for Voiced Speech Segregation

 

First column: mixtures from a corpus of 10 voiced utterances mixed with 10 intrusions collected by Martin Cooke. v3 and v8 are the utterances of the same sentence, "Why were you all weary", from different male speakers. 10 intrusions are: n0 - 1 kHz pure tone, n1 - white noise, n2 - noise bursts, n3 - “cocktail party” noise, n4 - rock music, n5 - siren, n6 - trill telephone, n7 - female speech, n8 - male speech, and n9 - female speech.

Second column: target speech segregated from the 10 mixtures in the first column using the Wang-Brown 1999 model. For more details, see D. L. Wang and G. J. Brown (1999): Separation of speech from interfering sounds based on oscillatory correlation , IEEE Trans. Neural Networks, Vol. 10, pp. 684-697.

Third column: target speech segregated from the 10 mixtures in the first column using the Hu-Wang 2004 model. For more details, see: G. Hu and D. L. Wang (2004): Monaural speech segregation based on pitch tracking and amplitude modulationIEEE Trans. Neural Networks, vol. 15, pp. 1135-1150.

 

Mixture

Wang-Brown Model

Hu-Wang Model

v3n0

v3n0

v3n0

v3n1

v3n1

v3n1

v3n2

v3n2

v3n2

v3n3

v3n3

v3n3

v3n4

v3n4

v3n4

v3n5

v3n5

v3n5

v3n6

v3n6

v3n6

v3n7

v3n7

v3n7

v3n8

v3n8

v3n8

v3n9

v3n9

v3n9

v8n6

v8n6

v8n6