Intelligibility data

Speech Intelligibility Data from Kjems et al.'s Experiments

This folder contains intelligibility data obtained in Kjems et al.'s listening experiments. The objective of the listening tests was to measure speech intelligibility of normal-hearing listeners by applying ideal time-frequency segregation (ITFS) to noisy mixtures. The results and related description are given in the following paper:

Role of mask pattern in intelligibility of ideal binary-masked noisy speech

Journal of the Acoustical Society of America

README

The data can be downloaded in one ZIP file (about 2.18GB). They are reorganized from the original data provided by Kjems.

The sentences are from Dantale II corpus [1], which includes 160 sentences (16 tracks and 10 sentences in each track). In the listening test, 150 sentences (track1--track12, track14--track16) are selected. Each sentence is 5-word long and follows the same structure: name-verb-numeral-adjective-noun. Four types of noises are considered, including SSN, cafeteria noise, car interior noise and noise from a bottling hall. Two kinds of binary masks are employed, ideal binary mask (IBM) and target binary mask (TBM). Since TBM is equivalent to IBM under SSN, there are 7 categories in total: IBM/bottles, IBM/car, IBM/cafe, IBM/ssn, TBM/bottles, TBM/car and TBM/cafe. The noises are mixed with clean speech at 3 levels of SNRs: 20% speech reception threshold, 50% speech reception threshold and -60dB. To compute the binary mask, 8 different RCs (RC=LC-SNR) including the unprocessed situation are used. As a consequence, there are 7*8*3=168 conditions in total.

Files in the ZIP file

-- Clean audio files, ITFS processed audio files and binary masks (IBM or TBM) are stored in "./ibm/bottles_20k", "./ibm/cafe_20k", "./ibm/ssn_20k", "./ibm/volvo_20k", "./tbm/bottles_20k", "./tbm/cafe_20k", and "./tbm/volvo_20k". Note: for one track, we choose target speech at RC=-100dB, SNR=50% SRT as the clean speech. Take track01 in "ibm/bottles" as an example, the clean speech is "target_track01_LCminusSNR-100.0_SNR-12.2.wav". More detailed information can be found in "./README.pdf".

-- "./intelligibility_data.mat" stores the human speech intelligibility scores (7*8*3), RC values (7*8) and SNR values (7*3).

-- In order to make it more clear, we apply the short-time objective intelligibility measure (STOI) [2] on the data. We also reproduce the Fig.7 in [2]. Matlab files are in folder "./experiment". We use the free Matlab implementation of STOI provided online.

-- "./README.pdf" provides a more detailed explanation of the data and also presents the STOI results of IBM/bottles.

[1] Wagener, K. C. (2003). Factors influencing sentence intelligibility in noise (Doctoral dissertation, Universität Oldenburg).

[2] Taal, C. H., Hendriks, R. C., Heusdens, R., and Jensen, J. (2011). An algorithm for intelligibility prediction of time–frequency weighted noisy speech. Audio, Speech, and Language Processing, IEEE Transactions on, 19(7), 2125-2136.