The data can be downloaded in one ZIP file (about 2.18GB). They are reorganized from the original data provided by Kjems.
The sentences are from Dantale II corpus [1], which includes 160 sentences (16 tracks and 10 sentences in each track). In the listening test, 150 sentences (track1--track12, track14--track16) are selected. Each sentence is 5-word long and follows the same structure: name-verb-numeral-adjective-noun. Four types of noises are considered, including SSN, cafeteria noise, car interior noise and noise from a bottling hall. Two kinds of binary masks are employed, ideal binary mask (IBM) and target binary mask (TBM). Since TBM is equivalent to IBM under SSN, there are 7 categories in total: IBM/bottles, IBM/car, IBM/cafe, IBM/ssn, TBM/bottles, TBM/car and TBM/cafe. The noises are mixed with clean speech at 3 levels of SNRs: 20% speech reception threshold, 50% speech reception threshold and -60dB. To compute the binary mask, 8 different RCs (RC=LC-SNR) including the unprocessed situation are used. As a consequence, there are 7*8*3=168 conditions in total.
-- Clean audio files, ITFS processed audio files and binary masks (IBM or TBM) are stored in "./ibm/bottles_20k", "./ibm/cafe_20k", "./ibm/ssn_20k", "./ibm/volvo_20k", "./tbm/bottles_20k", "./tbm/cafe_20k", and "./tbm/volvo_20k". Note: for one track, we choose target speech at RC=-100dB, SNR=50% SRT as the clean speech. Take track01 in "ibm/bottles" as an example, the clean speech is "target_track01_LCminusSNR-100.0_SNR-12.2.wav". More detailed information can be found in "./README.pdf".
-- "./intelligibility_data.mat" stores the human speech intelligibility scores (7*8*3), RC values (7*8) and SNR values (7*3).
-- In order to make it more clear, we apply the short-time objective intelligibility measure (STOI) [2] on the data. We also reproduce the Fig.7 in [2]. Matlab files are in folder "./experiment". We use the free Matlab implementation of STOI provided online.
-- "./README.pdf" provides a more detailed explanation of the data and also presents the STOI results of IBM/bottles.
[1] Wagener, K. C. (2003). Factors influencing sentence intelligibility in noise (Doctoral dissertation, Universität Oldenburg).
[2] Taal, C. H., Hendriks, R. C., Heusdens, R., and Jensen, J. (2011). An algorithm for intelligibility prediction of time–frequency weighted noisy speech. Audio, Speech, and Language Processing, IEEE Transactions on, 19(7), 2125-2136.