Subjective Evaluation for the WASPAA paper "Closing the Gap Between Time-domain Multi-channel Speech Enhancement on Real and Simulation Conditions"

The subjective evaluation is conducted based on webMUSHRA:

webmushra

We randomly selected testing samples from the et05_real subset in CHiME4, and asked the participants to evaluate the speech enahncement quality (speech intelligibility and denoising performance) of audios generated by different models, including ① original noisy speech (CH5), enhanced audios by ② BLSTM MVDR, ③ FasNet, ④ MC-Conv-TasNet, Beam-TasNet (both ⑤ sig-MVDR and ⑥ mask-MVDR), and ⑦ jointly trained MC-Conv-TasNet. The order of these audios are randomly shuffled for each testing sample, and a close-talk audio from CH0 is given as the reference.

Model MOS S-MOS N-MOS
① Noisy Input (CH5) 59.01 89.89 47.07
② BLSTM MVDR in [24] 77.49 91.62 66.29
③ FaSNet [31] 67.57 71.34 77.90
-------------------------------------- ----------------- ----------------- ---------------------
④ MC-Conv-TasNet 51.82 57.42 63.85
⑤ ‍‌‌‌‍ ‍‌‌‌‍ ‍‌‌‌‍ → Beam-TasNet (sig-MVDR) 69.45 79.71 64.75
⑥ ‍‌‌‌‍ ‍‌‌‌‍ ‍‌‌‌‍ → Beam-TasNet (mask-MVDR, 1-D) 77.32 92.21 70.07
⑦ Jointly trained MC-Conv-TasNet + ASR 74.61 74.13 85.17

The subjective evaluation is conducted in terms of the following criteria:

MOS: Determination of subjective global MOS. Select the category which best describes the heard sample for purpose of everyday speech communication. The OVERALL SPEECH SAMPLE was 100-Excellent / 80-Good / 60-Fair / 40-Poor / 20-Bad.

S-MOS: Determination of subjective speech MOS (S-MOS). Attending ONLY to the SPEECH SIGNAL, select the category which best describes the heard sample. The SPEECH SIGNAL in this sample was 100-Not Distorted / 80-Slightly Distorted / 60-Somewhat Distorted / 40-Fairly Distorted / 20-Very Distorted.

N-MOS: Determination of subjective noise MOS (N-MOS). Attending ONLY to the BACKGROUND, select the category which best describes the heard sample. The BACKGROUND in this sample was 100-Not Noticeable / 80-Slightly Noticeable / 60- Noticeable But Not Intrusive / 40-Somewhat Intrusive / 20-Very Intrusive.