Logo

Investigating Training Objectives for Generative Speech Enhancement

Link to the research paper

Bibtex citation

Investigating Training Objectives for Generative Speech Enhancement

Julius Richter, Danilo de Oliveira, Timo Gerkmann

Generative speech enhancement has recently shown promising advancements in improving speech quality in noisy environments. Multiple diffusion-based frameworks exist, each employing distinct training objectives and learning techniques. This paper aims at explaining the differences between these frameworks by focusing our investigation on score-based generative models and Schrödinger bridge. We conduct a series of comprehensive experiments to compare their performance and highlight differing training behaviors. Furthermore, we propose a novel perceptual loss function tailored for the Schrödinger bridge framework, demonstrating enhanced performance and improved perceptual quality of the enhanced speech signals. All experimental code and pre-trained models are publicly available to facilitate further research and development in this domain.

Results

We evaluate the performance of the proposed models on the VoiceBank-Demand (VB-DMD) dataset. The results are summarized in the table below. Models M1-M4 are based on score-based generative models for speech enhancement (SGMSE) , while M5-M8 are based on the Schrödinger bridge framework .

Table: Speech Enhancement Performance on VB-DMD. Values indicate the mean of the metrics.
Model POLQA PESQ SI-SDR ESTOI DNSMOS
Noisy 3.11 1.97 8.4 0.79 3.09
Conv-TasNet 3.56 2.63 19.1 0.85 3.37
MetricGAN+ 3.72 3.13 8.5 0.83 3.37
SE-MAMBA 4.33 3.56 19.7 0.89 3.58
SGMSE+ 3.95 2.93 17.3 0.87 3.56
M1 3.93 2.84 17.7 0.86 3.54
M2 3.96 2.90 18.0 0.86 3.55
M3 3.86 2.77 17.8 0.86 3.51
M4 (EDM2) 3.87 2.87 18.0 0.86 3.54
M5 4.15 2.91 19.4 0.88 3.59
M6 4.15 3.70 8.3 0.86 3.44
M7 4.25 3.50 14.1 0.87 3.55
M8 4.20 3.44 15.3 0.87 3.58

Audio Examples

Select an audio file:   

Noisy:

Clean:

Conv-TasNet :

MetricGAN+ :

SE-MAMBA :

SGMSE+ :

M1:

M2:

M3:

M4 (EDM2):

M5:

M6:

M7:

M8:


Citation

@article{richter2024investigating,
    title={Investigating Training Objectives for Generative Speech Enhancement},
    author={Julius Richter and Danilo de Oliveira and Timo Gerkmann},
    journal={arXiv preprint arXiv:2409.10753},
    year={2024}
}

References