语音顶会Interspeech 论文解读|Multi-Task Multi-Network Joint-Learning of Deep Residual Networks and Cycle-Consistency Generative Adversarial Networks for Robus-阿里云开发者社区

开发者社区> 阿里论文> 正文

语音顶会Interspeech 论文解读|Multi-Task Multi-Network Joint-Learning of Deep Residual Networks and Cycle-Consistency Generative Adversarial Networks for Robus

简介: Interspeech是世界上规模最大,最全面的顶级语音领域会议,本文为Shengkui Zhao, Chongjia Ni, Rong Tong, Bin Ma的入选论文

2019年,国际语音交流协会INTERSPEECH第20届年会将于9月15日至19日在奥地利格拉茨举行。Interspeech是世界上规模最大,最全面的顶级语音领域会议,近2000名一线业界和学界人士将会参与包括主题演讲,Tutorial,论文讲解和主会展览等活动,本次阿里论文有8篇入选,本文为Shengkui Zhao, Chongjia Ni, Rong Tong, Bin Ma的论文《Multi-Task Multi-Network Joint-Learning of Deep Residual Networks and Cycle-Consistency Generative Adversarial Networks for Robust Speech Recognition》

点击下载论文

文章解读

自动语音识别系统(ASR)在实际生活中有着广泛的应用场景,不过通常由于周遭环境的噪声和混响的影响,自动语音识别的结果出现错误和不稳定的情况。提高自动语音识别系统的鲁棒性是推广其应用的一个关键问题。为了解决这个问题,增加语音增强模块和模型适应训练已经研究了很长时间。最近,在统一建模框架中利用同时训练降噪和语音识别的多任务联合学习方案显示出令人鼓舞的进展,不过目前模型训练仍高度依赖于成对的干净和噪声数据。为了克服这一限制,研究者开始引进对抗性生成网络(GAN)和对抗性训练方法到声学模型的训练中,由于无需复杂的前端设计和配对训练数据,大大简化了模型训练过程和要求。尽管对抗性生成网络在计算机视觉领域发展迅速,但目前只引进了常规对抗性生成网络和进行了有限的模型训练实验,而且常规对抗性生成网络存在模式崩溃缺陷常常导致训练失败问题。
在这项工作中,我们采用更先进的循环一致性对抗性生成网络(CycleGAN)来解决由于常规对抗性生成网络模式崩溃导致的训练失败问题,另外,结合最近流行的深度残差网络(ResNets),我们进一步将多任务学习方案扩展为多任务多网络联合学习方案,以实现更强大的降噪功能和模型自适应训练功能。

7-1.png

基于CHiME-4的单声道自动语音识别的实验结果表明,与最先进的联合学习方法相比(B),我们提出的方法通过实现更低的字错误率(WER)显着提高了自动语音识别系统的噪声鲁棒性。

7-2.png

基于循环一致性对抗性生成网络,我们提出的多任务多网络联合学习方案较好的解决了模式崩溃问题。

7-3.png

文章摘要

**Robustness of automatic speech recognition (ASR) systems is a critical issue due to noise and reverberations. Speech enhancement and model adaptation have been studied for long time to address this issue. Recently, the developments of multitask joint-learning scheme that addresses noise reduction and ASR criteria in a unified modeling framework show promising improvements, but the model training highly relies on paired clean-noisy data. To overcome this limit, the generative adversarial networks (GANs) and the adversarial training method are deployed, which have greatly simplified the model training process without the requirements of complex front-end design and paired training data. Despite the fast developments of GANs for computer visions, only regular GANs have been adopted for robust ASR. In this work, we adopt a more advanced cycleconsistency GAN (CycleGAN) to address the training failure problem due to mode collapse of regular GANs. Using deep residual networks (ResNets), we further expand the multi-task scheme to a multi-task multi-network joint-learning scheme for more robust noise reduction and model adaptation. Experiment results on CHiME-4 show that our proposed approach significantly improves the noise robustness of the ASR system by achieving much lower word error rates (WERs) than the stateof-the-art joint-learning approaches.
Index Terms: Robust speech recognition, convolutional neural
networks, acoustic model, generative adversarial networks

阿里云开发者社区整理

版权声明:本文中所有内容均属于阿里云开发者社区所有,任何媒体、网站或个人未经阿里云开发者社区协议授权不得转载、链接、转贴或以其他方式复制发布/发表。申请授权请邮件developerteam@list.alibaba-inc.com,已获得阿里云开发者社区协议授权的媒体、网站,在转载使用时必须注明"稿件来源:阿里云开发者社区,原文作者姓名",违者本社区将依法追究责任。 如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件至:developer2020@service.aliyun.com 进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容。

分享:
+ 订阅

官方博客
官网链接