语音顶会Interspeech 论文解读|Fast Learning for Non-Parallel Many-to-Many Voice Conversion with Residual Star Generative Adversarial Networks

简介: Interspeech是世界上规模最大,最全面的顶级语音领域会议,本文为Shengkui Zhao, Trung Hieu Nguyen, Hao Wang, Bin Ma的入选论文

2019年,国际语音交流协会INTERSPEECH第20届年会将于9月15日至19日在奥地利格拉茨举行。Interspeech是世界上规模最大,最全面的顶级语音领域会议,近2000名一线业界和学界人士将会参与包括主题演讲,Tutorial,论文讲解和主会展览等活动,本次阿里论文有8篇入选,本文为Shengkui Zhao, Trung Hieu Nguyen, Hao Wang, Bin Ma的论文《Fast Learning for Non-Parallel Many-to-Many Voice Conversion with Residual Star Generative Adversarial Networks》

点击下载论文

文章解读

语音转换(Voice Conversion,VC)的主要目标是将源说话者的语音转换为目标说话者的语音,同时具有与原始样本相同的语言内容。语音转换系统有许多应用场景,例如原始语音增强,口语辅助和个性化的语音合成(TTS)系统。目前性能较好的语音转换系统,比如基于高斯混合模型(GMM)的方法和基于神经网络(NN)的方法,一般基于并行训练数据,其应用场景局限于并行数据的收集和同语言间的一对一转换。当收集并行数据困难时比如进行跨语言语音转换或者多对多的语音转换时, 并行训练数据的要求极大地限制了上述方法在实际场景中的可用性。

最近,基于对抗生成网络(GAN)的StarGAN被引入到语音转换的问题中,利用其多对多的域映射性能和无需并行数据的训练性能,仅使用语音特征和域信息作为输入,获得了较成功的多对多不同说话者之间的语音转换实验结果。本文在上述StarGAN-VC方法的基础上,通过添加残差训练机制,提出了一种快速学习训练框架,我们的方法称为Res-StarGAN- VC,其主要思想是基于转换过程中的源语音特征和目标语音特征之间的语言内容共享,通过添加输入到输出的快捷连接方式(shortcut connections)来实现残差映射。

2-1.png

实验显示这种快捷连接方式在不增加参数和计算复杂性的情况下,加速了网络的学习过程,有助于在对抗训练开始时生成高质量的假样本来提高训练质量。实验结果和主观评估显示,在单语言和跨语言的多对多的语音转换任务中,与StarGAN-VC方法相比,我们提出的方法提供了(1)对抗训练中更快的收敛性和(2)更清晰的发音和更好的说话人相似性。
2-2.png

2-3.png

文章摘要

This paper proposes a fast learning framework for non-parallel many-to-many voice conversion with residual Star Generative Adversarial Networks (StarGAN). In addition to the state-ofthe-art StarGAN-VC approach that learns an unreferenced mapping between a group of speakers’ acoustic features for nonparallel many-to-many voice conversion, our method, which we call Res-StarGAN-VC, presents an enhancement by incorporating a residual mapping. The idea is to leverage on the shared linguistic content between source and target features during conversion. The residual mapping is realized by using identity shortcut connections from the input to the output of the generator in Res-StarGAN-VC. Such shortcut connections accelerate the learning process of the network with no increase of parameters and computational complexity. They also help generate high-quality fake samples at the very beginning of the adversarial training. Experiments and subjective evaluations show that the proposed method offers (1) significantly faster convergence in adversarial training and (2) clearer pronunciations and better speaker similarity of converted speech, compared to the StarGAN-VC baseline on both mono-lingual and cross-lingual many-to-many voice conversion tasks.
Index Terms: Voice conversion (VC), non-parallel VC,many-to-many VC, generative adversarial networks (GANs),StarGAN-VC, Res-StarGAN-VC

阿里云开发者社区整理

相关文章
|
6月前
|
机器学习/深度学习 算法 数据处理
Stanford 机器学习练习 Part 3 Neural Networks: Representation
从神经网络开始,感觉自己慢慢跟不上课程的节奏了,一些代码好多参考了别人的代码,而且,让我现在单独写也不一定写的出来了。学习就是一件慢慢积累的过程,两年前我学算法的时候,好多算法都完全看不懂,但后来,看的多了,做的多了,有一天就茅塞顿开。所有的困难都是一时的,只要坚持下去,一切问题都会解决的。没忍住发了点鸡汤文。
19 0
|
9月前
|
机器学习/深度学习 自然语言处理 算法
【论文精读】COLING 2022-KiPT: Knowledge-injected Prompt Tuning for Event Detection
事件检测旨在通过识别和分类事件触发词(最具代表性的单词)来从文本中检测事件。现有的大部分工作严重依赖复杂的下游网络,需要足够的训练数据。
86 0
【论文精读】COLING 2022-KiPT: Knowledge-injected Prompt Tuning for Event Detection
《Multi-Task Multi-Network Joint-Learning of Deep Residual Networks and Cycle-Consistency Generative Adversarial Networks for Robust Speech Recognition》电子版地址
Multi-Task Multi-Network Joint-Learning of Deep Residual Networks and Cycle-Consistency Generative Adversarial Networks for Robust Speech Recognition
72 0
《Multi-Task Multi-Network Joint-Learning of Deep Residual Networks and Cycle-Consistency Generative Adversarial Networks for Robust Speech Recognition》电子版地址
|
机器学习/深度学习 数据挖掘 计算机视觉
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第四章(三)
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第四章
|
机器学习/深度学习 数据挖掘 计算机视觉
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第四章(一)
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第四章
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第四章(一)
|
机器学习/深度学习 数据挖掘 Java
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第四章(二)
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第四章
|
语音技术 机器学习/深度学习 计算机视觉
语音顶会Interspeech 论文解读|Multi-Task Multi-Network Joint-Learning of Deep Residual Networks and Cycle-Consistency Generative Adversarial Networks for Robus
Interspeech是世界上规模最大,最全面的顶级语音领域会议,本文为Shengkui Zhao, Chongjia Ni, Rong Tong, Bin Ma的入选论文
语音顶会Interspeech 论文解读|Multi-Task Multi-Network Joint-Learning of Deep Residual Networks and Cycle-Consistency Generative Adversarial Networks for Robus
|
语音技术 数据挖掘 智能硬件
语音顶会Interspeech 论文解读|Autoencoder-based Semi-Supervised Curriculum Learning For Out-of-domain Speaker Verification
Interspeech是世界上规模最大,最全面的顶级语音领域会议,本文为Siqi Zheng, Gang Liu, Hongbin Suo, Yun Lei的入选论文
语音顶会Interspeech 论文解读|Autoencoder-based Semi-Supervised Curriculum Learning For Out-of-domain Speaker Verification
|
数据挖掘 开发者 数据库
语音顶会Interspeech 论文解读|Investigation of Transformer based Spelling Correction Model for CTC-based End-to-End Mandarin Speech Recognition
Interspeech是世界上规模最大,最全面的顶级语音领域会议,本文为Shiliang Zhang, Ming Lei, Zhijie Yan的入选论文
语音顶会Interspeech 论文解读|Investigation of Transformer based Spelling Correction Model for CTC-based End-to-End Mandarin Speech Recognition
|
机器学习/深度学习 算法 文件存储
论文笔记系列-Neural Architecture Search With Reinforcement Learning
摘要 神经网络在多个领域都取得了不错的成绩,但是神经网络的合理设计却是比较困难的。在本篇论文中,作者使用 递归网络去省城神经网络的模型描述,并且使用 增强学习训练RNN,以使得生成得到的模型在验证集上取得最大的准确率。
3256 0