[PaddleSpeech 原神] 音色克隆之胡桃-阿里云开发者社区

[PaddleSpeech 原神] 音色克隆之胡桃

2023-02-14 934

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： [PaddleSpeech 原神] 音色克隆之胡桃

1. 配置 PaddleSpeech 开发环境

安装 PaddleSpeech 并在 PaddleSpeech/examples/other/tts_finetune/tts3 路径下配置 tools，下载预训练模型

# 配置 PaddleSpeech 开发环境
!git clone https://gitee.com/paddlepaddle/PaddleSpeech.git
%cd PaddleSpeech
!pip install . -i https://mirror.baidu.com/pypi/simple
# 下载 NLTK
%cd /home/aistudio
!wget -P data https://paddlespeech.bj.bcebos.com/Parakeet/tools/nltk_data.tar.gz
!tar zxvf data/nltk_data.tar.gz

# 删除软链接
# aistudio会报错： paddlespeech 的 repo中存在失效软链接
# 执行下面这行命令!!
!find -L /home/aistudio -type l -delete

# 配置 MFA & 下载预训练模型
%cd /home/aistudio
!bash env.sh

2 数据集配置

本项目数据集提供了完整的wav、labelx以及MFA对齐标注文件

如果要自行对齐，请去PaddleSpeech查阅完整资料

Finetune your own AM based on FastSpeech2 with multi-speakers dataset.

解压文件中的

音频

work/dataset/胡桃/wav/xx.wav

和标签

work/dataset/胡桃/wav/labels.txt

对齐的textgrid

work/dataset/胡桃/textgrid/newdir/xx.TextGrid

本项目采用胡桃的声音完成

2.1 解压数据集

!unzip /home/aistudio/data/data171682/yuanshen_zip.zip -d work/
!unzip /home/aistudio/work/yuanshen_zip/胡桃.zip -d work/dataset/

2.2 编写执行cmd函数代码

import subprocess
# 命令行执行函数，可以进入指定路径下执行
def run_cmd(cmd, cwd_path):
    p = subprocess.Popen(cmd, shell=True, cwd=cwd_path)
    res = p.wait()
    print(cmd)
    print("运行结果：", res)
    if res == 0:
        # 运行成功
        print("运行成功")
        return True
    else:
        # 运行失败
        print("运行失败")
        return False

2.3 配置各项参数

import os
# 试验路径
exp_dir = "/home/aistudio/work/exp"
# 配置试验相关路径信息
cwd_path = "/home/aistudio/PaddleSpeech/examples/other/tts_finetune/tts3"
# 可以参考 env.sh 文件，查看模型下载信息
pretrained_model_dir = "models/fastspeech2_mix_ckpt_1.2.0"
# # 同时上传了 wav+标注文本 以及本地生成的 textgrid 对齐文件
# 输入数据集路径
data_dir = "/home/aistudio/work/dataset/胡桃/wav"
# 如果上传了 MFA 对齐结果，则使用已经对齐的文件
mfa_dir = "/home/aistudio/work/dataset/胡桃/textgrid"
# 输出文件路径
wav_output_dir = os.path.join(exp_dir, "output")
os.makedirs(wav_output_dir, exist_ok=True)
dump_dir = os.path.join(exp_dir, 'dump')
output_dir = os.path.join(exp_dir, 'exp')
lang = "zh"

2.4 检查数据集是否合法

# check oov
cmd = f"""
    python3 local/check_oov.py \
        --input_dir={data_dir} \
        --pretrained_model_dir={pretrained_model_dir} \
        --newdir_name={new_dir} \
        --lang={lang}
"""

# 执行该步骤
run_cmd(cmd, cwd_path)

    python3 local/check_oov.py         --input_dir=/home/aistudio/work/dataset/胡桃/wav         --pretrained_model_dir=models/fastspeech2_mix_ckpt_1.2.0         --newdir_name=work/dataset/胡桃/textgrid/newdir         --lang=zh
运行结果： 0
运行成功
True

2.5 生成 Duration 时长信息

cmd = f"""
python3 local/generate_duration.py \
    --mfa_dir={mfa_dir}
"""

# 执行该步骤
run_cmd(cmd, cwd_path)

python3 local/generate_duration.py     --mfa_dir=/home/aistudio/work/dataset/胡桃/textgrid
运行结果： 0
运行成功
True

2.6. 数据预处理

cmd = f"""
python3 local/extract_feature.py \
    --duration_file="./durations.txt" \
    --input_dir={data_dir} \
    --dump_dir={dump_dir}\
    --pretrained_model_dir={pretrained_model_dir}
"""

# 执行该步骤
run_cmd(cmd, cwd_path)

2.7. 准备微调环境

cmd = f"""
python3 local/prepare_env.py \
    --pretrained_model_dir={pretrained_model_dir} \
    --output_dir={output_dir}
"""

# 执行该步骤
run_cmd(cmd, cwd_path)

python3 local/prepare_env.py     --pretrained_model_dir=models/fastspeech2_mix_ckpt_1.2.0     --output_dir=/home/aistudio/work/exp/exp
运行结果： 0
运行成功
True

2.8. 微调并训练

不同的数据集是不好给出统一的训练参数，因此在这一步，开发者可以根据自己训练的实际情况调整参数，重要参数说明：

训练轮次： epoch

epoch 决定了训练的轮次，可以结合 VisualDL 服务，在 AIstudio 中查看训练数据是否已经收敛，当数据集数量增加时，预设的训练轮次（100）不一定可以达到收敛状态
当训练轮次过多（epoch > 200）时，建议新建终端，进入/home/aistudio/PaddleSpeech/examples/other/tts_finetune/tts3 路径下, 执行 cmd 命令，AIStudio 在打印特别多的训练信息时，会产生错误

配置文件：

/home/aistudio/PaddleSpeech/examples/other/tts_finetune/tts3/conf/finetune.yaml

# 将默认的 yaml 拷贝一份到 exp_dir 下，方便修改
import shutil
in_label = "/home/aistudio/PaddleSpeech/examples/other/tts_finetune/tts3/conf/finetune.yaml"
shutil.copy(in_label, exp_dir)

'/home/aistudio/work/exp/finetune.yaml'

epoch = 250
config_path = os.path.join(exp_dir, "finetune.yaml")
cmd = f"""
python3 local/finetune.py \
    --pretrained_model_dir={pretrained_model_dir} \
    --dump_dir={dump_dir} \
    --output_dir={output_dir} \
    --ngpu=1 \
    --epoch={epoch} \
    --finetune_config={config_path}
"""

# 执行该步骤
# 如果训练轮次过多，则复制上面的cmd到终端中运行
run_cmd(cmd, cwd_path)

3 生成音频

输入我们需要生成的文字，即可生成对应的音频文件

3.1 文本输入

text_dict = {
    "0": "大家好，我是 胡桃，今天 天气 很不错啊，大家一起来原神找我玩呀！",
    "1": "hehe，太阳 出 来 我 晒 太阳 ，月亮 出 来 我 晒 月亮 咯。",
    "2": "我是it er hui , 一名 P P D E ，欢迎 大家 来飞桨 社区 找我，谢谢大家 fork 这个项目"
}

# 生成 sentence.txt
text_file = os.path.join(exp_dir, "sentence.txt")
with open(text_file, "w", encoding="utf8") as f:
    for k,v in sorted(text_dict.items(), key=lambda x:x[0]):
        f.write(f"{k} {v}\n")

3.2 调训练的模型

# 找到最新生成的模型
def find_max_ckpt(model_path):
    max_ckpt = 0
    for filename in os.listdir(model_path):
        if filename.endswith('.pdz'):
            files = filename[:-4]
            a1, a2, it = files.split("_")
            if int(it) > max_ckpt:
                max_ckpt = int(it)
    return max_ckpt

3.2 生成语音

# 配置一下参数信息
model_path = os.path.join(output_dir, "checkpoints")
ckpt = find_max_ckpt(model_path)
cmd = f"""
python3 /home/aistudio/PaddleSpeech/paddlespeech/t2s/exps/fastspeech2/../synthesize_e2e.py \
                --am=fastspeech2_mix \
                --am_config=models/fastspeech2_mix_ckpt_1.2.0/default.yaml \
                --am_ckpt={output_dir}/checkpoints/snapshot_iter_{ckpt}.pdz \
                --am_stat=models/fastspeech2_mix_ckpt_1.2.0/speech_stats.npy \
                --voc="hifigan_aishell3" \
                --voc_config=models/hifigan_aishell3_ckpt_0.2.0/default.yaml \
                --voc_ckpt=models/hifigan_aishell3_ckpt_0.2.0/snapshot_iter_2500000.pdz \
                --voc_stat=models/hifigan_aishell3_ckpt_0.2.0/feats_stats.npy \
                --lang=mix \
                --text={text_file} \
                --output_dir={wav_output_dir} \
                --phones_dict={dump_dir}/phone_id_map.txt \
                --speaker_dict={dump_dir}/speaker_id_map.txt \
                --spk_id=0 \
                --ngpu=1
"""

run_cmd(cmd, cwd_path)

3.4 语音展示

import IPython.display as ipd
ipd.Audio(os.path.join(wav_output_dir, "0.wav"))

[PaddleSpeech 原神] 音色克隆之胡桃

1. 配置 PaddleSpeech 开发环境

2 数据集配置

2.1 解压数据集

2.2 编写执行cmd函数代码

2.3 配置各项参数

2.4 检查数据集是否合法

2.5 生成 Duration 时长信息

2.6. 数据预处理

2.7. 准备微调环境

2.8. 微调并训练

3 生成音频

3.1 文本输入

3.2 调训练的模型

3.2 生成语音

3.4 语音展示

热门文章

最新文章

相关电子书

相关实验场景

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

[PaddleSpeech 原神] 音色克隆之胡桃

1. 配置 PaddleSpeech 开发环境

2 数据集配置

2.1 解压数据集

2.2 编写执行cmd函数代码

2.3 配置各项参数

2.4 检查数据集是否合法

2.5 生成 Duration 时长信息

2.6. 数据预处理

2.7. 准备微调环境

2.8. 微调并训练

3 生成音频

3.1 文本输入

3.2 调训练的模型

3.2 生成语音

3.4 语音展示

热门文章

最新文章

相关电子书

相关实验场景