民谣女神唱流行,基于AI人工智能so-vits库训练自己的音色模型(叶蓓/Python3.10)

本文涉及的产品
交互式建模 PAI-DSW,每月250计算时 3个月
模型在线服务 PAI-EAS,A10/V100等 500元 1个月
模型训练 PAI-DLC,100CU*H 3个月
简介: 流行天后孙燕姿的音色固然是极好的,但是目前全网都是她的声音复刻,听多了难免会有些审美疲劳,在网络上检索了一圈,还没有发现民谣歌手的音色模型,人就是这样,得不到的永远在骚动,本次我们自己构建训练集,来打造自己的音色模型,让民谣女神来唱流行歌曲,要多带劲就有多带劲。

流行天后孙燕姿的音色固然是极好的,但是目前全网都是她的声音复刻,听多了难免会有些审美疲劳,在网络上检索了一圈,还没有发现民谣歌手的音色模型,人就是这样,得不到的永远在骚动,本次我们自己构建训练集,来打造自己的音色模型,让民谣女神来唱流行歌曲,要多带劲就有多带劲。

构建训练集

训练集是指用于训练神经网络模型的数据集合。这个数据集通常由大量的输入和对应的输出组成,神经网络模型通过学习输入和输出之间的关系来进行训练,并且在训练过程中调整模型的参数以最小化误差。

通俗地讲,如果我们想要训练民谣歌手叶蓓的音色模型,就需要将她的歌曲作为输入参数,也就是训练集,训练集的作用是为模型提供学习的材料,使其能够从输入数据中学习到正确的输出。通过反复迭代训练集,神经网络模型可以不断地优化自身,提高其对输入数据的预测能力。

没错,so-vits库底层就是神经网络架构,而训练音色模型库,本质上解决的是预测问题,关于神经网络架构,请移步:人工智能机器学习底层原理剖析,人造神经元,您一定能看懂,通俗解释把AI“黑话”转化为“白话文”,这里不再赘述。

选择训练集样本时,最好选择具有歌手音色“特质”的歌曲,为什么全网都是孙燕姿?只是因为她的音色辨识度太高,模型可以从输入数据中更容易地学习到正确的输出。

此外,训练集数据贵精不贵多,特征权重比较高的清晰样本,在训练效果要比低质量样本要好,比如歌手“翻唱”的一些歌曲,或者使用非常规唱法的歌曲,这类样本虽然也具备一些歌手的音色特征,但对于模型训练来说,实际上起到是反作用,这是需要注意的事情。

这里选择叶蓓早期专辑《幸福深处》中的六首歌:

通常来说,训练集的数量越多,模型的性能就越好,但是在实践中,需要根据实际情况进行权衡和选择。

在深度学习中,通常需要大量的数据才能训练出高性能的模型。例如,在计算机视觉任务中,需要大量的图像数据来训练卷积神经网络模型。但是,在其他一些任务中,如语音识别和自然语言处理,相对较少的数据量也可以训练出高性能的模型。

通常,需要确保训练集中包含充足、多样的样本,以覆盖所有可能的输入情况。此外,训练集中需要包含足够的正样本和负样本,以保证模型的分类性能。

除了数量之外,训练集的质量也非常重要。需要确保训练集中不存在偏差和噪声,同时需要进行数据清洗和数据增强等预处理操作,以提高训练集的质量和多样性。

总的来说,训练集的数量要求需要根据具体问题进行调整,需要考虑问题的复杂性、数据的多样性、模型的复杂度和训练算法的效率等因素。在实践中,需要进行实验和验证,找到最适合问题的训练集规模。

综上,考虑到笔者的电脑配置以及训练时间成本,训练集相对较小,其他朋友可以根据自己的情况丰俭由己地进行调整。

训练集数据清洗

准备好训练集之后,我们需要对数据进行“清洗”,也就是去掉歌曲中的伴奏、停顿以及混音部分,只留下“清唱”的版本。

伴奏和人声分离推荐使用spleeter库:

pip3 install spleeter --user

接着运行命令,对训练集歌曲进行分离操作:

spleeter separate -o d:/output/ -p spleeter:2stems d:/数据.mp3

这里-o代表输出目录,-p代表选择的分离模型,最后是要分离的素材。

首次运行会比较慢,因为spleeter会下载预训练模型,体积在1.73g左右,运行完毕后,会在输出目录生成分离后的音轨文件:

D:\歌曲制作\清唱 的目录  
  
2023/05/11  15:38    <DIR>          .  
2023/05/11  13:45    <DIR>          ..  
2023/05/11  13:40        39,651,884 1_1_01. wxs.wav  
2023/05/11  15:34        46,103,084 1_1_02. qad_(Vocals)_(Vocals).wav  
2023/05/11  15:35        43,802,924 1_1_03. hs_(Vocals)_(Vocals).wav  
2023/05/11  15:36        39,054,764 1_1_04. hope_(Vocals)_(Vocals).wav  
2023/05/11  15:36        32,849,324 1_1_05. kamen_(Vocals)_(Vocals).wav  
2023/05/11  15:37        50,741,804 1_1_06. ctrl_(Vocals)_(Vocals).wav  
               6 个文件    252,203,784 字节  
               2 个目录 449,446,780,928 可用字节

关于spleeter更多的操作,请移步至:人工智能AI库Spleeter免费人声和背景音乐分离实践(Python3.10), 这里不再赘述。

分离后的数据样本还需要二次处理,因为分离后的音频本身还会带有一些轻微的背景音和混音,这里推荐使用noisereduce库:

pip3 install noisereduce,soundfile

随后进行降噪处理:

import noisereduce as nr  
import soundfile as sf  
  
# 读入音频文件  
data, rate = sf.read("audio_file.wav")  
  
# 获取噪声样本  
noisy_part = data[10000:15000]  
  
# 估算噪声  
noise = nr.estimate_noise(noisy_part, rate)  
  
# 应用降噪算法  
reduced_noise = nr.reduce_noise(audio_clip=data, noise_clip=noise, verbose=False)  
  
# 将结果写入文件  
sf.write("audio_file_denoised.wav", reduced_noise, rate)

先通过soundfile库将歌曲文件读出来,然后获取噪声样本并对其使用降噪算法,最后写入新文件。

至此,数据清洗工作基本完成。

训练集数据切分

深度学习过程中,计算机会把训练数据读入显卡的缓存中,但如果训练集数据过大,会导致内存溢出问题,也就是常说的“爆显存”现象。

将数据集分成多个部分,每次只载入一个部分的数据进行训练。这种方法可以减少内存使用,同时也可以实现并行处理,提高训练效率。

这里可以使用github.com/openvpi/audio-slicer库:

git clone https://github.com/openvpi/audio-slicer.git

随后编写代码:

import librosa  # Optional. Use any library you like to read audio files.  
import soundfile  # Optional. Use any library you like to write audio files.  
  
from slicer2 import Slicer  
  
audio, sr = librosa.load('example.wav', sr=None, mono=False)  # Load an audio file with librosa.  
slicer = Slicer(  
    sr=sr,  
    threshold=-40,  
    min_length=5000,  
    min_interval=300,  
    hop_size=10,  
    max_sil_kept=500  
)  
chunks = slicer.slice(audio)  
for i, chunk in enumerate(chunks):  
    if len(chunk.shape) > 1:  
        chunk = chunk.T  # Swap axes if the audio is stereo.  
    soundfile.write(f'clips/example_{i}.wav', chunk, sr)  # Save sliced audio files with soundfile.

该脚本可以将所有降噪后的清唱样本切成小样本,方便训练,电脑配置比较低的朋友,可以考虑将min\_interval和max\_sil\_kept调的更高一些,这些会切的更碎,所谓“细细切做臊子”。

最后,六首歌被切成了140个小样本:

D:\歌曲制作\slicer 的目录  
  
2023/05/11  15:45    <DIR>          .  
2023/05/11  13:45    <DIR>          ..  
2023/05/11  15:45           873,224 1_1_01. wxs_0.wav  
2023/05/11  15:45           934,964 1_1_01. wxs_1.wav  
2023/05/11  15:45         1,039,040 1_1_01. wxs_10.wav  
2023/05/11  15:45         1,391,840 1_1_01. wxs_11.wav  
2023/05/11  15:45         2,272,076 1_1_01. wxs_12.wav  
2023/05/11  15:45         2,637,224 1_1_01. wxs_13.wav  
2023/05/11  15:45         1,476,512 1_1_01. wxs_14.wav  
2023/05/11  15:45         1,044,332 1_1_01. wxs_15.wav  
2023/05/11  15:45         1,809,908 1_1_01. wxs_16.wav  
2023/05/11  15:45           887,336 1_1_01. wxs_17.wav  
2023/05/11  15:45           952,604 1_1_01. wxs_18.wav  
2023/05/11  15:45           989,648 1_1_01. wxs_19.wav  
2023/05/11  15:45           957,896 1_1_01. wxs_2.wav  
2023/05/11  15:45           231,128 1_1_01. wxs_20.wav  
2023/05/11  15:45         1,337,156 1_1_01. wxs_3.wav  
2023/05/11  15:45         1,308,932 1_1_01. wxs_4.wav  
2023/05/11  15:45         1,035,512 1_1_01. wxs_5.wav  
2023/05/11  15:45         2,388,500 1_1_01. wxs_6.wav  
2023/05/11  15:45         2,952,980 1_1_01. wxs_7.wav  
2023/05/11  15:45           929,672 1_1_01. wxs_8.wav  
2023/05/11  15:45           878,516 1_1_01. wxs_9.wav  
2023/05/11  15:45           963,188 1_1_02. qad_(Vocals)_(Vocals)_0.wav  
2023/05/11  15:45           901,448 1_1_02. qad_(Vocals)_(Vocals)_1.wav  
2023/05/11  15:45         1,411,244 1_1_02. qad_(Vocals)_(Vocals)_10.wav  
2023/05/11  15:45         2,070,980 1_1_02. qad_(Vocals)_(Vocals)_11.wav  
2023/05/11  15:45         2,898,296 1_1_02. qad_(Vocals)_(Vocals)_12.wav  
2023/05/11  15:45           885,572 1_1_02. qad_(Vocals)_(Vocals)_13.wav  
2023/05/11  15:45           841,472 1_1_02. qad_(Vocals)_(Vocals)_14.wav  
2023/05/11  15:45           876,752 1_1_02. qad_(Vocals)_(Vocals)_15.wav  
2023/05/11  15:45         1,091,960 1_1_02. qad_(Vocals)_(Vocals)_16.wav  
2023/05/11  15:45         1,188,980 1_1_02. qad_(Vocals)_(Vocals)_17.wav  
2023/05/11  15:45         1,446,524 1_1_02. qad_(Vocals)_(Vocals)_18.wav  
2023/05/11  15:45           924,380 1_1_02. qad_(Vocals)_(Vocals)_19.wav  
2023/05/11  15:45           255,824 1_1_02. qad_(Vocals)_(Vocals)_2.wav  
2023/05/11  15:45         1,718,180 1_1_02. qad_(Vocals)_(Vocals)_20.wav  
2023/05/11  15:45         2,070,980 1_1_02. qad_(Vocals)_(Vocals)_21.wav  
2023/05/11  15:45         2,827,736 1_1_02. qad_(Vocals)_(Vocals)_22.wav  
2023/05/11  15:45           862,640 1_1_02. qad_(Vocals)_(Vocals)_23.wav  
2023/05/11  15:45         1,628,216 1_1_02. qad_(Vocals)_(Vocals)_24.wav  
2023/05/11  15:45         1,626,452 1_1_02. qad_(Vocals)_(Vocals)_25.wav  
2023/05/11  15:45         1,499,444 1_1_02. qad_(Vocals)_(Vocals)_26.wav  
2023/05/11  15:45         1,303,640 1_1_02. qad_(Vocals)_(Vocals)_27.wav  
2023/05/11  15:45           998,468 1_1_02. qad_(Vocals)_(Vocals)_28.wav  
2023/05/11  15:45           781,496 1_1_02. qad_(Vocals)_(Vocals)_3.wav  
2023/05/11  15:45         1,368,908 1_1_02. qad_(Vocals)_(Vocals)_4.wav  
2023/05/11  15:45           892,628 1_1_02. qad_(Vocals)_(Vocals)_5.wav  
2023/05/11  15:45         1,386,548 1_1_02. qad_(Vocals)_(Vocals)_6.wav  
2023/05/11  15:45           883,808 1_1_02. qad_(Vocals)_(Vocals)_7.wav  
2023/05/11  15:45           952,604 1_1_02. qad_(Vocals)_(Vocals)_8.wav  
2023/05/11  15:45         1,303,640 1_1_02. qad_(Vocals)_(Vocals)_9.wav  
2023/05/11  15:45         1,354,796 1_1_03. hs_(Vocals)_(Vocals)_0.wav  
2023/05/11  15:45         1,344,212 1_1_03. hs_(Vocals)_(Vocals)_1.wav  
2023/05/11  15:45         1,305,404 1_1_03. hs_(Vocals)_(Vocals)_10.wav  
2023/05/11  15:45         1,291,292 1_1_03. hs_(Vocals)_(Vocals)_11.wav  
2023/05/11  15:45         1,338,920 1_1_03. hs_(Vocals)_(Vocals)_12.wav  
2023/05/11  15:45         1,093,724 1_1_03. hs_(Vocals)_(Vocals)_13.wav  
2023/05/11  15:45         1,375,964 1_1_03. hs_(Vocals)_(Vocals)_14.wav  
2023/05/11  15:45         1,409,480 1_1_03. hs_(Vocals)_(Vocals)_15.wav  
2023/05/11  15:45         1,481,804 1_1_03. hs_(Vocals)_(Vocals)_16.wav  
2023/05/11  15:45         2,247,380 1_1_03. hs_(Vocals)_(Vocals)_17.wav  
2023/05/11  15:45         1,312,460 1_1_03. hs_(Vocals)_(Vocals)_18.wav  
2023/05/11  15:45         1,428,884 1_1_03. hs_(Vocals)_(Vocals)_19.wav  
2023/05/11  15:45         1,051,388 1_1_03. hs_(Vocals)_(Vocals)_2.wav  
2023/05/11  15:45         1,377,728 1_1_03. hs_(Vocals)_(Vocals)_20.wav  
2023/05/11  15:45         1,485,332 1_1_03. hs_(Vocals)_(Vocals)_21.wav  
2023/05/11  15:45           897,920 1_1_03. hs_(Vocals)_(Vocals)_22.wav  
2023/05/11  15:45         1,591,172 1_1_03. hs_(Vocals)_(Vocals)_23.wav  
2023/05/11  15:45           920,852 1_1_03. hs_(Vocals)_(Vocals)_24.wav  
2023/05/11  15:45         1,046,096 1_1_03. hs_(Vocals)_(Vocals)_25.wav  
2023/05/11  15:45           730,340 1_1_03. hs_(Vocals)_(Vocals)_26.wav  
2023/05/11  15:45         1,383,020 1_1_03. hs_(Vocals)_(Vocals)_3.wav  
2023/05/11  15:45         1,188,980 1_1_03. hs_(Vocals)_(Vocals)_4.wav  
2023/05/11  15:45         1,003,760 1_1_03. hs_(Vocals)_(Vocals)_5.wav  
2023/05/11  15:45         1,243,664 1_1_03. hs_(Vocals)_(Vocals)_6.wav  
2023/05/11  15:45           845,000 1_1_03. hs_(Vocals)_(Vocals)_7.wav  
2023/05/11  15:45           892,628 1_1_03. hs_(Vocals)_(Vocals)_8.wav  
2023/05/11  15:45           539,828 1_1_03. hs_(Vocals)_(Vocals)_9.wav  
2023/05/11  15:45           725,048 1_1_04. hope_(Vocals)_(Vocals)_0.wav  
2023/05/11  15:45         1,023,164 1_1_04. hope_(Vocals)_(Vocals)_1.wav  
2023/05/11  15:45           202,904 1_1_04. hope_(Vocals)_(Vocals)_10.wav  
2023/05/11  15:45           659,780 1_1_04. hope_(Vocals)_(Vocals)_11.wav  
2023/05/11  15:45         1,017,872 1_1_04. hope_(Vocals)_(Vocals)_12.wav  
2023/05/11  15:45         1,495,916 1_1_04. hope_(Vocals)_(Vocals)_13.wav  
2023/05/11  15:45         1,665,260 1_1_04. hope_(Vocals)_(Vocals)_14.wav  
2023/05/11  15:45           675,656 1_1_04. hope_(Vocals)_(Vocals)_15.wav  
2023/05/11  15:45         1,187,216 1_1_04. hope_(Vocals)_(Vocals)_16.wav  
2023/05/11  15:45         1,201,328 1_1_04. hope_(Vocals)_(Vocals)_17.wav  
2023/05/11  15:45         1,368,908 1_1_04. hope_(Vocals)_(Vocals)_18.wav  
2023/05/11  15:45         1,462,400 1_1_04. hope_(Vocals)_(Vocals)_19.wav  
2023/05/11  15:45           963,188 1_1_04. hope_(Vocals)_(Vocals)_2.wav  
2023/05/11  15:45         1,121,948 1_1_04. hope_(Vocals)_(Vocals)_20.wav  
2023/05/11  15:45           165,860 1_1_04. hope_(Vocals)_(Vocals)_21.wav  
2023/05/11  15:45         1,116,656 1_1_04. hope_(Vocals)_(Vocals)_3.wav  
2023/05/11  15:45           622,736 1_1_04. hope_(Vocals)_(Vocals)_4.wav  
2023/05/11  15:45         1,349,504 1_1_04. hope_(Vocals)_(Vocals)_5.wav  
2023/05/11  15:45           984,356 1_1_04. hope_(Vocals)_(Vocals)_6.wav  
2023/05/11  15:45         2,104,496 1_1_04. hope_(Vocals)_(Vocals)_7.wav  
2023/05/11  15:45         1,762,280 1_1_04. hope_(Vocals)_(Vocals)_8.wav  
2023/05/11  15:45         1,116,656 1_1_04. hope_(Vocals)_(Vocals)_9.wav  
2023/05/11  15:45         1,114,892 1_1_05. kamen_(Vocals)_(Vocals)_0.wav  
2023/05/11  15:45           874,988 1_1_05. kamen_(Vocals)_(Vocals)_1.wav  
2023/05/11  15:45         1,400,660 1_1_05. kamen_(Vocals)_(Vocals)_10.wav  
2023/05/11  15:45           943,784 1_1_05. kamen_(Vocals)_(Vocals)_11.wav  
2023/05/11  15:45         1,351,268 1_1_05. kamen_(Vocals)_(Vocals)_12.wav  
2023/05/11  15:45         1,476,512 1_1_05. kamen_(Vocals)_(Vocals)_13.wav  
2023/05/11  15:45           933,200 1_1_05. kamen_(Vocals)_(Vocals)_14.wav  
2023/05/11  15:45         1,388,312 1_1_05. kamen_(Vocals)_(Vocals)_15.wav  
2023/05/11  15:45         1,012,580 1_1_05. kamen_(Vocals)_(Vocals)_16.wav  
2023/05/11  15:45         1,365,380 1_1_05. kamen_(Vocals)_(Vocals)_17.wav  
2023/05/11  15:45         1,614,104 1_1_05. kamen_(Vocals)_(Vocals)_18.wav  
2023/05/11  15:45         1,582,352 1_1_05. kamen_(Vocals)_(Vocals)_19.wav  
2023/05/11  15:45           949,076 1_1_05. kamen_(Vocals)_(Vocals)_2.wav  
2023/05/11  15:45         1,402,424 1_1_05. kamen_(Vocals)_(Vocals)_20.wav  
2023/05/11  15:45         1,268,360 1_1_05. kamen_(Vocals)_(Vocals)_21.wav  
2023/05/11  15:45         1,016,108 1_1_05. kamen_(Vocals)_(Vocals)_22.wav  
2023/05/11  15:45         1,065,500 1_1_05. kamen_(Vocals)_(Vocals)_3.wav  
2023/05/11  15:45           874,988 1_1_05. kamen_(Vocals)_(Vocals)_4.wav  
2023/05/11  15:45           954,368 1_1_05. kamen_(Vocals)_(Vocals)_5.wav  
2023/05/11  15:45         1,049,624 1_1_05. kamen_(Vocals)_(Vocals)_6.wav  
2023/05/11  15:45           878,516 1_1_05. kamen_(Vocals)_(Vocals)_7.wav  
2023/05/11  15:45         1,019,636 1_1_05. kamen_(Vocals)_(Vocals)_8.wav  
2023/05/11  15:45         1,383,020 1_1_05. kamen_(Vocals)_(Vocals)_9.wav  
2023/05/11  15:45         1,005,524 1_1_06. ctrl_(Vocals)_(Vocals)_0.wav  
2023/05/11  15:45         1,090,196 1_1_06. ctrl_(Vocals)_(Vocals)_1.wav  
2023/05/11  15:45            84,716 1_1_06. ctrl_(Vocals)_(Vocals)_10.wav  
2023/05/11  15:45           857,348 1_1_06. ctrl_(Vocals)_(Vocals)_11.wav  
2023/05/11  15:45           991,412 1_1_06. ctrl_(Vocals)_(Vocals)_12.wav  
2023/05/11  15:45         1,121,948 1_1_06. ctrl_(Vocals)_(Vocals)_13.wav  
2023/05/11  15:45           931,436 1_1_06. ctrl_(Vocals)_(Vocals)_14.wav  
2023/05/11  15:45         3,129,380 1_1_06. ctrl_(Vocals)_(Vocals)_15.wav  
2023/05/11  15:45         6,202,268 1_1_06. ctrl_(Vocals)_(Vocals)_16.wav  
2023/05/11  15:45         1,457,108 1_1_06. ctrl_(Vocals)_(Vocals)_17.wav  
2023/05/11  15:45         1,046,096 1_1_06. ctrl_(Vocals)_(Vocals)_2.wav  
2023/05/11  15:45           956,132 1_1_06. ctrl_(Vocals)_(Vocals)_3.wav  
2023/05/11  15:45         1,286,000 1_1_06. ctrl_(Vocals)_(Vocals)_4.wav  
2023/05/11  15:45           804,428 1_1_06. ctrl_(Vocals)_(Vocals)_5.wav  
2023/05/11  15:45         1,337,156 1_1_06. ctrl_(Vocals)_(Vocals)_6.wav  
2023/05/11  15:45         1,372,436 1_1_06. ctrl_(Vocals)_(Vocals)_7.wav  
2023/05/11  15:45         2,954,744 1_1_06. ctrl_(Vocals)_(Vocals)_8.wav  
2023/05/11  15:45         6,112,304 1_1_06. ctrl_(Vocals)_(Vocals)_9.wav  
             140 个文件    183,026,452 字节

至此,数据切分顺利完成。

开始训练

万事俱备,只差训练,首先配置so-vits-svc环境,请移步:AI天后,在线飙歌,人工智能AI孙燕姿模型应用实践,复刻《遥远的歌》,原唱晴子(Python3.10),囿于篇幅,这里不再赘述。

随后将切分后的数据集放在项目根目录的dataset\_raw/yebei文件夹,如果没有yebei文件夹,请进行创建。

随后构建训练配置文件:

{  
    "train": {  
        "log_interval": 200,  
        "eval_interval": 800,  
        "seed": 1234,  
        "epochs": 10000,  
        "learning_rate": 0.0001,  
        "betas": [  
            0.8,  
            0.99  
        ],  
        "eps": 1e-09,  
        "batch_size": 6,  
        "fp16_run": false,  
        "lr_decay": 0.999875,  
        "segment_size": 10240,  
        "init_lr_ratio": 1,  
        "warmup_epochs": 0,  
        "c_mel": 45,  
        "c_kl": 1.0,  
        "use_sr": true,  
        "max_speclen": 512,  
        "port": "8001",  
        "keep_ckpts": 10,  
        "all_in_mem": false  
    },  
    "data": {  
        "training_files": "filelists/train.txt",  
        "validation_files": "filelists/val.txt",  
        "max_wav_value": 32768.0,  
        "sampling_rate": 44100,  
        "filter_length": 2048,  
        "hop_length": 512,  
        "win_length": 2048,  
        "n_mel_channels": 80,  
        "mel_fmin": 0.0,  
        "mel_fmax": 22050  
    },  
    "model": {  
        "inter_channels": 192,  
        "hidden_channels": 192,  
        "filter_channels": 768,  
        "n_heads": 2,  
        "n_layers": 6,  
        "kernel_size": 3,  
        "p_dropout": 0.1,  
        "resblock": "1",  
        "resblock_kernel_sizes": [  
            3,  
            7,  
            11  
        ],  
        "resblock_dilation_sizes": [  
            [  
                1,  
                3,  
                5  
            ],  
            [  
                1,  
                3,  
                5  
            ],  
            [  
                1,  
                3,  
                5  
            ]  
        ],  
        "upsample_rates": [  
            8,  
            8,  
            2,  
            2,  
            2  
        ],  
        "upsample_initial_channel": 512,  
        "upsample_kernel_sizes": [  
            16,  
            16,  
            4,  
            4,  
            4  
        ],  
        "n_layers_q": 3,  
        "use_spectral_norm": false,  
        "gin_channels": 768,  
        "ssl_dim": 768,  
        "n_speakers": 1  
    },  
    "spk": {  
        "yebei": 0  
    }  
}

这里epochs是指对整个训练集进行一次完整的训练。具体来说,每个epoch包含多个训练步骤,每个训练步骤会从训练集中抽取一个小批量的数据进行训练,并更新模型的参数。

需要调整的参数是batch\_size,如果显存不够,需要往下调整,否则也会“爆显存”,如果训练过程中出现了下面这个错误:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 8.00 GiB total capacity; 6.86 GiB already allocated; 0 bytes free; 7.25 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

那么就说明显存已经不够用了。

最后,运行命令开始训练:

python3 train.py -c configs/config.json -m 44k

终端会返回训练过程:

D:\work\so-vits-svc\workenv\lib\site-packages\torch\optim\lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate  
  warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "  
D:\work\so-vits-svc\workenv\lib\site-packages\torch\functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.  
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\SpectralOps.cpp:867.)  
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]  
INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration.  
D:\work\so-vits-svc\workenv\lib\site-packages\torch\autograd\__init__.py:200: UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed.  This is not an error, but may impair performance.  
grad.sizes() = [32, 1, 4], strides() = [4, 1, 1]  
bucket_view.sizes() = [32, 1, 4], strides() = [4, 4, 1] (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\reducer.cpp:337.)  
  Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass  
INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration.  
INFO:44k:====> Epoch: 274, cost 39.02 s  
INFO:44k:====> Epoch: 275, cost 17.47 s  
INFO:44k:====> Epoch: 276, cost 17.74 s  
INFO:44k:====> Epoch: 277, cost 17.43 s  
INFO:44k:====> Epoch: 278, cost 17.59 s  
INFO:44k:====> Epoch: 279, cost 17.82 s  
INFO:44k:====> Epoch: 280, cost 17.64 s  
INFO:44k:====> Epoch: 281, cost 17.63 s  
INFO:44k:Train Epoch: 282 [65%]  
INFO:44k:Losses: [1.8697402477264404, 3.029414415359497, 11.415563583374023, 23.37869644165039, 0.2702481746673584], step: 6600, lr: 9.637943809624507e-05, reference_loss: 39.963661193847656

这里每一次Epoch系统都会返回损失函数等相关信息,训练好的模型存放在项目的logs/44k目录下,模型的后缀名是.pth。

结语

一般情况下,训练损失率低于50%,并且损失函数在训练集和验证集上都趋于稳定,则可以认为模型已经收敛。收敛的模型就可以为我们所用了,如何使用训练好的模型,请移步:AI天后,在线飙歌,人工智能AI孙燕姿模型应用实践,复刻《遥远的歌》,原唱晴子(Python3.10)

最后,奉上民谣女神叶蓓的总训练6400次的音色模型,与众乡亲同飨:

pan.baidu.com/s/1m3VGc7RktaO5snHw6RPLjQ?pwd=pqkb   
提取码:pqkb
相关文章
|
1月前
|
人工智能 自然语言处理 数据可视化
中国版“Manus”开源?AiPy:用Python重构AI生产力的通用智能体
AiPy是LLM大模型+Python程序编写+Python程序运行+程序可以控制的一切。
|
16天前
|
机器学习/深度学习 人工智能 PyTorch
200行python代码实现从Bigram模型到LLM
本文从零基础出发,逐步实现了一个类似GPT的Transformer模型。首先通过Bigram模型生成诗词,接着加入Positional Encoding实现位置信息编码,再引入Single Head Self-Attention机制计算token间的关系,并扩展到Multi-Head Self-Attention以增强表现力。随后添加FeedForward、Block结构、残差连接(Residual Connection)、投影(Projection)、层归一化(Layer Normalization)及Dropout等组件,最终调整超参数完成一个6层、6头、384维度的“0.0155B”模型
200行python代码实现从Bigram模型到LLM
|
24天前
|
人工智能 PyTorch TensorFlow
AI界的"翻译官":ONNX如何让各框架模型和谐共处
还在为不同框架间的模型转换头疼?ONNX让你在PyTorch训练的模型可以无缝在TensorFlow部署,甚至能让模型在手机上飞速运行。本文带你了解这个AI领域的'瑞士军刀',轻松实现跨平台高性能模型部署。
126 12
|
26天前
|
人工智能 小程序 计算机视觉
AI不只有大模型,小模型也蕴含着大生产力
近年来,AI大模型蓬勃发展,从ChatGPT掀起全球热潮,到国内“百模大战”爆发,再到DeepSeek打破算力壁垒,AI技术不断刷新认知。然而,在大模型备受关注的同时,许多小而精的细分模型却被忽视。这些轻量级模型无需依赖强大算力,可运行于手机、手持设备等边缘终端,广泛应用于物体识别、条码扫描、人体骨骼检测等领域。例如,通过人体识别模型衍生出的运动与姿态识别能力,已在AI体育、康复训练、线上赛事等场景中展现出巨大潜力,大幅提升了相关领域的效率与应用范围。本文将带您深入了解这些高效的小模型及其实际价值。
|
26天前
|
人工智能 数据安全/隐私保护 Docker
短短时间,疯狂斩获1.9k star,开源AI神器AingDesk:一键部署上百模型,本地运行还能联网搜索!
AingDesk 是一款开源的本地 AI 模型管理工具,已获 1.9k Star。它支持一键部署上百款大模型(如 DeepSeek、Llama),适配 CPU/GPU,可本地运行并联网搜索。五大核心功能包括零门槛模型部署、实时联网搜证、私人知识库搭建、跨平台共享和智能体工厂,满足学术、办公及团队协作需求。相比 Ollama 和 Cherry Studio,AingDesk 更简单易用,适合技术小白、团队管理者和隐私敏感者。项目地址:https://github.com/aingdesk/AingDesk。
184 3
|
26天前
|
机器学习/深度学习 人工智能 大数据
特征越多模型越好?这个AI领域的常识可能是错的
特征选择是机器学习中的"减肥秘方",它能帮助模型去除冗余特征,提高性能并降低计算成本。本文深入浅出地介绍特征选择的概念、方法与实践技巧,带你掌握这门让AI模型更高效的"瘦身术"。
53 1
|
27天前
|
存储 机器学习/深度学习 人工智能
多模态RAG实战指南:完整Python代码实现AI同时理解图片、表格和文本
本文探讨了多模态RAG系统的最优实现方案,通过模态特定处理与后期融合技术,在性能、准确性和复杂度间达成平衡。系统包含文档分割、内容提取、HTML转换、语义分块及向量化存储五大模块,有效保留结构和关系信息。相比传统方法,该方案显著提升了复杂查询的检索精度(+23%),并支持灵活升级。文章还介绍了查询处理机制与优势对比,为构建高效多模态RAG系统提供了实践指导。
242 0
多模态RAG实战指南:完整Python代码实现AI同时理解图片、表格和文本
|
1月前
|
人工智能 负载均衡 API
长连接网关技术专题(十二):大模型时代多模型AI网关的架构设计与实现
随着 AI 技术快速发展,业务对 AI 能力的渴求日益增长。当 AI 服务面对处理大规模请求和高并发流量时,AI 网关从中扮演着至关重要的角色。AI 服务通常涉及大量的计算任务和设备资源占用,此时需要一个 AI 网关负责协调这些请求来确保系统的稳定性与高效性。因此,与传统微服务架构类似,我们将相关 API 管理的功能(如流量控制、用户鉴权、配额计费、负载均衡、API 路由等)集中放置在 AI 网关层,可以降低系统整体复杂度并提升可维护性。 本文要分享的是B站在大模型时代基于多模型AI的网关架构设计和实践总结,希望能带给你启发。
109 4
|
1月前
|
SQL 人工智能 自然语言处理
阿里云 AI 搜索开放平台新功能发布:新增GTE自部署模型
阿里云 AI搜索开放平台正式推出 GTE 多语言通用文本向量模型(iic/gte_sentence-embedding_multilingual-base)
135 4
|
30天前
|
机器学习/深度学习 人工智能 算法
Python+YOLO v8 实战:手把手教你打造专属 AI 视觉目标检测模型
本文介绍了如何使用 Python 和 YOLO v8 开发专属的 AI 视觉目标检测模型。首先讲解了 YOLO 的基本概念及其高效精准的特点,接着详细说明了环境搭建步骤,包括安装 Python、PyCharm 和 Ultralytics 库。随后引导读者加载预训练模型进行图片验证,并准备数据集以训练自定义模型。最后,展示了如何验证训练好的模型并提供示例代码。通过本文,你将学会从零开始打造自己的目标检测系统,满足实际场景需求。
307 0
Python+YOLO v8 实战:手把手教你打造专属 AI 视觉目标检测模型

热门文章

最新文章

推荐镜像

更多