祝福视频生成器(一图一文AI生成)

简介: 祝福视频生成器(一图一文AI生成)

新年到,祝福来



你还在用简简单的文字祝福语嘛?

你还在为新年祝福视频发愁嘛?


看到我你就不愁啦~~~


这里啥都有已经给你准备好啦~~~


一张图一段话,祝福视频来!


项目地址:https://aistudio.baidu.com/aistudio/projectdetail/3435736


视频参考:https://www.bilibili.com/video/BV1KT4y1y7JR/


项目说明


本项目使用了Parakeet套件对语音进行了合成处理,然后使用paddleGAN套件对图片和视频进行了处理

利用准备的10种语音模板可以较好的发出这10种类型的声音比较动听

然后对图像进行处理可以使用动漫化头像或者自己的头像进行唇语合成


参考项目

Parakeet音色克隆:柯南的变声器成真啦

[鬼畜区的召唤]蜜雪冰城小giegie


特色

可以合成男女多种音色,添加图片,文字即可出现语音!


自定义模块


自定义模块值 对应含义 数值类型
lable 对应音色 int(1-11)
sentences 需要合成的内容 str
photo_patch 图片地址 地址
custom 自定义语音地址 地址


音色选择


lable 值 对应音色
1 台湾腔小姐姐
2 小姐姐
3 蜡笔小新
4 东北老铁
5 粤语小哥哥
6 小哥哥
7 低沉大叔
8 萌娃
9 御姐音
10 萝莉音
11 自定义


lable = 1  # 根据上面的选择器写入相应的值
sentences = "虎起生活的风帆,走向虎关通途。"  # 需要写入的祝福语
photo_patch = "./靓照.jpg"  # 照片地址
custom = "./" # 自定义语音地址


特别说明


如果需要使用自己的头像进行处理的,可以查看一下!!!要用自己头像的同学注意!!!!!


素材解压


!unzip  -d /home/aistudio/data /home/aistudio/data/data126388/素材.zip 
# !unzip  -d /home/aistudio/work/ /home/aistudio/data/pretrained.zip


Archive:  /home/aistudio/data/data126388/素材.zip
  inflating: /home/aistudio/data/蜡笔小新.wav  
  inflating: /home/aistudio/data/萝莉.wav  
  inflating: /home/aistudio/data/台湾腔小姐姐.wav  
  inflating: /home/aistudio/data/小宝宝.wav  
  inflating: /home/aistudio/data/小哥哥.wav  
  inflating: /home/aistudio/data/小姐姐.wav  
  inflating: /home/aistudio/data/御姐.wav  
  inflating: /home/aistudio/data/粤语小哥哥.wav  
  inflating: /home/aistudio/data/pretrained.zip  
  inflating: /home/aistudio/data/低沉大叔.wav  
  inflating: /home/aistudio/data/东北老铁.wav  


数据前期处理


tone_gather = {1:'data/台湾腔小姐姐.wav',
2:'data/小姐姐.wav',
3:'data/蜡笔小新.wav',
4:'data/东北老铁.wav',
5:'data/粤语小哥哥.wav',
6:'data/小哥哥.wav',
7:'data/低沉大叔.wav',
8:'data/小宝宝.wav',
9:'data/御姐.wav',
10:'data/萝莉.wav'}
tone_gather[11] = custom
if (custom == "./" and lable == 11) or (lable not in [i for i in range(1,12)]):
    lable = 1


symbol = [',', '.', ',', '。','!', '!', ';', ';', ':', ":"]
sentence = ''
for i in sentences:
    if i in symbol:
        sentence = sentence[:-1] + '$'
    else:
ce[:-1] + '$'
    else:
        sentence = sentence + i + '%'


语音合成


1、环境的生成与包的导入


#下载安装Parakeet--本项目中已帮大家安装好了,无需安装,如有安装需求,可执行以下代码:
# !git clone https://gitee.com/paddlepaddle/Parakeet.git -b release/v0.3 /home/aistudio/work/Parakeet


#安装parakeet包
!pip install -e /home/aistudio/work/Parakeet/


如果出现“No module named parakeet”的错误,可以重启项目解决


# 把必要的路径添加到 sys.path,避免找不到已安装的包的
import sys
sys.path.append("/home/aistudio/work/Parakeet")
sys.path.append("/home/aistudio/work/Parakeet/examples/tacotron2_aishell3")
import numpy as np
import os
import paddle
from matplotlib import pyplot as plt
from IPython import display as ipd
import soundfile as sf
import librosa.display
from parakeet.utils import display
paddle.set_device("gpu:0")


CUDAPlace(0)
%matplotlib inline


2. 加载语音克隆模型


from examples.ge2e.audio_processor import SpeakerVerificationPreprocessor
from parakeet.models.lstm_speaker_encoder import LSTMSpeakerEncoder
# speaker encoder
p = SpeakerVerificationPreprocessor(
    sampling_rate=16000, 
    audio_norm_target_dBFS=-30, 
    vad_window_length=30, 
    vad_moving_average_width=8, 
    vad_max_silence_length=6, 
    mel_window_length=25, 
    mel_window_step=10, 
    n_mels=40, 
    partial_n_frames=160, 
    min_pad_coverage=0.75, 
    partial_overlap_ratio=0.5)
speaker_encoder = LSTMSpeakerEncoder(n_mels=40, num_layers=3, hidden_size=256, output_size=256)
speaker_encoder_params_path = "/home/aistudio/work/pretrained/ge2e_ckpt_0.3/step-3000000.pdparams"
speaker_encoder.set_state_dict(paddle.load(speaker_encoder_params_path))
speaker_encoder.eval()
# synthesizer
from parakeet.models.tacotron2 import Tacotron2
from examples.tacotron2_aishell3.chinese_g2p import convert_sentence
from examples.tacotron2_aishell3.aishell3 import voc_phones, voc_tones
from yacs.config import CfgNode
synthesizer = Tacotron2(
    vocab_size=68,
    n_tones=10,
    d_mels= 80,
    d_encoder= 512,
    encoder_conv_layers = 3,
    encoder_kernel_size= 5,
    d_prenet= 256,
    d_attention_rnn= 1024,
    d_decoder_rnn = 1024,
    attention_filters = 32,
    attention_kernel_size = 31,
    d_attention= 128,
    d_postnet = 512,
    postnet_kernel_size = 5,
    postnet_conv_layers = 5,
    reduction_factor = 1,
    p_encoder_dropout = 0.5,
    p_prenet_dropout= 0.5,
    p_attention_dropout= 0.1,
    p_decoder_dropout= 0.1,
    p_postnet_dropout= 0.5,
    d_global_condition=256,
    use_stop_token=False
)
params_path = "/home/aistudio/work/pretrained/tacotron2_aishell3_ckpt_0.3/step-450000.pdparams"
synthesizer.set_state_dict(paddle.load(params_path))
synthesizer.eval()
# vocoder
from parakeet.models import ConditionalWaveFlow
vocoder = ConditionalWaveFlow(upsample_factors=[16, 16], n_flows=8, n_layers=8, n_group=16, channels=128, n_mels=80, kernel_size=[3, 3])
params_path = "/home/aistudio/work/pretrained/waveflow_ljspeech_ckpt_0.3/step-2000000.pdparams"
vocoder.set_state_dict(paddle.load(params_path))
vocoder.eval()


3. 提取目标音色的声音特征


注意:支持音频格式为wav和flac,如有其他格式音频,建议使用软件进行转换。

ref_audio_path = tone_gather[lable]
mel_sequences = p.extract_mel_partials(p.preprocess_wav(ref_audio_path))
# print("mel_sequences: ", mel_sequences.shape)
with paddle.no_grad():
    embed = speaker_encoder.embed_utterance(paddle.to_tensor(mel_sequences))
# print("embed shape: ", embed.shape)
phones, tones = convert_sentence(sentence)
# print(phones)
# print(tones)
phones = np.array([voc_phones.lookup(item) for item in phones], dtype=np.int64)
tones = np.array([voc_tones.lookup(item) for item in tones], dtype=np.int64)
phones = paddle.to_tensor(phones).unsqueeze(0)
tones = paddle.to_tensor(tones).unsqueeze(0)
utterance_embeds = paddle.unsqueeze(embed, 0)


/home/aistudio/work/Parakeet/examples/ge2e/audio_processor.py:96: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  audio_mask = np.round(audio_mask).astype(np.bool)


4. 合成频谱


提取到了参考语音的特征向量之后,给定需要合成的文本,通过 Tacotron2 模型生成频谱。


目前只支持汉字以及两个表示停顿的特殊符号,’%‘表示句中较短的停顿,’$'表示较长的停顿。这是和 AISHELL-3 数据集内的标注一致的。更通用的文本前端会在 parakeet 后续的版本中逐渐提供。


with paddle.no_grad():
    outputs = synthesizer.infer(phones, tones=tones, global_condition=utterance_embeds)
mel_input = paddle.transpose(outputs["mel_outputs_postnet"], [0, 2, 1])
fig = display.plot_alignment(outputs["alignments"][0].numpy().T)
os.system('mkdir -p /home/aistudio/syn_audio/')
with paddle.no_grad():
    wav = vocoder.infer(mel_input)
wav = wav.numpy()[0]
sf.write(f"/home/aistudio/syn_audio/generate.wav", wav, samplerate=22050)
# librosa.display.waveplot(wav)
 98%|█████████▊| 984/1000 [00:02<00:00, 332.07it/s]
Warning! Reached max decoder steps!!!
time: 1.586832046508789s
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/image.py:425: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
  a_min = np.asscalar(a_min.astype(scaled_dtype))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/image.py:426: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
  a_max = np.asscalar(a_max.astype(scaled_dtype))


57c51560dc214cdb96317c4f591a1416.png


5. 合成最终语音


使用 waveflow 声码器,将生成的频谱转换为音频。


# 查看生成语音
ipd.Audio(wav, rate=22050)


图片与视频处理



1、环境生成与包的导入


# 当前目录在: /home/aistudio/, 这个目录也是左边文件和文件夹所在的目录
# 克隆最新的PaddleGAN仓库到当前目录
# !git clone https://github.com/PaddlePaddle/PaddleGAN.git
# 如果从github下载慢可以从gitee clone:
!git clone https://gitee.com/paddlepaddle/PaddleGAN.git
%cd /home/aistudio/PaddleGAN/
!pip install -v -e .


#安装PaddleGAN的pip包,即可使用api预测方式
!pip install --upgrade ppgan
!pip install dlib


2、头像处理

#生成动画头像
from ppgan.apps import Photo2CartoonPredictor
%cd /home/aistudio
p2c = Photo2CartoonPredictor(output_path='/home/aistudio/result/')
p2c.run(photo_patch)
/home/aistudio
Cartoon image has been saved at '/home/aistudio/result/p2c_cartoon.png'.
array([[[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],
       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],
       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],
       ...,
       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],
       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],
       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]]], dtype=uint8)


!!!要用自己头像的同学注意!!!!!


下面的代码修改成

!export PYTHONPATH=$PYTHONPATH:/home/aistudio/PaddleGAN && python -u tools/first-order-de


#使用命令
#具体的各参数使用说明如下
#- driving_video: 驱动视频,视频中人物的表情动作作为待迁移的对象
#- source_image: 原始图片,视频中人物的表情动作将迁移到该原始图片中的人物上
#- relative: 指示程序中使用视频和图片中人物关键点的相对坐标还是绝对坐标,建议使用相对坐标,若使用绝对坐标,会导致迁移后人物扭曲变形
#- adapt_scale: 根据关键点凸包自适应运动尺度
%cd /home/aistudio/PaddleGAN/applications/
!export PYTHONPATH=$PYTHONPATH:/home/aistudio/PaddleGAN && python -u tools/first-order-demo.py  --driving_video ~/2.MOV  --source_image /home/aistudio/result/p2c_cartoon.png --relative --adapt_scale --output  ~/work
/home/aistudio/PaddleGAN/applications
[01/23 22:30:03] ppgan INFO: Found /home/aistudio/.cache/ppgan/vox-cpk.pdparams
W0123 22:30:03.200150  2496 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0123 22:30:03.205381  2496 device_context.cc:465] device: 0, cuDNN Version: 7.6.
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/tensor/creation.py:130: DeprecationWarning: `np.object` is a deprecated alias for the builtin `object`. To silence this warning, use `object` by itself. Doing this will not modify any behavior and is safe. 
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  if data.dtype == np.object:
100%|████████████████████████████████| 109119/109119 [00:03<00:00, 33939.25it/s]
1 persons have been detected
100%|█████████████████████████████████████████| 251/251 [00:08<00:00, 30.01it/s]


3、唇语合成


#使用命令行进行预测
#face: 原始视频,视频中的人物的唇形将根据音频进行唇形合成--通俗来说,想让谁说话
#audio:驱动唇形合成的音频,视频中的人物将根据此音频进行唇形合成--通俗来说,想让这个人说什么
%cd /home/aistudio/PaddleGAN/applications
!export PYTHONPATH=$PYTHONPATH:/home/aistudio/work/PaddleGAN && python tools/wav2lip.py --face /home/aistudio/work/result.mp4 --audio /home/aistudio/syn_audio/generate.wav --outfile /home/aistudio/result/target.mp4


/home/aistudio/PaddleGAN/applications
Reading video frames...
Number of frames available for inference: 251
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/librosa/core/constantq.py:1059: DeprecationWarning: `np.complex` is a deprecated alias for the builtin `complex`. To silence this warning, use `complex` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.complex128` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype=np.complex,
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/librosa/util/utils.py:2099: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  np.dtype(np.float): np.complex,
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/librosa/util/utils.py:2099: DeprecationWarning: `np.complex` is a deprecated alias for the builtin `complex`. To silence this warning, use `complex` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.complex128` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  np.dtype(np.float): np.complex,
Length of mel chunks: 344
W0123 22:40:21.509095  3459 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0123 22:40:21.513870  3459 device_context.cc:465] device: 0, cuDNN Version: 7.6.
Model loaded
  0%|                                                     | 0/3 [00:00<?, ?it/s]
  0%|                                                    | 0/16 [00:00<?, ?it/s][A/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/tensor/creation.py:130: DeprecationWarning: `np.object` is a deprecated alias for the builtin `object`. To silence this warning, use `object` by itself. Doing this will not modify any behavior and is safe. 
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  if data.dtype == np.object:
  6%|██▊                                         | 1/16 [00:00<00:05,  2.92it/s][A
 12%|█████▌                                      | 2/16 [00:00<00:04,  2.91it/s][A
 19%|████████▎                                   | 3/16 [00:01<00:04,  2.94it/s][A
 25%|███████████                                 | 4/16 [00:01<00:04,  2.96it/s][A
 31%|█████████████▊                              | 5/16 [00:01<00:03,  2.94it/s][A
 38%|████████████████▌                           | 6/16 [00:02<00:03,  2.96it/s][A
 44%|███████████████████▎                        | 7/16 [00:02<00:03,  2.88it/s][A
 50%|██████████████████████                      | 8/16 [00:02<00:02,  2.88it/s][A
 56%|████████████████████████▊                   | 9/16 [00:03<00:02,  2.90it/s][A
 62%|██████████████████████████▉                | 10/16 [00:03<00:02,  2.83it/s][A
 69%|█████████████████████████████▌             | 11/16 [00:03<00:01,  2.77it/s][A
 75%|████████████████████████████████▎          | 12/16 [00:04<00:01,  2.69it/s][A
 81%|██████████████████████████████████▉        | 13/16 [00:04<00:01,  2.64it/s][A
 88%|█████████████████████████████████████▋     | 14/16 [00:04<00:00,  2.67it/s][A
 94%|████████████████████████████████████████▎  | 15/16 [00:05<00:00,  2.70it/s][A
100%|███████████████████████████████████████████| 16/16 [00:05<00:00,  3.04it/s][A
100%|█████████████████████████████████████████████| 3/3 [00:07<00:00,  3.36s/it]
ffmpeg version 2.8.15-0ubuntu0.16.04.1 Copyright (c) 2000-2018 the FFmpeg developers
  built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.10) 20160609
  configuration: --prefix=/usr --extra-version=0ubuntu0.16.04.1 --build-suffix=-ffmpeg --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --cc=cc --cxx=g++ --enable-gpl --enable-shared --disable-stripping --disable-decoder=libopenjpeg --disable-decoder=libschroedinger --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmodplug --enable-libmp3lame --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-librtmp --enable-libschroedinger --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxvid --enable-libzvbi --enable-openal --enable-opengl --enable-x11grab --enable-libdc1394 --enable-libiec61883 --enable-libzmq --enable-frei0r --enable-libx264 --enable-libopencv
  libavutil      54. 31.100 / 54. 31.100
  libavcodec     56. 60.100 / 56. 60.100
  libavformat    56. 40.101 / 56. 40.101
  libavdevice    56.  4.100 / 56.  4.100
  libavfilter     5. 40.101 /  5. 40.101
  libavresample   2.  1.  0 /  2.  1.  0
  libswscale      3.  1.101 /  3.  1.101
  libswresample   1.  2.101 /  1.  2.101
  libpostproc    53.  3.100 / 53.  3.100
[0;33mGuessed Channel Layout for  Input Stream #0.0 : mono
[0mInput #0, wav, from '/home/aistudio/syn_audio/generate.wav':
  Duration: 00:00:11.60, bitrate: 352 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, 1 channels, s16, 352 kb/s
Input #1, avi, from 'temp/result.avi':
  Metadata:
    encoder         : Lavf58.31.101
  Duration: 00:00:11.47, start: 0.000000, bitrate: 522 kb/s
    Stream #1:0: Video: mpeg4 (Simple Profile) (DIVX / 0x58564944), yuv420p, 256x256 [SAR 1:1 DAR 1:1], 514 kb/s, 30 fps, 30 tbr, 30 tbn, 30 tbc
[1;36m[libx264 @ 0x128e080] [0m[0;33m-qscale is ignored, -crf is recommended.
[0m[1;36m[libx264 @ 0x128e080] [0musing SAR=1/1
[1;36m[libx264 @ 0x128e080] [0musing cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 AVX2 LZCNT BMI2
[1;36m[libx264 @ 0x128e080] [0mprofile High, level 1.3
[1;36m[libx264 @ 0x128e080] [0m264 - core 148 r2643 5c65704 - H.264/MPEG-4 AVC codec - Copyleft 2003-2015 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=8 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00


4、清晰化视频!


#对成品视频再次进行超分,这次选用的是针对视频超分的EDVR模型
%cd /home/aistudio/PaddleGAN/applications/
!python tools/video-enhance.py --input /home/aistudio/result/target.mp4 \
                               --process_order EDVR \
                               --output output_dir


/home/aistudio/PaddleGAN/applications
Model EDVR process start..
W0123 22:40:49.740236  3580 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0123 22:40:49.745282  3580 device_context.cc:465] device: 0, cuDNN Version: 7.6.
[01/23 22:40:54] ppgan INFO: Found /home/aistudio/.cache/ppgan/EDVR_L_w_tsa_SRx4.pdparams
100%|█████████████████████████████████████████| 345/345 [01:52<00:00,  3.09it/s]
Model EDVR output frames path: output_dir/EDVR/target/frames_pred/%08d.png
Model EDVR output video path: output_dir/EDVR/target_edvr_out.mp4
Model EDVR process done!
#给视频配上音乐
!ffmpeg -y -i /home/aistudio/syn_audio/generate.wav -i /home/aistudio/PaddleGAN/applications/output_dir/EDVR/target_edvr_out.mp4 -strict -2 -q:v 1 /home/aistudio/new_target.mp4


操作指南及疑难解答


使用指南:

对应的输入自定义模块的数据即可,推荐地址使用绝对路径(GAN里面会改变路径有些时候会出错)

最后的结果会出现在主目录中,文件名为new_target.mp4


疑难解答:


1、已经跑完一遍了,想要换一个音色,但是报错了!!!

解决方案:在数据处理下面添加一个Code里面写入

%cd /home/aistudio/


2、图像处理部分报错,没有识别到头像

更换您的头像,有可能是代码累了无法识别

3、dlib库无法加载

使用至尊版进行读取

4、语音只有11s左右


这个是由于准备的音频只有这么长,但是希望给智能小哥哥小姐姐一个休息的时间,一口气太长了,让他们喘口气吧~~~


数据文件地址


生成音频:/syn_audio/generate.wav

裁剪头像:/esult/p2c_photo.png

漫画风头像:result/p2c_cartoon.png

合成视频:result/target.mp4

超分后视频:new_target.mp4


传说中的飞桨社区最菜代码人,让我们一起努力!

记住:三岁出品必是精品 (不要脸系列)


相关实践学习
部署Stable Diffusion玩转AI绘画(GPU云服务器)
本实验通过在ECS上从零开始部署Stable Diffusion来进行AI绘画创作,开启AIGC盲盒。
目录
相关文章
|
6天前
|
人工智能 自然语言处理 前端开发
Director:构建视频智能体的 AI 框架,用自然语言执行搜索、编辑、合成和生成等复杂视频任务
Director 是一个构建视频智能体的 AI 框架,用户可以通过自然语言命令执行复杂的视频任务,如搜索、编辑、合成和生成视频内容。该框架基于 VideoDB 的“视频即数据”基础设施,集成了多个预构建的视频代理和 AI API,支持高度定制化,适用于开发者和创作者。
59 9
Director:构建视频智能体的 AI 框架,用自然语言执行搜索、编辑、合成和生成等复杂视频任务
|
5天前
|
机器学习/深度学习 人工智能 自然语言处理
MMAudio:开源 AI 音频合成项目,根据视频或文本生成同步的音频
MMAudio 是一个基于多模态联合训练的高质量 AI 音频合成项目,能够根据视频内容或文本描述生成同步的音频。该项目适用于影视制作、游戏开发、虚拟现实等多种场景,提升用户体验。
42 7
MMAudio:开源 AI 音频合成项目,根据视频或文本生成同步的音频
|
13天前
|
机器学习/深度学习 存储 人工智能
EfficientTAM:Meta AI推出的视频对象分割和跟踪模型
EfficientTAM是Meta AI推出的轻量级视频对象分割和跟踪模型,旨在解决SAM 2模型在移动设备上部署时的高计算复杂度问题。该模型采用非层次化Vision Transformer(ViT)作为图像编码器,并引入高效记忆模块,以降低计算复杂度,同时保持高质量的分割结果。EfficientTAM在多个视频分割基准测试中表现出与SAM 2相当的性能,具有更快的处理速度和更少的参数,特别适用于移动设备上的视频对象分割应用。
33 9
EfficientTAM:Meta AI推出的视频对象分割和跟踪模型
|
1天前
|
人工智能 小程序 API
【一步步开发AI运动小程序】十七、如何识别用户上传视频中的人体、运动、动作、姿态?
【云智AI运动识别小程序插件】提供人体、运动、姿态检测的AI能力,支持本地原生识别,无需后台服务,具有速度快、体验好、易集成等优点。本文介绍如何使用该插件实现用户上传视频的运动识别,包括视频解码抽帧和人体识别的实现方法。
|
20天前
|
机器学习/深度学习 人工智能 自然语言处理
LTX Video:Lightricks推出的开源AI视频生成模型
LTX Video是由Lightricks推出的开源AI视频生成模型,能够在4秒内生成5秒的高质量视频。该模型基于2亿参数的DiT架构,确保帧间平滑运动和结构一致性,支持长视频制作,适用于多种场景,如游戏图形升级和电子商务广告变体制作。
77 1
LTX Video:Lightricks推出的开源AI视频生成模型
存储 人工智能 自然语言处理
44 6
|
2月前
|
人工智能 编解码 API
【选择”丹摩“深入探索智谱AI的CogVideoX:视频生成的新前沿】
【选择”丹摩“深入探索智谱AI的CogVideoX:视频生成的新前沿】
|
3月前
|
人工智能
防AI换脸视频诈骗,中电金信联合复旦提出多模态鉴伪法,还入选顶会ACM MM
【9月更文挑战第26天】中电金信与复旦大学合作,提出一种基于身份信息增强的多媒体伪造检测方法,并入选ACM MM国际会议。该方法利用身份信息作为检测线索,构建了含54位名人324个视频的多模态伪造数据集IDForge,设计了参考辅助的多模态伪造检测网络R-MFDN,显著提升了检测性能,准确率达到92.90%。尽管如此,该方法仍存在一定局限性,如对非英语国家数据及无明确身份信息的视频检测效果可能受限。
86 4
|
2月前
|
人工智能 自然语言处理 搜索推荐
Sora - 探索AI视频模型的无限可能
这篇文章详细介绍了Sora AI视频模型的技术特点、应用场景、未来展望以及伦理和用户体验等方面的问题。
37 0
|
4月前
|
机器学习/深度学习 人工智能 编解码