用ModelScope带你制作小动画-阿里云开发者社区

用ModelScope带你制作小动画

2022-08-14 48899

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

模型在线服务 PAI-EAS，A10/V100等 500元 1个月

模型训练 PAI-DLC，5000CU*H 3个月

交互式建模 PAI-DSW，每月250计算时 3个月

简介： 本文带你利用ModelScope把实际拍摄的视频转换为动画，基本原理是把视频解码成图像，使用人像卡通化模型对视频逐帧进行卡通化，再把多帧图像合并成视频，从而完成动画生成

平台概览

近日阿里发布的ModelScope(https://modelscope.cn/#/models) 平台，意在打造开源的模型即服务共享平台，为泛AI开发者提供灵活、易用、低成本的一站式模型服务产品，让模型应用更简单。这个平台上模型丰富度还可以，目前一共138个模型，其中55个可以通过在线demo体验效果，4个可以支持finetune(finetune还有待加强)。

卡通化模型介绍

打开模型库，映入眼帘的就是人像卡通化模型，不得不说这页面做的还挺好看的，不知道模型实际效果怎么样，是不是只是表面功夫做的好，那就拿第一个模型来小试牛刀把。

看模型页面上动图的转换效果还是挺好的，不禁想着，利用你给视频做个卡通化，那我是不是就可以生成动画了？如果视频中没有人像，背景的卡通化效果是不是很好？视频中有各种质量参差不齐的帧，正好也可以用来测试下模型的各种corner case。话不多说，那就让我们开始把。

ModelScope的基本使用和环境搭建我就不重复了，大家自行参考文档：

https://modelscope.cn/#/docs/%E5%BF%AB%E9%80%9F%E5%BC%80%E5%A7%8B

视频卡通化

先展示一个大概效果

首先，我们要理解下视频和图片的区别，视频一般是有视频和音轨两部分，视频部分可以理解为是若干张连续图片，如果我们可以吧这些图片提取出来，逐帧利用模型做卡通化生成新的图像，再把生成的图像合成视频，就可以得到一个小动画了。

因此，我们可以把视频卡通化拆解为如下步骤：

视频解码
批量图片卡通化
视频合成
音轨恢复

下面我们将分步骤介绍实现，并在本文最后给出了完成的可执行python代码

视频解码

首先，我们利用opencv进行视频解码，把解码后的图片存放在frames中

video=cv2.VideoCapture(video_file)
if (video.isOpened() ==False): 
print("Error reading video file")
# Read frameframes= []
i=0while(video.isOpened()):
i+=1# Capture frame-by-frameif(i%10):
print(f'loading {i} frames')
ret, frame=video.read()
ifret==True:
frames.append(frame)
else:
break# When everything done, release the video capture objectvideo.release()
print('loading video done.')

批量图片卡通化

首先，初始化卡通化pipeline

img_cartoon=pipeline('image-portrait-stylization', model='damo/cv_unet_person-image-cartoon_compound-models')

看文档示例，pipeline支持图片文件名输入，不知道是不是支持图片数据的直接输入，自己尝试了下OK，然后又根据官方pipeline的使用文档(https://modelscope.cn/#/docs/%E6%A8%A1%E5%9E%8B%E7%9A%84%E6%8E%A8%E7%90%86Pipeline) 介绍，觉得应该是支持多张图片的输入，小心翼翼的尝试后果然也是可以的，为我的小机智感到自豪，也不得不说官方的文档细节做的还是不到位。

最终，采取如下代码完成批量图片卡通化，并把卡通化后的图片存在 result_frames 里

results=img_cartoon(frames)
result_frames= [r['output_img'] forrinresults]

另外不得不吐槽一下，多张图片本地gpu推理跑的是真慢啊，后来发现全是用的cpu。

视频合成

最后，通过opencv把多张图片再合成为视频，这里需要强调一下：

输出图片的尺寸和输入图片不同，因此再设置输出视频大小的时候不能采用原始输入视频的大小，这个地方把我坑了半个小时debug。

frame_height, frame_width, _=result_frames[0].shapesize= (frame_width, frame_height)
# FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions.# To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`.#  r, _, _, _ = lstsq(X, U)foridxinrange(len(result_frames)):
result_frames[idx] =result_frames[idx].astype(np.uint8)
print(f'saving video to file {out_file}')
out=cv2.VideoWriter(out_file,cv2.VideoWriter_fourcc(*'mp4v'), fps, size)
forfinresult_frames:
out.write(f)
out.release()
print(f'saving video done')

此外，在保存视频的时候还发现另一个问题，有些视频帧转换会打印如下日志，导致输出的图片不再是unint8，而是float32类型，因此加上强制把每帧图片转换为uint8的逻辑

FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`.
r, _, _, _ = lstsq(X, U)

音轨还原

opencv没有提取音轨，音轨和视频合成的功能，我们利用MoviePy(https://zulko.github.io/moviepy/)来完成。

首先安装MoviePy

pip install ffmpeg moviepy

利用moviepy提取原始音轨

importmoviepy.editorasmpaudio_file='out.mp3'my_clip=mp.VideoFileClip(video_file)
my_clip.audio.write_audiofile(audio_file)

读取合成视频和原始音轨，生成带有声音的动画

frommoviepy.editorimportVideoFileClip, AudioFileClip# loading video dsa gfg intro videoclip=VideoFileClip(out_tmp_file)
# loading audio fileaudioclip=AudioFileClip(audio_file)
# adding audio to the video clipvideoclip=clip.set_audio(audioclip)
videoclip.write_videofile(out_file)
# save to gif# videoclip.write_gif(out_gif_file)

效果展示

鼓浪屿的树-详细对比

树的这个视频我认为是尝试的几个视频中效果最好的，虽然没有任务，但是效果看起来挺好的，看起来卡通化的背景部分对于纹理比较密集、颜色多样的画面效果会比较好。

大海

大海的效果还行，海鸥也都可以看到，注意这个是没有人像的背景卡通化，看起来和可以接受

沙滩

这个视频转换效果不太好，原始视频上孩子的脸部没有强光，不知道为什么卡通化后脸上出现了异样。

问题整理

上传图片后模型一直处于模型加载过程，用户群反馈后已修复

看起来一直没有用gpu，虽然显存占用了，但是在用cpu计算，cpu利用率很高

第三个沙滩视频孩子脸部的badcase有待定位解决。

完整代码

使用方法：

修改video_file变量指向你的输入视频路径

修改out_file变量指定输出视频路径

python运行如下代码即可

importcv2importnumpyasnpfrommodelscope.hub.snapshot_downloadimportsnapshot_downloadfrommodelscope.pipelinesimportpipelinefrommoviepy.editorimportVideoFileClip, AudioFileClipimportlogginglogging.basicConfig(level=logging.INFO)
img_cartoon=pipeline('image-portrait-stylization', model='damo/cv_unet_person-image-cartoon_compound-models')
video_file='apps/gulangyu-tree.mp4'out_file='apps/gulangyu-tree_out.mp4'out_tmp_file='video_tmp.mp4'audio_file='audio_tmp.mp3'my_clip=VideoFileClip(video_file)
my_clip.audio.write_audiofile(audio_file)
logging.info('save audio file done')
logging.info(f'load video {video_file}')
video=cv2.VideoCapture(video_file)
fps=video.get(cv2.CAP_PROP_FPS)
if (video.isOpened() ==False): 
logging.info("Error reading video file")
# Read frameframes= []
i=0while(video.isOpened()):
i+=1# Capture frame-by-frameif(i%10):
logging.info(f'loading {i} frames')
ret, frame=video.read()
ifret==True:
# Display the resulting frameframes.append(frame)
else:
break# When everything done, release the video capture objectvideo.release()
logging.info('loading video done.')
results=img_cartoon(frames)
result_frames= [r['output_img'] forrinresults]
# We need to set resolutions for writing video and  convert them from float to integer.frame_height, frame_width, _=result_frames[0].shapesize= (frame_width, frame_height)
# FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions.# To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`.#  r, _, _, _ = lstsq(X, U)foridxinrange(len(result_frames)):
result_frames[idx] =result_frames[idx].astype(np.uint8)
logging.info(f'saving video to file {out_tmp_file}')
out=cv2.VideoWriter(out_tmp_file,cv2.VideoWriter_fourcc(*'mp4v'), fps, size)
forfinresult_frames:
out.write(f)
out.release()
logging.info(f'saving video done')
logging.info(f'merging audio and video')
# loading video dsa gfg intro videoclip=VideoFileClip(out_tmp_file)
# loading audio fileaudioclip=AudioFileClip(audio_file)
# adding audio to the video clipvideoclip=clip.set_audio(audioclip)
videoclip.write_videofile(out_file)
# save to gif# videoclip.write_gif(out_gif_file)logging.info('finished!')

用ModelScope带你制作小动画

平台概览

卡通化模型介绍

视频卡通化

视频解码

批量图片卡通化

视频合成

音轨还原

效果展示

鼓浪屿的树-详细对比

大海

沙滩

问题整理

完整代码

ModelScope模型即服务

热门文章

最新文章

相关课程

相关电子书

相关实验场景