近1个月来，AnimateDiff 无疑是AI动画/视频生成领域的一匹黑马，以“效果丝滑、稳定、无闪烁”等好评斩获“Stable Diffusion封神插件”称号。

AnimateDiff是一个可以对文生图模型进行动画处理的实用框架，无需进行特定模型调整，即可一次性为大多数现有的个性化文本转图像模型提供动画化能力，目前项目已开源，魔搭社区可下载体验，本文将提供社区推理、训练教程，欢迎各位开发者小伙伴来玩！

AnimateDiff 技术简介

论文：AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning（https://arxiv.org/abs/2307.04725）

Github：https://github.com/guoyww/AnimateDiff

常见的Text to Video 方法是在原始的文生图模型中加入时间建模，并在视频数据集上对模型进行调整。但普通用户通常无法负担敏感的超参数调整、大量个性化视频数据集收集和密集的计算资源，让个性化Text to Video这项工作富有挑战。

AnimateDiff 提出了一套新的方法，其核心原理为将一个新初始化的运动建模模块附加到冻结的基于文本到图像的模型上，并在此后的视频剪辑中对其进行训练，以提炼出合理的运动先验知识。一旦训练完成，通过简单地注入这个运动建模模块，所有从相同基础模型派生的个性化版本都可以立即成为以文本驱动的模型，可以生成多样化和个性化的动画图像。

把时序模块拆解出来，不改动原预训练模型的基础上，提供一个即插即用的模型

并且通过实验证明，运动先验可以推广到 3D 动画片和 2D 动漫等领域，即AnimateDiff 可以为个性化动画提供一个简单而有效的基线，用户只需承担个性化图像模型的成本，就能快速获得自然的个性化动画，以下是官方提供的一些优秀作品效果：

更多官方示例效果详见：https://animatediff.github.io/

魔搭社区实践教程

环境配置与安装

python 3.8及以上版本
pytorch 1.12及以上版本，推荐2.0及以上版本
建议使用CUDA 11.4及以上

本文主要演示的模型推理代码可在魔搭社区免费实例PAI-DSW的配置下运行（显存24G）：

第一步：点击模型右侧Notebook快速开发按钮，选择GPU环境

第二步：新建Notebook

安装依赖库

pip install peft diffusers -U

模型链接

Animatediff模型存在多个版本，目前魔搭上线的模型有：

https://www.modelscope.cn/models/Shanghai_AI_Laboratory/animatediff-motion-adapter-v1-5-2/summary

https://www.modelscope.cn/models/Shanghai_AI_Laboratory/animatediff-motion-adapter-v1-5/summary

https://www.modelscope.cn/models/Shanghai_AI_Laboratory/animatediff-motion-adapter-v1-4/summary

上述三个模型是三个版本的AnimateDiff模型，推荐使用1.5.2版本模型。

模型推理

环境配置完成后，打开Notebook：

import torch
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler
from diffusers.utils import export_to_gif
from modelscope import snapshot_download
model_dir = snapshot_download("Shanghai_AI_Laboratory/animatediff-motion-adapter-v1-5-2")
# Load the motion adapter
adapter = MotionAdapter.from_pretrained(model_dir)
# load SD 1.5 based finetuned model
model_id = snapshot_download("wyj123456/Realistic_Vision_V5.1_noVAE")
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter)
scheduler = DDIMScheduler.from_pretrained(
    model_id, subfolder="scheduler", clip_sample=False, timestep_spacing="linspace", steps_offset=1
)
pipe.scheduler = scheduler
# enable memory savings
pipe.enable_vae_slicing()
pipe.enable_model_cpu_offload()
output = pipe(
    prompt=(
        "masterpiece, bestquality, highlydetailed, ultradetailed, sunset, "
        "orange sky, warm lighting, fishing boats, ocean waves seagulls, "
        "rippling water, wharf, silhouette, serene atmosphere, dusk, evening glow, "
        "golden hour, coastal landscape, seaside scenery"
    ),
    negative_prompt="bad quality, worse quality",
    num_frames=16,
    guidance_scale=7.5,
    num_inference_steps=25,
    generator=torch.Generator("cpu").manual_seed(42),
)
frames = output.frames[0]
export_to_gif(frames, "animation.gif")

生成效果如下：

640 (7).gif

AnimateDiff系列模型也支持多种镜头LoRA模型，使用该系列模型可以实现图像旋转、放大缩小等能力：

https://www.modelscope.cn/models/Shanghai_AI_Laboratory/animatediff-motion-lora-zoom-out/summary

https://www.modelscope.cn/models/Shanghai_AI_Laboratory/animatediff-motion-lora-zoom-in/summary

https://www.modelscope.cn/models/Shanghai_AI_Laboratory/animatediff-motion-lora-tilt-up/summary

https://www.modelscope.cn/models/Shanghai_AI_Laboratory/animatediff-motion-lora-tilt-down/summary

https://www.modelscope.cn/models/Shanghai_AI_Laboratory/animatediff-motion-lora-rolling-anticlockwise/summary

https://www.modelscope.cn/models/Shanghai_AI_Laboratory/animatediff-motion-lora-rolling-clockwise/summary

https://www.modelscope.cn/models/Shanghai_AI_Laboratory/animatediff-motion-lora-pan-left/summary

https://www.modelscope.cn/models/Shanghai_AI_Laboratory/animatediff-motion-lora-pan-right/summary

这些模型可以通过如下方式叠加使用：

import torch
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler
from diffusers.utils import export_to_gif
from modelscope import snapshot_download
model_dir = snapshot_download("Shanghai_AI_Laboratory/animatediff-motion-adapter-v1-5-2")
# Load the motion adapter
adapter = MotionAdapter.from_pretrained(model_dir)
# load SD 1.5 based finetuned model
model_id = snapshot_download("wyj123456/Realistic_Vision_V5.1_noVAE")
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter)
lora_dir = snapshot_download("Shanghai_AI_Laboratory/animatediff-motion-lora-tilt-down")
pipe.load_lora_weights(lora_dir, adapter_name="tilt-down")
pipe.set_adapters(["tilt-down"], adapter_weights=[1.0])
scheduler = DDIMScheduler.from_pretrained(
    model_id, subfolder="scheduler", clip_sample=False, timestep_spacing="linspace", steps_offset=1
)
pipe.scheduler = scheduler
# enable memory savings
pipe.enable_vae_slicing()
pipe.enable_model_cpu_offload()
output = pipe(
    prompt=(
        "masterpiece, bestquality, highlydetailed, ultradetailed, sunset, "
        "orange sky, warm lighting, fishing boats, ocean waves seagulls, "
        "rippling water, wharf, silhouette, serene atmosphere, dusk, evening glow, "
        "golden hour, coastal landscape, seaside scenery"
    ),
    negative_prompt="bad quality, worse quality",
    num_frames=16,
    guidance_scale=7.5,
    num_inference_steps=25,
    generator=torch.Generator("cpu").manual_seed(42),
)
frames = output.frames[0]
export_to_gif(frames, "animation.gif")

添加了镜头倾斜后LoRA后实际的效果如下：

640 (8).gif

其他LoRA模型可以参考README中的代码在NoteBook中使用。

模型训练

AnimateDiff的训练目前在SWIFT中已经支持。SWIFT是ModelScope提供的LLM&AIGC模型训练和推理框架。我们基于官方REPO的实现，支持了全参数和LoRA两种训练模式。

在SWIFT中我们基于WebVid中提供的2.5M（250万个短视频）数据集进行了DDP训练，经过1个Epoch的训练后，得到的效果如下：

Prompt1: masterpiece, bestquality, highlydetailed, ultradetailed, Snow rocky mountains peaks canyon. Snow blanketed rocky mountains surround and shadow deep canyons

640 (9).gif

Prompt2: masterpiece, bestquality, highlydetailed, ultradetailed, girl, walking, on the street, flowers

640 (10).gif

Prompt3: masterpiece, bestquality, highlydetailed, ultradetailed, A drone view of celebration with Christmas tree and fireworks, starry sky - background

640 (11).gif

Prompt4: masterpiece, bestquality, highlydetailed, ultradetailed, beautiful house, mountain, snow top

640 (12).gif

结合以上出图效果，有兴趣的开发者可以下载WebVid 10M（一千万短视频）数据集，并根据以下步骤进行训练：

使用SWIFT训练AnimateDiff的方法非常简单，只需要先clone REPO到本地：

git clone https://github.com/modelscope/swift.git
cd swift/examples/pytorch/animatediff

开始训练之前需要将WebVid数据集下载到本地。该数据集分为两个部分，一个是文本prompt构成的索引文件（csv格式），用户需要根据其中的contentUrl字段将短视频下载到一个文件夹中，短视频名称和contentUrl中的文件名保持一致，如：

# contentUrl
https://ak.picdn.net/shutterstock/videos/1030221938/preview/stock-footage-lush-green-banana-palm-leaf-during-summer-in-rainforest-concept-of-travel-to-exotic-tropical.mp4
# 则短视频名称需要为：
stock-footage-lush-green-banana-palm-leaf-during-summer-in-rainforest-concept-of-travel-to-exotic-tropical.mp4

直接修改下面shell命令中的数据集路径即可：

# 修改scripts/full/sft.sh
# Experimental environment: A100 * 4
# 200GB GPU memory totally
PYTHONPATH=../../.. \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
torchrun --nproc_per_node=4 animatediff_sft.py \
  --model_id_or_path wyj123456/Realistic_Vision_V5.1_noVAE \
  --csv_path /path/to/webvid/results_2M_train.csv \
  --video_folder /path/to/videos \
  --sft_type full \
  --lr_scheduler_type constant \
  --trainable_modules .*motion_modules.* \
  --batch_size 4 \
  --eval_steps 100 \
  --gradient_accumulation_steps 16 \

我们的测试使用4*A100 DDP混合精度训练完成，250W视频训练时间约40小时，显存占用共200GB左右。

使用训练好的weights进行推理的命令，即可启动一个prompt命令行输出短视频：

# Experimental environment: A100
# 18GB GPU memory
PYTHONPATH=../../.. \
CUDA_VISIBLE_DEVICES=0 \
python animatediff_infer.py \
  --model_id_or_path wyj123456/Realistic_Vision_V5.1_noVAE \
  --sft_type full \
  --ckpt_dir /output/path/like/checkpoints/iter-xxx \
  --eval_human true  \

相比之下我们更推荐使用LoRA进行训练，这样可以直接加载Shanghai_AI_Laboratory/animatediff-motion-adapter-v1-5-2基于它的效果继续训练。

数据集格式同上，启动命令有所差别：

# Experimental environment: A100
# 20GB GPU memory
PYTHONPATH=../../.. \
CUDA_VISIBLE_DEVICES=0 \
python animatediff_sft.py \
  --model_id_or_path wyj123456/Realistic_Vision_V5.1_noVAE \
  --csv_path /path/to/webvid/results_2M_train.csv \
  --video_folder /path/to/videos \
  --motion_adapter_id_or_path Shanghai_AI_Laboratory/animatediff-motion-adapter-v1-5-2 \
  --sft_type lora \
  --lr_scheduler_type constant \
  --trainable_modules .*motion_modules.* \
  --batch_size 1 \
  --eval_steps 200 \
  --dataset_sample_size 10000 \
  --gradient_accumulation_steps 16 \

在这里我们使用batch_size=1，GA=16的混合精度单卡训练，占用显存约20GB左右，开发者不需要使用WebVid进数据集，而是可以使用自己特定的短视频进行训练。

LoRA训练后的推理命令也略有不同：

# Experimental environment: A100
# 18GB GPU memory
PYTHONPATH=../../.. \
CUDA_VISIBLE_DEVICES=0 \
python animatediff_infer.py \
  --model_id_or_path wyj123456/Realistic_Vision_V5.1_noVAE \
  --motion_adapter_id_or_path Shanghai_AI_Laboratory/animatediff-motion-adapter-v1-5-2 \
  --sft_type lora \
  --ckpt_dir /output/path/like/checkpoints/iter-xxx \
  --eval_human true  \

好啦，希望以上内容能抛砖引玉，期待更多AIGCer们能基于AnimateDiff 开源生态创作出越来越多的作品和玩法！

点击直达模型开源链接：https://www.modelscope.cn/models/Shanghai_AI_Laboratory/animatediff-motion-adapter-v1-5-2/summary

文生视频黑马AnimateDiff 魔搭社区最佳实践教程来啦！

AnimateDiff 技术简介

魔搭社区实践教程

模型链接

模型推理

模型训练

ModelScope模型即服务

热门文章

最新文章

相关课程

相关电子书

相关实验场景