LoRA Training 训练_问答-阿里云开发者社区

中文
背景说明：
我计划为 wan2.2 视频模型训练一个角色 LoRA（安妮 / Annie），用于动画制作，写实 3D 风格。
此前我从未训练过 LoRA，但今天我已经成功部署了 DiffSynth-Studio，并完成了一个官方示例的训练流程。
现在我希望正式开始我的角色 LoRA 训练工作，但在数据集构建与标注方面仍有许多疑问。
我期望这个安妮 LoRA 的使用方式类似于：
“安妮在小溪边看书，另一个人物在旁边钓鱼。”
我希望生成结果中：
安妮的外观始终正确、稳定
其他角色不会继承安妮的外观特征
一、训练数据集 —— 图片相关问题
1.1 训练样本中是否需要脸部特写图片？
1.2 训练样本中是否需要上半身图片？
1.3 训练样本中是否需要全身图片？
1.4 是否需要包含多种表情（哭、笑、怒等）的图片？
1.5 是否需要背面视角图片？（我可以提供）
1.6 是否需要上下左右 360° 全角度图片？（我可以提供）
1.7 是否需要多种姿势（蹲、坐、站、跑、跳等）？（我可以提供）
1.8 是否需要不同服装（款式、颜色等）？（我可以提供）
1.9 是否需要不同发型（长、短、不同款式）？（我可以提供）
1.10 是否需要不同纯色背景（纯白、纯灰、纯黑等）？（我可以提供）
1.11 训练集中是否应尽量避免帽子？
1.12 是否还有其他重要的图片数据补充建议？
二、训练数据集 —— 描述 / 标注（Caption）相关问题
2.1 训练描述中是否应将角色名（Annie）放在第一位？
2.2 是否需要描述五官细节（眼睛、嘴巴、鼻子、脸型等）？
2.3 是否需要描述镜头/视角（远景、近景、左、右、俯视等）？
2.4 是否需要描述表情？
2.5 是否需要描述姿势/动作？
2.6 是否需要描述服装（款式、颜色等）？
2.7 如果发型是固定的，是否仍需要在描述中标注发型？
2.8 如果发型不固定，是否需要在描述中标注发型？
2.9 如果出现帽子，是否应在描述中明确标注？
2.10 是否需要描述背景（纯色或场景）？
2.11 是否需要明确标注人物风格（写实 3D）？
2.12 是否需要描述性别？
2.13 是否需要描述年龄？
2.14 是否需要描述体型？
2.15 是否还有其他重要的描述补充建议？
2.16 为了避免其他角色继承安妮的外观，是否需要在描述中强化安妮的独有特征？
（例如：只有安妮拥有红色头发）
三、训练数据集 —— 视频样本相关问题
3.1 是否需要视频样本（例如：围绕安妮 360° 旋转的视频）？
3.2 是否需要视频样本（例如：对安妮进行缓慢推近 / 拉远的镜头）？
非常感谢任何形式的帮助，哪怕只回答其中一项问题也非常感激 🙏

====================

English
Background:
I plan to train a character LoRA (Annie) for the wan2.2 video model, intended for animation production in a realistic 3D style.
I have never trained a LoRA before, but today I successfully deployed DiffSynth-Studio and completed training using one of the official example projects.
Now I would like to officially begin my character LoRA training workflow, and I still have many questions regarding dataset construction and captioning.
My intended usage of the Annie LoRA is something like:
“Annie is reading by a stream, while another character is fishing nearby.”
My goal is:
Annie’s appearance remains correct and consistent
Other characters do NOT inherit Annie’s appearance

Training Dataset — Image-Related Questions
1.1 Do training samples require close-up facial images?
1.2 Do training samples require upper-body shots?
1.3 Do training samples require full-body images?
1.4 Should the dataset include various facial expressions (crying, smiling, angry, etc.)?
1.5 Are back-view images required? (I can provide them)
1.6 Are full 360-degree angle images (top, bottom, left, right) required? (I can provide them)
1.7 Should the dataset include various poses (squatting, sitting, standing, running, jumping, etc.)? (I can provide them)
1.8 Should the dataset include different outfits (styles, colors, etc.)? (I can provide them)
1.9 Should the dataset include different hairstyles (long, short, various styles)? (I can provide them)
1.10 Should the dataset include different solid-color backgrounds (pure white, gray, black, etc.)? (I can provide them)
1.11 Should hats be avoided in the training dataset?
1.12 Are there any other important image-related recommendations?
Training Dataset — Caption / Description Questions
2.1 Should the character name (Annie) be placed at the beginning of each caption?
2.2 Should facial features (eyes, mouth, nose, face shape, etc.) be described?
2.3 Should camera distance and angles (close-up, wide shot, left, right, top-down, etc.) be described?
2.4 Should facial expressions be described?
2.5 Should poses or actions be described?
2.6 Should clothing details (style, color, etc.) be described?
2.7 If the hairstyle is fixed, should it still be described?
2.8 If the hairstyle is not fixed, should it be described?
2.9 If hats appear, should their presence be explicitly described?
2.10 Should the background (solid color or scene) be described?
2.11 Should the character style (realistic 3D) be explicitly stated?
2.12 Should gender be described?
2.13 Should age be described?
2.14 Should body type be described?
2.15 Are there any additional important captioning recommendations?
2.16 To prevent other characters from inheriting Annie’s appearance, should captions emphasize Annie’s unique features?
(e.g., “Only Annie has red hair”)
Training Dataset — Video Sample Questions
3.1 Are video samples required (e.g., a 360-degree rotation video of Annie)?
3.2 Are video samples required (e.g., a slow zoom-in / zoom-out shot of Annie)?
I would greatly appreciate any clarification — even answering just one of these questions would be extremely helpful. 🙏

LoRA Training 训练

千问大模型

相关文章

热门讨论

热门文章