消费级显卡微调可图Kolors最佳实践！

2024-07-15 490

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 近期，快手开源了一种名为Kolors（可图）的文本到图像生成模型，该模型具有对英语和汉语的深刻理解，并能够生成高质量、逼真的图像。

近期，快手开源了一种名为Kolors（可图）的文本到图像生成模型，该模型具有对英语和汉语的深刻理解，并能够生成高质量、逼真的图像。

魔搭社区在DiffSynth-Studio中提供了可图Kolors微调脚本。

代码开源链接：

https://github.com/Kwai-Kolors/Kolors

模型开源链接：

https://modelscope.cn/models/Kwai-Kolors/Kolors

技术报告链接：

https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/Kolors_paper.pdf

微调脚本链接：

https://github.com/modelscope/DiffSynth-Studio/tree/main/examples/train/kolors

微调最佳实践

下载模型权重

下载可图Kolors模型

modelscope download --model=Kwai-Kolors/Kolors --local_dir models/kolors/Kolors

下载额外的VAE模型（https://modelscope.cn/models/AI-ModelScope/sdxl-vae-fp16-fix）

modelscope download --model=AI-ModelScope/sdxl-vae-fp16-fix --local_dir models/kolors/sdxl-vae-fp16-fix diffusion_pytorch_model.safetensors

模型文件结构：

models
├── kolors
│   └── Kolors
│       ├── text_encoder
│       │   ├── config.json
│       │   ├── pytorch_model-00001-of-00007.bin
│       │   ├── pytorch_model-00002-of-00007.bin
│       │   ├── pytorch_model-00003-of-00007.bin
│       │   ├── pytorch_model-00004-of-00007.bin
│       │   ├── pytorch_model-00005-of-00007.bin
│       │   ├── pytorch_model-00006-of-00007.bin
│       │   ├── pytorch_model-00007-of-00007.bin
│       │   └── pytorch_model.bin.index.json
│       ├── unet
│       │   └── diffusion_pytorch_model.safetensors
│       └── vae
│           └── diffusion_pytorch_model.safetensors
└── sdxl-vae-fp16-fix
    └── diffusion_pytorch_model.safetensors

微调：

安装依赖：

pip install peft lightning pandas torchvision

数据准备：

我们准备了一些开源数据集：

柯基小狗数据集：

https://modelscope.cn/datasets/buptwq/lora-stable-diffusion-finetune

文生图风格定制数据集（metadata做了汉化）：

https://modelscope.cn/datasets/iic/style_custom_dataset

数据集按照如下格式：

data/dog/
└── train
    ├── 00.jpg
    ├── 01.jpg
    ├── 02.jpg
    ├── 03.jpg
    ├── 04.jpg
    └── metadata.csv

metadata.csv:

file_name,text
00.jpg,一只小狗
01.jpg,一只小狗
02.jpg,一只小狗
03.jpg,一只小狗
04.jpg,一只小狗

训练lora模型：

我们提供了训练脚本 train_kolors_lora.py，在运行该训练脚本之前，需要先clone本项目

https://github.com/modelscope/DiffSynth-Studio.git
cd DiffSynth-Studio

采用以下设置，需要22GB VRAM

CUDA_VISIBLE_DEVICES="0" python examples/train/kolors/train_kolors_lora.py \
  --pretrained_unet_path models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors \
  --pretrained_text_encoder_path models/kolors/Kolors/text_encoder \
  --pretrained_fp16_vae_path models/sdxl-vae-fp16-fix/diffusion_pytorch_model.safetensors \
  --dataset_path data/dog \
  --output_path ./models \
  --max_epochs 10 \
  --center_crop \
  --use_gradient_checkpointing \
  --precision "16-mixed"

可选参数：

-h, --help            show this help message and exit
  --pretrained_unet_path PRETRAINED_UNET_PATH
                        Path to pretrained model (UNet). For example, `models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors`.
  --pretrained_text_encoder_path PRETRAINED_TEXT_ENCODER_PATH
                        Path to pretrained model (Text Encoder). For example, `models/kolors/Kolors/text_encoder`.
  --pretrained_fp16_vae_path PRETRAINED_FP16_VAE_PATH
                        Path to pretrained model (VAE). For example, `models/kolors/Kolors/sdxl-vae-fp16-fix/diffusion_pytorch_model.safetensors`.
  --dataset_path DATASET_PATH
                        The path of the Dataset.
  --output_path OUTPUT_PATH
                        Path to save the model.
  --steps_per_epoch STEPS_PER_EPOCH
                        Number of steps per epoch.
  --height HEIGHT       Image height.
  --width WIDTH         Image width.
  --center_crop         Whether to center crop the input images to the resolution. If not set, the images will be randomly cropped. The images will be resized to the resolution first before cropping.
  --random_flip         Whether to randomly flip images horizontally
  --batch_size BATCH_SIZE
                        Batch size (per device) for the training dataloader.
  --dataloader_num_workers DATALOADER_NUM_WORKERS
                        Number of subprocesses to use for data loading. 0 means that the data will be loaded in the main process.
  --precision {32,16,16-mixed}
                        Training precision
  --learning_rate LEARNING_RATE
                        Learning rate.
  --lora_rank LORA_RANK
                        The dimension of the LoRA update matrices.
  --lora_alpha LORA_ALPHA
                        The weight of the LoRA update matrices.
  --use_gradient_checkpointing
                        Whether to use gradient checkpointing.
  --accumulate_grad_batches ACCUMULATE_GRAD_BATCHES
                        The number of batches in gradient accumulation.
  --training_strategy {auto,deepspeed_stage_1,deepspeed_stage_2,deepspeed_stage_3}
                        Training strategy
  --max_epochs MAX_EPOCHS
                        Number of epochs.

训练后推理

训练完成后，可以使用自己训练的LoRA来生成新图像。以下是一些示例：

from diffsynth import ModelManager, KolorsImagePipeline
from peft import LoraConfig, inject_adapter_in_model
import torch
def load_lora(model, lora_rank, lora_alpha, lora_path):
    lora_config = LoraConfig(
        r=lora_rank,
        lora_alpha=lora_alpha,
        init_lora_weights="gaussian",
        target_modules=["to_q", "to_k", "to_v", "to_out"],
    )
    model = inject_adapter_in_model(lora_config, model)
    state_dict = torch.load(lora_path, map_location="cpu")
    model.load_state_dict(state_dict, strict=False)
    return model
# Load models
model_manager = ModelManager(torch_dtype=torch.float16, device="cuda",
                             file_path_list=[
                                 "models/kolors/Kolors/text_encoder",
                                 "models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors",
                                 "models/kolors/Kolors/vae/diffusion_pytorch_model.safetensors"
                             ])
pipe = KolorsImagePipeline.from_model_manager(model_manager)
# Generate an image with lora
pipe.unet = load_lora(
    pipe.unet,
    lora_rank=4, lora_alpha=4.0, # The two parameters should be consistent with those in your training script.
    lora_path="path/to/your/lora/model/lightning_logs/version_x/checkpoints/epoch=x-step=xxx.ckpt"
)
torch.manual_seed(0)
image = pipe(
    prompt="一只小狗蹦蹦跳跳，周围是姹紫嫣红的鲜花，远处是山脉",
    negative_prompt="",
    cfg_scale=4,
    num_inference_steps=50, height=1024, width=1024,
)
image.save("image_with_lora.jpg")

柯基lora：

Prompt: 一只小狗蹦蹦跳跳，周围是姹紫嫣红的鲜花，远处是山脉

3D风格lora：

Prompt：一只小狗和一只小猫3D

点击链接👇直达链接

https://modelscope.cn/models/Kwai-Kolors/Kolors?from=alizishequ__text

文章标签：

物联网

消费级显卡微调可图Kolors最佳实践！

微调最佳实践

下载模型权重

微调：

安装依赖：

数据准备：

训练lora模型：

训练后推理

热门文章

最新文章

相关课程

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

消费级显卡微调可图Kolors最佳实践！

微调最佳实践

下载模型权重

微调：

安装依赖：

数据准备：

训练lora模型：

训练后推理

热门文章

最新文章

相关课程

相关电子书

相关实验场景