近期,快手开源了一种名为Kolors(可图)的文本到图像生成模型,该模型具有对英语和汉语的深刻理解,并能够生成高质量、逼真的图像。
魔搭社区在DiffSynth-Studio中提供了可图Kolors微调脚本。
代码开源链接:
https://github.com/Kwai-Kolors/Kolors
模型开源链接:
https://modelscope.cn/models/Kwai-Kolors/Kolors
技术报告链接:
https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/Kolors_paper.pdf
微调脚本链接:
https://github.com/modelscope/DiffSynth-Studio/tree/main/examples/train/kolors
微调最佳实践
下载模型权重
下载可图Kolors模型
modelscope download --model=Kwai-Kolors/Kolors --local_dir models/kolors/Kolors
下载额外的VAE模型(https://modelscope.cn/models/AI-ModelScope/sdxl-vae-fp16-fix)
modelscope download --model=AI-ModelScope/sdxl-vae-fp16-fix --local_dir models/kolors/sdxl-vae-fp16-fix diffusion_pytorch_model.safetensors
模型文件结构:
models ├── kolors │ └── Kolors │ ├── text_encoder │ │ ├── config.json │ │ ├── pytorch_model-00001-of-00007.bin │ │ ├── pytorch_model-00002-of-00007.bin │ │ ├── pytorch_model-00003-of-00007.bin │ │ ├── pytorch_model-00004-of-00007.bin │ │ ├── pytorch_model-00005-of-00007.bin │ │ ├── pytorch_model-00006-of-00007.bin │ │ ├── pytorch_model-00007-of-00007.bin │ │ └── pytorch_model.bin.index.json │ ├── unet │ │ └── diffusion_pytorch_model.safetensors │ └── vae │ └── diffusion_pytorch_model.safetensors └── sdxl-vae-fp16-fix └── diffusion_pytorch_model.safetensors
微调:
安装依赖:
pip install peft lightning pandas torchvision
数据准备:
我们准备了一些开源数据集:
柯基小狗数据集:
https://modelscope.cn/datasets/buptwq/lora-stable-diffusion-finetune
文生图风格定制数据集(metadata做了汉化):
https://modelscope.cn/datasets/iic/style_custom_dataset
数据集按照如下格式:
data/dog/ └── train ├── 00.jpg ├── 01.jpg ├── 02.jpg ├── 03.jpg ├── 04.jpg └── metadata.csv
metadata.csv:
file_name,text 00.jpg,一只小狗 01.jpg,一只小狗 02.jpg,一只小狗 03.jpg,一只小狗 04.jpg,一只小狗
训练lora模型:
我们提供了训练脚本 train_kolors_lora.py,在运行该训练脚本之前,需要先clone本项目
https://github.com/modelscope/DiffSynth-Studio.git cd DiffSynth-Studio
采用以下设置,需要22GB VRAM
CUDA_VISIBLE_DEVICES="0" python examples/train/kolors/train_kolors_lora.py \ --pretrained_unet_path models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors \ --pretrained_text_encoder_path models/kolors/Kolors/text_encoder \ --pretrained_fp16_vae_path models/sdxl-vae-fp16-fix/diffusion_pytorch_model.safetensors \ --dataset_path data/dog \ --output_path ./models \ --max_epochs 10 \ --center_crop \ --use_gradient_checkpointing \ --precision "16-mixed"
可选参数:
-h, --help show this help message and exit --pretrained_unet_path PRETRAINED_UNET_PATH Path to pretrained model (UNet). For example, `models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors`. --pretrained_text_encoder_path PRETRAINED_TEXT_ENCODER_PATH Path to pretrained model (Text Encoder). For example, `models/kolors/Kolors/text_encoder`. --pretrained_fp16_vae_path PRETRAINED_FP16_VAE_PATH Path to pretrained model (VAE). For example, `models/kolors/Kolors/sdxl-vae-fp16-fix/diffusion_pytorch_model.safetensors`. --dataset_path DATASET_PATH The path of the Dataset. --output_path OUTPUT_PATH Path to save the model. --steps_per_epoch STEPS_PER_EPOCH Number of steps per epoch. --height HEIGHT Image height. --width WIDTH Image width. --center_crop Whether to center crop the input images to the resolution. If not set, the images will be randomly cropped. The images will be resized to the resolution first before cropping. --random_flip Whether to randomly flip images horizontally --batch_size BATCH_SIZE Batch size (per device) for the training dataloader. --dataloader_num_workers DATALOADER_NUM_WORKERS Number of subprocesses to use for data loading. 0 means that the data will be loaded in the main process. --precision {32,16,16-mixed} Training precision --learning_rate LEARNING_RATE Learning rate. --lora_rank LORA_RANK The dimension of the LoRA update matrices. --lora_alpha LORA_ALPHA The weight of the LoRA update matrices. --use_gradient_checkpointing Whether to use gradient checkpointing. --accumulate_grad_batches ACCUMULATE_GRAD_BATCHES The number of batches in gradient accumulation. --training_strategy {auto,deepspeed_stage_1,deepspeed_stage_2,deepspeed_stage_3} Training strategy --max_epochs MAX_EPOCHS Number of epochs.
训练后推理
训练完成后,可以使用自己训练的LoRA来生成新图像。以下是一些示例:
from diffsynth import ModelManager, KolorsImagePipeline from peft import LoraConfig, inject_adapter_in_model import torch def load_lora(model, lora_rank, lora_alpha, lora_path): lora_config = LoraConfig( r=lora_rank, lora_alpha=lora_alpha, init_lora_weights="gaussian", target_modules=["to_q", "to_k", "to_v", "to_out"], ) model = inject_adapter_in_model(lora_config, model) state_dict = torch.load(lora_path, map_location="cpu") model.load_state_dict(state_dict, strict=False) return model # Load models model_manager = ModelManager(torch_dtype=torch.float16, device="cuda", file_path_list=[ "models/kolors/Kolors/text_encoder", "models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors", "models/kolors/Kolors/vae/diffusion_pytorch_model.safetensors" ]) pipe = KolorsImagePipeline.from_model_manager(model_manager) # Generate an image with lora pipe.unet = load_lora( pipe.unet, lora_rank=4, lora_alpha=4.0, # The two parameters should be consistent with those in your training script. lora_path="path/to/your/lora/model/lightning_logs/version_x/checkpoints/epoch=x-step=xxx.ckpt" ) torch.manual_seed(0) image = pipe( prompt="一只小狗蹦蹦跳跳,周围是姹紫嫣红的鲜花,远处是山脉", negative_prompt="", cfg_scale=4, num_inference_steps=50, height=1024, width=1024, ) image.save("image_with_lora.jpg")
柯基lora:
Prompt: 一只小狗蹦蹦跳跳,周围是姹紫嫣红的鲜花,远处是山脉
3D风格lora:
Prompt:一只小狗和一只小猫3D
点击链接👇直达链接
https://modelscope.cn/models/Kwai-Kolors/Kolors?from=alizishequ__text