引言
近期,快手开源了名为Kolors(可图)的文本到图像生成模型,该模型具有对英语和汉语的深刻理解,并能够生成高质量、逼真的图像。技术报告中也提了几个重要的工作内容:
首先,Kolors基于通用语言模型(ChatGLM),而不是像Imagen和Stable Diffusion 3基于大语言模型T5,这增强了其对英语和汉语的理解能力,并利用多模态大型语言模型CogVLM重新为训练数据集中的图像生成更详细的描述;
其次,Kolors训练分为两个阶段,即概念学习阶段和质量改进阶段,并使用特定的数据集进行训练以提高视觉吸引力,通过引入高质量的数据和优化高分辨率训练技术来改善图像质量;
最后,Kolors团队提出了一种平衡类别的基准数据集KolorsPrompts,用于指导Kolors的训练和评估。
实验结果表明,即使使用U-Net backbone,可图Kolors也表现出色,在人类评价中超越了现有的开源模型,性能达到了Midjourney-v6水平。Kolors代码和权重已经开源!
代码开源链接:https://github.com/Kwai-Kolors/Kolors
模型开源链接:https://modelscope.cn/models/Kwai-Kolors/Kolors
技术报告链接:https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/Kolors_paper.pdf
下载和体验可图
模型链接直达:
https://modelscope.cn/models/Kwai-Kolors/Kolors?from=alizishequ__text
下载方式:
sdk下载:
#模型下载 from modelscope import snapshot_download model_dir = snapshot_download('Kwai-Kolors/Kolors')
git下载
git clone https://www.modelscope.cn/Kwai-Kolors/Kolors.git
CLI下载
modelscope download --model=Kwai-Kolors/Kolors --local_dir ./Kolors/
最佳实践
参考开源项目:https://github.com/kijai/ComfyUI-KwaiKolorsWrapper,我们在魔搭社区免费GPU算力上,完成了Kolors的ComfyUI环境搭建和体验实践。
体验环境
使用魔搭社区的Notebook运行Kolors可图模型:
搭建 ComfyUI
从最新的ComfyUI的代码安装
# #@title Environment Setup from pathlib import Path OPTIONS = {} UPDATE_COMFY_UI = True #@param {type:"boolean"} INSTALL_COMFYUI_MANAGER = True #@param {type:"boolean"} INSTALL_KOLORS = True #@param {type:"boolean"} INSTALL_CUSTOM_NODES_DEPENDENCIES = True #@param {type:"boolean"} OPTIONS['UPDATE_COMFY_UI'] = UPDATE_COMFY_UI OPTIONS['INSTALL_COMFYUI_MANAGER'] = INSTALL_COMFYUI_MANAGER OPTIONS['INSTALL_KOLORS'] = INSTALL_KOLORS OPTIONS['INSTALL_CUSTOM_NODES_DEPENDENCIES'] = INSTALL_CUSTOM_NODES_DEPENDENCIES current_dir = !pwd WORKSPACE = f"{current_dir[0]}/ComfyUI" %cd /mnt/workspace/ ![ ! -d $WORKSPACE ] && echo -= Initial setup ComfyUI =- && git clone https://github.com/comfyanonymous/ComfyUI %cd $WORKSPACE if OPTIONS['UPDATE_COMFY_UI']: !echo "-= Updating ComfyUI =-" !git pull if OPTIONS['INSTALL_COMFYUI_MANAGER']: %cd custom_nodes ![ ! -d ComfyUI-Manager ] && echo -= Initial setup ComfyUI-Manager =- && git clone https://github.com/ltdrdata/ComfyUI-Manager %cd ComfyUI-Manager !git pull if OPTIONS['INSTALL_KOLORS']: %cd ../ ![ ! -d ComfyUI-KwaiKolorsWrapper ] && echo -= Initial setup KOLORS =- && git clone https://github.com/kijai/ComfyUI-KwaiKolorsWrapper.git %cd ComfyUI-KwaiKolorsWrapper !git pull %cd $WORKSPACE if OPTIONS['INSTALL_CUSTOM_NODES_DEPENDENCIES']: !pwd !echo "-= Install custom nodes dependencies =-" ![ -f "custom_nodes/ComfyUI-Manager/scripts/colab-dependencies.py" ] && python "custom_nodes/ComfyUI-Manager/scripts/colab-dependencies.py"
下载模型权重
#@markdown ###Download standard resources OPTIONS = {} #@markdown **unet** !wget -c "https://modelscope.cn/models/Kwai-Kolors/Kolors/resolve/master/unet/diffusion_pytorch_model.fp16.safetensors" -P ./models/diffusers/Kolors/unet/ !wget -c "https://modelscope.cn/models/Kwai-Kolors/Kolors/resolve/master/unet/config.json" -P ./models/diffusers/Kolors/unet/ #@markdown **encoder** !modelscope download --model=ZhipuAI/chatglm3-6b-base --local_dir ./models/diffusers/Kolors/text_encoder/ #@markdown **vae** !wget -c "https://modelscope.cn/models/AI-ModelScope/sdxl-vae-fp16-fix/resolve/master/sdxl.vae.safetensors" -P ./models/vae/ #sdxl-vae-fp16-fix.safetensors #@markdown **scheduler** !wget -c "https://modelscope.cn/models/Kwai-Kolors/Kolors/resolve/master/scheduler/scheduler_config.json" -P ./models/diffusers/Kolors/scheduler/ #@markdown **modelindex** !wget -c "https://modelscope.cn/models/Kwai-Kolors/Kolors/resolve/master/model_index.json" -P ./models/diffusers/Kolors/
通过cloudflareg启动ComfyUI
!wget "https://modelscope.oss-cn-beijing.aliyuncs.com/resource/cloudflared-linux-amd64.deb" !dpkg -i cloudflared-linux-amd64.deb %cd /mnt/workspace/ComfyUI import subprocess import threading import time import socket import urllib.request def iframe_thread(port): while True: time.sleep(0.5) sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) result = sock.connect_ex(('127.0.0.1', port)) if result == 0: break sock.close() print("\nComfyUI finished loading, trying to launch cloudflared (if it gets stuck here cloudflared is having issues)\n") p = subprocess.Popen(["cloudflared", "tunnel", "--url", "http://127.0.0.1:{}".format(port)], stdout=subprocess.PIPE, stderr=subprocess.PIPE) for line in p.stderr: l = line.decode() if "trycloudflare.com " in l: print("This is the URL to access ComfyUI:", l[l.find("http"):], end='') #print(l, end='') threading.Thread(target=iframe_thread, daemon=True, args=(8188,)).start() !python main.py --dont-print-server
点击右侧 load,加载ComfyUI-KwaiKolorsWrapper项目提供的 workflow
文生图体验:
图生图体验(一辆白色小汽车):
显存占用:
效果测试
简单 Prompt
复杂 Prompt
多实体生成能力很能打,颜色能做到分别控制,空间关系也比较完美
多风格
多风格,强!
文本
可以处理简单的文本
多样性
多样性还不错
性能测试
1024 分辨率,A10,生成一张图片(25步)耗时7秒。
后续魔搭社区将继续探索可图模型,并推出微调教程,请期待哦!