In-Context LoRA实现高效多任务图像生成,开启视觉创作新篇章

本文涉及的产品
模型训练 PAI-DLC,100CU*H 3个月
模型在线服务 PAI-EAS,A10/V100等 500元 1个月
交互式建模 PAI-DSW,每月250计算时 3个月
简介: 这篇文章介绍了通义实验室提出的In-Context LoRA,这是一种基于现有文本到图像模型的任务无关性框架,用于实现高质量的多任务图像生成。

01引言

近期通义实验室提出了一种基于现有文本到图像模型(如FLUX.1-dev)的任务无关性框架,称为In-Context LoRA,用于实现高质量的多任务图像生成。该方法通过利用模型的内在上下文学习能力,无需对模型架构进行修改,仅需调整少量的训练数据,就能使模型适应不同的图像生成任务,有效地简化了模型的训练过程,并且减少了对大量标注数据的需求,同时保持了高生成质量。实验结果显示,In-Context LoRA在多个实际应用场景下,如故事板生成、字体设计、家居装饰等,能够产生连贯一致且高度符合提示的图像集合。

此外,该框架还支持条件图像生成,进一步扩大了其应用范围。本文通过对不同任务的实验验证,证明了In-Context LoRA作为一种通用的图像生成工具的潜力,为未来无需特定架构修改即可适应新任务的模型开发提供了新的思路。

模型链接:

https://modelscope.cn/models/iic/In-Context-LoRA

论文链接:

https://arxiv.org/pdf/2410.23775

项目链接:

https://ali-vilab.github.io/In-Context-LoRA-Page/

代码链接:

https://github.com/ali-vilab/In-Context-LoRA

image.png

关键要点

  1. In-Context LoRA模型是一种低秩适应方法,可以增强文本到图像的转换能力。
  2. 与任务无关的框架:In-Context 是一个通用框架,但它需要针对不同应用进行特定任务的微调。
  3. 可定制的图像集生成:可以微调文本到图像模型以生成具有可定制内在关系的图像集。
  4. 图像集condition:可以根据另一组图像来条件化一组图像的生成,从而实现广泛的可控生成应用。

模型原理:

In-context-lora的出发点是假设基础文本到图像模型在各种任务中都具有一定的上下文生成能力,即使质量参差不齐。基于这一洞察力,对大型数据集进行广泛的训练是不必要的;相反,我们可以使用精心挑选的高质量图像集来激活模型的上下文能力。

另一个观察是,文本到图像模型可以从包含多个面板描述的单个提示中生成连贯的多面板图像。因此,In-context-lora简化架构,使用综合图像提示而不是要求每个图像只关注其各自的文本令牌。能够重新利用原始文本到图像架构而无需任何结构修改。

In-context-lora框架设计在训练过程中直接将它们合并成一个大型图像,同时生成一组图像,并将它们的标题合并为一个合并提示,该提示具有总体描述和每个面板清晰的指导。生成图像集后,将大型图像拆分为单独的面板。此外,由于文本到图像模型已经显示出上下文能力,In-context-lora不调整整个模型,而是在高质量数据的小数据集中应用低秩适应(LoRA),以触发并增强这些能力。

为了支持对额外一组图像的条件,In-context-lora使用SDEdit(一种不需要训练的方法),在一个未标记的集合上对一组图像进行补全,并将所有这些图像合并到一张大图中。

02效果体验

本次In-context lora提供了10个LoRA,分别为:情侣头像设计,电影剧照,字体设计,家居装饰,肖像插画,人像摄影,PPT模板,沙尘暴视觉效果,烟花视觉效果,视觉识别设计。

社区开发者利用 In-Context LoRA (IC-LoRA) 展开了一系列创新项目:

虚拟试衣:

image.png

商品设计:

image.png

电影剧照:

image.png

03最佳实践

comfyui

环境安装:安装comfyui和对应的定制化节点

# #@title Environment Setup
from pathlib import Path
OPTIONS = {}
UPDATE_COMFY_UI = True  #@param {type:"boolean"}
INSTALL_COMFYUI_MANAGER = True  #@param {type:"boolean"}
INSTALL_CUSTOM_NODES_DEPENDENCIES = True  #@param {type:"boolean"}
INSTALL_ComfyUI_CustomNodes = True #@param {type:"boolean"}
INSTALL_x_flux_comfyui = True  #@param {type:"boolean"}
OPTIONS['UPDATE_COMFY_UI'] = UPDATE_COMFY_UI
OPTIONS['INSTALL_COMFYUI_MANAGER'] = INSTALL_COMFYUI_MANAGER
OPTIONS['INSTALL_CUSTOM_NODES_DEPENDENCIES'] = INSTALL_CUSTOM_NODES_DEPENDENCIES
OPTIONS['INSTALL_ComfyUI_CustomNodes'] = INSTALL_ComfyUI_CustomNodes
OPTIONS['INSTALL_x_flux_comfyui'] = INSTALL_x_flux_comfyui
current_dir = !pwd
WORKSPACE = f"{current_dir[0]}/ComfyUI"
%cd /mnt/workspace/
![ ! -d $WORKSPACE ] && echo -= Initial setup ComfyUI =- && git clone https://github.com/comfyanonymous/ComfyUI
%cd $WORKSPACE
if OPTIONS['UPDATE_COMFY_UI']:
  !echo "-= Updating ComfyUI =-"
  !git pull
if OPTIONS['INSTALL_COMFYUI_MANAGER']:
  %cd custom_nodes
  ![ ! -d ComfyUI-Manager ] && echo -= Initial setup ComfyUI-Manager =- && git clone https://github.com/ltdrdata/ComfyUI-Manager
  %cd ComfyUI-Manager
  !git pull
if OPTIONS['INSTALL_ComfyUI_CustomNodes']:
  %cd ..
  !echo -= Initial setup ComfyUI_Comfyroll_CustomNodes =- && git clone https://github.com/Suzie1/ComfyUI_Comfyroll_CustomNodes.git
  !echo -= Initial setup ComfyUI_rgthree_comfy =- && git clone https://github.com/rgthree/rgthree-comfy.git
  !echo -= Initial setup ComfyUI_JPS =- && git clone https://github.com/JPS-GER/ComfyUI_JPS-Nodes.git
  !echo -= Initial setup ComfyUI_Custom_Scripts =- && git clone https://github.com/pythongosssss/ComfyUI-Custom-Scripts.git
  !echo -= Initial setup ComfyUI-KJNodes =- && git clone https://github.com/kijai/ComfyUI-KJNodes.git
if OPTIONS['INSTALL_x_flux_comfyui']:
  !echo -= Initial setup x-flux-comfyui =- && git clone https://github.com/XLabs-AI/x-flux-comfyui.git
if OPTIONS['INSTALL_CUSTOM_NODES_DEPENDENCIES']:
  !pwd
  !echo "-= Install custom nodes dependencies =-"
  ![ -f "custom_nodes/ComfyUI-Manager/scripts/colab-dependencies.py" ] && python "custom_nodes/ComfyUI-Manager/scripts/colab-dependencies.py"
!pip install spandrel

模型下载:下载flux和in-context lora

#@markdown ###Download standard resources
%cd /mnt/workspace/ComfyUI
### FLUX1-DEV
# !modelscope download --model=AI-ModelScope/FLUX.1-dev --local_dir ./models/unet/ flux1-dev.safetensors
!modelscope download --model=AI-ModelScope/flux-fp8 --local_dir ./models/unet/ flux1-dev-fp8.safetensors
### clip
!modelscope download --model=AI-ModelScope/flux_text_encoders --local_dir ./models/clip/ clip_l.safetensors
!modelscope download --model=AI-ModelScope/flux_text_encoders --local_dir ./models/clip/ t5xxl_fp8_e4m3fn.safetensors
### vae
!modelscope download --model=AI-ModelScope/FLUX.1-dev --local_dir ./models/vae/ ae.safetensors
### lora
#!modelscope download --model=FluxLora/flux-koda --local_dir ./models/loras/ araminta_k_flux_koda.safetensors
!modelscope download --model=iic/In-Context-LoRA --local_dir ./models/loras/ film-storyboard.safetensors
!modelscope download --model=iic/In-Context-LoRA --local_dir ./models/loras/ font-design.safetensors
!modelscope download --model=iic/In-Context-LoRA --local_dir ./models/loras/ couple-profile.safetensors
!modelscope download --model=iic/In-Context-LoRA --local_dir ./models/loras/ home-decoration.safetensors
!modelscope download --model=iic/In-Context-LoRA --local_dir ./models/loras/ portrait-illustration.safetensors
!modelscope download --model=iic/In-Context-LoRA --local_dir ./models/loras/ portrait-photography.safetensors
!modelscope download --model=iic/In-Context-LoRA --local_dir ./models/loras/ ppt-templates.safetensors
!modelscope download --model=iic/In-Context-LoRA --local_dir ./models/loras/ sandstorm-visual-effect.safetensors
!modelscope download --model=iic/In-Context-LoRA --local_dir ./models/loras/ sparklers-visual-effect.safetensors
!modelscope download --model=iic/In-Context-LoRA --local_dir ./models/loras/ visual-identity-design.safetensors

运行工作流,使用cloudflare运行工作流

!wget "https://modelscope.oss-cn-beijing.aliyuncs.com/resource/cloudflared-linux-amd64.deb"
!dpkg -i cloudflared-linux-amd64.deb
%cd /mnt/workspace/ComfyUI
import subprocess
import threading
import time
import socket
import urllib.request
def iframe_thread(port):
  while True:
      time.sleep(0.5)
      sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
      result = sock.connect_ex(('127.0.0.1', port))
      if result == 0:
        break
      sock.close()
  print("\nComfyUI finished loading, trying to launch cloudflared (if it gets stuck here cloudflared is having issues)\n")
  p = subprocess.Popen(["cloudflared", "tunnel", "--url", "http://127.0.0.1:{}".format(port)], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
  for line in p.stderr:
    l = line.decode()
    if "trycloudflare.com " in l:
      print("This is the URL to access ComfyUI:", l[l.find("http"):], end='')
    #print(l, end='')
threading.Thread(target=iframe_thread, daemon=True, args=(8188,)).start()
!python main.py --dont-print-server

Notebook分享链接:https://modelscope.cn/notebook/share/ipynb/a4c365ec/ComfyUI_incontexlora.ipynb

工作流效果:

image.png

显存占用:

image.png

DiffSynth-Studio训练

In-Context LoRA 采用了通用的 LoRA 训练算法,所以大部分支持 FLUX LoRA 训练的框架都可以用来训练 In-Context LoRA,作者为我们提供了一个样例数据集,我们基于开源项目 DiffSynth-Studio 进行训练。

环境搭建

git clone https://github.com/modelscope/DiffSynth-Studio.git
cd DiffSynth-Studio
pip install -e .

模型下载

modelscope download --model AI-ModelScope/FLUX.1-dev --local_dir ./models/FLUX/FLUX.1-dev

数据集准备(下载链接:https://github.com/ali-vilab/In-Context-LoRA/blob/main/data/movie-shots.zip

data/lora_datasets/movie-shots/
└── train
    ├── 0001.jpg
    ├── 0002.jpg
    ├── 0003.jpg
    ├── 0004.jpg
    ├── 0005.jpg
    ├── 0006.jpg
    ├── 0007.jpg
    ├── 0008.jpg
    ├── 0009.jpg
    ├── 0010.jpg
    └── metadata.csv

metadata.csv:

file_name,text
0001.jpg,"[MOVIE-SHOTS] In this image, we see a sequence of domestic life with a young girl named <Matilda> orchestrating her morning routine: [SCENE-1] <Matilda> diligently flips pancakes on the stove, demonstrating a sense of independence and skill in the kitchen, [SCENE-2] transitions to a close-up of her face that reveals a curious and thoughtful expression peeking over the counter, [SCENE-3] concludes the scene as <Matilda> sits at the breakfast table with a stack of books and magazines nearby, while she enjoys her meal, displaying a blend of intellect and simplicity in her everyday world."
0002.jpg,"[MOVIE-SHOTS] In a series of dynamic and vivid shots, the image captures the journey of <Alex>, a stylish traveler navigating a bustling train setting; [SCENE-1] beginning with <Alex> sprinting towards a departing train, clutching a large suitcase, against a backdrop of vibrant sunset hues and distant figures wrapped in daily routine, [SCENE-2] followed by a close-up as <Alex> pauses at the train door, exuding confidence with sunglasses momentarily raised and eyes set firmly on the horizon, evoking a sense of anticipation and introspection, [SCENE-3] and concluding with <Alex> stepping into a warmly-lit cabin interior, the intricate patterns on the walls and his nonchalant demeanor suggesting both the allure of exploration and the quiet introspection of new beginnings, effectively encapsulating the essence of adventure and self-discovery."
0003.jpg,"[MOVIE-SHOTS] In an animated heist sequence, [SCENE-1] the image opens with <Fox>, wearing a knit mask with hypnotic eyes, surveying the horizon against a backdrop of smokestacks and a cloudy sky, [SCENE-2] transitions to <Fox> now alongside <Badger>, also masked, exchanging a determined glance under the ominous overcast, [SCENE-3] finally converges with <Fox> joined by a team that includes a wide-eyed <Cat> and a serious-looking <Mouse>, all peering cautiously over a wooden ledge, poised for action as they navigate their bold adventure."
0004.jpg,"[MOVIE-SHOTS] In a delightful animated sequence, [SCENE-1] a young panda named <Bao> appears adorably forlorn as he lies on the floor surrounded by a scatter of vegetables, his wide eyes expressing a mix of curiosity and yearning, [SCENE-2] which quickly shifts to <Bao> gleefully rolling on his back, showcasing his playful nature amidst the same rustic setting, [SCENE-3] before finally nestling happily in a large wooden bucket, as an amused goose character, <Fei>, gazes at him affectionately, adding warmth and joy to this charming tableau."
0005.jpg,"[MOVIE-SHOTS] In a whimsical city adventure, [SCENE-1] young <Sophie> peers over a railing with a curious plush pig beside her, capturing a moment of anticipation amidst towering skyscrapers, as [SCENE-2] a bridge raises dramatically over a bustling, rain-soaked street, featuring silhouetted streetlights and gleaming car headlights highlighting the grandeur of the cityscape, while [SCENE-3] <Sophie>, now facing forward, joyfully embraces the night as city lights illuminate the vibrant urban waterfront behind her, creating an enchanting atmosphere of exploration and wonder."
0006.jpg,"[MOVIE-SHOTS] In a captivating sequence of moments, the image journey begins [SCENE-1] as <Anna> and <Tom> are seated together on a sun-dappled park bench, engaging in a sincere conversation that is accentuated by the vibrant greenery around them, [SCENE-2] then shifts to a quaint café where <Anna>, displaying a contemplative expression, rests her chin on her hand while <Tom> listens intently, their discussion punctuated by the soft clink of coffee cups and the background chatter of other patrons, [SCENE-3] concluding the visual narrative in an art gallery, where <Anna>, donned in a vivid purple coat, stands opposite <Tom> across a centerpiece of wildflowers, their shared gaze locked on the artistic arrangements, hinting at a deeper understanding cultivated through their interactions."
0007.jpg,"[MOVIE-SHOTS] In this captivating montage of scenes, we follow a poignant narrative set against an elegant winter backdrop, [SCENE-1] beginning with <Elena> gazing contemplatively into the distance, a world of thoughts wrapped in the serenity of falling snowflakes; [SCENE-2] as the ambiance shifts to a bustling ice rink adorned with twinkling lights, <Elena> now stands facing <Alexei>, both enveloped in luxurious winter attire, capturing a moment of intense, unspoken emotion amidst a crowd of skaters; [SCENE-3] the focus then centers on <Alexei>, whose earnest expression and intent gaze suggest a deep conversation, as the surrounding world fades into a blur of festive lights and gentle snowfall, weaving a story of connection and unresolved feelings within this enchanting winter setting."
0008.jpg,"[MOVIE-SHOTS] In this thrilling sequence of action-packed movie scenes, [SCENE-1] <Lola>, dressed in a casual tank top and vibrant green pants, is captured in a dynamic sprint along a city sidewalk, suggesting urgency while a character in a red jersey observes from the side, [SCENE-2] as she races against time and the traffic, her determined expression directed at an oncoming red van that adds intensity to the unfolding drama, [SCENE-3] where <Lola> is seamlessly joined by <Tom>, who pedals furiously on a bicycle beside her, both seemingly engaged in a high-stakes chase down an urban street, their determined pace reflecting a palpable tension and shared mission."
0009.jpg,"[MOVIE-SHOTS] A scenic road trip unfolds through the lens of youthful introspection and fleeting moments of serenity; [SCENE-1] <Chris> is seen enjoying the journey, his relaxed expression captured in the car's side mirror as the vehicle cruises down an open road, symbolizing freedom and escape, [SCENE-2] transitioning to a detail shot of wildflowers delicately arranged in a Sprite bottle on the dashboard, signifying a simple beauty amid the motion, [SCENE-3] culminating in a reflective moment where <Chris> sits contemplatively against a vast, tranquil sky, while in the background, <Jesse> is seen absorbed in their own world, enhancing the theme of introspection and connection to the natural surroundings."
0010.jpg,"[MOVIE-SHOTS] In this heartwarming and humorous sequence, [SCENE-1] <Charlie>, a curious toddler, mischievously smears green food coloring all over his face while giggling near a kitchen counter, bringing pure joy and a hint of chaos into the household; [SCENE-2] as the scene shifts to the family car, <Charlie> sits wide-eyed and innocent with the same green face, now snugly buckled in his car seat, capturing a moment of endearing surprise for those around him; [SCENE-3] finally, <Charlie> beams with a contagious smile, the car stopped at a cafe drive-thru, his green-masked happiness lighting up the vehicle interior, bringing laughter and warmth to everyone who sees him throughout this delightful and comedic slice of life adventure."

开始训练,训练需要 40G 显存

CUDA_VISIBLE_DEVICES="0" python examples/train/flux/train_flux_lora.py \
  --pretrained_text_encoder_path models/FLUX/FLUX.1-dev/text_encoder/model.safetensors \
  --pretrained_text_encoder_2_path models/FLUX/FLUX.1-dev/text_encoder_2 \
  --pretrained_dit_path models/FLUX/FLUX.1-dev/flux1-dev.safetensors \
  --pretrained_vae_path models/FLUX/FLUX.1-dev/ae.safetensors \
  --dataset_path data/lora_datasets/movie-shots \
  --output_path ./models \
  --max_epochs 10 \
  --steps_per_epoch 100 \
  --height 1536 \
  --width 1024 \
  --center_crop \
  --precision "bf16" \
  --learning_rate 1e-4 \
  --lora_rank 16 \
  --lora_alpha 16 \
  --use_gradient_checkpointing \
  --align_to_opensource_format

训练完毕,在 models/lightning_logs 中即可找到训练好的模型文件。测试一下效果


prompt:[MOVIE-SHOTS] In a vibrant festival, [SCENE-1] we find , a shy boy, standing at the edge of a bustling carnival, eyes wide with awe at the colorful rides and laughter, [SCENE-2] transitioning to him reluctantly trying a daring game, his friends cheering him on, [SCENE-3] culminating in a triumphant moment as he wins a giant stuffed bear, his face beaming with pride as he holds it up for all to see.

image.png

点击链接https://modelscope.cn/models/iic/In-Context-LoRA?from=alizishequ__text,即可跳转模型链接~


相关文章
|
8月前
|
人工智能 vr&ar 图形学
开源单图生成3D模型TripoSR的局限性分析
【2月更文挑战第25天】开源单图生成3D模型TripoSR的局限性分析
355 6
开源单图生成3D模型TripoSR的局限性分析
|
8月前
|
人工智能 物联网 PyTorch
SCEdit:轻量级高效可控的AI图像生成微调框架(附魔搭社区训练实践教程)
SCEdit是一个高效的生成式微调框架,由阿里巴巴通义实验室基础视觉智能团队所提出。
|
27天前
|
机器学习/深度学习 人工智能 文字识别
Kimi 上线视觉思考模型,K1 系列强化学习模型正式开放,无需借助外部 OCR 处理图像与文本进行思考并回答
k1视觉思考模型是kimi推出的k1系列强化学习AI模型,具备端到端图像理解和思维链技术,能够在数学、物理、化学等领域表现优异。本文详细介绍了k1视觉思考模型的功能、技术原理、使用方法及其在多个应用场景中的表现。
187 68
Kimi 上线视觉思考模型,K1 系列强化学习模型正式开放,无需借助外部 OCR 处理图像与文本进行思考并回答
|
13天前
|
人工智能 自然语言处理
DynamicControl:腾讯推出动态地条件控制图像生成框架,结合了多模态大语言模型的推理能力和文生图模型的生成能力
DynamicControl 是腾讯优图联合南洋理工等机构推出的动态条件控制图像生成新框架,通过自适应选择不同条件,显著增强了图像生成的可控性。
47 11
DynamicControl:腾讯推出动态地条件控制图像生成框架,结合了多模态大语言模型的推理能力和文生图模型的生成能力
|
13天前
|
机器学习/深度学习 人工智能 算法
Enhance-A-Video:上海 AI Lab 推出视频生成质量增强算法,显著提升 AI 视频生成的真实度和细节表现
Enhance-A-Video 是由上海人工智能实验室、新加坡国立大学和德克萨斯大学奥斯汀分校联合推出的视频生成质量增强算法,能够显著提升视频的对比度、清晰度和细节真实性。
51 8
Enhance-A-Video:上海 AI Lab 推出视频生成质量增强算法,显著提升 AI 视频生成的真实度和细节表现
|
1月前
|
人工智能 编解码 机器人
NVILA:英伟达开源视觉语言大模型,高效处理高分辨率图像和长视频
NVILA是英伟达推出的视觉语言大模型,旨在高效处理高分辨率图像和长视频,同时保持高准确性。该模型通过“扩展-压缩”策略和多种优化技术,在多个领域如机器人导航和医疗成像中展现出广泛的应用潜力。
101 13
NVILA:英伟达开源视觉语言大模型,高效处理高分辨率图像和长视频
|
1月前
|
机器学习/深度学习 人工智能
SNOOPI:创新 AI 文本到图像生成框架,提升单步扩散模型的效率和性能
SNOOPI是一个创新的AI文本到图像生成框架,通过增强单步扩散模型的指导,显著提升模型性能和控制力。该框架包括PG-SB和NASA两种技术,分别用于增强训练稳定性和整合负面提示。SNOOPI在多个评估指标上超越基线模型,尤其在HPSv2得分达到31.08,成为单步扩散模型的新标杆。
71 10
SNOOPI:创新 AI 文本到图像生成框架,提升单步扩散模型的效率和性能
|
3月前
|
机器学习/深度学习 人工智能 自然语言处理
扩散引导语言建模(DGLM):一种可控且高效的AI对齐方法
DGLM(Diffusion Guided Language Modeling)是一种新型框架,结合了自回归模型的流畅性和扩散模型的灵活性,解决了现有引导生成方法的局限性。DGLM通过扩散网络生成语义提案,并使用轻量级提示生成器将嵌入转化为软提示,引导自回归解码器生成文本。该方法无需微调模型权重,易于控制新属性,并在多个基准数据集上表现出色。实验结果显示,DGLM在毒性缓解、情感控制和组合控制等方面优于现有方法,为可控文本生成提供了新的方向。
67 10
扩散引导语言建模(DGLM):一种可控且高效的AI对齐方法
|
4月前
|
数据可视化 Swift
小钢炮进化,MiniCPM 3.0 开源!4B参数超GPT3.5性能,无限长文本,超强RAG三件套!模型推理、微调实战来啦!
旗舰端侧模型面壁「小钢炮」系列进化为全新 MiniCPM 3.0 基座模型,再次以小博大,以 4B 参数,带来超越 GPT-3.5 的性能。并且,量化后仅 2GB 内存,端侧友好。
小钢炮进化,MiniCPM 3.0 开源!4B参数超GPT3.5性能,无限长文本,超强RAG三件套!模型推理、微调实战来啦!
|
5月前
|
编解码 人机交互 语音技术
Sora 原理使用问题之Sora生成的视频的特性有哪些
Sora 原理使用问题之Sora生成的视频的特性有哪些

热门文章

最新文章