01引言
近期通义实验室提出了一种基于现有文本到图像模型(如FLUX.1-dev)的任务无关性框架,称为In-Context LoRA,用于实现高质量的多任务图像生成。该方法通过利用模型的内在上下文学习能力,无需对模型架构进行修改,仅需调整少量的训练数据,就能使模型适应不同的图像生成任务,有效地简化了模型的训练过程,并且减少了对大量标注数据的需求,同时保持了高生成质量。实验结果显示,In-Context LoRA在多个实际应用场景下,如故事板生成、字体设计、家居装饰等,能够产生连贯一致且高度符合提示的图像集合。
此外,该框架还支持条件图像生成,进一步扩大了其应用范围。本文通过对不同任务的实验验证,证明了In-Context LoRA作为一种通用的图像生成工具的潜力,为未来无需特定架构修改即可适应新任务的模型开发提供了新的思路。
模型链接:
https://modelscope.cn/models/iic/In-Context-LoRA
论文链接:
https://arxiv.org/pdf/2410.23775
项目链接:
https://ali-vilab.github.io/In-Context-LoRA-Page/
代码链接:
https://github.com/ali-vilab/In-Context-LoRA
关键要点
- In-Context LoRA模型是一种低秩适应方法,可以增强文本到图像的转换能力。
- 与任务无关的框架:In-Context 是一个通用框架,但它需要针对不同应用进行特定任务的微调。
- 可定制的图像集生成:可以微调文本到图像模型以生成具有可定制内在关系的图像集。
- 图像集condition:可以根据另一组图像来条件化一组图像的生成,从而实现广泛的可控生成应用。
模型原理:
In-context-lora的出发点是假设基础文本到图像模型在各种任务中都具有一定的上下文生成能力,即使质量参差不齐。基于这一洞察力,对大型数据集进行广泛的训练是不必要的;相反,我们可以使用精心挑选的高质量图像集来激活模型的上下文能力。
另一个观察是,文本到图像模型可以从包含多个面板描述的单个提示中生成连贯的多面板图像。因此,In-context-lora简化架构,使用综合图像提示而不是要求每个图像只关注其各自的文本令牌。能够重新利用原始文本到图像架构而无需任何结构修改。
In-context-lora框架设计在训练过程中直接将它们合并成一个大型图像,同时生成一组图像,并将它们的标题合并为一个合并提示,该提示具有总体描述和每个面板清晰的指导。生成图像集后,将大型图像拆分为单独的面板。此外,由于文本到图像模型已经显示出上下文能力,In-context-lora不调整整个模型,而是在高质量数据的小数据集中应用低秩适应(LoRA),以触发并增强这些能力。
为了支持对额外一组图像的条件,In-context-lora使用SDEdit(一种不需要训练的方法),在一个未标记的集合上对一组图像进行补全,并将所有这些图像合并到一张大图中。
02效果体验
本次In-context lora提供了10个LoRA,分别为:情侣头像设计,电影剧照,字体设计,家居装饰,肖像插画,人像摄影,PPT模板,沙尘暴视觉效果,烟花视觉效果,视觉识别设计。
社区开发者利用 In-Context LoRA (IC-LoRA) 展开了一系列创新项目:
虚拟试衣:
商品设计:
电影剧照:
03最佳实践
comfyui
环境安装:安装comfyui和对应的定制化节点
# #@title Environment Setup from pathlib import Path OPTIONS = {} UPDATE_COMFY_UI = True #@param {type:"boolean"} INSTALL_COMFYUI_MANAGER = True #@param {type:"boolean"} INSTALL_CUSTOM_NODES_DEPENDENCIES = True #@param {type:"boolean"} INSTALL_ComfyUI_CustomNodes = True #@param {type:"boolean"} INSTALL_x_flux_comfyui = True #@param {type:"boolean"} OPTIONS['UPDATE_COMFY_UI'] = UPDATE_COMFY_UI OPTIONS['INSTALL_COMFYUI_MANAGER'] = INSTALL_COMFYUI_MANAGER OPTIONS['INSTALL_CUSTOM_NODES_DEPENDENCIES'] = INSTALL_CUSTOM_NODES_DEPENDENCIES OPTIONS['INSTALL_ComfyUI_CustomNodes'] = INSTALL_ComfyUI_CustomNodes OPTIONS['INSTALL_x_flux_comfyui'] = INSTALL_x_flux_comfyui current_dir = !pwd WORKSPACE = f"{current_dir[0]}/ComfyUI" %cd /mnt/workspace/ ![ ! -d $WORKSPACE ] && echo -= Initial setup ComfyUI =- && git clone https://github.com/comfyanonymous/ComfyUI %cd $WORKSPACE if OPTIONS['UPDATE_COMFY_UI']: !echo "-= Updating ComfyUI =-" !git pull if OPTIONS['INSTALL_COMFYUI_MANAGER']: %cd custom_nodes ![ ! -d ComfyUI-Manager ] && echo -= Initial setup ComfyUI-Manager =- && git clone https://github.com/ltdrdata/ComfyUI-Manager %cd ComfyUI-Manager !git pull if OPTIONS['INSTALL_ComfyUI_CustomNodes']: %cd .. !echo -= Initial setup ComfyUI_Comfyroll_CustomNodes =- && git clone https://github.com/Suzie1/ComfyUI_Comfyroll_CustomNodes.git !echo -= Initial setup ComfyUI_rgthree_comfy =- && git clone https://github.com/rgthree/rgthree-comfy.git !echo -= Initial setup ComfyUI_JPS =- && git clone https://github.com/JPS-GER/ComfyUI_JPS-Nodes.git !echo -= Initial setup ComfyUI_Custom_Scripts =- && git clone https://github.com/pythongosssss/ComfyUI-Custom-Scripts.git !echo -= Initial setup ComfyUI-KJNodes =- && git clone https://github.com/kijai/ComfyUI-KJNodes.git if OPTIONS['INSTALL_x_flux_comfyui']: !echo -= Initial setup x-flux-comfyui =- && git clone https://github.com/XLabs-AI/x-flux-comfyui.git if OPTIONS['INSTALL_CUSTOM_NODES_DEPENDENCIES']: !pwd !echo "-= Install custom nodes dependencies =-" ![ -f "custom_nodes/ComfyUI-Manager/scripts/colab-dependencies.py" ] && python "custom_nodes/ComfyUI-Manager/scripts/colab-dependencies.py" !pip install spandrel
模型下载:下载flux和in-context lora
#@markdown ###Download standard resources %cd /mnt/workspace/ComfyUI ### FLUX1-DEV # !modelscope download --model=AI-ModelScope/FLUX.1-dev --local_dir ./models/unet/ flux1-dev.safetensors !modelscope download --model=AI-ModelScope/flux-fp8 --local_dir ./models/unet/ flux1-dev-fp8.safetensors ### clip !modelscope download --model=AI-ModelScope/flux_text_encoders --local_dir ./models/clip/ clip_l.safetensors !modelscope download --model=AI-ModelScope/flux_text_encoders --local_dir ./models/clip/ t5xxl_fp8_e4m3fn.safetensors ### vae !modelscope download --model=AI-ModelScope/FLUX.1-dev --local_dir ./models/vae/ ae.safetensors ### lora #!modelscope download --model=FluxLora/flux-koda --local_dir ./models/loras/ araminta_k_flux_koda.safetensors !modelscope download --model=iic/In-Context-LoRA --local_dir ./models/loras/ film-storyboard.safetensors !modelscope download --model=iic/In-Context-LoRA --local_dir ./models/loras/ font-design.safetensors !modelscope download --model=iic/In-Context-LoRA --local_dir ./models/loras/ couple-profile.safetensors !modelscope download --model=iic/In-Context-LoRA --local_dir ./models/loras/ home-decoration.safetensors !modelscope download --model=iic/In-Context-LoRA --local_dir ./models/loras/ portrait-illustration.safetensors !modelscope download --model=iic/In-Context-LoRA --local_dir ./models/loras/ portrait-photography.safetensors !modelscope download --model=iic/In-Context-LoRA --local_dir ./models/loras/ ppt-templates.safetensors !modelscope download --model=iic/In-Context-LoRA --local_dir ./models/loras/ sandstorm-visual-effect.safetensors !modelscope download --model=iic/In-Context-LoRA --local_dir ./models/loras/ sparklers-visual-effect.safetensors !modelscope download --model=iic/In-Context-LoRA --local_dir ./models/loras/ visual-identity-design.safetensors
运行工作流,使用cloudflare运行工作流
!wget "https://modelscope.oss-cn-beijing.aliyuncs.com/resource/cloudflared-linux-amd64.deb" !dpkg -i cloudflared-linux-amd64.deb %cd /mnt/workspace/ComfyUI import subprocess import threading import time import socket import urllib.request def iframe_thread(port): while True: time.sleep(0.5) sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) result = sock.connect_ex(('127.0.0.1', port)) if result == 0: break sock.close() print("\nComfyUI finished loading, trying to launch cloudflared (if it gets stuck here cloudflared is having issues)\n") p = subprocess.Popen(["cloudflared", "tunnel", "--url", "http://127.0.0.1:{}".format(port)], stdout=subprocess.PIPE, stderr=subprocess.PIPE) for line in p.stderr: l = line.decode() if "trycloudflare.com " in l: print("This is the URL to access ComfyUI:", l[l.find("http"):], end='') #print(l, end='') threading.Thread(target=iframe_thread, daemon=True, args=(8188,)).start() !python main.py --dont-print-server
Notebook分享链接:https://modelscope.cn/notebook/share/ipynb/a4c365ec/ComfyUI_incontexlora.ipynb
工作流效果:
显存占用:
DiffSynth-Studio训练
In-Context LoRA 采用了通用的 LoRA 训练算法,所以大部分支持 FLUX LoRA 训练的框架都可以用来训练 In-Context LoRA,作者为我们提供了一个样例数据集,我们基于开源项目 DiffSynth-Studio 进行训练。
环境搭建
git clone https://github.com/modelscope/DiffSynth-Studio.git cd DiffSynth-Studio pip install -e .
模型下载
modelscope download --model AI-ModelScope/FLUX.1-dev --local_dir ./models/FLUX/FLUX.1-dev
数据集准备(下载链接:https://github.com/ali-vilab/In-Context-LoRA/blob/main/data/movie-shots.zip)
data/lora_datasets/movie-shots/ └── train ├── 0001.jpg ├── 0002.jpg ├── 0003.jpg ├── 0004.jpg ├── 0005.jpg ├── 0006.jpg ├── 0007.jpg ├── 0008.jpg ├── 0009.jpg ├── 0010.jpg └── metadata.csv
metadata.csv:
file_name,text 0001.jpg,"[MOVIE-SHOTS] In this image, we see a sequence of domestic life with a young girl named <Matilda> orchestrating her morning routine: [SCENE-1] <Matilda> diligently flips pancakes on the stove, demonstrating a sense of independence and skill in the kitchen, [SCENE-2] transitions to a close-up of her face that reveals a curious and thoughtful expression peeking over the counter, [SCENE-3] concludes the scene as <Matilda> sits at the breakfast table with a stack of books and magazines nearby, while she enjoys her meal, displaying a blend of intellect and simplicity in her everyday world." 0002.jpg,"[MOVIE-SHOTS] In a series of dynamic and vivid shots, the image captures the journey of <Alex>, a stylish traveler navigating a bustling train setting; [SCENE-1] beginning with <Alex> sprinting towards a departing train, clutching a large suitcase, against a backdrop of vibrant sunset hues and distant figures wrapped in daily routine, [SCENE-2] followed by a close-up as <Alex> pauses at the train door, exuding confidence with sunglasses momentarily raised and eyes set firmly on the horizon, evoking a sense of anticipation and introspection, [SCENE-3] and concluding with <Alex> stepping into a warmly-lit cabin interior, the intricate patterns on the walls and his nonchalant demeanor suggesting both the allure of exploration and the quiet introspection of new beginnings, effectively encapsulating the essence of adventure and self-discovery." 0003.jpg,"[MOVIE-SHOTS] In an animated heist sequence, [SCENE-1] the image opens with <Fox>, wearing a knit mask with hypnotic eyes, surveying the horizon against a backdrop of smokestacks and a cloudy sky, [SCENE-2] transitions to <Fox> now alongside <Badger>, also masked, exchanging a determined glance under the ominous overcast, [SCENE-3] finally converges with <Fox> joined by a team that includes a wide-eyed <Cat> and a serious-looking <Mouse>, all peering cautiously over a wooden ledge, poised for action as they navigate their bold adventure." 0004.jpg,"[MOVIE-SHOTS] In a delightful animated sequence, [SCENE-1] a young panda named <Bao> appears adorably forlorn as he lies on the floor surrounded by a scatter of vegetables, his wide eyes expressing a mix of curiosity and yearning, [SCENE-2] which quickly shifts to <Bao> gleefully rolling on his back, showcasing his playful nature amidst the same rustic setting, [SCENE-3] before finally nestling happily in a large wooden bucket, as an amused goose character, <Fei>, gazes at him affectionately, adding warmth and joy to this charming tableau." 0005.jpg,"[MOVIE-SHOTS] In a whimsical city adventure, [SCENE-1] young <Sophie> peers over a railing with a curious plush pig beside her, capturing a moment of anticipation amidst towering skyscrapers, as [SCENE-2] a bridge raises dramatically over a bustling, rain-soaked street, featuring silhouetted streetlights and gleaming car headlights highlighting the grandeur of the cityscape, while [SCENE-3] <Sophie>, now facing forward, joyfully embraces the night as city lights illuminate the vibrant urban waterfront behind her, creating an enchanting atmosphere of exploration and wonder." 0006.jpg,"[MOVIE-SHOTS] In a captivating sequence of moments, the image journey begins [SCENE-1] as <Anna> and <Tom> are seated together on a sun-dappled park bench, engaging in a sincere conversation that is accentuated by the vibrant greenery around them, [SCENE-2] then shifts to a quaint café where <Anna>, displaying a contemplative expression, rests her chin on her hand while <Tom> listens intently, their discussion punctuated by the soft clink of coffee cups and the background chatter of other patrons, [SCENE-3] concluding the visual narrative in an art gallery, where <Anna>, donned in a vivid purple coat, stands opposite <Tom> across a centerpiece of wildflowers, their shared gaze locked on the artistic arrangements, hinting at a deeper understanding cultivated through their interactions." 0007.jpg,"[MOVIE-SHOTS] In this captivating montage of scenes, we follow a poignant narrative set against an elegant winter backdrop, [SCENE-1] beginning with <Elena> gazing contemplatively into the distance, a world of thoughts wrapped in the serenity of falling snowflakes; [SCENE-2] as the ambiance shifts to a bustling ice rink adorned with twinkling lights, <Elena> now stands facing <Alexei>, both enveloped in luxurious winter attire, capturing a moment of intense, unspoken emotion amidst a crowd of skaters; [SCENE-3] the focus then centers on <Alexei>, whose earnest expression and intent gaze suggest a deep conversation, as the surrounding world fades into a blur of festive lights and gentle snowfall, weaving a story of connection and unresolved feelings within this enchanting winter setting." 0008.jpg,"[MOVIE-SHOTS] In this thrilling sequence of action-packed movie scenes, [SCENE-1] <Lola>, dressed in a casual tank top and vibrant green pants, is captured in a dynamic sprint along a city sidewalk, suggesting urgency while a character in a red jersey observes from the side, [SCENE-2] as she races against time and the traffic, her determined expression directed at an oncoming red van that adds intensity to the unfolding drama, [SCENE-3] where <Lola> is seamlessly joined by <Tom>, who pedals furiously on a bicycle beside her, both seemingly engaged in a high-stakes chase down an urban street, their determined pace reflecting a palpable tension and shared mission." 0009.jpg,"[MOVIE-SHOTS] A scenic road trip unfolds through the lens of youthful introspection and fleeting moments of serenity; [SCENE-1] <Chris> is seen enjoying the journey, his relaxed expression captured in the car's side mirror as the vehicle cruises down an open road, symbolizing freedom and escape, [SCENE-2] transitioning to a detail shot of wildflowers delicately arranged in a Sprite bottle on the dashboard, signifying a simple beauty amid the motion, [SCENE-3] culminating in a reflective moment where <Chris> sits contemplatively against a vast, tranquil sky, while in the background, <Jesse> is seen absorbed in their own world, enhancing the theme of introspection and connection to the natural surroundings." 0010.jpg,"[MOVIE-SHOTS] In this heartwarming and humorous sequence, [SCENE-1] <Charlie>, a curious toddler, mischievously smears green food coloring all over his face while giggling near a kitchen counter, bringing pure joy and a hint of chaos into the household; [SCENE-2] as the scene shifts to the family car, <Charlie> sits wide-eyed and innocent with the same green face, now snugly buckled in his car seat, capturing a moment of endearing surprise for those around him; [SCENE-3] finally, <Charlie> beams with a contagious smile, the car stopped at a cafe drive-thru, his green-masked happiness lighting up the vehicle interior, bringing laughter and warmth to everyone who sees him throughout this delightful and comedic slice of life adventure."
开始训练,训练需要 40G 显存
CUDA_VISIBLE_DEVICES="0" python examples/train/flux/train_flux_lora.py \ --pretrained_text_encoder_path models/FLUX/FLUX.1-dev/text_encoder/model.safetensors \ --pretrained_text_encoder_2_path models/FLUX/FLUX.1-dev/text_encoder_2 \ --pretrained_dit_path models/FLUX/FLUX.1-dev/flux1-dev.safetensors \ --pretrained_vae_path models/FLUX/FLUX.1-dev/ae.safetensors \ --dataset_path data/lora_datasets/movie-shots \ --output_path ./models \ --max_epochs 10 \ --steps_per_epoch 100 \ --height 1536 \ --width 1024 \ --center_crop \ --precision "bf16" \ --learning_rate 1e-4 \ --lora_rank 16 \ --lora_alpha 16 \ --use_gradient_checkpointing \ --align_to_opensource_format
训练完毕,在 models/lightning_logs
中即可找到训练好的模型文件。测试一下效果
prompt:[MOVIE-SHOTS] In a vibrant festival, [SCENE-1] we find , a shy boy, standing at the edge of a bustling carnival, eyes wide with awe at the colorful rides and laughter, [SCENE-2] transitioning to him reluctantly trying a daring game, his friends cheering him on, [SCENE-3] culminating in a triumphant moment as he wins a giant stuffed bear, his face beaming with pride as he holds it up for all to see.
点击链接https://modelscope.cn/models/iic/In-Context-LoRA?from=alizishequ__text,即可跳转模型链接~