1. 引言
InstantStyle 是一个通用框架,它采用两种简单但有效的技术来实现风格和内容与参考图像的有效分离。
将内容与图像分离。受益于 CLIP 全局特征的良好表征,从图像特征中减去内容文本特征后,可以显式地解耦样式和内容。
仅注入样式块:根据经验,深层网络的每一层都会捕获不同的语义信息,在工作中关键观察是存在两种特定的注意力层处理风格。具体来说,InstantStyle分别找到向上的blocks.0.attentions.1和向下的blocks.2.attentions.1捕捉风格(颜色、材质、氛围)和空间布局(结构、构图)。
下面是一些示例:
InstantStyle项目也登上了Github global Trending榜!恭喜项目组!
2. InstantStyle体验最佳实践
风格迁移:
prompt:a girl, masterpiece, best quality, high quality
prompt:a cat, masterpiece, best quality, high quality
风格迁移+controlnet
prompt:a Chinese girl, masterpiece, best quality, high quality
InstantStyle推理代码:
环境安装和模型下载:
!git clone https://github.com/InstantStyle/InstantStyle.git %cd InstantStyle !git clone https://www.modelscope.cn/AI-ModelScope/IP-Adapter.git !mv IP-Adapter/models models !mv IP-Adapter/sdxl_models sdxl_models
模型推理:
import torch from diffusers import StableDiffusionXLPipeline from modelscope import snapshot_download from PIL import Image from ip_adapter import IPAdapterXL base_model_path = snapshot_download("AI-ModelScope/stable-diffusion-xl-base-1.0") image_encoder_path = "sdxl_models/image_encoder" ip_ckpt = "sdxl_models/ip-adapter_sdxl.bin" device = "cuda" # load SDXL pipeline pipe = StableDiffusionXLPipeline.from_pretrained( base_model_path, torch_dtype=torch.float16, add_watermarker=False, ) # reduce memory consumption pipe.enable_vae_tiling() # load ip-adapter # target_blocks=["block"] for original IP-Adapter # target_blocks=["up_blocks.0.attentions.1"] for style blocks only # target_blocks = ["up_blocks.0.attentions.1", "down_blocks.2.attentions.1"] # for style+layout blocks ip_model = IPAdapterXL(pipe, image_encoder_path, ip_ckpt, device, target_blocks=["up_blocks.0.attentions.1"]) image = "./assets/0.jpg" image = Image.open(image) image.resize((512, 512)) # generate image variations with only image prompt images = ip_model.generate(pil_image=image, prompt="a cat, masterpiece, best quality, high quality", negative_prompt= "text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry", scale=1.0, guidance_scale=5, num_samples=1, num_inference_steps=30, seed=42, #neg_content_prompt="a rabbit", #neg_content_scale=0.5, ) images[0].save("result.png")
WebUI-Demo搭建
clone创空间代码:
git clone https://www.modelscope.cn/studios/instantx/InstantStyle.git cd InstantStyle python app.py
前端web应用展现: