Diffusers 文件

T2I-Adapter

Hugging Face's logo
加入 Hugging Face 社群

並獲得增強的文件體驗

開始使用

T2I-Adapter

T2I-Adapter 是一種介面卡,可實現類似 ControlNet 的可控生成。T2I-Adapter 透過學習控制訊號(例如,深度圖)與預訓練模型內部知識之間的*對映*來工作。介面卡插入到基礎模型中,以便在生成過程中根據控制訊號提供額外指導。

載入一個以特定控制(如 Canny 邊緣)為條件的 T2I-Adapter,並將其傳遞給 from_pretrained() 中的流水線。

import torch
from diffusers import T2IAdapter, StableDiffusionXLAdapterPipeline, AutoencoderKL

t2i_adapter = T2IAdapter.from_pretrained(
    "TencentARC/t2i-adapter-canny-sdxl-1.0",
    torch_dtype=torch.float16,
)

使用 opencv-python 生成 Canny 影像。

import cv2
import numpy as np
from PIL import Image
from diffusers.utils import load_image

original_image = load_image(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/non-enhanced-prompt.png"
)

image = np.array(original_image)

low_threshold = 100
high_threshold = 200

image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)

將 Canny 影像傳遞給流水線以生成影像。

vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipeline = StableDiffusionXLAdapterPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    adapter=t2i_adapter,
    vae=vae,
    torch_dtype=torch.float16,
).to("cuda")

prompt = """
A photorealistic overhead image of a cat reclining sideways in a flamingo pool floatie holding a margarita. 
The cat is floating leisurely in the pool and completely relaxed and happy.
"""

pipeline(
    prompt, 
    image=canny_image,
    num_inference_steps=100, 
    guidance_scale=10,
).images[0]
Generated image (prompt only)
原始影像
Control image (Canny edges)
Canny 影像
Generated image (ControlNet + prompt)
生成的影像

MultiAdapter

您可以使用 MultiAdapter 類組合多個控制,例如 Canny 影像和深度圖。

下面的示例組合了 Canny 影像和深度圖。

將控制影像和 T2I-Adapter 作為列表載入。

import torch
from diffusers.utils import load_image
from diffusers import StableDiffusionXLAdapterPipeline, AutoencoderKL, MultiAdapter, T2IAdapter

canny_image = load_image(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/canny-cat.png"
)
depth_image = load_image(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl_depth_image.png"
)
controls = [canny_image, depth_image]
prompt = ["""
a relaxed rabbit sitting on a striped towel next to a pool with a tropical drink nearby, 
bright sunny day, vacation scene, 35mm photograph, film, professional, 4k, highly detailed
"""]

adapters = MultiAdapter(
    [
        T2IAdapter.from_pretrained("TencentARC/t2i-adapter-canny-sdxl-1.0", torch_dtype=torch.float16),
        T2IAdapter.from_pretrained("TencentARC/t2i-adapter-depth-midas-sdxl-1.0", torch_dtype=torch.float16),
    ]
)

將介面卡、提示和控制影像傳遞給 StableDiffusionXLAdapterPipeline。使用 adapter_conditioning_scale 引數來確定每個控制的權重。

vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipeline = StableDiffusionXLAdapterPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    vae=vae,
    adapter=adapters,
).to("cuda")

pipeline(
    prompt,
    image=controls,
    height=1024,
    width=1024,
    adapter_conditioning_scale=[0.7, 0.7]
).images[0]
Generated image (prompt only)
Canny 影像
Control image (Canny edges)
深度圖
Generated image (ControlNet + prompt)
生成的影像
< > 在 GitHub 上更新

© . This site is unofficial and not affiliated with Hugging Face, Inc.