Diffusers 文件
T2I-Adapter
加入 Hugging Face 社群
並獲得增強的文件體驗
開始使用
T2I-Adapter
T2I-Adapter 是一種介面卡,可實現類似 ControlNet 的可控生成。T2I-Adapter 透過學習控制訊號(例如,深度圖)與預訓練模型內部知識之間的*對映*來工作。介面卡插入到基礎模型中,以便在生成過程中根據控制訊號提供額外指導。
載入一個以特定控制(如 Canny 邊緣)為條件的 T2I-Adapter,並將其傳遞給 from_pretrained() 中的流水線。
import torch
from diffusers import T2IAdapter, StableDiffusionXLAdapterPipeline, AutoencoderKL
t2i_adapter = T2IAdapter.from_pretrained(
"TencentARC/t2i-adapter-canny-sdxl-1.0",
torch_dtype=torch.float16,
)
使用 opencv-python 生成 Canny 影像。
import cv2
import numpy as np
from PIL import Image
from diffusers.utils import load_image
original_image = load_image(
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/non-enhanced-prompt.png"
)
image = np.array(original_image)
low_threshold = 100
high_threshold = 200
image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)
將 Canny 影像傳遞給流水線以生成影像。
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipeline = StableDiffusionXLAdapterPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
adapter=t2i_adapter,
vae=vae,
torch_dtype=torch.float16,
).to("cuda")
prompt = """
A photorealistic overhead image of a cat reclining sideways in a flamingo pool floatie holding a margarita.
The cat is floating leisurely in the pool and completely relaxed and happy.
"""
pipeline(
prompt,
image=canny_image,
num_inference_steps=100,
guidance_scale=10,
).images[0]



MultiAdapter
您可以使用 MultiAdapter
類組合多個控制,例如 Canny 影像和深度圖。
下面的示例組合了 Canny 影像和深度圖。
將控制影像和 T2I-Adapter 作為列表載入。
import torch
from diffusers.utils import load_image
from diffusers import StableDiffusionXLAdapterPipeline, AutoencoderKL, MultiAdapter, T2IAdapter
canny_image = load_image(
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/canny-cat.png"
)
depth_image = load_image(
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl_depth_image.png"
)
controls = [canny_image, depth_image]
prompt = ["""
a relaxed rabbit sitting on a striped towel next to a pool with a tropical drink nearby,
bright sunny day, vacation scene, 35mm photograph, film, professional, 4k, highly detailed
"""]
adapters = MultiAdapter(
[
T2IAdapter.from_pretrained("TencentARC/t2i-adapter-canny-sdxl-1.0", torch_dtype=torch.float16),
T2IAdapter.from_pretrained("TencentARC/t2i-adapter-depth-midas-sdxl-1.0", torch_dtype=torch.float16),
]
)
將介面卡、提示和控制影像傳遞給 StableDiffusionXLAdapterPipeline。使用 adapter_conditioning_scale
引數來確定每個控制的權重。
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipeline = StableDiffusionXLAdapterPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
vae=vae,
adapter=adapters,
).to("cuda")
pipeline(
prompt,
image=controls,
height=1024,
width=1024,
adapter_conditioning_scale=[0.7, 0.7]
).images[0]


