Diffusers 文件
ControlNet
並獲得增強的文件體驗
開始使用
ControlNet
ControlNet 是一種介面卡,可以實現可控生成,例如生成特定**姿勢**的貓影像,或遵循**特定**貓的草圖中的線條。它的工作原理是新增一個較小的“零卷積”層網路,並逐步訓練它們以避免干擾原始模型。原始模型引數被凍結,以避免重新訓練它。
ControlNet 透過額外的視覺資訊或“結構控制”(如邊緣檢測、深度圖、人體姿態等)進行條件控制,這些資訊可以與文字提示結合,以生成受視覺輸入引導的影像。
ControlNet 適用於許多模型,例如 Flux、Hunyuan-DiT、Stable Diffusion 3 等。本指南中的示例使用 Flux 和 Stable Diffusion XL。
載入一個基於特定控制(例如邊緣檢測)進行條件控制的 ControlNet,並將其傳遞給 from_pretrained() 中的管道。
使用 opencv-python 生成邊緣影像。
import cv2
import numpy as np
from PIL import Image
from diffusers.utils import load_image
original_image = load_image(
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/non-enhanced-prompt.png"
)
image = np.array(original_image)
low_threshold = 100
high_threshold = 200
image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)
將邊緣影像傳遞給管道。使用 controlnet_conditioning_scale
引數來確定分配給控制的權重。
import torch
from diffusers.utils import load_image
from diffusers import FluxControlNetPipeline, FluxControlNetModel
controlnet = FluxControlNetModel.from_pretrained(
"InstantX/FLUX.1-dev-Controlnet-Canny", torch_dtype=torch.bfloat16
)
pipeline = FluxControlNetPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev", controlnet=controlnet, torch_dtype=torch.bfloat16
).to("cuda")
prompt = """
A photorealistic overhead image of a cat reclining sideways in a flamingo pool floatie holding a margarita.
The cat is floating leisurely in the pool and completely relaxed and happy.
"""
pipeline(
prompt,
control_image=canny_image,
controlnet_conditioning_scale=0.5,
num_inference_steps=50,
guidance_scale=3.5,
).images[0]



多 ControlNet
您可以組合多個 ControlNet 條件,例如邊緣影像和深度圖,以建立**多 ControlNet**。為了獲得最佳結果,您應該遮罩條件,使它們不重疊,並嘗試不同的 controlnet_conditioning_scale
引數來調整分配給每個控制輸入的權重。
以下示例組合了邊緣影像和深度圖。
將 ControlNet 作為列表傳遞給管道,並調整影像大小至預期輸入大小。
import torch
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel, AutoencoderKL
controlnets = [
ControlNetModel.from_pretrained(
"diffusers/controlnet-depth-sdxl-1.0-small", torch_dtype=torch.float16
),
ControlNetModel.from_pretrained(
"diffusers/controlnet-canny-sdxl-1.0", torch_dtype=torch.float16,
),
]
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipeline = StableDiffusionXLControlNetPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnets, vae=vae, torch_dtype=torch.float16
).to("cuda")
prompt = """
a relaxed rabbit sitting on a striped towel next to a pool with a tropical drink nearby,
bright sunny day, vacation scene, 35mm photograph, film, professional, 4k, highly detailed
"""
negative_prompt = "lowres, bad anatomy, worst quality, low quality, deformed, ugly"
images = [canny_image.resize((1024, 1024)), depth_image.resize((1024, 1024))]
pipeline(
prompt,
negative_prompt=negative_prompt,
image=images,
num_inference_steps=100,
controlnet_conditioning_scale=[0.5, 0.5],
strength=0.7,
).images[0]



猜測模式
猜測模式**僅**從控制輸入(邊緣檢測、深度圖、姿勢等)生成影像,而無需提示的指導。它根據塊深度以固定比例調整 ControlNet 的輸出殘差。較早的 DownBlock
僅按 0.1
縮放,而 MidBlock
則完全按 1.0
縮放。
import torch
from diffusers.utils import load_iamge
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel
controlnet = ControlNetModel.from_pretrained(
"diffusers/controlnet-canny-sdxl-1.0", torch_dtype=torch.float16
)
pipeline = StableDiffusionXLControlNetPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
controlnet=controlnet,
torch_dtype=torch.float16
).to("cuda")
canny_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/canny-cat.png")
pipeline(
"",
image=canny_image,
guess_mode=True
).images[0]

