Diffusers 文件

ControlNet

Hugging Face's logo
加入 Hugging Face 社群

並獲得增強的文件體驗

開始使用

ControlNet

ControlNet 是一種介面卡,可以實現可控生成,例如生成特定**姿勢**的貓影像,或遵循**特定**貓的草圖中的線條。它的工作原理是新增一個較小的“零卷積”層網路,並逐步訓練它們以避免干擾原始模型。原始模型引數被凍結,以避免重新訓練它。

ControlNet 透過額外的視覺資訊或“結構控制”(如邊緣檢測、深度圖、人體姿態等)進行條件控制,這些資訊可以與文字提示結合,以生成受視覺輸入引導的影像。

ControlNet 適用於許多模型,例如 FluxHunyuan-DiTStable Diffusion 3 等。本指南中的示例使用 Flux 和 Stable Diffusion XL。

載入一個基於特定控制(例如邊緣檢測)進行條件控制的 ControlNet,並將其傳遞給 from_pretrained() 中的管道。

文字到影像
影像到影像
影像修復

使用 opencv-python 生成邊緣影像。

import cv2
import numpy as np
from PIL import Image
from diffusers.utils import load_image

original_image = load_image(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/non-enhanced-prompt.png"
)

image = np.array(original_image)

low_threshold = 100
high_threshold = 200

image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)

將邊緣影像傳遞給管道。使用 controlnet_conditioning_scale 引數來確定分配給控制的權重。

import torch
from diffusers.utils import load_image
from diffusers import FluxControlNetPipeline, FluxControlNetModel

controlnet = FluxControlNetModel.from_pretrained(
    "InstantX/FLUX.1-dev-Controlnet-Canny", torch_dtype=torch.bfloat16
)
pipeline = FluxControlNetPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", controlnet=controlnet, torch_dtype=torch.bfloat16
).to("cuda")

prompt = """
A photorealistic overhead image of a cat reclining sideways in a flamingo pool floatie holding a margarita. 
The cat is floating leisurely in the pool and completely relaxed and happy.
"""

pipeline(
    prompt, 
    control_image=canny_image,
    controlnet_conditioning_scale=0.5,
    num_inference_steps=50, 
    guidance_scale=3.5,
).images[0]
Generated image (prompt only)
原始影像
Control image (Canny edges)
邊緣影像
Generated image (ControlNet + prompt)
生成的影像

多 ControlNet

您可以組合多個 ControlNet 條件,例如邊緣影像和深度圖,以建立**多 ControlNet**。為了獲得最佳結果,您應該遮罩條件,使它們不重疊,並嘗試不同的 controlnet_conditioning_scale 引數來調整分配給每個控制輸入的權重。

以下示例組合了邊緣影像和深度圖。

將 ControlNet 作為列表傳遞給管道,並調整影像大小至預期輸入大小。

import torch
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel, AutoencoderKL

controlnets = [
    ControlNetModel.from_pretrained(
        "diffusers/controlnet-depth-sdxl-1.0-small", torch_dtype=torch.float16
    ),
    ControlNetModel.from_pretrained(
        "diffusers/controlnet-canny-sdxl-1.0", torch_dtype=torch.float16,
    ),
]

vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipeline = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnets, vae=vae, torch_dtype=torch.float16
).to("cuda")

prompt = """
a relaxed rabbit sitting on a striped towel next to a pool with a tropical drink nearby, 
bright sunny day, vacation scene, 35mm photograph, film, professional, 4k, highly detailed
"""
negative_prompt = "lowres, bad anatomy, worst quality, low quality, deformed, ugly"

images = [canny_image.resize((1024, 1024)), depth_image.resize((1024, 1024))]

pipeline(
    prompt,
    negative_prompt=negative_prompt,
    image=images,
    num_inference_steps=100,
    controlnet_conditioning_scale=[0.5, 0.5],
    strength=0.7,
).images[0]
Generated image (prompt only)
邊緣影像
Control image (Canny edges)
深度圖
Generated image (ControlNet + prompt)
生成的影像

猜測模式

猜測模式**僅**從控制輸入(邊緣檢測、深度圖、姿勢等)生成影像,而無需提示的指導。它根據塊深度以固定比例調整 ControlNet 的輸出殘差。較早的 DownBlock 僅按 0.1 縮放,而 MidBlock 則完全按 1.0 縮放。

import torch
from diffusers.utils import load_iamge
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel

controlnet = ControlNetModel.from_pretrained(
  "diffusers/controlnet-canny-sdxl-1.0", torch_dtype=torch.float16
)
pipeline = StableDiffusionXLControlNetPipeline.from_pretrained(
  "stabilityai/stable-diffusion-xl-base-1.0",
  controlnet=controlnet,
  torch_dtype=torch.float16
).to("cuda")

canny_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/canny-cat.png")
pipeline(
  "",
  image=canny_image,
  guess_mode=True
).images[0]
Control image (Canny edges)
邊緣影像
Generated image (Guess mode)
生成的影像
< > 在 GitHub 上更新

© . This site is unofficial and not affiliated with Hugging Face, Inc.