ControlNet

ControlNet 是一種介面卡，可以實現可控生成，例如生成特定**姿勢**的貓影像，或遵循**特定**貓的草圖中的線條。它的工作原理是新增一個較小的“零卷積”層網路，並逐步訓練它們以避免干擾原始模型。原始模型引數被凍結，以避免重新訓練它。

ControlNet 透過額外的視覺資訊或“結構控制”（如邊緣檢測、深度圖、人體姿態等）進行條件控制，這些資訊可以與文字提示結合，以生成受視覺輸入引導的影像。

ControlNet 適用於許多模型，例如 Flux、Hunyuan-DiT、Stable Diffusion 3 等。本指南中的示例使用 Flux 和 Stable Diffusion XL。

載入一個基於特定控制（例如邊緣檢測）進行條件控制的 ControlNet，並將其傳遞給 from_pretrained() 中的管道。

文字到影像

影像到影像

影像修復

多 ControlNet

您可以組合多個 ControlNet 條件，例如邊緣影像和深度圖，以建立**多 ControlNet**。為了獲得最佳結果，您應該遮罩條件，使它們不重疊，並嘗試不同的 controlnet_conditioning_scale 引數來調整分配給每個控制輸入的權重。

以下示例組合了邊緣影像和深度圖。

將 ControlNet 作為列表傳遞給管道，並調整影像大小至預期輸入大小。

import torch
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel, AutoencoderKL

controlnets = [
    ControlNetModel.from_pretrained(
        "diffusers/controlnet-depth-sdxl-1.0-small", torch_dtype=torch.float16
    ),
    ControlNetModel.from_pretrained(
        "diffusers/controlnet-canny-sdxl-1.0", torch_dtype=torch.float16,
    ),
]

vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipeline = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnets, vae=vae, torch_dtype=torch.float16
).to("cuda")

prompt = """
a relaxed rabbit sitting on a striped towel next to a pool with a tropical drink nearby, 
bright sunny day, vacation scene, 35mm photograph, film, professional, 4k, highly detailed
"""
negative_prompt = "lowres, bad anatomy, worst quality, low quality, deformed, ugly"

images = [canny_image.resize((1024, 1024)), depth_image.resize((1024, 1024))]

pipeline(
    prompt,
    negative_prompt=negative_prompt,
    image=images,
    num_inference_steps=100,
    controlnet_conditioning_scale=[0.5, 0.5],
    strength=0.7,
).images[0]

Generated image (ControlNet + prompt) — 生成的影像

猜測模式

猜測模式**僅**從控制輸入（邊緣檢測、深度圖、姿勢等）生成影像，而無需提示的指導。它根據塊深度以固定比例調整 ControlNet 的輸出殘差。較早的 DownBlock 僅按 0.1 縮放，而 MidBlock 則完全按 1.0 縮放。

import torch
from diffusers.utils import load_iamge
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel

controlnet = ControlNetModel.from_pretrained(
  "diffusers/controlnet-canny-sdxl-1.0", torch_dtype=torch.float16
)
pipeline = StableDiffusionXLControlNetPipeline.from_pretrained(
  "stabilityai/stable-diffusion-xl-base-1.0",
  controlnet=controlnet,
  torch_dtype=torch.float16
).to("cuda")

canny_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/canny-cat.png")
pipeline(
  "",
  image=canny_image,
  guess_mode=True
).images[0]

< > 在 GitHub 上更新

Diffusers

ControlNet

多 ControlNet

猜測模式