🧨 Diffusers 迎來 Stable Diffusion 3.5 Large

釋出於 2024 年 10 月 22 日

在 GitHub 上更新

贊

Apolinário from multimodal AI art

multimodalart

Alvaro Somoza

OzzyGT

Aritra Roy Gosthipaty

ariG23498

Stable Diffusion 3.5 是其前身 Stable Diffusion 3 的改進版本。截至今日，這些模型已在 Hugging Face Hub 上可用，並可與 🧨 Diffusers 配合使用。

此次釋出包含兩個檢查點

一個大型（8B）模型
一個大型（8B）時間步長蒸餾模型，支援少量步驟推理

在這篇文章中，我們將重點介紹如何在 Diffusers 中使用 Stable Diffusion 3.5（SD3.5），涵蓋推理和訓練。

架構變化

SD3.5 (large) 的 Transformer 架構與 SD3 (medium) 非常相似，有以下變化：

QK 歸一化：對於訓練大型 Transformer 模型，QK 歸一化已成為標準，SD3.5 Large 也不例外。
雙注意力層：SD3.5 不再對 MMDiT 塊中每種模態流使用單注意力層，而是使用雙注意力層。

文字編碼器、VAE 和噪聲排程器的其餘細節與 SD3 Medium 完全相同。有關 SD3 的更多資訊，我們建議查閱原始論文。

在 Diffusers 中使用 SD3.5

確保您已安裝最新版本的 Diffusers

pip install -U diffusers

由於模型是受限的，在使用 Diffusers 之前，您需要先訪問 Stable Diffusion 3.5 Large Hugging Face 頁面，填寫表格並接受協議。一旦您獲得許可權，您需要登入，以便您的系統知道您已接受協議。使用以下命令登入：

huggingface-cli login

以下程式碼片段將以 torch.bfloat16 精度下載 8B 引數版本的 SD3.5。這是 Stability AI 釋出原始檢查點時使用的格式，也是執行推理的推薦方式。

import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3.5-large", torch_dtype=torch.bfloat16
).to("cuda")

image = pipe(
    prompt="a photo of a cat holding a sign that says hello world",
    negative_prompt="",
    num_inference_steps=40,
    height=1024,
    width=1024,
    guidance_scale=4.5,
).images[0]

image.save("sd3_hello_world.png")

本次釋出還附帶了一個**“時間步長蒸餾”**模型，它消除了無分類器引導，並允許我們在更少的步驟（通常是 4-8 步）內生成影像。

import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3.5-large-turbo", torch_dtype=torch.bfloat16
).to("cuda")

image = pipe(
    prompt="a photo of a cat holding a sign that says hello world",
    num_inference_steps=4,
    height=1024,
    width=1024,
    guidance_scale=1.0,
).images[0]

image.save("sd3_hello_world.png")

我們 SD3 部落格文章和 Diffusers 官方文件中展示的所有示例都應該已經適用於 SD3.5。特別是，這兩個資源都深入探討了最佳化執行推理所需的記憶體。由於 SD3.5 Large 比 SD3 Medium 大得多，記憶體最佳化對於在消費級介面上執行推理至關重要。

使用量化執行推理

Diffusers 原生支援使用 bitsandbytes 量化，這能進一步最佳化記憶體。

首先，請確保安裝所有必要的庫

pip install -Uq git+https://github.com/huggingface/transformers@main
pip install -Uq bitsandbytes

然後以 “NF4”精度載入 Transformer

from diffusers import BitsAndBytesConfig, SD3Transformer2DModel
import torch

model_id = "stabilityai/stable-diffusion-3.5-large"
nf4_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)
model_nf4 = SD3Transformer2DModel.from_pretrained(
    model_id,
    subfolder="transformer",
    quantization_config=nf4_config,
    torch_dtype=torch.bfloat16
)

現在，我們準備好執行推理了

from diffusers import StableDiffusion3Pipeline

pipeline = StableDiffusion3Pipeline.from_pretrained(
    model_id, 
    transformer=model_nf4,
    torch_dtype=torch.bfloat16
)
pipeline.enable_model_cpu_offload()

prompt = "A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus, basking in a river of melted butter amidst a breakfast-themed landscape. It features the distinctive, bulky body shape of a hippo. However, instead of the usual grey skin, the creature's body resembles a golden-brown, crispy waffle fresh off the griddle. The skin is textured with the familiar grid pattern of a waffle, each square filled with a glistening sheen of syrup. The environment combines the natural habitat of a hippo with elements of a breakfast table setting, a river of warm, melted butter, with oversized utensils or plates peeking out from the lush, pancake-like foliage in the background, a towering pepper mill standing in for a tree.  As the sun rises in this fantastical world, it casts a warm, buttery glow over the scene. The creature, content in its butter river, lets out a yawn. Nearby, a flock of birds take flight"
image = pipeline(
    prompt=prompt,
    negative_prompt="",
    num_inference_steps=28,
    guidance_scale=4.5,
    max_sequence_length=512,
).images[0]
image.save("whimsical.png")

您可以在 BitsAndBytesConfig 中控制其他引數。有關詳細資訊，請參閱文件。

還可以直接載入使用與上述 nf4_config 相同的量化模型。這對於記憶體較低的機器特別有用。有關端到端示例，請參閱此 Colab Notebook。

使用量化訓練 SD3.5 Large 的 LoRA

藉助 bitsandbytes 和 peft 等庫，可以在擁有 24GB 視訊記憶體的消費級 GPU 卡上對 SD3.5 Large 等大型模型進行微調。利用我們現有的 SD3 訓練指令碼訓練 LoRA 已經成為可能。以下訓練命令已經有效：

accelerate launch train_dreambooth_lora_sd3.py \
  --pretrained_model_name_or_path="stabilityai/stable-diffusion-3.5-large"  \
  --dataset_name="Norod78/Yarn-art-style" \
  --output_dir="yart_art_sd3-5_lora" \
  --mixed_precision="bf16" \
  --instance_prompt="Frog, yarn art style" \
  --caption_column="text"\
  --resolution=768 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=4e-4 \
  --report_to="wandb" \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=700 \
  --rank=16 \
  --seed="0" \
  --push_to_hub

然而，為了使其與量化配合使用，我們需要調整幾個引數。下面，我們提供如何實現這些的指導：

我們使用量化配置初始化 transformer，或者直接載入量化檢查點。
然後，我們使用 peft 中的 prepare_model_for_kbit_training() 函式對其進行準備。
由於 peft 對 bitsandbytes 的強大支援，其餘過程保持不變！

請參閱此示例指令碼以獲取更完整的示例。

使用單檔案載入 Stable Diffusion 3.5 Transformer

您可以使用 Stability AI 釋出的原始檢查點檔案，透過 from_single_file 方法載入 Stable Diffusion 3.5 Transformer 模型

import torch
from diffusers import SD3Transformer2DModel, StableDiffusion3Pipeline

transformer = SD3Transformer2DModel.from_single_file(
    "https://huggingface.co/stabilityai/stable-diffusion-3.5-large-turbo/blob/main/sd3.5_large.safetensors",
    torch_dtype=torch.bfloat16,
)
pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3.5-large",
    transformer=transformer,
    torch_dtype=torch.bfloat16,
)
pipe.enable_model_cpu_offload()
image = pipe("a cat holding a sign that says hello world").images[0]
image.save("sd35.png")

重要連結

Stable Diffusion 3.5 Large 在 Hub 上的集合
Diffusers 關於 Stable Diffusion 3.5 的官方文件
用於量化推理的 Colab Notebook
訓練 LoRA
Stable Diffusion 3 論文
Stable Diffusion 3 部落格文章

鳴謝：感謝 Daniel Frank 為本部落格文章縮圖提供背景照片。感謝 Pedro Cuenca 和 Tom Aarsen 對帖子草稿的審閱。

更多部落格文章

在消費級硬體上 (LoRA) 微調 FLUX.1-dev

作者： 2025 年 6 月 19 日 • 83

探索 Diffusers 中的量化後端

作者： 2025 年 5 月 21 日 • 39

社群

透過拖放到文字輸入框、貼上或點選此處上傳圖片、音訊和影片。

點選或貼上此處以上傳圖片

· 註冊或登入發表評論

贊