Diffusers 文件

潛在一致性蒸餾

擴散模型

加入 Hugging Face 社群

並獲得增強的文件體驗

在模型、資料集和 Spaces 上進行協作

透過加速推理獲得更快的示例

切換文件主題

開始使用

潛在一致性蒸餾

潛在一致性模型 (LCM) 能夠在短短幾個步驟內生成高質量影像，這代表著一個巨大的飛躍，因為許多流水線需要至少 25 個以上的步驟。LCM 是透過將潛在一致性蒸餾方法應用於任何 Stable Diffusion 模型而產生的。此方法透過將*一步引導蒸餾*應用於潛在空間，並結合*跳步*方法來一致地跳過時間步以加速蒸餾過程（有關更多詳細資訊，請參閱論文的第 4.1、4.2 和 4.3 節）。

如果您在 vRAM 有限的 GPU 上進行訓練，請嘗試啟用 gradient_checkpointing、gradient_accumulation_steps 和 mixed_precision 以減少記憶體使用並加速訓練。您可以透過使用 xFormers 和 bitsandbytes 的 8 位最佳化器啟用記憶體高效注意力來進一步減少記憶體使用。

本指南將探討 train_lcm_distill_sd_wds.py 指令碼，幫助您更熟悉它，以及如何將其適應您自己的用例。

在執行指令碼之前，請確保從原始碼安裝庫

git clone https://github.com/huggingface/diffusers
cd diffusers
pip install .

然後導航到包含訓練指令碼的示例資料夾並安裝您正在使用的指令碼所需的依賴項

cd examples/consistency_distillation
pip install -r requirements.txt

🤗 Accelerate 是一個幫助您在多個 GPU/TPU 或使用混合精度進行訓練的庫。它將根據您的硬體和環境自動配置您的訓練設定。請檢視 🤗 Accelerate 快速入門以瞭解更多資訊。

初始化 🤗 Accelerate 環境（嘗試啟用 torch.compile 以顯著加速訓練）

accelerate config

要設定預設的 🤗 Accelerate 環境而不選擇任何配置

accelerate config default

或者如果您的環境不支援互動式 shell（例如筆記本），您可以使用

from accelerate.utils import write_basic_config

write_basic_config()

最後，如果您想在自己的資料集上訓練模型，請檢視建立訓練資料集指南，瞭解如何建立與訓練指令碼相容的資料集。

指令碼引數

以下部分重點介紹訓練指令碼中對於理解如何修改它很重要的部分，但它並未詳細介紹指令碼的各個方面。如果您有興趣瞭解更多資訊，請隨時閱讀指令碼，如果您有任何問題或疑慮，請告訴我們。

訓練指令碼提供了許多引數來幫助您自定義訓練執行。所有引數及其描述都可以在 parse_args() 函式中找到。此函式為每個引數提供了預設值，例如訓練批處理大小和學習率，但您也可以根據需要自定義訓練命令中的值。

例如，要使用 fp16 格式的混合精度加速訓練，請將 --mixed_precision 引數新增到訓練命令中

accelerate launch train_lcm_distill_sd_wds.py \
  --mixed_precision="fp16"

大多數引數與文字到影像訓練指南中的引數相同，因此本指南將重點關注與潛在一致性蒸餾相關的引數。

--pretrained_teacher_model：用作教師模型的預訓練潛在擴散模型的路徑
--pretrained_vae_model_name_or_path：預訓練 VAE 的路徑；SDXL VAE 已知存在數值不穩定性，因此此引數允許您指定替代 VAE（例如 madebyollin 的此 VAE，它在 fp16 中有效）
--w_min 和 --w_max：引導比例取樣的最小和最大引導比例值
--num_ddim_timesteps：DDIM 取樣的時間步數
--loss_type：用於計算潛在一致性蒸餾的損失型別（L2 或 Huber）；Huber 損失通常更受青睞，因為它對異常值更魯棒
--huber_c：Huber 損失引數

訓練指令碼

訓練指令碼首先建立一個數據集類——Text2ImageDataset——用於預處理影像並建立訓練資料集。

def transform(example):
    image = example["image"]
    image = TF.resize(image, resolution, interpolation=transforms.InterpolationMode.BILINEAR)

    c_top, c_left, _, _ = transforms.RandomCrop.get_params(image, output_size=(resolution, resolution))
    image = TF.crop(image, c_top, c_left, resolution, resolution)
    image = TF.to_tensor(image)
    image = TF.normalize(image, [0.5], [0.5])

    example["image"] = image
    return example

為了提高讀取和寫入儲存在雲端的大型資料集的效能，此指令碼使用 WebDataset 格式來建立預處理流水線，以應用變換並建立用於訓練的資料集和資料載入器。影像經過處理並饋送到訓練迴圈，而無需首先下載完整資料集。

processing_pipeline = [
    wds.decode("pil", handler=wds.ignore_and_continue),
    wds.rename(image="jpg;png;jpeg;webp", text="text;txt;caption", handler=wds.warn_and_continue),
    wds.map(filter_keys({"image", "text"})),
    wds.map(transform),
    wds.to_tuple("image", "text"),
]

在 main() 函式中，載入了所有必要的元件，如噪聲排程器、分詞器、文字編碼器和 VAE。教師 UNet 也在此處載入，然後您可以從教師 UNet 建立一個學生 UNet。學生 UNet 在訓練期間由最佳化器更新。

teacher_unet = UNet2DConditionModel.from_pretrained(
    args.pretrained_teacher_model, subfolder="unet", revision=args.teacher_revision
)

unet = UNet2DConditionModel(**teacher_unet.config)
unet.load_state_dict(teacher_unet.state_dict(), strict=False)
unet.train()

現在您可以建立最佳化器來更新 UNet 引數

optimizer = optimizer_class(
    unet.parameters(),
    lr=args.learning_rate,
    betas=(args.adam_beta1, args.adam_beta2),
    weight_decay=args.adam_weight_decay,
    eps=args.adam_epsilon,
)

建立資料集

dataset = Text2ImageDataset(
    train_shards_path_or_url=args.train_shards_path_or_url,
    num_train_examples=args.max_train_samples,
    per_gpu_batch_size=args.train_batch_size,
    global_batch_size=args.train_batch_size * accelerator.num_processes,
    num_workers=args.dataloader_num_workers,
    resolution=args.resolution,
    shuffle_buffer_size=1000,
    pin_memory=True,
    persistent_workers=True,
)
train_dataloader = dataset.train_dataloader

接下來，您已準備好設定訓練迴圈並實現潛在一致性蒸餾方法（有關更多詳細資訊，請參閱論文中的演算法 1）。指令碼的這一部分負責向潛在空間新增噪聲、取樣並建立引導比例嵌入，以及從噪聲中預測原始影像。

pred_x_0 = predicted_origin(
    noise_pred,
    start_timesteps,
    noisy_model_input,
    noise_scheduler.config.prediction_type,
    alpha_schedule,
    sigma_schedule,
)

model_pred = c_skip_start * noisy_model_input + c_out_start * pred_x_0

它接下來獲取教師模型預測和 LCM 預測，計算損失，然後將其反向傳播到 LCM。

if args.loss_type == "l2":
    loss = F.mse_loss(model_pred.float(), target.float(), reduction="mean")
elif args.loss_type == "huber":
    loss = torch.mean(
        torch.sqrt((model_pred.float() - target.float()) ** 2 + args.huber_c**2) - args.huber_c
    )

如果您想了解有關訓練迴圈如何工作的更多資訊，請檢視理解流水線、模型和排程器教程，其中分解了去噪過程的基本模式。

啟動指令碼

現在您已準備好啟動訓練指令碼並開始蒸餾！

在本指南中，您將使用 --train_shards_path_or_url 來指定儲存在 Hub 上此處的 Conceptual Captions 12M 資料集的路徑。將 MODEL_DIR 環境變數設定為教師模型的名稱，將 OUTPUT_DIR 設定為您要儲存模型的位置。

export MODEL_DIR="stable-diffusion-v1-5/stable-diffusion-v1-5"
export OUTPUT_DIR="path/to/saved/model"

accelerate launch train_lcm_distill_sd_wds.py \
    --pretrained_teacher_model=$MODEL_DIR \
    --output_dir=$OUTPUT_DIR \
    --mixed_precision=fp16 \
    --resolution=512 \
    --learning_rate=1e-6 --loss_type="huber" --ema_decay=0.95 --adam_weight_decay=0.0 \
    --max_train_steps=1000 \
    --max_train_samples=4000000 \
    --dataloader_num_workers=8 \
    --train_shards_path_or_url="pipe:curl -L -s https://huggingface.co/datasets/laion/conceptual-captions-12m-webdataset/resolve/main/data/{00000..01099}.tar?download=true" \
    --validation_steps=200 \
    --checkpointing_steps=200 --checkpoints_total_limit=10 \
    --train_batch_size=12 \
    --gradient_checkpointing --enable_xformers_memory_efficient_attention \
    --gradient_accumulation_steps=1 \
    --use_8bit_adam \
    --resume_from_checkpoint=latest \
    --report_to=wandb \
    --seed=453645634 \
    --push_to_hub

訓練完成後，您可以將新的 LCM 用於推理。

from diffusers import UNet2DConditionModel, DiffusionPipeline, LCMScheduler
import torch

unet = UNet2DConditionModel.from_pretrained("your-username/your-model", torch_dtype=torch.float16, variant="fp16")
pipeline = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", unet=unet, torch_dtype=torch.float16, variant="fp16")

pipeline.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
pipeline.to("cuda")

prompt = "sushi rolls in the form of panda heads, sushi platter"

image = pipeline(prompt, num_inference_steps=4, guidance_scale=1.0).images[0]

LoRA

LoRA 是一種顯著減少可訓練引數數量的訓練技術。因此，訓練速度更快，並且更容易儲存結果權重，因為它們要小得多（約 100MB）。使用 train_lcm_distill_lora_sd_wds.py 或 train_lcm_distill_lora_sdxl.wds.py 指令碼來使用 LoRA 進行訓練。

LoRA 訓練指令碼在 LoRA 訓練指南中進行了更詳細的討論。

Stable Diffusion XL

Stable Diffusion XL (SDXL) 是一個強大的文字到影像模型，可以生成高解析度影像，並在其架構中添加了第二個文字編碼器。使用 train_lcm_distill_sdxl_wds.py 指令碼來使用 LoRA 訓練 SDXL 模型。

SDXL 訓練指令碼在 SDXL 訓練指南中有更詳細的討論。

後續步驟

恭喜您蒸餾了一個 LCM 模型！要了解有關 LCM 的更多資訊，以下內容可能會有所幫助：

瞭解如何使用 LCM 進行推理，包括文字到影像、影像到影像以及與 LoRA 檢查點結合使用。
閱讀使用潛在一致性 LoRA 在 4 個步驟中完成 SDXL 部落格文章，瞭解有關 SDXL LCM-LoRA 的超快速推理、質量比較、基準測試等更多資訊。

< > 在 GitHub 上更新

←自定義擴散使用 DDPO 進行強化學習訓練→