Diffusers

加入 Hugging Face 社群

並獲得增強的文件體驗

在模型、資料集和 Spaces 上進行協作

透過加速推理獲得更快的示例

切換文件主題

開始使用

無條件影像生成

無條件影像生成模型在訓練期間不以文字或影像為條件。它只生成與其訓練資料分佈相似的影像。

本指南將探討 train_unconditional.py 訓練指令碼，以幫助您熟悉它，以及如何根據您自己的用例對其進行調整。

在執行指令碼之前，請確保從原始碼安裝庫

git clone https://github.com/huggingface/diffusers
cd diffusers
pip install .

然後導航到包含訓練指令碼的示例資料夾並安裝所需的依賴項。

cd examples/unconditional_image_generation
pip install -r requirements.txt

🤗 Accelerate 是一個幫助您在多個 GPU/TPU 或混合精度下進行訓練的庫。它將根據您的硬體和環境自動配置您的訓練設定。請檢視 🤗 Accelerate 快速導覽以瞭解更多資訊。

初始化 🤗 Accelerate 環境

accelerate config

要設定預設的 🤗 Accelerate 環境而不選擇任何配置

accelerate config default

或者，如果您的環境不支援互動式 shell（例如筆記本），您可以使用

from accelerate.utils import write_basic_config

write_basic_config()

最後，如果您想在自己的資料集上訓練模型，請檢視建立訓練資料集指南，瞭解如何建立與訓練指令碼相容的資料集。

指令碼引數

以下部分重點介紹訓練指令碼中對於理解如何修改它很重要的部分，但它沒有詳細介紹指令碼的各個方面。如果您有興趣瞭解更多資訊，請隨時閱讀指令碼，如果您有任何問題或疑慮，請告訴我們。

訓練指令碼提供了許多引數來幫助您自定義訓練執行。所有引數及其描述都可以在 parse_args() 函式中找到。它為每個引數提供了預設值，例如訓練批處理大小和學習率，但您也可以在訓練命令中設定自己的值（如果需要）。

例如，為了使用 bf16 格式的混合精度加速訓練，請在訓練命令中新增 --mixed_precision 引數

accelerate launch train_unconditional.py \
  --mixed_precision="bf16"

一些基本且重要的引數包括

--dataset_name：Hub 上的資料集名稱或要訓練的資料集的本地路徑
--output_dir：訓練好的模型的儲存位置
--push_to_hub：是否將訓練好的模型推送到 Hub
--checkpointing_steps：模型訓練時儲存檢查點的頻率；如果訓練中斷，這很有用，您可以透過在訓練命令中新增 --resume_from_checkpoint 從該檢查點繼續訓練

帶上您的資料集，讓訓練指令碼處理其他一切！

訓練指令碼

資料集預處理和訓練迴圈的程式碼在 main() 函式中。如果您需要調整訓練指令碼，這就是您需要進行更改的地方。

如果您不提供模型配置，train_unconditional 指令碼將初始化一個 UNet2DModel。您可以在此處配置 UNet（如果需要）

model = UNet2DModel(
    sample_size=args.resolution,
    in_channels=3,
    out_channels=3,
    layers_per_block=2,
    block_out_channels=(128, 128, 256, 256, 512, 512),
    down_block_types=(
        "DownBlock2D",
        "DownBlock2D",
        "DownBlock2D",
        "DownBlock2D",
        "AttnDownBlock2D",
        "DownBlock2D",
    ),
    up_block_types=(
        "UpBlock2D",
        "AttnUpBlock2D",
        "UpBlock2D",
        "UpBlock2D",
        "UpBlock2D",
        "UpBlock2D",
    ),
)

接下來，指令碼會初始化一個排程器和最佳化器

# Initialize the scheduler
accepts_prediction_type = "prediction_type" in set(inspect.signature(DDPMScheduler.__init__).parameters.keys())
if accepts_prediction_type:
    noise_scheduler = DDPMScheduler(
        num_train_timesteps=args.ddpm_num_steps,
        beta_schedule=args.ddpm_beta_schedule,
        prediction_type=args.prediction_type,
    )
else:
    noise_scheduler = DDPMScheduler(num_train_timesteps=args.ddpm_num_steps, beta_schedule=args.ddpm_beta_schedule)

# Initialize the optimizer
optimizer = torch.optim.AdamW(
    model.parameters(),
    lr=args.learning_rate,
    betas=(args.adam_beta1, args.adam_beta2),
    weight_decay=args.adam_weight_decay,
    eps=args.adam_epsilon,
)

然後它載入資料集，您可以指定如何預處理它

dataset = load_dataset("imagefolder", data_dir=args.train_data_dir, cache_dir=args.cache_dir, split="train")

augmentations = transforms.Compose(
    [
        transforms.Resize(args.resolution, interpolation=transforms.InterpolationMode.BILINEAR),
        transforms.CenterCrop(args.resolution) if args.center_crop else transforms.RandomCrop(args.resolution),
        transforms.RandomHorizontalFlip() if args.random_flip else transforms.Lambda(lambda x: x),
        transforms.ToTensor(),
        transforms.Normalize([0.5], [0.5]),
    ]
)

最後，訓練迴圈處理其他所有事情，例如向影像新增噪聲、預測噪聲殘差、計算損失、在指定步驟儲存檢查點，以及儲存模型並將其推送到 Hub。如果您想了解訓練迴圈的工作原理，請檢視瞭解管道、模型和排程器教程，該教程詳細介紹了去噪過程的基本模式。

啟動指令碼

完成所有更改或對預設配置滿意後，您就可以啟動訓練指令碼了！🚀

在 4 塊 V100 GPU 上，一次完整的訓練執行需要 2 小時。

單 GPU

多 GPU

訓練指令碼在您的儲存庫中建立並儲存一個檢查點檔案。現在您可以載入並使用您訓練好的模型進行推理了

from diffusers import DiffusionPipeline
import torch

pipeline = DiffusionPipeline.from_pretrained("anton-l/ddpm-butterflies-128").to("cuda")
image = pipeline().images[0]

< > 在 GitHub 上更新

←將模型適配到新任務文字到影像→