Diffusers 文件

自定義擴散 (Custom Diffusion)

擴散器 (Diffusers)

加入 Hugging Face 社群

並獲得增強的文件體驗

在模型、資料集和 Spaces 上進行協作

透過加速推理獲得更快的示例

切換文件主題

開始使用

自定義擴散 (Custom Diffusion)

Custom Diffusion 是一種用於個性化影像生成模型的訓練技術。與 Textual Inversion、DreamBooth 和 LoRA 類似，Custom Diffusion 只需少量（約 4-5 個）示例影像。這項技術透過僅訓練交叉注意力層中的權重來實現，並使用一個特殊單詞來表示新學習的概念。Custom Diffusion 的獨特之處在於它還可以同時學習多個概念。

如果您在 vRAM 有限的 GPU 上進行訓練，您應該嘗試使用 --enable_xformers_memory_efficient_attention 啟用 xFormers，以實現更快、vRAM 需求更低（16GB）的訓練。為了節省更多記憶體，在訓練引數中新增 --set_grads_to_none 將梯度設定為 None 而非零（此選項可能會導致一些問題，如果您遇到任何問題，請嘗試刪除此引數）。

本指南將探討 train_custom_diffusion.py 指令碼，幫助您更熟悉它，以及如何將其應用於您自己的用例。

在執行指令碼之前，請確保從原始碼安裝庫

git clone https://github.com/huggingface/diffusers
cd diffusers
pip install .

導航到包含訓練指令碼的示例資料夾並安裝所需的依賴項

cd examples/custom_diffusion
pip install -r requirements.txt
pip install clip-retrieval

🤗 Accelerate 是一個幫助您在多個 GPU/TPU 上或使用混合精度進行訓練的庫。它將根據您的硬體和環境自動配置您的訓練設定。請檢視 🤗 Accelerate 快速入門以瞭解更多資訊。

初始化 🤗 Accelerate 環境

accelerate config

要設定預設的 🤗 Accelerate 環境而不選擇任何配置

accelerate config default

或者如果您的環境不支援互動式 shell（例如筆記本），您可以使用

from accelerate.utils import write_basic_config

write_basic_config()

最後，如果您想在自己的資料集上訓練模型，請檢視建立訓練資料集指南，瞭解如何建立與訓練指令碼相容的資料集。

以下部分重點介紹了訓練指令碼中對於理解如何修改它很重要的部分，但並未詳細涵蓋指令碼的各個方面。如果您有興趣瞭解更多資訊，請隨時通讀指令碼，並告訴我們您是否有任何問題或疑慮。

指令碼引數

訓練指令碼包含所有引數，可幫助您自定義訓練執行。這些引數在 parse_args() 函式中找到。該函式帶有預設值，但您也可以在訓練命令中設定自己的值。

例如，更改輸入影像的解析度

accelerate launch train_custom_diffusion.py \
  --resolution=256

許多基本引數已在 DreamBooth 訓練指南中介紹，因此本指南重點介紹 Custom Diffusion 特有的引數。

--freeze_model：凍結交叉注意力層中的鍵和值引數；預設為 crossattn_kv，但您可以將其設定為 crossattn 以訓練交叉注意力層中的所有引數。
--concepts_list：要學習多個概念，請提供一個包含概念的 JSON 檔案的路徑。
--modifier_token：用於表示學習到的概念的特殊單詞。
--initializer_token：用於初始化 modifier_token 嵌入的特殊單詞。

先驗保留損失 (Prior preservation loss)

先驗保留損失是一種利用模型自身生成的樣本來幫助它學習如何生成更多樣化影像的方法。由於這些生成的樣本影像與您提供的影像屬於同一類別，它們有助於模型保留其已學習到的關於該類別的資訊，以及如何利用這些資訊來建立新的組合。

先驗保留損失的許多引數已在 DreamBooth 訓練指南中介紹。

正則化 (Regularization)

Custom Diffusion 包括使用一小組真實影像訓練目標影像，以防止過擬合。正如您所想象的，當您只訓練少量影像時，這很容易發生！使用 clip_retrieval 下載 200 張真實影像。class_prompt 應與目標影像屬於同一類別。這些影像儲存在 class_data_dir 中。

python retrieve.py --class_prompt cat --class_data_dir real_reg/samples_cat --num_class_images 200

要啟用正則化，請新增以下引數

--with_prior_preservation：是否使用先驗保留損失。
--prior_loss_weight：控制先驗保留損失對模型的影響。
--real_prior：是否使用一小組真實影像來防止過擬合。

accelerate launch train_custom_diffusion.py \
  --with_prior_preservation \
  --prior_loss_weight=1.0 \
  --class_data_dir="./real_reg/samples_cat" \
  --class_prompt="cat" \
  --real_prior=True \

訓練指令碼

Custom Diffusion 訓練指令碼中的許多程式碼與 DreamBooth 指令碼相似。本指南將重點介紹與 Custom Diffusion 相關的程式碼。

Custom Diffusion 訓練指令碼有兩個資料集類

CustomDiffusionDataset：預處理影像、類別影像和提示詞以進行訓練。
PromptDataset：準備用於生成類別影像的提示詞。

接下來，modifier_token 被新增到分詞器中，轉換為 token id，並且 token 嵌入的大小被調整以適應新的 modifier_token。然後，modifier_token 嵌入用 initializer_token 的嵌入進行初始化。文字編碼器中的所有引數都被凍結，除了 token 嵌入，因為這是模型試圖學習與概念關聯的內容。

params_to_freeze = itertools.chain(
    text_encoder.text_model.encoder.parameters(),
    text_encoder.text_model.final_layer_norm.parameters(),
    text_encoder.text_model.embeddings.position_embedding.parameters(),
)
freeze_params(params_to_freeze)

現在您需要將 Custom Diffusion 權重新增到注意力層。這是正確設定注意力權重形狀和大小，以及在每個 UNet 塊中設定適當數量的注意力處理器非常重要的一步。

st = unet.state_dict()
for name, _ in unet.attn_processors.items():
    cross_attention_dim = None if name.endswith("attn1.processor") else unet.config.cross_attention_dim
    if name.startswith("mid_block"):
        hidden_size = unet.config.block_out_channels[-1]
    elif name.startswith("up_blocks"):
        block_id = int(name[len("up_blocks.")])
        hidden_size = list(reversed(unet.config.block_out_channels))[block_id]
    elif name.startswith("down_blocks"):
        block_id = int(name[len("down_blocks.")])
        hidden_size = unet.config.block_out_channels[block_id]
    layer_name = name.split(".processor")[0]
    weights = {
        "to_k_custom_diffusion.weight": st[layer_name + ".to_k.weight"],
        "to_v_custom_diffusion.weight": st[layer_name + ".to_v.weight"],
    }
    if train_q_out:
        weights["to_q_custom_diffusion.weight"] = st[layer_name + ".to_q.weight"]
        weights["to_out_custom_diffusion.0.weight"] = st[layer_name + ".to_out.0.weight"]
        weights["to_out_custom_diffusion.0.bias"] = st[layer_name + ".to_out.0.bias"]
    if cross_attention_dim is not None:
        custom_diffusion_attn_procs[name] = attention_class(
            train_kv=train_kv,
            train_q_out=train_q_out,
            hidden_size=hidden_size,
            cross_attention_dim=cross_attention_dim,
        ).to(unet.device)
        custom_diffusion_attn_procs[name].load_state_dict(weights)
    else:
        custom_diffusion_attn_procs[name] = attention_class(
            train_kv=False,
            train_q_out=False,
            hidden_size=hidden_size,
            cross_attention_dim=cross_attention_dim,
        )
del st
unet.set_attn_processor(custom_diffusion_attn_procs)
custom_diffusion_layers = AttnProcsLayers(unet.attn_processors)

最佳化器被初始化，用於更新交叉注意力層引數。

optimizer = optimizer_class(
    itertools.chain(text_encoder.get_input_embeddings().parameters(), custom_diffusion_layers.parameters())
    if args.modifier_token is not None
    else custom_diffusion_layers.parameters(),
    lr=args.learning_rate,
    betas=(args.adam_beta1, args.adam_beta2),
    weight_decay=args.adam_weight_decay,
    eps=args.adam_epsilon,
)

在訓練迴圈中，重要的是隻更新您正在學習的概念的嵌入。這意味著將所有其他 token 嵌入的梯度設定為零。

if args.modifier_token is not None:
    if accelerator.num_processes > 1:
        grads_text_encoder = text_encoder.module.get_input_embeddings().weight.grad
    else:
        grads_text_encoder = text_encoder.get_input_embeddings().weight.grad
    index_grads_to_zero = torch.arange(len(tokenizer)) != modifier_token_id[0]
    for i in range(len(modifier_token_id[1:])):
        index_grads_to_zero = index_grads_to_zero & (
            torch.arange(len(tokenizer)) != modifier_token_id[i]
        )
    grads_text_encoder.data[index_grads_to_zero, :] = grads_text_encoder.data[
        index_grads_to_zero, :
    ].fill_(0)

啟動指令碼

完成所有更改或對預設配置滿意後，您就可以啟動訓練指令碼了！🚀

在本指南中，您將下載並使用這些示例貓影像。您也可以根據需要建立和使用自己的資料集（請參閱建立訓練資料集指南）。

將環境變數 MODEL_NAME 設定為 Hub 上的模型 ID 或本地模型的路徑，將 INSTANCE_DIR 設定為您剛剛下載貓影像的路徑，並將 OUTPUT_DIR 設定為您要儲存模型的路徑。您將使用 <new1> 作為特殊詞，將新學習的嵌入與其關聯。指令碼會建立並儲存模型檢查點和一個 pytorch_custom_diffusion_weights.bin 檔案到您的儲存庫中。

要使用 Weights and Biases 監控訓練進度，請在訓練命令中新增 --report_to=wandb 引數，並使用 --validation_prompt 指定驗證提示詞。這對於除錯和儲存中間結果很有用。

如果您正在訓練人臉，Custom Diffusion 團隊發現以下引數效果良好

--learning_rate=5e-6
--max_train_steps 可以在 1000 到 2000 之間
--freeze_model=crossattn
至少使用 15-20 張影像進行訓練

單個概念

多個概念

訓練完成後，您可以使用新的 Custom Diffusion 模型進行推理。

單個概念

多個概念

下一步

恭喜您使用 Custom Diffusion 訓練了一個模型！🎉 要了解更多資訊

閱讀《Text-to-Image 擴散的多概念定製》部落格文章，瞭解 Custom Diffusion 團隊的實驗結果詳情。

< > 在 GitHub 上更新

←LoRA 潛在一致性蒸餾→