Diffusers 文件

CMStochasticIterativeScheduler

Diffusers

加入 Hugging Face 社群

並獲得增強的文件體驗

在模型、資料集和 Spaces 上進行協作

透過加速推理獲得更快的示例

切換文件主題

開始使用

CMStochasticIterativeScheduler

Yang Song、Prafulla Dhariwal、Mark Chen 和 Ilya Sutskever 在 Consistency Models 中引入了一種多步和一步排程器（演算法 1），它能夠以一步或少量步驟生成高質量樣本。

論文摘要如下：

擴散模型顯著推動了影像、音訊和影片生成領域的發展，但它們依賴於迭代取樣過程，導致生成速度緩慢。為了克服這一限制，我們提出了**一致性模型**，這是一系列新的模型家族，它們透過將噪聲直接對映到資料來生成高質量樣本。它們天生支援快速一步生成，同時仍允許多步取樣以權衡計算和樣本質量。它們還支援零樣本資料編輯，例如影像修復、著色和超解析度，而無需顯式訓練這些任務。一致性模型可以透過蒸餾預訓練擴散模型或作為獨立的生成模型進行訓練。透過大量實驗，我們證明它們在一步和少量步驟取樣方面優於現有擴散模型的蒸餾技術，在 CIFAR-10 上的一步生成達到了 3.55 的新 SOTA FID，在 ImageNet 64x64 上達到了 6.20。當獨立訓練時，一致性模型成為一個新的生成模型家族，在 CIFAR-10、ImageNet 64x64 和 LSUN 256x256 等標準基準測試中，其效能優於現有的一步非對抗性生成模型。

原始程式碼庫可在 openai/consistency_models 中找到。

CMStochasticIterativeScheduler

class diffusers.CMStochasticIterativeScheduler

< source >

( num_train_timesteps: int = 40 sigma_min: float = 0.002 sigma_max: float = 80.0 sigma_data: float = 0.5 s_noise: float = 1.0 rho: float = 7.0 clip_denoised: bool = True )

引數

num_train_timesteps (int, 預設為 40) — 訓練模型的擴散步數。
sigma_min (float, 預設為 0.002) — Sigma 排程中的最小噪聲幅度。預設為原始實現中的 0.002。
sigma_max (float, 預設為 80.0) — Sigma 排程中的最大噪聲幅度。預設為原始實現中的 80.0。
sigma_data (float, 預設為 0.5) — EDM 論文中資料分佈的標準差。預設為原始實現中的 0.5。
s_noise (float, 預設為 1.0) — 額外的噪聲量，以抵消取樣過程中細節的丟失。合理範圍為 [1.000, 1.011]。預設為原始實現中的 1.0。
rho (float, 預設為 7.0) — 用於從 EDM 論文計算 Karras sigma 排程中的引數。預設為原始實現中的 7.0。
clip_denoised (bool, 預設為 True) — 是否將去噪輸出裁剪到 (-1, 1)。
timesteps (List 或 np.ndarray 或 torch.Tensor, 可選) — 可選指定的顯式時間步排程。時間步應按遞增順序排列。

一致性模型的多步和一步取樣。

此模型繼承自 SchedulerMixin 和 ConfigMixin。有關庫為所有排程器（例如載入和儲存）實現的通用方法，請檢視超類文件。

get_scalings_for_boundary_condition

< source >

( sigma ) → tuple

引數

sigma (torch.Tensor) — Karras sigma 排程中的當前 sigma。

元組

一個兩元素元組，其中 `c_skip`（當前樣本的權重）是第一個元素，`c_out`（一致性模型輸出的權重）是第二個元素。

獲取一致性模型引數化中用於強制邊界條件（來自論文附錄 C）的縮放。

`c_skip` 和 `c_out` 方程中的 `epsilon` 設定為 `sigma_min`。

scale_model_input

< source >

( sample: Tensor timestep: typing.Union[float, torch.Tensor] ) → torch.Tensor

引數

sample (torch.Tensor) — 輸入樣本。
timestep (float 或 torch.Tensor) — 擴散鏈中的當前時間步。

torch.Tensor

一個縮放後的輸入樣本。

將一致性模型輸入按 `(sigma**2 + sigma_data**2) ** 0.5` 進行縮放。

set_begin_index

< source >

( begin_index: int = 0 )

引數

begin_index (int) — 排程器的開始索引。

設定排程器的起始索引。此函式應在推理之前從管道中執行。

set_timesteps

< source >

( num_inference_steps: typing.Optional[int] = None device: typing.Union[str, torch.device] = None timesteps: typing.Optional[typing.List[int]] = None )

引數

num_inference_steps (int) — 使用預訓練模型生成樣本時使用的擴散步數。
device (str 或 torch.device, 可選) — 時間步應移動到的裝置。如果為 None，則時間步不移動。
timesteps (List[int], 可選) — 用於支援時間步之間任意間距的自定義時間步。如果為 None，則使用時間步之間等間距的預設時間步間距策略。如果傳遞了 timesteps，則 num_inference_steps 必須為 None。

設定用於擴散鏈的時間步（在推理之前執行）。

sigma_to_t

< source >

( sigmas: typing.Union[float, numpy.ndarray] ) → float 或 np.ndarray

引數

sigmas (float 或 np.ndarray) — 單個 Karras sigma 或 Karras sigma 陣列。

float 或 np.ndarray

縮放後的輸入時間步或縮放後的輸入時間步陣列。

從 Karras sigma 獲取縮放後的時間步，用於一致性模型的輸入。

步驟

< source >

( model_output: Tensor timestep: typing.Union[float, torch.Tensor] sample: Tensor generator: typing.Optional[torch._C.Generator] = None return_dict: bool = True ) → CMStochasticIterativeSchedulerOutput 或 tuple

引數

model_output (torch.Tensor) — 從學習到的擴散模型直接輸出。
timestep (float) — 擴散鏈中的當前時間步。
sample (torch.Tensor) — 擴散過程建立的當前樣本例項。
generator (torch.Generator, 可選) — 隨機數生成器。
return_dict (bool, 可選, 預設為 True) — 是否返回 CMStochasticIterativeSchedulerOutput 或 tuple。

CMStochasticIterativeSchedulerOutput 或 tuple

如果 return_dict 為 True，則返回 CMStochasticIterativeSchedulerOutput，否則返回一個元組，其中第一個元素是樣本張量。

透過逆轉 SDE 預測前一個時間步的樣本。此函式從學習到的模型輸出（通常是預測的噪聲）傳播擴散過程。

CMStochasticIterativeSchedulerOutput

class diffusers.schedulers.scheduling_consistency_models.CMStochasticIterativeSchedulerOutput

< source >

( prev_sample: Tensor )

引數

prev_sample (形狀為 (batch_size, num_channels, height, width) 的 torch.Tensor，用於影像) — 前一個時間步的計算樣本 (x_{t-1})。prev_sample 應在去噪迴圈中用作下一個模型輸入。

排程器 `step` 函式的輸出類。

< > 在 GitHub 上更新

←概述 CogVideoXDDIMScheduler→