Diffusers

加入 Hugging Face 社群

並獲得增強的文件體驗

在模型、資料集和 Spaces 上進行協作

透過加速推理獲得更快的示例

切換文件主題

開始使用

一致性模型

一致性模型由 Yang Song、Prafulla Dhariwal、Mark Chen 和 Ilya Sutskever 在一致性模型中提出。

論文摘要如下：

擴散模型在影像、音訊和影片生成領域取得了顯著進展，但它們依賴於迭代取樣過程，導致生成速度較慢。為了克服這一限制，我們提出了一致性模型，這是一個新的模型家族，透過直接將噪聲對映到資料來生成高質量樣本。它們天生支援快速一步生成，同時仍允許多步取樣以權衡計算和樣本質量。它們還支援零樣本資料編輯，如影像修復、著色和超解析度，而無需對這些任務進行顯式訓練。一致性模型可以透過蒸餾預訓練擴散模型進行訓練，也可以作為獨立的生成模型進行訓練。透過廣泛的實驗，我們證明它們在一步和少量步取樣中優於現有擴散模型的蒸餾技術，在一步生成方面，在 CIFAR-10 上實現了 3.55 的新 SOTA FID，在 ImageNet 64x64 上實現了 6.20。當單獨訓練時，一致性模型成為一個全新的生成模型家族，在 CIFAR-10、ImageNet 64x64 和 LSUN 256x256 等標準基準上，其效能可以超越現有的一步非對抗性生成模型。

原始程式碼庫可在 openai/consistency_models 找到，其他檢查點可在 openai 獲取。

該管道由 dg845 和 ayushtues 貢獻。❤️

技巧

為了進一步加速，使用 torch.compile 在 <1 秒內生成多張影像

  import torch
  from diffusers import ConsistencyModelPipeline

  device = "cuda"
  # Load the cd_bedroom256_lpips checkpoint.
  model_id_or_path = "openai/diffusers-cd_bedroom256_lpips"
  pipe = ConsistencyModelPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
  pipe.to(device)

+ pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)

  # Multistep sampling
  # Timesteps can be explicitly specified; the particular timesteps below are from the original GitHub repo:
  # https://github.com/openai/consistency_models/blob/main/scripts/launch.sh#L83
  for _ in range(10):
      image = pipe(timesteps=[17, 0]).images[0]
      image.show()

ConsistencyModelPipeline

class diffusers.ConsistencyModelPipeline

< 來源 >

( unet: UNet2DModel scheduler: CMStochasticIterativeScheduler )

引數

unet (UNet2DModel) — 用於對編碼影像潛變數進行去噪的 UNet2DModel。
scheduler (SchedulerMixin) — 與 unet 結合使用，用於對編碼影像潛變數進行去噪的排程器。目前僅與 CMStochasticIterativeScheduler 相容。

用於無條件或類條件影像生成的管道。

此模型繼承自 DiffusionPipeline。請查閱超類文件，瞭解所有管道實現的通用方法（下載、儲存、在特定裝置上執行等）。

call

< 來源 >

( batch_size: int = 1 class_labels: typing.Union[torch.Tensor, typing.List[int], int, NoneType] = None num_inference_steps: int = 1 timesteps: typing.List[int] = None generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None latents: typing.Optional[torch.Tensor] = None output_type: typing.Optional[str] = 'pil' return_dict: bool = True callback: typing.Optional[typing.Callable[[int, int, torch.Tensor], NoneType]] = None callback_steps: int = 1 ) → ImagePipelineOutput 或 tuple

引數

batch_size (int, 可選, 預設為 1) — 要生成的影像數量。
class_labels (torch.Tensor 或 List[int] 或 int, 可選) — 用於條件化類條件一致性模型的可選類標籤。如果模型不是類條件的，則不使用。
num_inference_steps (int, 可選, 預設為 1) — 去噪步數。更多去噪步數通常會帶來更高質量的影像，但推理速度會變慢。
timesteps (List[int], 可選) — 用於去噪過程的自定義時間步。如果未定義，則使用等間距的 num_inference_steps 時間步。必須按降序排列。
generator (torch.Generator, 可選) — 用於使生成具有確定性的 torch.Generator。
latents (torch.Tensor, 可選) — 從高斯分佈取樣的預生成噪聲潛變數，用作影像生成的輸入。可用於使用不同的提示調整相同的生成。如果未提供，則使用提供的隨機 generator 取樣生成潛變數張量。
output_type (str, 可選, 預設為 "pil") — 生成影像的輸出格式。可選擇 PIL.Image 或 np.array。
return_dict (bool, 可選, 預設為 True) — 是否返回 ImagePipelineOutput 而不是普通元組。
callback (Callable, 可選) — 推理過程中每隔 callback_steps 步呼叫一次的函式。該函式以以下引數呼叫：callback(step: int, timestep: int, latents: torch.Tensor)。
callback_steps (int, 可選, 預設為 1) — 呼叫 callback 函式的頻率。如果未指定，則在每一步都呼叫回撥。

ImagePipelineOutput 或 tuple

如果 return_dict 為 True，則返回 ImagePipelineOutput，否則返回一個 tuple，其中第一個元素是生成的影像列表。

示例

>>> import torch

>>> from diffusers import ConsistencyModelPipeline

>>> device = "cuda"
>>> # Load the cd_imagenet64_l2 checkpoint.
>>> model_id_or_path = "openai/diffusers-cd_imagenet64_l2"
>>> pipe = ConsistencyModelPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
>>> pipe.to(device)

>>> # Onestep Sampling
>>> image = pipe(num_inference_steps=1).images[0]
>>> image.save("cd_imagenet64_l2_onestep_sample.png")

>>> # Onestep sampling, class-conditional image generation
>>> # ImageNet-64 class label 145 corresponds to king penguins
>>> image = pipe(num_inference_steps=1, class_labels=145).images[0]
>>> image.save("cd_imagenet64_l2_onestep_sample_penguin.png")

>>> # Multistep sampling, class-conditional image generation
>>> # Timesteps can be explicitly specified; the particular timesteps below are from the original GitHub repo:
>>> # https://github.com/openai/consistency_models/blob/main/scripts/launch.sh#L77
>>> image = pipe(num_inference_steps=None, timesteps=[22, 0], class_labels=145).images[0]
>>> image.save("cd_imagenet64_l2_multistep_sample_penguin.png")

ImagePipelineOutput

class diffusers.ImagePipelineOutput

< 來源 >

( images: typing.Union[typing.List[PIL.Image.Image], numpy.ndarray] )

引數

images (List[PIL.Image.Image] 或 np.ndarray) — 長度為 batch_size 的去噪 PIL 影像列表，或形狀為 (batch_size, height, width, num_channels) 的 NumPy 陣列。

影像流水線的輸出類。

< > 在 GitHub 上更新

←ConsisID ControlNet→

Diffusers

一致性模型

技巧

ConsistencyModelPipeline

class diffusers.ConsistencyModelPipeline

__call__

ImagePipelineOutput

class diffusers.ImagePipelineOutput

call