Diffusers 文件

潛擴散

Diffusers

加入 Hugging Face 社群

並獲得增強的文件體驗

在模型、資料集和 Spaces 上進行協作

透過加速推理獲得更快的示例

切換文件主題

開始使用

潛在擴散

潛在擴散是由 Robin Rombach、Andreas Blattmann、Dominik Lorenz、Patrick Esser 和 Björn Ommer 在高解析度影像合成與潛在擴散模型中提出的。

論文摘要如下：

透過將影像形成過程分解為去噪自編碼器的順序應用，擴散模型（DMs）在影像資料及其他領域取得了最先進的合成結果。此外，它們的公式允許一種引導機制來控制影像生成過程而無需重新訓練。然而，由於這些模型通常直接在畫素空間中操作，因此強大 DM 的最佳化通常需要數百個 GPU 天，並且由於順序評估，推理成本很高。為了在有限的計算資源上實現 DM 訓練，同時保持其質量和靈活性，我們將其應用於強大預訓練自編碼器的潛在空間。與之前的工作相比，在此類表示上訓練擴散模型首次實現了複雜性降低和細節保留之間的近乎最佳點，極大地提高了視覺保真度。透過在模型架構中引入交叉注意力層，我們將擴散模型轉變為用於文字或邊界框等一般條件輸入以及以卷積方式實現高解析度合成的強大而靈活的生成器。我們的潛在擴散模型（LDMs）在影像修復方面達到了新的最先進水平，並在各種任務（包括無條件影像生成、語義場景合成和超解析度）上實現了極具競爭力的效能，同時與基於畫素的 DM 相比，顯著降低了計算要求。

原始程式碼庫可在 CompVis/latent-diffusion 找到。

務必檢視排程器指南，瞭解如何探索排程器速度和質量之間的權衡，並檢視跨管道重用元件部分，瞭解如何高效地將相同元件載入到多個管道中。

LDMTextToImagePipeline

類 diffusers.LDMTextToImagePipeline

< 源 >

( vqvae: typing.Union[diffusers.models.autoencoders.vq_model.VQModel, diffusers.models.autoencoders.autoencoder_kl.AutoencoderKL] bert: PreTrainedModel tokenizer: PreTrainedTokenizer unet: typing.Union[diffusers.models.unets.unet_2d.UNet2DModel, diffusers.models.unets.unet_2d_condition.UNet2DConditionModel] scheduler: typing.Union[diffusers.schedulers.scheduling_ddim.DDIMScheduler, diffusers.schedulers.scheduling_pndm.PNDMScheduler, diffusers.schedulers.scheduling_lms_discrete.LMSDiscreteScheduler] )

引數

vqvae (VQModel) — 用於將影像編碼和解碼為潛在表示的向量量化（VQ）模型。
bert (LDMBertModel) — 基於 BERT 的文字編碼器模型。
tokenizer (BertTokenizer) — 用於文字分詞的 BertTokenizer。
unet (UNet2DConditionModel) — 用於對編碼影像潛在表示進行去噪的 UNet2DConditionModel。
scheduler (SchedulerMixin) — 與 unet 結合使用的排程器，用於對編碼影像潛在表示進行去噪。可以是 DDIMScheduler、LMSDiscreteScheduler 或 PNDMScheduler 之一。

用於使用潛在擴散進行文字到影像生成的管道。

此模型繼承自 DiffusionPipeline。有關所有管道實現的通用方法（下載、儲存、在特定裝置上執行等），請檢視超類文件。

call

< 源 >

( prompt: typing.Union[str, typing.List[str]] height: typing.Optional[int] = None width: typing.Optional[int] = None num_inference_steps: typing.Optional[int] = 50 guidance_scale: typing.Optional[float] = 1.0 eta: typing.Optional[float] = 0.0 generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None latents: typing.Optional[torch.Tensor] = None output_type: typing.Optional[str] = 'pil' return_dict: bool = True **kwargs ) → ImagePipelineOutput 或 tuple

引數

prompt (str 或 List[str]) — 用於引導影像生成的提示或提示列表。
height (int, 可選, 預設為 self.unet.config.sample_size * self.vae_scale_factor) — 生成影像的畫素高度。
width (int, 可選, 預設為 self.unet.config.sample_size * self.vae_scale_factor) — 生成影像的畫素寬度。
num_inference_steps (int, 可選, 預設為 50) — 去噪步數。更多去噪步數通常會帶來更高質量的影像，但推理速度會變慢。
guidance_scale (float, 可選, 預設為 1.0) — 更高的引導比例值鼓勵模型生成與文字 prompt 緊密相關的影像，但影像質量會降低。當 guidance_scale > 1 時，啟用引導比例。
generator (torch.Generator, 可選) — 用於使生成具有確定性的 torch.Generator。
latents (torch.Tensor, 可選) — 從高斯分佈中取樣的預生成噪聲潛在表示，用作影像生成的輸入。可用於使用不同的提示調整相同的生成。如果未提供，則使用提供的隨機 generator 進行取樣生成潛在張量。
output_type (str, 可選, 預設為 "pil") — 生成影像的輸出格式。在 PIL.Image 或 np.array 之間選擇。
return_dict (bool, 可選, 預設為 True) — 是否返回 ImagePipelineOutput 而不是純元組。

ImagePipelineOutput 或 tuple

如果 return_dict 為 True，則返回 ImagePipelineOutput，否則返回一個 tuple，其中第一個元素是包含生成影像的列表。

用於生成的管道的呼叫函式。

示例

>>> from diffusers import DiffusionPipeline

>>> # load model and scheduler
>>> ldm = DiffusionPipeline.from_pretrained("CompVis/ldm-text2im-large-256")

>>> # run pipeline in inference (sample random noise and denoise)
>>> prompt = "A painting of a squirrel eating a burger"
>>> images = ldm([prompt], num_inference_steps=50, eta=0.3, guidance_scale=6).images

>>> # save images
>>> for idx, image in enumerate(images):
...     image.save(f"squirrel-{idx}.png")

LDMSuperResolutionPipeline

類 diffusers.LDMSuperResolutionPipeline

< 源 >

( vqvae: VQModel unet: UNet2DModel scheduler: typing.Union[diffusers.schedulers.scheduling_ddim.DDIMScheduler, diffusers.schedulers.scheduling_pndm.PNDMScheduler, diffusers.schedulers.scheduling_lms_discrete.LMSDiscreteScheduler, diffusers.schedulers.scheduling_euler_discrete.EulerDiscreteScheduler, diffusers.schedulers.scheduling_euler_ancestral_discrete.EulerAncestralDiscreteScheduler, diffusers.schedulers.scheduling_dpmsolver_multistep.DPMSolverMultistepScheduler] )

引數

vqvae (VQModel) — 用於將影像編碼和解碼為潛在表示的向量量化（VQ）模型。
unet (UNet2DModel) — 用於對編碼影像進行去噪的 UNet2DModel。
scheduler (SchedulerMixin) — 與 unet 結合使用的排程器，用於對編碼影像潛在表示進行去噪。可以是 DDIMScheduler、LMSDiscreteScheduler、EulerDiscreteScheduler、EulerAncestralDiscreteScheduler、DPMSolverMultistepScheduler 或 PNDMScheduler 之一。

用於使用潛在擴散進行影像超解析度的管道。

此模型繼承自 DiffusionPipeline。有關所有管道實現的通用方法（下載、儲存、在特定裝置上執行等），請檢視超類文件。

call

< 源 >

( image: typing.Union[torch.Tensor, PIL.Image.Image] = None batch_size: typing.Optional[int] = 1 num_inference_steps: typing.Optional[int] = 100 eta: typing.Optional[float] = 0.0 generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None output_type: typing.Optional[str] = 'pil' return_dict: bool = True ) → ImagePipelineOutput 或 tuple

引數

image (torch.Tensor 或 PIL.Image.Image) — 用作流程起點的影像或表示影像批次的張量。
batch_size (int, 可選, 預設為 1) — 要生成的影像數量。
num_inference_steps (int, 可選, 預設為 100) — 去噪步數。更多去噪步數通常會帶來更高質量的影像，但推理速度會變慢。
eta (float, 可選, 預設為 0.0) — 對應於 DDIM 論文中的引數 eta (η)。僅適用於 DDIMScheduler，在其他排程器中被忽略。
generator (torch.Generator 或 List[torch.Generator], 可選) — 用於使生成具有確定性的 torch.Generator。
output_type (str, 可選, 預設為 "pil") — 生成影像的輸出格式。在 PIL.Image 或 np.array 之間選擇。
return_dict (bool, 可選, 預設為 True) — 是否返回 ImagePipelineOutput 而不是純元組。

ImagePipelineOutput 或 tuple

如果 return_dict 為 True，則返回 ImagePipelineOutput；否則返回一個 tuple，其中第一個元素是生成的影像列表。

用於生成的管道的呼叫函式。

示例

>>> import requests
>>> from PIL import Image
>>> from io import BytesIO
>>> from diffusers import LDMSuperResolutionPipeline
>>> import torch

>>> # load model and scheduler
>>> pipeline = LDMSuperResolutionPipeline.from_pretrained("CompVis/ldm-super-resolution-4x-openimages")
>>> pipeline = pipeline.to("cuda")

>>> # let's download an  image
>>> url = (
...     "https://user-images.githubusercontent.com/38061659/199705896-b48e17b8-b231-47cd-a270-4ffa5a93fa3e.png"
... )
>>> response = requests.get(url)
>>> low_res_img = Image.open(BytesIO(response.content)).convert("RGB")
>>> low_res_img = low_res_img.resize((128, 128))

>>> # run pipeline in inference (sample random noise and denoise)
>>> upscaled_image = pipeline(low_res_img, num_inference_steps=100, eta=1).images[0]
>>> # save image
>>> upscaled_image.save("ldm_generated_image.png")

ImagePipelineOutput

class diffusers.ImagePipelineOutput

< 來源 >

( images: typing.Union[typing.List[PIL.Image.Image], numpy.ndarray] )

引數

images (List[PIL.Image.Image] 或 np.ndarray) — 長度為 batch_size 的去噪 PIL 影像列表，或形狀為 (batch_size, height, width, num_channels) 的 NumPy 陣列。

影像流水線的輸出類。

< > 在 GitHub 上更新

←潛在一致性模型 Latte→

Diffusers

潛在擴散

LDMTextToImagePipeline

類 diffusers.LDMTextToImagePipeline

__call__

LDMSuperResolutionPipeline

類 diffusers.LDMSuperResolutionPipeline

__call__

ImagePipelineOutput

class diffusers.ImagePipelineOutput

call

call