擴散器

( prior: PriorTransformer image_encoder: CLIPVisionModelWithProjection text_encoder: CLIPTextModelWithProjection tokenizer: CLIPTokenizer scheduler: UnCLIPScheduler image_processor: CLIPImageProcessor )

引數

prior (PriorTransformer) — 規範的 unCLIP 先驗，用於從文字嵌入中近似影像嵌入。
image_encoder (CLIPVisionModelWithProjection) — 凍結的影像編碼器。
text_encoder (CLIPTextModelWithProjection) — 凍結的文字編碼器。
tokenizer (CLIPTokenizer) — CLIPTokenizer 類的分詞器。
scheduler (UnCLIPScheduler) — 與 prior 結合使用以生成影像嵌入的排程器。

用於生成 Kandinsky 影像先驗的管道

此模型繼承自 DiffusionPipeline。有關庫為所有管道實現的通用方法（例如下載或儲存、在特定裝置上執行等），請檢視超類文件。

call

( prompt: typing.Union[str, typing.List[str]] negative_prompt: typing.Union[str, typing.List[str], NoneType] = None num_images_per_prompt: int = 1 num_inference_steps: int = 25 generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None latents: typing.Optional[torch.Tensor] = None guidance_scale: float = 4.0 output_type: typing.Optional[str] = 'pt' return_dict: bool = True ) → KandinskyPriorPipelineOutput 或 tuple

引數

prompt (str 或 List[str]) — 用於指導影像生成的提示或提示列表。
negative_prompt (str 或 List[str], 可選) — 不用於指導影像生成的提示或提示列表。在使用引導時（即 guidance_scale 小於 1 時）將被忽略。
num_images_per_prompt (int, 可選, 預設為 1) — 每個提示生成的影像數量。
num_inference_steps (int, 可選, 預設為 25) — 去噪步數。更多去噪步數通常會帶來更高質量的影像，但推理速度會變慢。
generator (torch.Generator 或 List[torch.Generator], 可選) — 一個或多個 torch 生成器，用於使生成具有確定性。
latents (torch.Tensor, 可選) — 預先生成的噪聲潛在變數，從高斯分佈中取樣，用作影像生成的輸入。可用於使用不同提示微調相同的生成。如果未提供，將使用提供的隨機 generator 取樣生成一個潛在張量。
guidance_scale (float, 可選, 預設為 4.0) — 無分類器擴散引導中定義的引導比例。guidance_scale 定義為 Imagen Paper 中公式 2 的 w。透過設定 guidance_scale > 1 啟用引導比例。較高的引導比例鼓勵生成與文字 prompt 緊密相關的影像，通常以犧牲較低影像質量為代價。
output_type (str, 可選, 預設為 "pt") — 生成影像的輸出格式。在 "np" (np.array) 或 "pt" (torch.Tensor) 之間選擇。
return_dict (bool, 可選, 預設為 True) — 是否返回 ImagePipelineOutput 而不是普通的元組。

KandinskyPriorPipelineOutput 或 tuple

呼叫管道進行生成時呼叫的函式。

示例

>>> from diffusers import KandinskyPipeline, KandinskyPriorPipeline
>>> import torch

>>> pipe_prior = KandinskyPriorPipeline.from_pretrained("kandinsky-community/kandinsky-2-1-prior")
>>> pipe_prior.to("cuda")

>>> prompt = "red cat, 4k photo"
>>> out = pipe_prior(prompt)
>>> image_emb = out.image_embeds
>>> negative_image_emb = out.negative_image_embeds

>>> pipe = KandinskyPipeline.from_pretrained("kandinsky-community/kandinsky-2-1")
>>> pipe.to("cuda")

>>> image = pipe(
...     prompt,
...     image_embeds=image_emb,
...     negative_image_embeds=negative_image_emb,
...     height=768,
...     width=768,
...     num_inference_steps=100,
... ).images

>>> image[0].save("cat.png")

interpolate

( images_and_prompts: typing.List[typing.Union[str, PIL.Image.Image, torch.Tensor]] weights: typing.List[float] num_images_per_prompt: int = 1 num_inference_steps: int = 25 generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None latents: typing.Optional[torch.Tensor] = None negative_prior_prompt: typing.Optional[str] = None negative_prompt: str = '' guidance_scale: float = 4.0 device = None ) → KandinskyPriorPipelineOutput 或 tuple

引數

images_and_prompts (List[Union[str, PIL.Image.Image, torch.Tensor]]) — 用於指導影像生成的提示和影像列表。
weights — (List[float]): images_and_prompts 中每個條件的權重列表
num_images_per_prompt (int, 可選, 預設為 1) — 每個提示生成的影像數量。
num_inference_steps (int, 可選, 預設為 25) — 去噪步數。更多去噪步數通常會帶來更高質量的影像，但推理速度會變慢。
generator (torch.Generator 或 List[torch.Generator], 可選) — 一個或多個 torch 生成器，用於使生成具有確定性。
latents (torch.Tensor, 可選) — 預先生成的噪聲潛在變數，從高斯分佈中取樣，用作影像生成的輸入。可用於使用不同提示微調相同的生成。如果未提供，將使用提供的隨機 generator 取樣生成一個潛在張量。
negative_prior_prompt (str, 可選) — 不用於指導先驗擴散過程的提示。在使用引導時（即 guidance_scale 小於 1 時）將被忽略。
negative_prompt (str 或 List[str], 可選) — 不用於指導影像生成的提示或提示列表。在使用引導時（即 guidance_scale 小於 1 時）將被忽略。
guidance_scale (float, 可選, 預設為 4.0) — 無分類器擴散引導中定義的引導比例。guidance_scale 定義為 Imagen Paper 中公式 2 的 w。透過設定 guidance_scale > 1 啟用引導比例。較高的引導比例鼓勵生成與文字 prompt 緊密相關的影像，通常以犧牲較低影像質量為代價。

KandinskyPriorPipelineOutput 或 tuple

當使用先驗管道進行插值時呼叫的函式。

示例

>>> from diffusers import KandinskyPriorPipeline, KandinskyPipeline
>>> from diffusers.utils import load_image
>>> import PIL

>>> import torch
>>> from torchvision import transforms

>>> pipe_prior = KandinskyPriorPipeline.from_pretrained(
...     "kandinsky-community/kandinsky-2-1-prior", torch_dtype=torch.float16
... )
>>> pipe_prior.to("cuda")

>>> img1 = load_image(
...     "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main"
...     "/kandinsky/cat.png"
... )

>>> img2 = load_image(
...     "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main"
...     "/kandinsky/starry_night.jpeg"
... )

>>> images_texts = ["a cat", img1, img2]
>>> weights = [0.3, 0.3, 0.4]
>>> image_emb, zero_image_emb = pipe_prior.interpolate(images_texts, weights)

>>> pipe = KandinskyPipeline.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16)
>>> pipe.to("cuda")

>>> image = pipe(
...     "",
...     image_embeds=image_emb,
...     negative_image_embeds=zero_image_emb,
...     height=768,
...     width=768,
...     num_inference_steps=150,
... ).images[0]

>>> image.save("starry_cat.png")

KandinskyPipeline

class diffusers.KandinskyPipeline

引數

text_encoder (MultilingualCLIP) — 凍結的文字編碼器。
tokenizer (XLMRobertaTokenizer) — 類的分詞器
scheduler (Union[DDIMScheduler,DDPMScheduler]) — 與 unet 結合使用的排程器，用於生成影像潛在變數。
unet (UNet2DConditionModel) — 用於對影像嵌入進行去噪的條件 U-Net 架構。
movq (VQModel) — 用於從潛在變數生成影像的 MoVQ 解碼器。

使用 Kandinsky 進行文字到影像生成的 Pipeline

此模型繼承自 DiffusionPipeline。有關庫為所有管道實現的通用方法（例如下載或儲存、在特定裝置上執行等），請檢視超類文件。

call

( prompt: typing.Union[str, typing.List[str]] image_embeds: typing.Union[torch.Tensor, typing.List[torch.Tensor]] negative_image_embeds: typing.Union[torch.Tensor, typing.List[torch.Tensor]] negative_prompt: typing.Union[str, typing.List[str], NoneType] = None height: int = 512 width: int = 512 num_inference_steps: int = 100 guidance_scale: float = 4.0 num_images_per_prompt: int = 1 generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None latents: typing.Optional[torch.Tensor] = None output_type: typing.Optional[str] = 'pil' callback: typing.Optional[typing.Callable[[int, int, torch.Tensor], NoneType]] = None callback_steps: int = 1 return_dict: bool = True ) → ImagePipelineOutput 或 tuple

引數

prompt (str 或 List[str]) — 用於指導影像生成的提示或提示列表。
image_embeds (torch.Tensor 或 List[torch.Tensor]) — 用於文字提示的剪輯影像嵌入，將用於條件影像生成。
negative_image_embeds (torch.Tensor 或 List[torch.Tensor]) — 用於否定文字提示的剪輯影像嵌入，將用於條件影像生成。
negative_prompt (str 或 List[str], 可選) — 不用於指導影像生成的提示或提示列表。當不使用指導時（即，如果 guidance_scale 小於 1 時被忽略）。
height (int, 可選, 預設為 512) — 生成影像的高度（畫素）。
width (int, 可選, 預設為 512) — 生成影像的寬度（畫素）。
num_inference_steps (int, 可選, 預設為 100) — 去噪步數。更多的去噪步數通常會帶來更高質量的影像，但推理速度會變慢。
guidance_scale (float, 可選, 預設為 4.0) — 如 Classifier-Free Diffusion Guidance 中所定義的指導比例。guidance_scale 被定義為 Imagen Paper 中公式 2 的 w。透過將 guidance_scale > 1 設定為啟用指導比例。較高的指導比例會鼓勵生成與文字 prompt 緊密相關的影像，通常會犧牲影像質量。
num_images_per_prompt (int, 可選, 預設為 1) — 每個提示生成的影像數量。
generator (torch.Generator 或 List[torch.Generator], 可選) — 一個或多個 torch 生成器，用於使生成具有確定性。
latents (torch.Tensor, 可選) — 預生成的帶噪聲的潛在變數，從高斯分佈中取樣，用作影像生成的輸入。可用於使用不同提示調整相同的生成。如果未提供，將使用提供的隨機 generator 取樣生成一個潛在變數張量。
output_type (str, 可選, 預設為 "pil") — 生成影像的輸出格式。可選擇："pil" (PIL.Image.Image)、"np" (np.array) 或 "pt" (torch.Tensor)。
callback (Callable, 可選) — 在推理過程中每 callback_steps 步呼叫的函式。該函式將使用以下引數呼叫：callback(step: int, timestep: int, latents: torch.Tensor)。
callback_steps (int, 可選, 預設為 1) — 呼叫 callback 函式的頻率。如果未指定，則在每一步都呼叫回撥。
return_dict (bool, 可選, 預設為 True) — 是否返回 ImagePipelineOutput 而不是普通元組。

ImagePipelineOutput 或 tuple

呼叫管道進行生成時呼叫的函式。

示例

>>> from diffusers import KandinskyPipeline, KandinskyPriorPipeline
>>> import torch

>>> pipe_prior = KandinskyPriorPipeline.from_pretrained("kandinsky-community/Kandinsky-2-1-prior")
>>> pipe_prior.to("cuda")

>>> prompt = "red cat, 4k photo"
>>> out = pipe_prior(prompt)
>>> image_emb = out.image_embeds
>>> negative_image_emb = out.negative_image_embeds

>>> pipe = KandinskyPipeline.from_pretrained("kandinsky-community/kandinsky-2-1")
>>> pipe.to("cuda")

>>> image = pipe(
...     prompt,
...     image_embeds=image_emb,
...     negative_image_embeds=negative_image_emb,
...     height=768,
...     width=768,
...     num_inference_steps=100,
... ).images

>>> image[0].save("cat.png")

KandinskyCombinedPipeline

class diffusers.KandinskyCombinedPipeline

( text_encoder: MultilingualCLIP tokenizer: XLMRobertaTokenizer unet: UNet2DConditionModel scheduler: typing.Union[diffusers.schedulers.scheduling_ddim.DDIMScheduler, diffusers.schedulers.scheduling_ddpm.DDPMScheduler] movq: VQModel prior_prior: PriorTransformer prior_image_encoder: CLIPVisionModelWithProjection prior_text_encoder: CLIPTextModelWithProjection prior_tokenizer: CLIPTokenizer prior_scheduler: UnCLIPScheduler prior_image_processor: CLIPImageProcessor )

引數

text_encoder (MultilingualCLIP) — 凍結的文字編碼器。
tokenizer (XLMRobertaTokenizer) — 類的分詞器
scheduler (Union[DDIMScheduler,DDPMScheduler]) — 與 unet 結合使用的排程器，用於生成影像潛在變數。
unet (UNet2DConditionModel) — 用於對影像嵌入進行去噪的條件 U-Net 架構。
movq (VQModel) — 用於從潛在變數生成影像的 MoVQ 解碼器。
prior_prior (PriorTransformer) — 用於從文字嵌入中近似影像嵌入的規範 unCLIP 先驗。
prior_image_encoder (CLIPVisionModelWithProjection) — 凍結的影像編碼器。
prior_text_encoder (CLIPTextModelWithProjection) — 凍結的文字編碼器。
prior_tokenizer (CLIPTokenizer) — CLIPTokenizer 類的分詞器。
prior_scheduler (UnCLIPScheduler) — 與 prior 結合使用的排程器，用於生成影像嵌入。

Kandinsky 文字到影像生成組合 Pipeline

此模型繼承自 DiffusionPipeline。有關庫為所有管道實現的通用方法（例如下載或儲存、在特定裝置上執行等），請檢視超類文件。

call

( prompt: typing.Union[str, typing.List[str]] negative_prompt: typing.Union[str, typing.List[str], NoneType] = None num_inference_steps: int = 100 guidance_scale: float = 4.0 num_images_per_prompt: int = 1 height: int = 512 width: int = 512 prior_guidance_scale: float = 4.0 prior_num_inference_steps: int = 25 generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None latents: typing.Optional[torch.Tensor] = None output_type: typing.Optional[str] = 'pil' callback: typing.Optional[typing.Callable[[int, int, torch.Tensor], NoneType]] = None callback_steps: int = 1 return_dict: bool = True ) → ImagePipelineOutput 或 tuple

引數

prompt (str 或 List[str]) — 用於指導影像生成的提示或提示列表。
negative_prompt (str 或 List[str], 可選) — 不用於指導影像生成的提示或提示列表。當不使用指導時（即，如果 guidance_scale 小於 1 時被忽略）。
num_images_per_prompt (int, 可選, 預設為 1) — 每個提示生成的影像數量。
num_inference_steps (int, 可選, 預設為 100) — 去噪步數。更多的去噪步數通常會帶來更高質量的影像，但推理速度會變慢。
height (int, 可選, 預設為 512) — 生成影像的高度（畫素）。
width (int, 可選, 預設為 512) — 生成影像的寬度（畫素）。
prior_guidance_scale (float, 可選, 預設為 4.0) — 如 Classifier-Free Diffusion Guidance 中定義的指導比例。guidance_scale 被定義為 Imagen Paper 中公式 2 的 w。透過將 guidance_scale > 1 設定為啟用指導比例。較高的指導比例會鼓勵生成與文字 prompt 緊密相關的影像，通常會犧牲影像質量。
prior_num_inference_steps (int, 可選, 預設為 100) — 去噪步數。更多的去噪步數通常會帶來更高質量的影像，但推理速度會變慢。
guidance_scale (float, 可選, 預設為 4.0) — 如 Classifier-Free Diffusion Guidance 中定義的指導比例。guidance_scale 被定義為 Imagen Paper 中公式 2 的 w。透過將 guidance_scale > 1 設定為啟用指導比例。較高的指導比例會鼓勵生成與文字 prompt 緊密相關的影像，通常會犧牲影像質量。
generator (torch.Generator 或 List[torch.Generator], 可選) — 一個或多個 torch 生成器，用於使生成具有確定性。
latents (torch.Tensor, 可選) — 預生成的帶噪聲的潛在變數，從高斯分佈中取樣，用作影像生成的輸入。可用於使用不同提示調整相同的生成。如果未提供，將使用提供的隨機 generator 取樣生成一個潛在變數張量。
output_type (str, 可選, 預設為 "pil") — 生成影像的輸出格式。可選擇："pil" (PIL.Image.Image)、"np" (np.array) 或 "pt" (torch.Tensor)。
callback (Callable, 可選) — 在推理過程中每 callback_steps 步呼叫的函式。該函式將使用以下引數呼叫：callback(step: int, timestep: int, latents: torch.Tensor)。
callback_steps (int, 可選, 預設為 1) — 呼叫 callback 函式的頻率。如果未指定，則在每一步都呼叫回撥。
return_dict (bool, 可選, 預設為 True) — 是否返回 ImagePipelineOutput 而不是普通元組。

ImagePipelineOutput 或 tuple

呼叫管道進行生成時呼叫的函式。

示例

from diffusers import AutoPipelineForText2Image
import torch

pipe = AutoPipelineForText2Image.from_pretrained(
    "kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16
)
pipe.enable_model_cpu_offload()

prompt = "A lion in galaxies, spirals, nebulae, stars, smoke, iridescent, intricate detail, octane render, 8k"

image = pipe(prompt=prompt, num_inference_steps=25).images[0]

enable_sequential_cpu_offload

( gpu_id: typing.Optional[int] = None device: typing.Union[torch.device, str] = None )

使用 🤗 Accelerate 將所有模型（unet、text_encoder、vae 和 safety checker 狀態字典）解除安裝到 CPU，顯著降低記憶體使用。模型被移動到 torch.device('meta')，僅當呼叫其特定子模組的 forward 方法時才載入到 GPU。解除安裝是基於子模組進行的。與使用 enable_model_cpu_offload 相比，記憶體節省更高，但效能更低。

KandinskyImg2ImgPipeline

class diffusers.KandinskyImg2ImgPipeline

( text_encoder: MultilingualCLIP movq: VQModel tokenizer: XLMRobertaTokenizer unet: UNet2DConditionModel scheduler: DDIMScheduler )

引數

text_encoder (MultilingualCLIP) — 凍結的文字編碼器。
tokenizer (XLMRobertaTokenizer) — 類的分詞器
scheduler (DDIMScheduler) — 與 unet 結合使用的排程器，用於生成影像潛在變數。
unet (UNet2DConditionModel) — 用於去噪影像嵌入的條件 U-Net 架構。
movq (VQModel) — MoVQ 影像編碼器和解碼器。

使用 Kandinsky 進行影像到影像生成的管道。

此模型繼承自 DiffusionPipeline。有關庫為所有管道實現的通用方法（例如下載或儲存、在特定裝置上執行等），請檢視超類文件。

call

( prompt: typing.Union[str, typing.List[str]] image: typing.Union[torch.Tensor, PIL.Image.Image, typing.List[torch.Tensor], typing.List[PIL.Image.Image]] image_embeds: Tensor negative_image_embeds: Tensor negative_prompt: typing.Union[str, typing.List[str], NoneType] = None height: int = 512 width: int = 512 num_inference_steps: int = 100 strength: float = 0.3 guidance_scale: float = 7.0 num_images_per_prompt: int = 1 generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None output_type: typing.Optional[str] = 'pil' callback: typing.Optional[typing.Callable[[int, int, torch.Tensor], NoneType]] = None callback_steps: int = 1 return_dict: bool = True ) → ImagePipelineOutput 或 tuple

引數

prompt (str 或 List[str]) — 用於引導影像生成的提示或提示列表。
image (torch.Tensor, PIL.Image.Image) — Image，或表示影像批次的張量，將用作此過程的起點。
image_embeds (torch.Tensor 或 List[torch.Tensor]) — 用於文字提示的剪輯影像嵌入，將用於條件影像生成。
negative_image_embeds (torch.Tensor 或 List[torch.Tensor]) — 用於負文字提示的剪輯影像嵌入，將用於條件影像生成。
negative_prompt (str 或 List[str], 可選) — 不用於引導影像生成的提示或提示列表。當不使用引導時（即，如果 guidance_scale 小於 1），則忽略。
height (int, 可選, 預設為 512) — 生成影像的高度（畫素）。
width (int, 可選, 預設為 512) — 生成影像的寬度（畫素）。
num_inference_steps (int, 可選, 預設為 100) — 去噪步數。更多的去噪步數通常會帶來更高質量的影像，但推理速度會變慢。
strength (float, 可選, 預設為 0.3) — 概念上，表示參考 image 的轉換程度。必須在 0 到 1 之間。image 將用作起點，strength 越大，新增的噪聲越多。去噪步數取決於最初新增的噪聲量。當 strength 為 1 時，新增的噪聲將達到最大值，去噪過程將執行 num_inference_steps 中指定的完整迭代次數。因此，值為 1 實際上會忽略 image。
guidance_scale (float, 可選, 預設為 4.0) — 無分類器擴散引導中定義的引導比例。guidance_scale 定義為 Imagen Paper 中公式 2 的 w。透過設定 guidance_scale > 1 啟用引導比例。更高的引導比例鼓勵生成與文字 prompt 緊密相關的影像，通常以犧牲影像質量為代價。
num_images_per_prompt (int, 可選, 預設為 1) — 每個提示生成的影像數量。
generator (torch.Generator 或 List[torch.Generator], 可選) — 一個或多個 torch generator(s) 以使生成具有確定性。
output_type (str, 可選, 預設為 "pil") — 生成影像的輸出格式。在以下選項中選擇："pil" (PIL.Image.Image)、"np" (np.array) 或 "pt" (torch.Tensor)。
callback (Callable, 可選) — 在推理過程中每 callback_steps 步呼叫的函式。該函式呼叫時帶有以下引數：callback(step: int, timestep: int, latents: torch.Tensor)。
callback_steps (int, 可選, 預設為 1) — 呼叫 callback 函式的頻率。如果未指定，則在每一步都呼叫回撥。
return_dict (bool, 可選, 預設為 True) — 是否返回 ImagePipelineOutput 而不是普通元組。

ImagePipelineOutput 或 tuple

呼叫管道進行生成時呼叫的函式。

示例

>>> from diffusers import KandinskyImg2ImgPipeline, KandinskyPriorPipeline
>>> from diffusers.utils import load_image
>>> import torch

>>> pipe_prior = KandinskyPriorPipeline.from_pretrained(
...     "kandinsky-community/kandinsky-2-1-prior", torch_dtype=torch.float16
... )
>>> pipe_prior.to("cuda")

>>> prompt = "A red cartoon frog, 4k"
>>> image_emb, zero_image_emb = pipe_prior(prompt, return_dict=False)

>>> pipe = KandinskyImg2ImgPipeline.from_pretrained(
...     "kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16
... )
>>> pipe.to("cuda")

>>> init_image = load_image(
...     "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main"
...     "/kandinsky/frog.png"
... )

>>> image = pipe(
...     prompt,
...     image=init_image,
...     image_embeds=image_emb,
...     negative_image_embeds=zero_image_emb,
...     height=768,
...     width=768,
...     num_inference_steps=100,
...     strength=0.2,
... ).images

>>> image[0].save("red_frog.png")

KandinskyImg2ImgCombinedPipeline

class diffusers.KandinskyImg2ImgCombinedPipeline

引數

text_encoder (MultilingualCLIP) — 凍結的文字編碼器。
tokenizer (XLMRobertaTokenizer) — Tokenizer 類。
scheduler (Union[DDIMScheduler,DDPMScheduler]) — 與 unet 結合使用的排程器，用於生成影像潛空間。
unet (UNet2DConditionModel) — 用於去噪影像嵌入的條件 U-Net 架構。
movq (VQModel) — 用於從潛空間生成影像的 MoVQ 解碼器。
prior_prior (PriorTransformer) — 規範的 unCLIP 先驗，用於從文字嵌入近似影像嵌入。
prior_image_encoder (CLIPVisionModelWithProjection) — 凍結的影像編碼器。
prior_text_encoder (CLIPTextModelWithProjection) — 凍結的文字編碼器。
prior_tokenizer (CLIPTokenizer) — CLIPTokenizer 類的分詞器。
prior_scheduler (UnCLIPScheduler) — 與 prior 結合使用的排程器，用於生成影像嵌入。

使用 Kandinsky 進行影像到影像生成的組合管道。

此模型繼承自 DiffusionPipeline。有關庫為所有管道實現的通用方法（例如下載或儲存、在特定裝置上執行等），請檢視超類文件。

call

( prompt: typing.Union[str, typing.List[str]] image: typing.Union[torch.Tensor, PIL.Image.Image, typing.List[torch.Tensor], typing.List[PIL.Image.Image]] negative_prompt: typing.Union[str, typing.List[str], NoneType] = None num_inference_steps: int = 100 guidance_scale: float = 4.0 num_images_per_prompt: int = 1 strength: float = 0.3 height: int = 512 width: int = 512 prior_guidance_scale: float = 4.0 prior_num_inference_steps: int = 25 generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None latents: typing.Optional[torch.Tensor] = None output_type: typing.Optional[str] = 'pil' callback: typing.Optional[typing.Callable[[int, int, torch.Tensor], NoneType]] = None callback_steps: int = 1 return_dict: bool = True ) → ImagePipelineOutput 或 tuple

引數

prompt (str 或 List[str]) — 用於引導影像生成的提示或提示列表。
image (torch.Tensor, PIL.Image.Image, np.ndarray, List[torch.Tensor], List[PIL.Image.Image], 或 List[np.ndarray]) — Image，或表示影像批次的張量，將用作此過程的起點。如果直接傳遞潛空間，則不會再次編碼。
negative_prompt (str 或 List[str], 可選) — 不用於引導影像生成的提示或提示列表。當不使用引導時（即，如果 guidance_scale 小於 1），則忽略。
num_images_per_prompt (int, 可選, 預設為 1) — 每個提示生成的影像數量。
num_inference_steps (int, 可選, 預設為 100) — 去噪步數。更多的去噪步數通常會帶來更高質量的影像，但推理速度會變慢。
height (int, 可選, 預設為 512) — 生成影像的高度（畫素）。
width (int, 可選, 預設為 512) — 生成影像的寬度（畫素）。
strength (float, 可選, 預設為 0.3) — 概念上，表示參考 image 的轉換程度。必須在 0 到 1 之間。image 將用作起點，strength 越大，新增的噪聲越多。去噪步數取決於最初新增的噪聲量。當 strength 為 1 時，新增的噪聲將達到最大值，去噪過程將執行 num_inference_steps 中指定的完整迭代次數。因此，值為 1 實際上會忽略 image。
prior_guidance_scale (float, 可選, 預設為 4.0) — 無分類器擴散引導中定義的引導比例。guidance_scale 定義為 Imagen Paper 中公式 2 的 w。透過設定 guidance_scale > 1 啟用引導比例。更高的引導比例鼓勵生成與文字 prompt 緊密相關的影像，通常以犧牲影像質量為代價。
prior_num_inference_steps (int, 可選, 預設為 100) — 去噪步數。更多的去噪步數通常會帶來更高質量的影像，但推理速度會變慢。
guidance_scale (float, 可選, 預設為 4.0) — 無分類器擴散引導中定義的引導比例。guidance_scale 定義為 Imagen Paper 中公式 2 的 w。透過設定 guidance_scale > 1 啟用引導比例。更高的引導比例鼓勵生成與文字 prompt 緊密相關的影像，通常以犧牲影像質量為代價。
generator (torch.Generator 或 List[torch.Generator], 可選) — 一個或多個 torch generator(s) 以使生成具有確定性。
latents (torch.Tensor, 可選) — 預生成的帶噪聲的潛在空間，從高斯分佈中取樣，用作影像生成的輸入。可用於使用不同提示微調同一生成。如果未提供，將使用提供的隨機 generator 取樣生成一個潛在空間張量。
output_type (str, 可選, 預設為 "pil") — 生成影像的輸出格式。在以下選項中選擇："pil" (PIL.Image.Image)、"np" (np.array) 或 "pt" (torch.Tensor)。
callback (Callable, 可選) — 在推理過程中每 callback_steps 步呼叫的函式。該函式呼叫時帶有以下引數：callback(step: int, timestep: int, latents: torch.Tensor)。
callback_steps (int, 可選, 預設為 1) — 呼叫 callback 函式的頻率。如果未指定，則在每一步都呼叫回撥。
return_dict (bool, 可選, 預設為 True) — 是否返回 ImagePipelineOutput 而不是普通元組。

ImagePipelineOutput 或 tuple

呼叫管道進行生成時呼叫的函式。

示例

from diffusers import AutoPipelineForImage2Image
import torch
import requests
from io import BytesIO
from PIL import Image
import os

pipe = AutoPipelineForImage2Image.from_pretrained(
    "kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16
)
pipe.enable_model_cpu_offload()

prompt = "A fantasy landscape, Cinematic lighting"
negative_prompt = "low quality, bad quality"

url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"

response = requests.get(url)
image = Image.open(BytesIO(response.content)).convert("RGB")
image.thumbnail((768, 768))

image = pipe(prompt=prompt, image=original_image, num_inference_steps=25).images[0]

enable_sequential_cpu_offload

( gpu_id: typing.Optional[int] = None device: typing.Union[torch.device, str] = None )

使用 Accelerate 將所有模型解除安裝到 CPU，顯著減少記憶體使用。呼叫時，unet、text_encoder、vae 和 safety checker 的狀態字典將儲存到 CPU，然後移動到 torch.device('meta')，僅在其特定子模組呼叫 forward 方法時才載入到 GPU。請注意，解除安裝是基於子模組進行的。記憶體節省高於 enable_model_cpu_offload，但效能較低。

KandinskyInpaintPipeline

class diffusers.KandinskyInpaintPipeline

( text_encoder: MultilingualCLIP movq: VQModel tokenizer: XLMRobertaTokenizer unet: UNet2DConditionModel scheduler: DDIMScheduler )

引數

text_encoder (MultilingualCLIP) — 凍結的文字編碼器。
tokenizer (XLMRobertaTokenizer) — Tokenizer 類。
scheduler (DDIMScheduler) — 與 unet 結合使用的排程器，用於生成影像潛在空間。
unet (UNet2DConditionModel) — 用於對影像嵌入進行去噪的條件 U-Net 架構。
movq (VQModel) — MoVQ 影像編碼器和解碼器

使用 Kandinsky2.1 進行文字引導影像修復的管線

此模型繼承自 DiffusionPipeline。有關庫為所有管道實現的通用方法（例如下載或儲存、在特定裝置上執行等），請檢視超類文件。

call

( prompt: typing.Union[str, typing.List[str]] image: typing.Union[torch.Tensor, PIL.Image.Image] mask_image: typing.Union[torch.Tensor, PIL.Image.Image, numpy.ndarray] image_embeds: Tensor negative_image_embeds: Tensor negative_prompt: typing.Union[str, typing.List[str], NoneType] = None height: int = 512 width: int = 512 num_inference_steps: int = 100 guidance_scale: float = 4.0 num_images_per_prompt: int = 1 generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None latents: typing.Optional[torch.Tensor] = None output_type: typing.Optional[str] = 'pil' callback: typing.Optional[typing.Callable[[int, int, torch.Tensor], NoneType]] = None callback_steps: int = 1 return_dict: bool = True ) → ImagePipelineOutput 或 tuple

引數

prompt (str 或 List[str]) — 用於引導影像生成的提示詞。
image (torch.Tensor, PIL.Image.Image 或 np.ndarray) — 將用作過程起點的影像或表示影像批次的張量。
mask_image (PIL.Image.Image,torch.Tensor 或 np.ndarray) — 用於遮蓋 image 的影像或表示影像批次的張量。遮罩中的白色畫素將被重新繪製，而黑色畫素將被保留。只有當您傳入的影像是 pytorch 張量時，才能傳入 pytorch 張量作為遮罩，並且它應該包含一個顏色通道 (L) 而不是 3 個，因此預期的形狀將是 (B, 1, H, W,)、(B, H, W)、(1, H, W) 或 (H, W)。如果影像是 PIL 影像或 numpy 陣列，遮罩也應該是 PIL 影像或 numpy 陣列。如果它是 PIL 影像，在使用前它將被轉換為單通道（亮度）。如果它是 numpy 陣列，預期形狀是 (H, W)。
image_embeds (torch.Tensor 或 List[torch.Tensor]) — 用於文字提示的剪輯影像嵌入，將用於條件影像生成。
negative_image_embeds (torch.Tensor 或 List[torch.Tensor]) — 用於負文字提示的剪輯影像嵌入，將用於條件影像生成。
negative_prompt (str 或 List[str], 可選) — 不用於引導影像生成的提示詞。不使用引導時忽略（即，如果 guidance_scale 小於 1 則忽略）。
height (int, 可選, 預設為 512) — 生成影像的高度（畫素）。
width (int, 可選, 預設為 512) — 生成影像的寬度（畫素）。
num_inference_steps (int, 可選, 預設為 100) — 去噪步數。更多的去噪步數通常會帶來更高的影像質量，但推理速度會變慢。
guidance_scale (float, 可選, 預設為 4.0) — Classifier-Free Diffusion Guidance 中定義的引導比例。guidance_scale 定義為 Imagen Paper 中公式 2 的 w。透過設定 guidance_scale > 1 來啟用引導比例。更高的引導比例會促使生成與文字 prompt 緊密相關的影像，但通常會犧牲影像質量。
num_images_per_prompt (int, 可選, 預設為 1) — 每個提示詞生成的影像數量。
generator (torch.Generator 或 List[torch.Generator], 可選) — 一個或一個 torch generator(s) 列表，用於使生成具有確定性。
latents (torch.Tensor, 可選) — 預生成的帶噪聲的潛在變數，從高斯分佈中取樣，用作影像生成的輸入。可用於使用不同的提示詞調整相同的生成。如果未提供，將使用提供的隨機 generator 取樣生成一個潛在變數張量。
output_type (str, 可選, 預設為 "pil") — 生成影像的輸出格式。選擇："pil" (PIL.Image.Image)、"np" (np.array) 或 "pt" (torch.Tensor)。
callback (Callable, 可選) — 在推理過程中每 callback_steps 步呼叫一次的函式。函式將使用以下引數呼叫：callback(step: int, timestep: int, latents: torch.Tensor)。
callback_steps (int, 可選, 預設為 1) — 呼叫 callback 函式的頻率。如果未指定，則在每一步都呼叫回撥函式。
return_dict (bool, 可選, 預設為 True) — 是否返回 ImagePipelineOutput 而不是普通元組。

ImagePipelineOutput 或 tuple

呼叫管道進行生成時呼叫的函式。

示例

>>> from diffusers import KandinskyInpaintPipeline, KandinskyPriorPipeline
>>> from diffusers.utils import load_image
>>> import torch
>>> import numpy as np

>>> pipe_prior = KandinskyPriorPipeline.from_pretrained(
...     "kandinsky-community/kandinsky-2-1-prior", torch_dtype=torch.float16
... )
>>> pipe_prior.to("cuda")

>>> prompt = "a hat"
>>> image_emb, zero_image_emb = pipe_prior(prompt, return_dict=False)

>>> pipe = KandinskyInpaintPipeline.from_pretrained(
...     "kandinsky-community/kandinsky-2-1-inpaint", torch_dtype=torch.float16
... )
>>> pipe.to("cuda")

>>> init_image = load_image(
...     "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main"
...     "/kandinsky/cat.png"
... )

>>> mask = np.zeros((768, 768), dtype=np.float32)
>>> mask[:250, 250:-250] = 1

>>> out = pipe(
...     prompt,
...     image=init_image,
...     mask_image=mask,
...     image_embeds=image_emb,
...     negative_image_embeds=zero_image_emb,
...     height=768,
...     width=768,
...     num_inference_steps=50,
... )

>>> image = out.images[0]
>>> image.save("cat_with_hat.png")

KandinskyInpaintCombinedPipeline

class diffusers.KandinskyInpaintCombinedPipeline

引數

text_encoder (MultilingualCLIP) — 凍結的文字編碼器。
tokenizer (XLMRobertaTokenizer) — 類標記器
scheduler (Union[DDIMScheduler,DDPMScheduler]) — 與 unet 結合使用以生成影像潛在變數的排程器。
unet (UNet2DConditionModel) — 用於對影像嵌入進行去噪的條件 U-Net 架構。
movq (VQModel) — 用於從潛在變數生成影像的 MoVQ 解碼器。
prior_prior (PriorTransformer) — 規範的 unCLIP 先驗，用於從文字嵌入中近似影像嵌入。
prior_image_encoder (CLIPVisionModelWithProjection) — 凍結的影像編碼器。
prior_text_encoder (CLIPTextModelWithProjection) — 凍結的文字編碼器。
prior_tokenizer (CLIPTokenizer) — CLIPTokenizer 類的標記器。
prior_scheduler (UnCLIPScheduler) — 與 prior 結合使用以生成影像嵌入的排程器。

使用 Kandinsky 生成的組合管線

此模型繼承自 DiffusionPipeline。有關庫為所有管道實現的通用方法（例如下載或儲存、在特定裝置上執行等），請檢視超類文件。

call