Diffusers 文件

LEDITS++

Diffusers

加入 Hugging Face 社群

並獲得增強的文件體驗

在模型、資料集和 Spaces 上進行協作

透過加速推理獲得更快的示例

切換文件主題

開始使用

LEDITS++

LEDITS++ 由 Manuel Brack、Felix Friedrich、Katharina Kornmeier、Linoy Tsaban、Patrick Schramowski、Kristian Kersting 和 Apolinário Passos 在 LEDITS++: Limitless Image Editing using Text-to-Image Models 中提出。

論文摘要如下：

文字到影像擴散模型最近因其僅透過文字輸入即可生成高保真影像的驚人能力而受到越來越多的關注。隨後的研究工作旨在利用其能力並將其應用於真實影像編輯。然而，現有的影像到影像方法通常效率低下、不精確且通用性有限。它們要麼需要耗時的微調，要麼不必要地強烈偏離輸入影像，並且/或者缺乏對多個同時編輯的支援。為了解決這些問題，我們引入了 LEDITS++，一種高效、通用且精確的文字影像處理技術。LEDITS++ 的新型反演方法無需調優或最佳化，只需幾個擴散步驟即可生成高保真結果。其次，我們的方法支援多個同時編輯，並且與架構無關。第三，我們使用一種新穎的隱式掩蔽技術，將更改限制在相關的影像區域。作為我們詳盡評估的一部分，我們提出了新穎的 TEdBench++ 基準。我們的結果展示了 LEDITS++ 的能力及其相對於先前方法的改進。專案頁面可在 https://leditsplusplus-project.static.hf.space 找到。

您可以在專案頁面上找到有關 LEDITS++ 的更多資訊，並在 demo 中試用。

由於當前 diffusers 實現的 [DPMSolverMultistepScheduler](/docs/diffusers/v0.34.0/en/api/schedulers/multistep_dpm_solver#diffusers.DPMSolverMultistepScheduler) 存在一些向後相容性問題，因此此 LEdits++ 實現無法再保證完美反演。此問題不太可能對實際用例產生任何明顯影響。但是，我們提供了一個替代實現，可在專門的 [GitHub 儲存庫](https://github.com/ml-research/ledits_pp) 中保證完美反演。

我們提供基於不同預訓練模型的兩種不同管道。

LEditsPPPipelineStableDiffusion

class diffusers.LEditsPPPipelineStableDiffusion

< 源 >

( vae: AutoencoderKL text_encoder: CLIPTextModel tokenizer: CLIPTokenizer unet: UNet2DConditionModel scheduler: typing.Union[diffusers.schedulers.scheduling_ddim.DDIMScheduler, diffusers.schedulers.scheduling_dpmsolver_multistep.DPMSolverMultistepScheduler] safety_checker: StableDiffusionSafetyChecker feature_extractor: CLIPImageProcessor requires_safety_checker: bool = True )

引數

vae (AutoencoderKL) — 用於將影像編碼和解碼為潛在表示的變分自編碼器 (VAE) 模型。
text_encoder (CLIPTextModel) — 凍結的文字編碼器。Stable Diffusion 使用 CLIP 的文字部分，特別是 clip-vit-large-patch14 變體。
tokenizer (CLIPTokenizer) — CLIPTokenizer 類的分詞器。
unet (UNet2DConditionModel) — 用於對編碼影像潛在表示進行去噪的條件 U-Net 架構。
scheduler (DPMSolverMultistepScheduler 或 DDIMScheduler) — 與 unet 結合使用的排程器，用於對編碼影像潛在表示進行去噪。可以是 DPMSolverMultistepScheduler 或 DDIMScheduler 中的一個。如果傳入任何其他排程器，它將自動設定為 DPMSolverMultistepScheduler。
safety_checker (StableDiffusionSafetyChecker) — 用於評估生成的影像是否可能具有冒犯性或有害的分類模組。詳情請參閱模型卡。
feature_extractor (CLIPImageProcessor) — 用於從生成的影像中提取特徵作為 safety_checker 輸入的模型。

使用 LEDits++ 和 Stable Diffusion 的文字影像編輯管道。

此模型繼承自 DiffusionPipeline 並基於 StableDiffusionPipeline 構建。有關所有管道（下載、儲存、在特定裝置上執行等）實現的通用方法，請檢視超類文件。

call

< 源 >

( negative_prompt: typing.Union[str, typing.List[str], NoneType] = None generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None output_type: typing.Optional[str] = 'pil' return_dict: bool = True editing_prompt: typing.Union[str, typing.List[str], NoneType] = None editing_prompt_embeds: typing.Optional[torch.Tensor] = None negative_prompt_embeds: typing.Optional[torch.Tensor] = None reverse_editing_direction: typing.Union[bool, typing.List[bool], NoneType] = False edit_guidance_scale: typing.Union[float, typing.List[float], NoneType] = 5 edit_warmup_steps: typing.Union[int, typing.List[int], NoneType] = 0 edit_cooldown_steps: typing.Union[int, typing.List[int], NoneType] = None edit_threshold: typing.Union[float, typing.List[float], NoneType] = 0.9 user_mask: typing.Optional[torch.Tensor] = None sem_guidance: typing.Optional[typing.List[torch.Tensor]] = None use_cross_attn_mask: bool = False use_intersect_mask: bool = True attn_store_steps: typing.Optional[typing.List[int]] = [] store_averaged_over_steps: bool = True cross_attention_kwargs: typing.Optional[typing.Dict[str, typing.Any]] = None guidance_rescale: float = 0.0 clip_skip: typing.Optional[int] = None callback_on_step_end: typing.Optional[typing.Callable[[int, int, typing.Dict], NoneType]] = None callback_on_step_end_tensor_inputs: typing.List[str] = ['latents'] **kwargs ) → LEditsPPDiffusionPipelineOutput 或 tuple

引數

negative_prompt (str 或 List[str]，可選) — 不引導影像生成的提示詞。當不使用引導時（即，如果 guidance_scale 小於 1 則忽略），此引數將被忽略。
generator (torch.Generator, 可選) — 一個或多個 torch generator(s)，用於使生成確定性。
output_type (str, 可選，預設為 "pil") — 生成影像的輸出格式。選擇 PIL: PIL.Image.Image 或 np.array。
return_dict (bool, 可選，預設為 True) — 是否返回 LEditsPPDiffusionPipelineOutput 而不是普通的元組。
editing_prompt (str 或 List[str], 可選) — 用於引導影像生成的提示詞。透過設定 editing_prompt = None 來重建影像。提示詞的引導方向應透過 reverse_editing_direction 指定。
editing_prompt_embeds (torch.Tensor>, 可選) — 用於引導影像生成的預計算嵌入。嵌入的引導方向應透過 reverse_editing_direction 指定。
negative_prompt_embeds (torch.Tensor, 可選) — 預生成的負文字嵌入。可用於輕鬆調整文字輸入（提示權重）。如果未提供，negative_prompt_embeds 將從 negative_prompt 輸入引數生成。
reverse_editing_direction (bool 或 List[bool], 可選, 預設為 False) — editing_prompt 中對應的提示詞是應該增加還是減少。
edit_guidance_scale (float 或 List[float], 可選，預設為 5) — 引導影像生成的引導尺度。如果作為列表提供，值應與 editing_prompt 對應。edit_guidance_scale 定義為 LEDITS++ 論文中公式 12 的 s_e。
edit_warmup_steps (float 或 List[float], 可選，預設為 10) — 對於每個提示，不應用引導的擴散步數。
edit_cooldown_steps (float 或 List[float], 可選，預設為 None) — 對於每個提示，在停止應用引導之前的擴散步數。
edit_threshold (float 或 List[float], 可選，預設為 0.9) — 引導的遮罩閾值。閾值應與修改的影像區域成比例。LEDITS++ 論文中公式 12 的 edit_threshold 定義為 λ。
user_mask (torch.Tensor, 可選) — 使用者提供的掩碼，用於更好地控制編輯過程。當 LEDITS++ 的隱式掩碼不符合使用者偏好時，此功能很有用。
sem_guidance (List[torch.Tensor], 可選) — 在生成時應用的預生成引導向量列表。列表的長度必須與 num_inference_steps 對應。
use_cross_attn_mask (bool, 預設為 False) — 是否使用交叉注意力掩碼。當 use_intersect_mask 設定為 true 時，交叉注意力掩碼總是被使用。交叉注意力掩碼定義為 LEDITS++ 論文中公式 12 的“M^1”。
use_intersect_mask (bool, 預設為 True) — 遮罩項是否計算為交叉注意力遮罩和從噪聲估計中匯出的遮罩的交集。交叉注意力遮罩定義為 LEDITS++ 論文中公式 12 的“M^1”，從噪聲估計中匯出的遮罩定義為“M^2”。
attn_store_steps (List[int], 可選) — 注意力圖儲存在 AttentionStore 中的步驟。僅用於視覺化。
store_averaged_over_steps (bool, 預設為 True) — 'attn_store_steps' 的注意力圖是否在擴散步驟上平均儲存。如果為 False，則每個步驟的注意力圖將單獨儲存。僅用於視覺化。
cross_attention_kwargs (dict, 可選) — 如果指定，則傳遞給 self.processor 中定義的 AttentionProcessor 的 kwargs 字典。
guidance_rescale (float, 可選, 預設為 0.0) — 來自 Common Diffusion Noise Schedules and Sample Steps are Flawed 的引導重縮放因子。當使用零終端信噪比時，引導重縮放因子應修復過度曝光。
clip_skip (int, 可選) — 計算提示嵌入時要從 CLIP 中跳過的層數。值為 1 表示將使用倒數第二層的輸出計算提示嵌入。
callback_on_step_end (Callable, 可選) — 一個在推理過程中每個去噪步驟結束時呼叫的函式。該函式將使用以下引數呼叫：callback_on_step_end(self: DiffusionPipeline, step: int, timestep: int, callback_kwargs: Dict)。callback_kwargs 將包含 callback_on_step_end_tensor_inputs 指定的所有張量列表。
callback_on_step_end_tensor_inputs (List, 可選) — callback_on_step_end 函式的張量輸入列表。列表中指定的張量將作為 callback_kwargs 引數傳遞。您只能包含管道類 ._callback_tensor_inputs 屬性中列出的變數。

LEditsPPDiffusionPipelineOutput 或 tuple

如果 return_dict 為 True，則為 LEditsPPDiffusionPipelineOutput，否則為 tuple。當返回 tuple 時，第一個元素是生成的影像列表，第二個元素是 bool 列表，表示根據 safety_checker，相應生成的影像是否可能表示“不適合工作” (nsfw) 內容。

用於編輯的管道呼叫函式。必須事先呼叫 invert() 方法。編輯將始終針對最後反轉的影像執行。

示例

>>> import torch

>>> from diffusers import LEditsPPPipelineStableDiffusion
>>> from diffusers.utils import load_image

>>> pipe = LEditsPPPipelineStableDiffusion.from_pretrained(
...     "runwayml/stable-diffusion-v1-5", variant="fp16", torch_dtype=torch.float16
... )
>>> pipe.enable_vae_tiling()
>>> pipe = pipe.to("cuda")

>>> img_url = "https://www.aiml.informatik.tu-darmstadt.de/people/mbrack/cherry_blossom.png"
>>> image = load_image(img_url).resize((512, 512))

>>> _ = pipe.invert(image=image, num_inversion_steps=50, skip=0.1)

>>> edited_image = pipe(
...     editing_prompt=["cherry blossom"], edit_guidance_scale=10.0, edit_threshold=0.75
... ).images[0]

反轉

< source 源 >

( image: typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor]] source_prompt: str = '' source_guidance_scale: float = 3.5 num_inversion_steps: int = 30 skip: float = 0.15 generator: typing.Optional[torch._C.Generator] = None cross_attention_kwargs: typing.Optional[typing.Dict[str, typing.Any]] = None clip_skip: typing.Optional[int] = None height: typing.Optional[int] = None width: typing.Optional[int] = None resize_mode: typing.Optional[str] = 'default' crops_coords: typing.Optional[typing.Tuple[int, int, int, int]] = None ) → LEditsPPInversionPipelineOutput

引數

image (PipelineImageInput) — 要編輯的影像的輸入。多個輸入影像必須預設為相同的縱橫比。
source_prompt (str, 預設為 "") — 描述輸入影像的提示，將在反轉過程中用於引導。如果 source_prompt 為 ""，則引導將被停用。
source_guidance_scale (float, 預設為 3.5) — 反轉過程中的引導強度。
num_inversion_steps (int, 預設為 30) — 丟棄初始 skip 步驟後執行的總反轉步驟數。
skip (float, 預設為 0.15) — 將在反轉和後續生成中忽略的初始步驟部分。較低的值將導致輸入影像發生更強的變化。skip 必須介於 0 和 1 之間。
generator (torch.Generator, 可選) — 用於使反轉確定性的 torch.Generator。
cross_attention_kwargs (dict, 可選) — 如果指定，則傳遞給 self.processor 中定義的 AttentionProcessor 的 kwargs 字典。
clip_skip (int, 可選) — 計算提示嵌入時要從 CLIP 中跳過的層數。值為 1 表示將使用倒數第二層的輸出計算提示嵌入。
height (int, 可選, 預設為 None) — 預處理影像的高度。如果為 None，將使用 get_default_height_width() 獲取預設高度。
width (int, 可選, 預設為 None) -- 預處理影像的寬度。如果為 None，將使用 get_default_height_width() 獲取預設寬度。
resize_mode (str, 可選, 預設為 default) — 調整大小模式，可以是 default 或 fill。如果為 default，將調整影像大小以適應指定的寬度和高度，並且可能不保持原始縱橫比。如果為 fill，將調整影像大小以適應指定的寬度和高度，保持縱橫比，然後將影像居中，用影像資料填充空白區域。如果為 crop，將調整影像大小以適應指定的寬度和高度，保持縱橫比，然後將影像居中，裁剪多餘部分。請注意，調整大小模式 fill 和 crop 僅支援 PIL 影像輸入。
crops_coords (List[Tuple[int, int, int, int]], 可選, 預設為 None) — 批次中每張影像的裁剪座標。如果為 None，將不裁剪影像。

LEditsPPInversionPipelineOutput

輸出將包含調整大小後的輸入影像和相應的 VAE 重建。

用於影像反轉的管道功能，如 LEDITS++ 論文中所述。如果排程程式設定為 DDIMScheduler，則將執行 edit-friendly DPDM 提出的反轉。

停用 vae 切片

< source 源 >

( )

停用切片 VAE 解碼。如果之前啟用了 enable_vae_slicing，此方法將返回一步計算解碼。

停用 vae 平鋪

< source 源 >

( )

停用平鋪 VAE 解碼。如果之前啟用了 enable_vae_tiling，此方法將恢復一步計算解碼。

啟用 vae 切片

< source 源 >

( )

啟用切片 VAE 解碼。啟用此選項後，VAE 會將輸入張量分片，分步計算解碼。這有助於節省一些記憶體並允許更大的批次大小。

啟用 vae 平鋪

< source 源 >

( )

啟用平鋪 VAE 解碼。啟用此選項後，VAE 將把輸入張量分割成瓦片，分多步計算編碼和解碼。這對於節省大量記憶體和處理更大的影像非常有用。

編碼提示

< source 源 >

( device num_images_per_prompt enable_edit_guidance negative_prompt = None editing_prompt = None negative_prompt_embeds: typing.Optional[torch.Tensor] = None editing_prompt_embeds: typing.Optional[torch.Tensor] = None lora_scale: typing.Optional[float] = None clip_skip: typing.Optional[int] = None )

引數

device — (torch.device): torch 裝置
num_images_per_prompt (int) — 每個提示應生成的影像數量
enable_edit_guidance (bool) — 是否執行任何編輯或重建輸入影像
negative_prompt (str 或 List[str], 可選) — 不用於引導影像生成的提示。如果未定義，則必須傳遞 negative_prompt_embeds。當不使用引導時（即，如果 guidance_scale 小於 1），則忽略。
editing_prompt (str 或 List[str], 可選) — 要編碼的編輯提示。如果未定義，則必須傳遞 editing_prompt_embeds。
editing_prompt_embeds (torch.Tensor, 可選) — 預生成的文字嵌入。可用於輕鬆調整文字輸入，例如提示權重。如果未提供，將從 prompt 輸入引數生成文字嵌入。
negative_prompt_embeds (torch.Tensor, 可選) — 預生成的負文字嵌入。可用於輕鬆調整文字輸入，例如提示權重。如果未提供，負提示嵌入將從 negative_prompt 輸入引數生成。
lora_scale (float, 可選) — 如果載入了 LoRA 層，則應用於文字編碼器的所有 LoRA 層的 LoRA 比例。
clip_skip (int, 可選) — 計算提示嵌入時要從 CLIP 中跳過的層數。值為 1 表示將使用倒數第二層的輸出計算提示嵌入。

將提示編碼為文字編碼器隱藏狀態。

LEditsPPPipelineStableDiffusionXL

class diffusers.LEditsPPPipelineStableDiffusionXL

< source 源 >

( vae: AutoencoderKL text_encoder: CLIPTextModel text_encoder_2: CLIPTextModelWithProjection tokenizer: CLIPTokenizer tokenizer_2: CLIPTokenizer unet: UNet2DConditionModel scheduler: typing.Union[diffusers.schedulers.scheduling_dpmsolver_multistep.DPMSolverMultistepScheduler, diffusers.schedulers.scheduling_ddim.DDIMScheduler] image_encoder: CLIPVisionModelWithProjection = None feature_extractor: CLIPImageProcessor = None force_zeros_for_empty_prompt: bool = True add_watermarker: typing.Optional[bool] = None )

引數

vae (AutoencoderKL) — 變分自編碼器 (VAE) 模型，用於將影像編碼和解碼為潛在表示。
text_encoder (CLIPTextModel) — 凍結的文字編碼器。Stable Diffusion XL 使用 CLIP 的文字部分，特別是 clip-vit-large-patch14 變體。
text_encoder_2 (CLIPTextModelWithProjection) — 第二個凍結的文字編碼器。Stable Diffusion XL 使用 CLIP 的文字和池化部分，特別是 laion/CLIP-ViT-bigG-14-laion2B-39B-b160k 變體。
tokenizer (CLIPTokenizer) — CLIPTokenizer 類的分詞器。
tokenizer_2 (CLIPTokenizer) — 第二個 CLIPTokenizer 類的分詞器。
unet (UNet2DConditionModel) — 條件 U-Net 架構，用於對編碼影像潛在表示進行去噪。
scheduler (DPMSolverMultistepScheduler 或 DDIMScheduler) — 與 unet 結合使用的排程程式，用於對編碼影像潛在表示進行去噪。可以是 DPMSolverMultistepScheduler 或 DDIMScheduler 之一。如果傳遞任何其他排程程式，它將自動設定為 DPMSolverMultistepScheduler。
force_zeros_for_empty_prompt (bool, 可選, 預設為 "True") — 是否強制將負提示嵌入始終設定為 0。另請參閱 stabilityai/stable-diffusion-xl-base-1-0 的配置。
add_watermarker (bool, 可選) — 是否使用 invisible_watermark 庫對輸出影像進行水印。如果未定義，如果安裝了該包，它將預設為 True，否則將不使用水印。

使用 Stable Diffusion XL 進行文字影像編輯的管道。

此模型繼承自 DiffusionPipeline 並基於 StableDiffusionXLPipeline。有關所有管道實現的通用方法（下載、儲存、在特定裝置上執行等），請檢視超類文件。

此外，管道繼承了以下載入方法

LoRA: LEditsPPPipelineStableDiffusionXL.load_lora_weights()
Ckpt: loaders.FromSingleFileMixin.from_single_file()

以及以下儲存方法

LoRA: loaders.StableDiffusionXLPipeline.save_lora_weights

call

< source 源 >

( denoising_end: typing.Optional[float] = None negative_prompt: typing.Union[str, typing.List[str], NoneType] = None negative_prompt_2: typing.Union[str, typing.List[str], NoneType] = None negative_prompt_embeds: typing.Optional[torch.Tensor] = None negative_pooled_prompt_embeds: typing.Optional[torch.Tensor] = None ip_adapter_image: typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor], NoneType] = None output_type: typing.Optional[str] = 'pil' return_dict: bool = True cross_attention_kwargs: typing.Optional[typing.Dict[str, typing.Any]] = None guidance_rescale: float = 0.0 crops_coords_top_left: typing.Tuple[int, int] = (0, 0) target_size: typing.Optional[typing.Tuple[int, int]] = None editing_prompt: typing.Union[str, typing.List[str], NoneType] = None editing_prompt_embeddings: typing.Optional[torch.Tensor] = None editing_pooled_prompt_embeds: typing.Optional[torch.Tensor] = None reverse_editing_direction: typing.Union[bool, typing.List[bool], NoneType] = False edit_guidance_scale: typing.Union[float, typing.List[float], NoneType] = 5 edit_warmup_steps: typing.Union[int, typing.List[int], NoneType] = 0 edit_cooldown_steps: typing.Union[int, typing.List[int], NoneType] = None edit_threshold: typing.Union[float, typing.List[float], NoneType] = 0.9 sem_guidance: typing.Optional[typing.List[torch.Tensor]] = None use_cross_attn_mask: bool = False use_intersect_mask: bool = False user_mask: typing.Optional[torch.Tensor] = None attn_store_steps: typing.Optional[typing.List[int]] = [] store_averaged_over_steps: bool = True clip_skip: typing.Optional[int] = None callback_on_step_end: typing.Optional[typing.Callable[[int, int, typing.Dict], NoneType]] = None callback_on_step_end_tensor_inputs: typing.List[str] = ['latents'] **kwargs ) → LEditsPPDiffusionPipelineOutput or tuple

引數

denoising_end (float, 可選) — 指定時，確定去噪過程在有意過早終止前完成的總分數（介於0.0和1.0之間）。因此，返回的樣本仍將保留由排程器選擇的離散時間步所決定的相當數量的噪聲。`denoising_end` 引數應在管道作為“去噪器混合體”多管道設定的一部分時使用，如[**Refining the Image**]中所述。
negative_prompt (str 或 List[str], 可選) — 不用於引導影像生成的提示或提示列表。如果未定義，則必須傳遞 `negative_prompt_embeds`。當不使用引導時（即，如果 `guidance_scale` 小於 `1`），則忽略此引數。
negative_prompt_2 (str 或 List[str], 可選) — 不用於引導影像生成併發送到 `tokenizer_2` 和 `text_encoder_2` 的提示或提示列表。如果未定義，`negative_prompt` 將用於兩個文字編碼器。
negative_prompt_embeds (torch.Tensor, 可選) — 預生成的負文字嵌入。可用於輕鬆調整文字輸入，例如提示權重。如果未提供，將從 `negative_prompt` 輸入引數生成 negative_prompt_embeds。
negative_pooled_prompt_embeds (torch.Tensor, 可選) — 預生成的負池化文字嵌入。可用於輕鬆調整文字輸入，例如提示權重。如果未提供，池化 negative_prompt_embeds 將從 `negative_prompt` 輸入引數生成。
ip_adapter_image — (PipelineImageInput, 可選): 用於IP介面卡的可選影像輸入。
output_type (str, 可選, 預設為 "pil") — 生成影像的輸出格式。選擇 PIL: PIL.Image.Image 或 np.array。
return_dict (bool, 可選, 預設為 True) — 是否返回 ~pipelines.stable_diffusion_xl.StableDiffusionXLPipelineOutput 而不是普通元組。
callback (Callable, 可選) — 在推理期間，每 `callback_steps` 步呼叫一次的函式。該函式將使用以下引數呼叫：`callback(step: int, timestep: int, latents: torch.Tensor)`。
callback_steps (int, 可選, 預設為 1) — `callback` 函式將被呼叫的頻率。如果未指定，回撥將在每一步呼叫。
cross_attention_kwargs (dict, 可選) — 一個 kwargs 字典，如果指定，將作為 `AttentionProcessor` 中 `self.processor` 定義的引數傳遞給 diffusers.models.attention_processor。
guidance_rescale (float, 可選, 預設為 0.7) — Common Diffusion Noise Schedules and Sample Steps are Flawed 中提出的引導重縮放因子。guidance_scale 在 Common Diffusion Noise Schedules and Sample Steps are Flawed 的公式 16 中定義為 φ。引導重縮放因子應在零終端 SNR 下修復過度曝光。
crops_coords_top_left (Tuple[int], 可選, 預設為 (0, 0)) — `crops_coords_top_left` 可用於生成看起來從 `crops_coords_top_left` 位置向下“裁剪”的影像。透過將 `crops_coords_top_left` 設定為 (0, 0) 通常可以獲得有利的、居中的影像。SDXL 微條件的一部分，如 https://huggingface.co/papers/2307.01952 的第 2.2 節所述。
target_size (Tuple[int], 可選, 預設為 (1024, 1024)) — 在大多數情況下，`target_size` 應設定為生成影像的所需高度和寬度。如果未指定，它將預設為 `(width, height)`。SDXL 微條件的一部分，如 https://huggingface.co/papers/2307.01952 的第 2.2 節所述。
editing_prompt (str 或 List[str], 可選) — 引導影像生成的提示或提示列表。透過設定 `editing_prompt = None` 來重建影像。提示的引導方向應透過 `reverse_editing_direction` 指定。
editing_prompt_embeddings (torch.Tensor, 可選) — 預生成的編輯文字嵌入。可用於輕鬆調整文字輸入，例如提示權重。如果未提供，`editing_prompt_embeddings` 將從 `editing_prompt` 輸入引數生成。
editing_pooled_prompt_embeddings (torch.Tensor, 可選) — 預生成的池化編輯文字嵌入。可用於輕鬆調整文字輸入，例如提示權重。如果未提供，`editing_prompt_embeddings` 將從 `editing_prompt` 輸入引數生成。
reverse_editing_direction (bool 或 List[bool], 可選, 預設為 False) — `editing_prompt` 中相應提示應增加還是減少。
edit_guidance_scale (float 或 List[float], 可選, 預設為 5) — 引導影像生成的引導尺度。如果作為列表提供，值應與 `editing_prompt` 對應。`edit_guidance_scale` 定義為 LEDITS++ 論文公式 12 的 `s_e`。
edit_warmup_steps (float 或 List[float], 可選, 預設為 10) — 不應用引導的擴散步數（每個提示）。
edit_cooldown_steps (float 或 List[float], 可選, 預設為 None) — 不再應用引導的擴散步數（每個提示）。
edit_threshold (float 或 List[float], 可選, 預設為 0.9) — 引導的遮罩閾值。閾值應與修改的影像區域成比例。`edit_threshold` 定義為 LEDITS++ 論文公式 12 的 `λ`。
sem_guidance (List[torch.Tensor], 可選) — 在生成時應用的預生成引導向量列表。列表長度必須與 `num_inference_steps` 對應。
use_cross_attn_mask — 是否使用交叉注意力掩碼。當 `use_intersect_mask` 設定為 true 時，始終使用交叉注意力掩碼。交叉注意力掩碼定義為 LEDITS++ 論文公式 12 的 `M^1`。
use_intersect_mask — 遮罩項是否計算為交叉注意力遮罩和噪聲估計得出的遮罩的交集。交叉注意力遮罩定義為 LEDITS++ 論文公式 12 的 `M^1`，噪聲估計得出的遮罩定義為 `M^2`。
user_mask — 使用者提供的遮罩，用於更好地控制編輯過程。當 LEDITS++ 的隱式遮罩不符合使用者偏好時，此功能很有用。
attn_store_steps — 用於在 AttentionStore 中儲存注意力圖的步驟。僅用於視覺化。
store_averaged_over_steps — `attn_store_steps` 的注意力圖是否在擴散步驟中取平均值。如果為 False，則每個步驟的注意力圖將單獨儲存。僅用於視覺化。
clip_skip (int, 可選) — 計算提示嵌入時要跳過 CLIP 的層數。值為 1 表示將使用倒數第二層的輸出計算提示嵌入。
callback_on_step_end (Callable, 可選) — 在推理過程中，每個去噪步驟結束時呼叫的函式。該函式使用以下引數呼叫：`callback_on_step_end(self: DiffusionPipeline, step: int, timestep: int, callback_kwargs: Dict)`。`callback_kwargs` 將包含 `callback_on_step_end_tensor_inputs` 中指定的所有張量列表。
callback_on_step_end_tensor_inputs (List, 可選) — `callback_on_step_end` 函式的張量輸入列表。列表中指定的張量將作為 `callback_kwargs` 引數傳遞。您只能包含管道類的 `._callback_tensor_inputs` 屬性中列出的變數。

LEditsPPDiffusionPipelineOutput 或 tuple

LEditsPPDiffusionPipelineOutput 如果 `return_dict` 為 True，否則為 `tuple`。返回元組時，第一個元素是包含生成影像的列表。

編輯管道的呼叫函式。必須事先呼叫 invert() 方法。編輯將始終針對最後反轉的影像進行。

示例

>>> import torch

>>> from diffusers import LEditsPPPipelineStableDiffusionXL
>>> from diffusers.utils import load_image

>>> pipe = LEditsPPPipelineStableDiffusionXL.from_pretrained(
...     "stabilityai/stable-diffusion-xl-base-1.0", variant="fp16", torch_dtype=torch.float16
... )
>>> pipe.enable_vae_tiling()
>>> pipe = pipe.to("cuda")

>>> img_url = "https://www.aiml.informatik.tu-darmstadt.de/people/mbrack/tennis.jpg"
>>> image = load_image(img_url).resize((1024, 1024))

>>> _ = pipe.invert(image=image, num_inversion_steps=50, skip=0.2)

>>> edited_image = pipe(
...     editing_prompt=["tennis ball", "tomato"],
...     reverse_editing_direction=[True, False],
...     edit_guidance_scale=[5.0, 10.0],
...     edit_threshold=[0.9, 0.85],
... ).images[0]

反轉

< source >

( image: typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor]] source_prompt: str = '' source_guidance_scale = 3.5 negative_prompt: str = None negative_prompt_2: str = None num_inversion_steps: int = 50 skip: float = 0.15 generator: typing.Optional[torch._C.Generator] = None crops_coords_top_left: typing.Tuple[int, int] = (0, 0) num_zero_noise_steps: int = 3 cross_attention_kwargs: typing.Optional[typing.Dict[str, typing.Any]] = None height: typing.Optional[int] = None width: typing.Optional[int] = None resize_mode: typing.Optional[str] = 'default' crops_coords: typing.Optional[typing.Tuple[int, int, int, int]] = None ) → LEditsPPInversionPipelineOutput

引數

image (PipelineImageInput) — 要編輯的影像的輸入。多個輸入影像必須預設為相同的縱橫比。
source_prompt (str, 預設為 "") — 描述輸入影像的提示，將在反轉期間用於引導。如果 `source_prompt` 為 ""，則引導將停用。
source_guidance_scale (float, 預設為 3.5) — 反轉期間的引導強度。
negative_prompt (str 或 List[str], 可選) — 不用於引導影像生成的提示或提示列表。如果未定義，則必須傳遞 `negative_prompt_embeds`。當不使用引導時（即，如果 `guidance_scale` 小於 `1`），則忽略此引數。
negative_prompt_2 (str 或 List[str], 可選) — 不用於引導影像生成併發送到 `tokenizer_2` 和 `text_encoder_2` 的提示或提示列表。如果未定義，`negative_prompt` 將用於兩個文字編碼器。
num_inversion_steps (int, 預設為 50) — 丟棄初始 `skip` 步後執行的總反轉步數。
skip (float, 預設為 0.15) — 將在反轉和後續生成中忽略的初始步驟的比例。較低的值將導致對輸入影像的更改更強烈。`skip` 必須介於 0 和 1 之間。
generator (torch.Generator, 可選) — 用於使反轉確定性的 torch.Generator。
crops_coords_top_left (Tuple[int], 可選, 預設為 (0, 0)) — `crops_coords_top_left` 可用於生成看起來從 `crops_coords_top_left` 位置向下“裁剪”的影像。透過將 `crops_coords_top_left` 設定為 (0, 0) 通常可以獲得有利的、居中的影像。SDXL 微條件的一部分，如 https://huggingface.co/papers/2307.01952 的第 2.2 節所述。
num_zero_noise_steps (int, 預設為 3) — 最後不重新噪化當前影像的擴散步數。如果未將步數設定為零，SD-XL 與 DPMSolverMultistepScheduler 結合使用將產生噪聲偽影。
cross_attention_kwargs (dict, 可選) — 一個 kwargs 字典，如果指定，則傳遞給 diffusers.models.attention_processor 中定義的 self.processor 的 AttentionProcessor。

LEditsPPInversionPipelineOutput

輸出將包含調整大小後的輸入影像和相應的 VAE 重建。

用於影像反轉的管道功能，如 LEDITS++ 論文中所述。如果排程程式設定為 DDIMScheduler，則將執行 edit-friendly DPDM 提出的反轉。

停用 vae 切片

< 源 >

( )

停用切片 VAE 解碼。如果之前啟用了 enable_vae_slicing，此方法將返回一步計算解碼。

停用 vae 平鋪

< 源 >

( )

停用平鋪 VAE 解碼。如果之前啟用了 enable_vae_tiling，此方法將恢復一步計算解碼。

啟用 vae 切片

< 源 >

( )

啟用切片 VAE 解碼。啟用此選項後，VAE 會將輸入張量分片，分步計算解碼。這有助於節省一些記憶體並允許更大的批次大小。

啟用 vae 平鋪

< 源 >

( )

啟用平鋪 VAE 解碼。啟用此選項後，VAE 將把輸入張量分割成瓦片，分多步計算編碼和解碼。這對於節省大量記憶體和處理更大的影像非常有用。

編碼提示

< 源 >

( device: typing.Optional[torch.device] = None num_images_per_prompt: int = 1 negative_prompt: typing.Optional[str] = None negative_prompt_2: typing.Optional[str] = None negative_prompt_embeds: typing.Optional[torch.Tensor] = None negative_pooled_prompt_embeds: typing.Optional[torch.Tensor] = None lora_scale: typing.Optional[float] = None clip_skip: typing.Optional[int] = None enable_edit_guidance: bool = True editing_prompt: typing.Optional[str] = None editing_prompt_embeds: typing.Optional[torch.Tensor] = None editing_pooled_prompt_embeds: typing.Optional[torch.Tensor] = None )

引數

device — (torch.device): torch 裝置
num_images_per_prompt (int) — 每個提示應生成的影像數量
negative_prompt (str 或 List[str], 可選) — 不引導影像生成的提示。如果未定義，則必須傳遞 negative_prompt_embeds。
negative_prompt_2 (str 或 List[str], 可選) — 不引導影像生成併發送到 tokenizer_2 和 text_encoder_2 的提示。如果未定義，則 negative_prompt 將用於兩個文字編碼器。
negative_prompt_embeds (torch.Tensor, 可選) — 預生成的負文字嵌入。可用於輕鬆調整文字輸入，例如提示權重。如果未提供，則將從 negative_prompt 輸入引數生成 negative_prompt_embeds。
negative_pooled_prompt_embeds (torch.Tensor, 可選) — 預生成的負池化文字嵌入。可用於輕鬆調整文字輸入，例如提示權重。如果未提供，則 pooled negative_prompt_embeds 將從 negative_prompt 輸入引數生成。
lora_scale (float, 可選) — 如果載入了 LoRA 層，則將應用於文字編碼器所有 LoRA 層的 LoRA 比例。
clip_skip (int, 可選) — 計算提示嵌入時要跳過的 CLIP 層數。值為 1 表示將使用倒數第二層的輸出計算提示嵌入。
enable_edit_guidance (bool) — 是否引導到編輯提示。
editing_prompt (str 或 List[str], 可選) — 要編碼的編輯提示。如果未定義且“enable_edit_guidance”為 True，則必須傳遞 editing_prompt_embeds。
editing_prompt_embeds (torch.Tensor, 可選) — 預生成的編輯文字嵌入。可用於輕鬆調整文字輸入，例如提示權重。如果未提供且“enable_edit_guidance”為 True，則 editing_prompt_embeds 將從 editing_prompt 輸入引數生成。
editing_pooled_prompt_embeds (torch.Tensor, 可選) — 預生成的編輯池化文字嵌入。可用於輕鬆調整文字輸入，例如提示權重。如果未提供，則 pooled editing_pooled_prompt_embeds 將從 editing_prompt 輸入引數生成。

將提示編碼為文字編碼器隱藏狀態。

get_guidance_scale_embedding

< 源 >

( w: Tensor embedding_dim: int = 512 dtype: dtype = torch.float32 ) → torch.Tensor

引數

w (torch.Tensor) — 生成具有指定引導尺度的嵌入向量，以隨後豐富時間步嵌入。
embedding_dim (int, 可選, 預設為 512) — 要生成的嵌入的維度。
dtype (torch.dtype, 可選, 預設為 torch.float32) — 生成嵌入的資料型別。

torch.Tensor

形狀為 (len(w), embedding_dim) 的嵌入向量。

請參閱 https://github.com/google-research/vdm/blob/dc27b98a554f65cdc654b800da5aa1846545d41b/model_vdm.py#L298

LEditsPPDiffusionPipelineOutput

class diffusers.pipelines.LEditsPPDiffusionPipelineOutput

< 源 >

( images: typing.Union[typing.List[PIL.Image.Image], numpy.ndarray] nsfw_content_detected: typing.Optional[typing.List[bool]] )

引數

images (List[PIL.Image.Image] 或 np.ndarray) — 長度為 batch_size 的去噪 PIL 影像列表或形狀為 (batch_size, height, width, num_channels) 的 NumPy 陣列。
nsfw_content_detected (List[bool]) — 指示相應生成的影像是否包含“不安全內容”(nsfw) 的列表，如果無法執行安全檢查，則為 None。

LEdits++ 擴散管道的輸出類。

LEditsPPInversionPipelineOutput

class diffusers.pipelines.LEditsPPInversionPipelineOutput

< 源 >

( images: typing.Union[typing.List[PIL.Image.Image], numpy.ndarray] vae_reconstruction_images: typing.Union[typing.List[PIL.Image.Image], numpy.ndarray] )

引數

input_images (List[PIL.Image.Image] 或 np.ndarray) — 裁剪和調整大小後的輸入影像列表，作為長度為 batch_size 的 PIL 影像或形狀為 (batch_size, height, width, num_channels) 的 NumPy 陣列。
vae_reconstruction_images (List[PIL.Image.Image] 或 np.ndarray) — 所有輸入影像的 VAE 重建列表，作為長度為 batch_size 的 PIL 影像或形狀為 (batch_size, height, width, num_channels) 的 NumPy 陣列。

LEdits++ 擴散管道的輸出類。

< > 在 GitHub 上更新

←Latte LTXVideo→

Diffusers

LEDITS++

LEditsPPPipelineStableDiffusion

class diffusers.LEditsPPPipelineStableDiffusion

__call__

反轉

停用 vae 切片

停用 vae 平鋪

啟用 vae 切片

啟用 vae 平鋪

編碼提示

LEditsPPPipelineStableDiffusionXL

class diffusers.LEditsPPPipelineStableDiffusionXL

__call__

反轉

停用 vae 切片

停用 vae 平鋪

啟用 vae 切片

啟用 vae 平鋪

編碼提示

get_guidance_scale_embedding

LEditsPPDiffusionPipelineOutput

class diffusers.pipelines.LEditsPPDiffusionPipelineOutput

LEditsPPInversionPipelineOutput

class diffusers.pipelines.LEditsPPInversionPipelineOutput

call

call