Diffusers 文件

影像修復

Diffusers

加入 Hugging Face 社群

並獲得增強的文件體驗

在模型、資料集和 Spaces 上進行協作

透過加速推理獲得更快的示例

切換文件主題

開始使用

影像修復

影像修復（Inpainting）用於替換或編輯影像的特定區域。這使其成為影像恢復的有用工具，例如去除缺陷和偽影，甚至用全新的內容替換影像區域。影像修復依賴於遮罩來確定要填充影像的哪些區域；要修復的區域由白色畫素表示，要保留的區域由黑色畫素表示。白色畫素由提示詞填充。

使用 🤗 Diffusers，您可以這樣進行影像修復：

使用 AutoPipelineForInpainting 類載入影像修復檢查點。這將根據檢查點自動檢測要載入的相應管道類。

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder-inpaint", torch_dtype=torch.float16
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

您會注意到，在整個指南中，我們使用 enable_model_cpu_offload() 和 enable_xformers_memory_efficient_attention() 來節省記憶體並提高推理速度。如果您使用的是 PyTorch 2.0，則無需在管道上呼叫 enable_xformers_memory_efficient_attention()，因為它已經在使用 PyTorch 2.0 原生縮放點積注意力。

載入基礎影像和遮罩影像

init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

建立用於修復影像的提示詞，並將其與基礎影像和遮罩影像一起傳遞給管道。

prompt = "a black cat with glowing eyes, cute, adorable, disney, pixar, highly detailed, 8k"
negative_prompt = "bad anatomy, deformed, ugly, disfigured"
image = pipeline(prompt=prompt, negative_prompt=negative_prompt, image=init_image, mask_image=mask_image).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)

基礎影像

遮罩影像

生成的影像

建立遮罩影像

在本指南中，為方便起見，所有程式碼示例都提供了遮罩影像。您可以在自己的影像上進行影像修復，但需要為其建立遮罩影像。使用下面的空間可以輕鬆建立遮罩影像。

上傳要進行影像修復的基礎影像，並使用草圖工具繪製遮罩。完成後，單擊 **執行** 以生成並下載遮罩影像。

遮罩模糊

`~VaeImageProcessor.blur` 方法提供瞭如何混合原始影像和影像修復區域的選項。模糊量由 `blur_factor` 引數決定。增加 `blur_factor` 會增加應用於遮罩邊緣的模糊量，從而軟化原始影像和影像修復區域之間的過渡。低或零的 `blur_factor` 會保留遮罩的更銳利邊緣。

要使用此功能，請使用影像處理器建立模糊遮罩。

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image
from PIL import Image

pipeline = AutoPipelineForInpainting.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16).to('cuda')

mask = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/seashore_mask.png")
blurred_mask = pipeline.mask_processor.blur(mask, blur_factor=33)
blurred_mask

無模糊遮罩

應用模糊的遮罩

非影像修復專用檢查點

到目前為止，本指南使用了影像修復專用檢查點，例如 stable-diffusion-v1-5/stable-diffusion-inpainting。但您也可以使用常規檢查點，例如 stable-diffusion-v1-5/stable-diffusion-v1-5。讓我們比較這兩個檢查點的結果。

左側影像是由常規檢查點生成的，右側影像是由影像修復檢查點生成的。您會立即注意到左側影像不夠清晰，並且仍然可以看到模型應該修復的區域的輪廓。右側影像要清晰得多，修復區域顯得更自然。

stable-diffusion-v1-5/stable-diffusion-v1-5

runwayml/stable-diffusion-inpainting

stable-diffusion-v1-5/stable-diffusion-v1-5

runwayml/stable-diffusion-inpainting

然而，對於更基本的任務，例如從影像中擦除物件（例如道路上的岩石），常規檢查點也能產生相當不錯的效果。常規檢查點和影像修復檢查點之間的差異並不那麼明顯。

stable-diffusion-v1-5/stable-diffusion-v1-5

runwayml/stable-diffusion-inpaint

stable-diffusion-v1-5/stable-diffusion-v1-5

runwayml/stable-diffusion-inpainting

使用非影像修復專用檢查點的缺點是整體影像質量可能會更低，但它通常傾向於保留遮罩區域（這就是為什麼您可以看到遮罩輪廓）。影像修復專用檢查點經過專門訓練，旨在生成更高質量的影像修復結果，其中包括在遮罩區域和非遮罩區域之間建立更自然的過渡。因此，這些檢查點更有可能改變您的非遮罩區域。

如果保留非遮罩區域對您的任務很重要，您可以使用 `VaeImageProcessor.apply_overlay` 方法強制影像的非遮罩區域保持不變，但代價是遮罩區域和非遮罩區域之間可能存在一些不自然的過渡。

import PIL
import numpy as np
import torch

from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

device = "cuda"
pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting",
    torch_dtype=torch.float16,
    variant="fp16"
)
pipeline = pipeline.to(device)

img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"

init_image = load_image(img_url).resize((512, 512))
mask_image = load_image(mask_url).resize((512, 512))

prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
repainted_image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image).images[0]
repainted_image.save("repainted_image.png")

unmasked_unchanged_image = pipeline.image_processor.apply_overlay(mask_image, init_image, repainted_image)
unmasked_unchanged_image.save("force_unmasked_unchanged.png")
make_image_grid([init_image, mask_image, repainted_image, unmasked_unchanged_image], rows=2, cols=2)

配置管道引數

影像特徵——例如質量和“創造力”——取決於管道引數。瞭解這些引數的作用對於獲得您想要的結果非常重要。讓我們看看最重要的引數以及改變它們如何影響輸出。

強度

`strength` 是新增到基礎影像中的噪聲量，它影響輸出與基礎影像的相似程度。

📈 高 `strength` 值意味著向影像新增更多噪聲，去噪過程需要更長時間，但您將獲得更高質量的影像，這些影像與基礎影像更不同。
📉 低 `strength` 值意味著向影像新增更少噪聲，去噪過程更快，但影像質量可能不如前者，並且生成的影像與基礎影像更相似。

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, strength=0.6).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)

強度 = 0.6

強度 = 0.8

強度 = 1.0

指導尺度

`guidance_scale` 影響文字提示和生成影像的一致性。

📈 高 `guidance_scale` 值意味著提示和生成的影像緊密對齊，因此輸出是提示的更嚴格解釋。
📉 低 `guidance_scale` 值意味著提示和生成的影像更鬆散地對齊，因此輸出可能與提示有更多變化。

您可以將 `strength` 和 `guidance_scale` 一起使用，以更好地控制模型的表達能力。例如，高 `strength` 和 `guidance_scale` 值的組合賦予模型最大的創作自由。

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, guidance_scale=2.5).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)

指導尺度 = 2.5

指導尺度 = 7.5

指導尺度 = 12.5

負面提示

負面提示與正面提示作用相反；它引導模型避免在影像中生成某些內容。這對於快速提高影像質量和防止模型生成您不想要的內容非常有用。

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
negative_prompt = "bad architecture, unstable, poor details, blurry"
image = pipeline(prompt=prompt, negative_prompt=negative_prompt, image=init_image, mask_image=mask_image).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)

填充遮罩裁剪

提高影像修復質量的一種方法是使用 padding_mask_crop 引數。啟用此選項後，它會裁剪帶有一些使用者指定填充的遮罩區域，並且還會從原始影像中裁剪相同的區域。影像和遮罩都將升級到更高的解析度以進行影像修復，然後疊加到原始影像上。這是一種在不使用 StableDiffusionUpscalePipeline 等單獨管道的情況下提高影像質量的快速簡便方法。

將 `padding_mask_crop` 引數新增到管道呼叫中，並將其設定為所需的填充值。

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image
from PIL import Image

generator = torch.Generator(device='cuda').manual_seed(0)
pipeline = AutoPipelineForInpainting.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16).to('cuda')

base = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/seashore.png")
mask = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/seashore_mask.png")

image = pipeline("boat", image=base, mask_image=mask, strength=0.75, generator=generator, padding_mask_crop=32).images[0]
image

預設影像修復影像

啟用 `padding_mask_crop` 的影像修復影像

鏈式影像修復管道

AutoPipelineForInpainting 可以與其他 🤗 Diffusers 管道連結以編輯其輸出。這通常有助於提高其他擴散管道的輸出質量，如果您使用多個管道，將它們連結在一起以將輸出保持在潛在空間中並重用相同的管道元件可以更節省記憶體。

文字到影像再到影像修復

將文字到影像和影像修復管道連結在一起，您可以對生成的影像進行影像修復，而無需先提供基礎影像。這使得編輯您喜歡的文字到影像輸出變得方便，而無需生成全新的影像。

從文字到影像管道開始建立一座城堡。

import torch
from diffusers import AutoPipelineForText2Image, AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForText2Image.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

text2image = pipeline("concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k").images[0]

載入上面輸出的遮罩影像

mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_text-chain-mask.png")

然後我們用瀑布來修復遮罩區域

pipeline = AutoPipelineForInpainting.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder-inpaint", torch_dtype=torch.float16
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

prompt = "digital painting of a fantasy waterfall, cloudy"
image = pipeline(prompt=prompt, image=text2image, mask_image=mask_image).images[0]
make_image_grid([text2image, mask_image, image], rows=1, cols=3)

文字到影像

影像修復

影像修復到影像再到影像

您也可以在影像到影像或超解析度等其他管道之前連結一個影像修復管道，以提高質量。

首先修復影像。

import torch
from diffusers import AutoPipelineForInpainting, AutoPipelineForImage2Image
from diffusers.utils import load_image, make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image_inpainting = pipeline(prompt=prompt, image=init_image, mask_image=mask_image).images[0]

# resize image to 1024x1024 for SDXL
image_inpainting = image_inpainting.resize((1024, 1024))

現在，讓我們將影像傳遞給另一個帶有 SDXL 細化模型的影像修復管道，以增強影像細節和質量。

pipeline = AutoPipelineForInpainting.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

image = pipeline(prompt=prompt, image=image_inpainting, mask_image=mask_image, output_type="latent").images[0]

在管道中指定 `output_type="latent"` 很重要，以將所有輸出保持在潛在空間中，從而避免不必要的解碼-編碼步驟。這僅在鏈式管道使用相同的 VAE 時才有效。例如，在文字到影像再到影像修復部分中，Kandinsky 2.2 使用與 Stable Diffusion 模型不同的 VAE 類，因此它將不起作用。但是，如果您對兩個管道都使用 Stable Diffusion v1.5，則可以將所有內容保持在潛在空間中，因為它們都使用 AutoencoderKL。

最後，您可以將此影像傳遞給影像到影像管道，以對其進行最後的潤色。使用 from_pipe() 方法效率更高，因為它可以重用現有管道元件，並避免不必要地將所有管道元件再次載入到記憶體中。

pipeline = AutoPipelineForImage2Image.from_pipe(pipeline)
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

image = pipeline(prompt=prompt, image=image).images[0]
make_image_grid([init_image, mask_image, image_inpainting, image], rows=2, cols=2)

初始影像

影像修復

影像到影像

影像到影像和影像修復實際上是非常相似的任務。影像到影像生成的新影像與現有提供的影像相似。影像修復做同樣的事情，但它只轉換由遮罩定義的影像區域，而影像的其餘部分保持不變。您可以將影像修復視為進行特定更改的更精確工具，而影像到影像則具有更廣泛的範圍，可以進行更全面的更改。

控制影像生成

讓影像看起來完全符合您的期望是具有挑戰性的，因為去噪過程是隨機的。雖然您可以透過配置 `negative_prompt` 等引數來控制生成的某些方面，但有更好、更有效的方法來控制影像生成。

提示詞加權

提示詞加權提供了一種量化的方法來縮放提示中概念的表示。您可以使用它來增加或減少提示中每個概念的文字嵌入向量的幅度，這隨後決定了每個概念生成的數量。Compel 庫提供了一種直觀的語法來縮放提示權重和生成嵌入。瞭解如何在提示詞加權指南中建立嵌入。

生成嵌入後，將其傳遞給 AutoPipelineForInpainting 中的 `prompt_embeds` (如果使用負面提示，則還傳遞給 `negative_prompt_embeds`) 引數。這些嵌入將替換 `prompt` 引數。

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import make_image_grid

pipeline = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16,
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

image = pipeline(prompt_embeds=prompt_embeds, # generated from Compel
    negative_prompt_embeds=negative_prompt_embeds, # generated from Compel
    image=init_image,
    mask_image=mask_image
).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)

ControlNet

ControlNet 模型與其他擴散模型（如 Stable Diffusion）一起使用，它們提供了更靈活、更準確的方式來控制影像的生成方式。ControlNet 接受一個額外的條件影像輸入，該輸入指導擴散模型保留其中的特徵。

例如，讓我們使用預訓練在影像修復影像上的 ControlNet 對影像進行條件化。

import torch
import numpy as np
from diffusers import ControlNetModel, StableDiffusionControlNetInpaintPipeline
from diffusers.utils import load_image, make_image_grid

# load ControlNet
controlnet = ControlNetModel.from_pretrained("lllyasviel/control_v11p_sd15_inpaint", torch_dtype=torch.float16, variant="fp16")

# pass ControlNet to the pipeline
pipeline = StableDiffusionControlNetInpaintPipeline.from_pretrained(
    "runwayml/stable-diffusion-inpainting", controlnet=controlnet, torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

# load base and mask image
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

# prepare control image
def make_inpaint_condition(init_image, mask_image):
    init_image = np.array(init_image.convert("RGB")).astype(np.float32) / 255.0
    mask_image = np.array(mask_image.convert("L")).astype(np.float32) / 255.0

    assert init_image.shape[0:1] == mask_image.shape[0:1], "image and image_mask must have the same image size"
    init_image[mask_image > 0.5] = -1.0  # set as masked pixel
    init_image = np.expand_dims(init_image, 0).transpose(0, 3, 1, 2)
    init_image = torch.from_numpy(init_image)
    return init_image

control_image = make_inpaint_condition(init_image, mask_image)

現在從基礎影像、遮罩影像和控制影像生成影像。您會注意到生成影像中強烈保留了基礎影像的特徵。

prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, control_image=control_image).images[0]
make_image_grid([init_image, mask_image, PIL.Image.fromarray(np.uint8(control_image[0][0])).convert('RGB'), image], rows=2, cols=2)

您可以更進一步，將其與影像到影像管道連結起來，以應用新樣式

from diffusers import AutoPipelineForImage2Image

pipeline = AutoPipelineForImage2Image.from_pretrained(
    "nitrosocke/elden-ring-diffusion", torch_dtype=torch.float16,
)
pipeline.enable_model_cpu_offload()
# remove following line if xFormers is not installed or you have PyTorch 2.0 or higher installed
pipeline.enable_xformers_memory_efficient_attention()

prompt = "elden ring style castle" # include the token "elden ring style" in the prompt
negative_prompt = "bad architecture, deformed, disfigured, poor details"

image_elden_ring = pipeline(prompt, negative_prompt=negative_prompt, image=image).images[0]
make_image_grid([init_image, mask_image, image, image_elden_ring], rows=2, cols=2)

初始影像

ControlNet 影像修復

影像到影像

最佳化

如果您的資源有限，執行擴散模型可能會很困難且緩慢，但透過一些最佳化技巧就不是這樣了。您可以啟用的最大（也是最簡單）的最佳化之一是切換到記憶體高效的注意力。如果您使用的是 PyTorch 2.0，則會自動啟用縮放點積注意力，您無需執行任何其他操作。對於非 PyTorch 2.0 使用者，您可以安裝並使用xFormers的記憶體高效注意力實現。這兩種選項都可以減少記憶體使用並加速推理。

您還可以將模型解除安裝到 CPU 以節省更多記憶體。

+ pipeline.enable_xformers_memory_efficient_attention()
+ pipeline.enable_model_cpu_offload()

要進一步加速您的推理程式碼，請使用`torch_compile`。您應該將 `torch.compile` 包裝在管道中最密集的元件（通常是 UNet）周圍。

pipeline.unet = torch.compile(pipeline.unet, mode="reduce-overhead", fullgraph=True)

在減少記憶體使用和加速推理指南中瞭解更多資訊。

< > 在 GitHub 上更新

←影像到影像影片生成→

Diffusers

影像修復

建立遮罩影像

遮罩模糊

熱門模型

Stable Diffusion 影像修復

Stable Diffusion XL (SDXL) 影像修復

Kandinsky 2.2 影像修復

非影像修復專用檢查點

配置管道引數

強度

指導尺度

負面提示

填充遮罩裁剪

鏈式影像修復管道

文字到影像再到影像修復

影像修復到影像再到影像

控制影像生成

提示詞加權

ControlNet

最佳化