外繪 III - 修復模型

社群文章釋出於 2024 年 4 月 23 日

贊

阿爾瓦羅·索莫薩

OzzyGT

這是關於外繪的第三個指南，如果你想了解其他方法，可以在這裡檢視

在本指南中，我們將探討如何在不改變原始主體的情況下進行外繪。我們可以使用修復模型來實現這一點，儘管它最初是為不同的任務訓練的，但只要我們幫助模型理解我們想要在影像新區域中生成的內容，我們仍然可以做到這一點。

1- 帶有透明背景的原始影像

首先，我們需要一張好的圖片，為此我將使用這張來自維基媒體的圖片。

這輛車有很多文字和可識別的標誌，所以我們可以判斷影像是否失真。

讓我們從移除背景開始，為此我將使用 `RMBG v1.4`，你可以在這裡找到模型：https://huggingface.co/briaai/RMBG-1.4 並找到如何使用它的說明，或者你可以直接使用 Hugging Face Space 來完成：https://huggingface.co/spaces/briaai/BRIA-RMBG-1.4。

我們的目標是隻獲取帶有透明背景（Alpha）的主體。

如果您想透過此方法獲得最佳效果，最好使用 Photoshop 等專業工具手動移除背景。正如您在此示例中看到的，汽車並不完美，但足以滿足本指南的需求。

現在我們有了主體，我總是更喜歡處理方形影像，因為 SDXL 在 1024x1024 影像上的效能更好，但從技術上講，只要您的 VRAM 支援，這可以用於任何影像尺寸。

使用 pillow，這就像縮放影像並將其貼上到方形影像中一樣簡單，我們還需要背景為白色

def scale_and_paste(original_image):
    aspect_ratio = original_image.width / original_image.height

    if original_image.width > original_image.height:
        new_width = 1024
        new_height = round(new_width / aspect_ratio)
    else:
        new_height = 1024
        new_width = round(new_height * aspect_ratio)

    resized_original = original_image.resize((new_width, new_height), Image.LANCZOS)
    white_background = Image.new("RGBA", (1024, 1024), "white")
    x = (1024 - new_width) // 2
    y = (1024 - new_height) // 2
    white_background.paste(resized_original, (x, y), resized_original)

    return resized_original, white_background

2.- 生成臨時背景

下一步，我們需要用與最終影像中想要的內容相似的影像來填充白色區域。例如，在這種情況下，我希望汽車行駛在高速公路上。

我們將使用 `inpaint controlnet` 來生成具有最佳效果的臨時背景。如果你想了解如何操作，我曾在第一篇指南中介紹過。

controlnet = ControlNetModel.from_pretrained(
    "destitech/controlnet-inpaint-dreamer-sdxl", torch_dtype=torch.float16, variant="fp16"
)

該模型喜歡新增細節，因此它通常會新增擾流板或使車頂或保險槓變大。

為了減輕這種影響，我們將使用一個 `zoe depth controlnet`，並把汽車做得比原來小一點，這樣我們就可以毫無問題地將原車貼上回影像上。

from controlnet_aux import ZoeDetector

def scale_and_paste(original_image):
    ...
    # make the subject a little smaller
    new_width = new_width - 20
    new_height = new_height - 20
    ...

# load preprocessor and generate depth map
zoe = ZoeDetector.from_pretrained("lllyasviel/Annotators")
image_zoe = zoe(white_bg_image, detect_resolution=512, image_resolution=1024)

controlnets = [
    ControlNetModel.from_pretrained(
        "destitech/controlnet-inpaint-dreamer-sdxl", torch_dtype=torch.float16, variant="fp16"
    ),
    ControlNetModel.from_pretrained("diffusers/controlnet-zoe-depth-sdxl-1.0", torch_dtype=torch.float16),
]

def generate_image(prompt, negative_prompt, inpaint_image, zoe_image, seed: int = None):
    if seed is None:
        seed = random.randint(0, 2**32 - 1)

    generator = torch.Generator(device="cpu").manual_seed(seed)

    image = pipeline(
        prompt,
        negative_prompt=negative_prompt,
        image=[inpaint_image, zoe_image],
        guidance_scale=6.5,
        num_inference_steps=25,
        generator=generator,
        controlnet_conditioning_scale=[0.5, 0.8],
        control_guidance_end=[0.9, 0.6],
    ).images[0]

    return image

現在我們可以生成一些背景並選擇我們喜歡的。

我喜歡最後一個，所以我們將在接下來的步驟中繼續使用這張圖片。

現在我們有了背景，我們只需要將原始汽車貼上到上面，同時還需要為它建立一個遮罩以進行外繪。

原始貼上	遮罩

3.- 外繪

背景移除不知何故將部分車燈作為 alpha 通道，確保原始影像的主體具有您想要的 alpha 通道非常重要。在這種情況下，這並不是很重要，因為車燈與生成的影像匹配。

現在，我們終於可以用修復模型生成外繪了，我將使用一個與 RealVisXL 模型合併的修復模型。

pipeline = StableDiffusionXLInpaintPipeline.from_pretrained(
    "OzzyGT/RealVisXL_V4.0_inpainting",
    torch_dtype=torch.float16,
    variant="fp16",
    vae=vae,
).to("cuda")

image = pipeline(
    prompt,
    negative_prompt=negative_prompt,
    image=image,
    mask_image=mask,
    guidance_scale=10.0,
    strength=0.8,
    num_inference_steps=30,
    generator=generator,
).images[0]

我喜歡最後一個，但是由於我們使用整個影像進行外繪，所以原始汽車略有改變，為了解決這個問題，我們只需要再次貼上原始汽車即可。

4.- 最終修飾

這張圖片看起來還不錯，但如果你想真正做出好作品，你需要投入一些努力。在此步驟之前，所有操作都可以透過程式設計完成，但要獲得真正好的最終結果，現在是時候修復一些細節並使用其他軟體應用濾鏡和增強顏色了。

例如，我不喜歡汽車下方沒有任何陰影，所以我會繪製陰影來模擬它們，然後用影像到影像（image-to-image）進行處理。像往常一樣，我只是將原始影像貼上到生成的影像上。

畫畫	img2img 傳球	最後

用程式碼來做這件事可能會很累，所以我建議使用一個好的 UI 進行最後的修飾，我喜歡使用 InvokeAI 來做這件事，我還建議觀看影片教程，在那裡你可以學習如何在不需要複雜繪畫的情況下新增細節，例如：https://www.youtube.com/watch?v=GAlaOlihZ20

我不會為這個演示修復所有細節，但我會做一些色彩校正，讓它看起來更專業一些

希望這能幫助您更好地理解如何使用 Diffusers 進行外繪。如果您有任何問題，請隨時在討論區提問。

這是完整程式碼

import random

import requests
import torch
from controlnet_aux import ZoeDetector
from PIL import Image, ImageOps

from diffusers import (
    AutoencoderKL,
    ControlNetModel,
    StableDiffusionXLControlNetPipeline,
    StableDiffusionXLInpaintPipeline,
)


def scale_and_paste(original_image):
    aspect_ratio = original_image.width / original_image.height

    if original_image.width > original_image.height:
        new_width = 1024
        new_height = round(new_width / aspect_ratio)
    else:
        new_height = 1024
        new_width = round(new_height * aspect_ratio)

    # make the subject a little smaller
    new_width = new_width - 20
    new_height = new_height - 20

    resized_original = original_image.resize((new_width, new_height), Image.LANCZOS)
    white_background = Image.new("RGBA", (1024, 1024), "white")
    x = (1024 - new_width) // 2
    y = (1024 - new_height) // 2
    white_background.paste(resized_original, (x, y), resized_original)

    return resized_original, white_background


# load the original image with alpha
original_image = Image.open(
    requests.get(
        "https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/outpainting/BMW_i8_Safety_Car_Front.png?download=true",
        stream=True,
    ).raw
).convert("RGBA")
resized_img, white_bg_image = scale_and_paste(original_image)

# load preprocessor and generate depth map
zoe = ZoeDetector.from_pretrained("lllyasviel/Annotators")
image_zoe = zoe(white_bg_image, detect_resolution=512, image_resolution=1024)

# load controlnets
controlnets = [
    ControlNetModel.from_pretrained(
        "destitech/controlnet-inpaint-dreamer-sdxl", torch_dtype=torch.float16, variant="fp16"
    ),
    ControlNetModel.from_pretrained("diffusers/controlnet-zoe-depth-sdxl-1.0", torch_dtype=torch.float16),
]

# vae in case it doesn't come with model
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16).to("cuda")

# initial pipeline for temp background
pipeline = StableDiffusionXLControlNetPipeline.from_pretrained(
    "SG161222/RealVisXL_V4.0", torch_dtype=torch.float16, variant="fp16", controlnet=controlnets, vae=vae
).to("cuda")


# function to generate
def generate_image(prompt, negative_prompt, inpaint_image, zoe_image, seed: int = None):
    if seed is None:
        seed = random.randint(0, 2**32 - 1)

    generator = torch.Generator(device="cpu").manual_seed(seed)

    image = pipeline(
        prompt,
        negative_prompt=negative_prompt,
        image=[inpaint_image, zoe_image],
        guidance_scale=6.5,
        num_inference_steps=25,
        generator=generator,
        controlnet_conditioning_scale=[0.5, 0.8],
        control_guidance_end=[0.9, 0.6],
    ).images[0]

    return image


# initial prompt
prompt = "a car on the highway"
negative_prompt = ""

temp_image = generate_image(prompt, negative_prompt, white_bg_image, image_zoe, 4138619029)

# paste original subject over temporal background
x = (1024 - resized_img.width) // 2
y = (1024 - resized_img.height) // 2
temp_image.paste(resized_img, (x, y), resized_img)

# create a mask for the final outpainting
mask = Image.new("L", temp_image.size)
mask.paste(resized_img.split()[3], (x, y))
mask = ImageOps.invert(mask)
final_mask = mask.point(lambda p: p > 128 and 255)

# clear old pipeline for VRAM savings
pipeline = None
torch.cuda.empty_cache()

# new pipeline with inpaiting model
pipeline = StableDiffusionXLInpaintPipeline.from_pretrained(
    "OzzyGT/RealVisXL_V4.0_inpainting",
    torch_dtype=torch.float16,
    variant="fp16",
    vae=vae,
).to("cuda")

# Use a blurred mask for better blend
mask_blurred = pipeline.mask_processor.blur(final_mask, blur_factor=20)


# function for final outpainting
def generate_outpaint(prompt, negative_prompt, image, mask, seed: int = None):
    if seed is None:
        seed = random.randint(0, 2**32 - 1)

    generator = torch.Generator(device="cpu").manual_seed(seed)

    image = pipeline(
        prompt,
        negative_prompt=negative_prompt,
        image=image,
        mask_image=mask,
        guidance_scale=10.0,
        strength=0.8,
        num_inference_steps=30,
        generator=generator,
    ).images[0]

    return image


# better prompt for final outpainting
prompt = "high quality photo of a car on the highway, shadows, highly detailed"
negative_prompt = ""

# generate the image
final_image = generate_outpaint(prompt, negative_prompt, temp_image, mask_blurred, 3352253467)

# paste original subject over final background
x = (1024 - resized_img.width) // 2
y = (1024 - resized_img.height) // 2
final_image.paste(resized_img, (x, y), resized_img)
final_image.save("result.png")

社群

透過拖放到文字輸入框、貼上或點選此處上傳圖片、音訊和影片。

點選或貼上此處以上傳圖片

· 註冊或登入發表評論

贊