Diffusers

加入 Hugging Face 社群

並獲得增強的文件體驗

在模型、資料集和 Spaces 上進行協作

透過加速推理獲得更快的示例

切換文件主題

開始使用

Stable Diffusion 2

Stable Diffusion 2 是一個文字到影像的*潛在擴散*模型，它建立在原始的 Stable Diffusion 的工作基礎上，由 Stability AI 和 LAION 的 Robin Rombach 和 Katherine Crowson 領導。

Stable Diffusion 2.0 版本包含使用全新文字編碼器 (OpenCLIP) 訓練的強大文字到影像模型，該編碼器由 LAION 在 Stability AI 的支援下開發，與早期 V1 版本相比，它極大地提高了生成影像的質量。此版本中的文字到影像模型可以生成預設解析度為 512x512 畫素和 768x768 畫素的影像。這些模型在 Stability AI 的 DeepFloyd 團隊建立的 LAION-5B 資料集的美學子集上進行訓練，然後使用 LAION 的 NSFW 過濾器進一步過濾以去除成人內容。

有關 Stable Diffusion 2 如何工作以及它與原始 Stable Diffusion 有何不同，請參閱官方公告文章。

Stable Diffusion 2 的架構與原始 Stable Diffusion 模型基本相同，因此請檢視其 API 文件以瞭解如何使用 Stable Diffusion 2。我們建議使用 DPMSolverMultistepScheduler，因為它在速度/質量之間提供了合理的權衡，並且只需 20 步即可執行。

Stable Diffusion 2 可用於文字到影像、影像修復、超解析度和深度到影像等任務

任務	程式碼庫
文字到影像 (512x512)	stabilityai/stable-diffusion-2-base
文字到影像 (768x768)	stabilityai/stable-diffusion-2
影像修復	stabilityai/stable-diffusion-2-inpainting
超解析度	stable-diffusion-x4-upscaler
深度到影像	stabilityai/stable-diffusion-2-depth

以下是一些如何為每個任務使用 Stable Diffusion 2 的示例

務必檢視 Stable Diffusion 的提示部分，瞭解如何探索排程器速度和質量之間的權衡，以及如何高效地重用管道元件！

如果您有興趣將其中一個官方檢查點用於某個任務，請探索 CompVis、Runway 和 Stability AI Hub 組織！

文字到影像

from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
import torch

repo_id = "stabilityai/stable-diffusion-2-base"
pipe = DiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.float16, variant="fp16")

pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")

prompt = "High quality photo of an astronaut riding a horse in space"
image = pipe(prompt, num_inference_steps=25).images[0]
image

影像修復

import torch
from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
from diffusers.utils import load_image, make_image_grid

img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"

init_image = load_image(img_url).resize((512, 512))
mask_image = load_image(mask_url).resize((512, 512))

repo_id = "stabilityai/stable-diffusion-2-inpainting"
pipe = DiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.float16, variant="fp16")

pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")

prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
image = pipe(prompt=prompt, image=init_image, mask_image=mask_image, num_inference_steps=25).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)

超解析度

from diffusers import StableDiffusionUpscalePipeline
from diffusers.utils import load_image, make_image_grid
import torch

# load model and scheduler
model_id = "stabilityai/stable-diffusion-x4-upscaler"
pipeline = StableDiffusionUpscalePipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipeline = pipeline.to("cuda")

# let's download an  image
url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd2-upscale/low_res_cat.png"
low_res_img = load_image(url)
low_res_img = low_res_img.resize((128, 128))
prompt = "a white cat"
upscaled_image = pipeline(prompt=prompt, image=low_res_img).images[0]
make_image_grid([low_res_img.resize((512, 512)), upscaled_image.resize((512, 512))], rows=1, cols=2)

深度到影像

import torch
from diffusers import StableDiffusionDepth2ImgPipeline
from diffusers.utils import load_image, make_image_grid

pipe = StableDiffusionDepth2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-depth",
    torch_dtype=torch.float16,
).to("cuda")


url = "http://images.cocodataset.org/val2017/000000039769.jpg"
init_image = load_image(url)
prompt = "two tigers"
negative_prompt = "bad, deformed, ugly, bad anotomy"
image = pipe(prompt=prompt, image=init_image, negative_prompt=negative_prompt, strength=0.7).images[0]
make_image_grid([init_image, image], rows=1, cols=2)

< > 在 GitHub 上更新

←SDXL Turbo Stable Diffusion 3→