Diffusers

加入 Hugging Face 社群

並獲得增強的文件體驗

在模型、資料集和 Spaces 上進行協作

透過加速推理獲得更快的示例

切換文件主題

開始使用

Stable unCLIP

Stable unCLIP 檢查點是在 Stable Diffusion 2.1 檢查點的基礎上進行微調，以 CLIP 影像嵌入為條件。Stable unCLIP 仍然以文字嵌入為條件。鑑於這兩個獨立的條件，stable unCLIP 可用於文字引導的影像變體。當與 unCLIP 先驗結合時，它也可以用於完整的文字到影像生成。

論文摘要如下：

CLIP 等對比模型已被證明可以學習影像的魯棒表示，捕捉語義和風格。為了利用這些表示進行影像生成，我們提出了一個兩階段模型：一個根據文字標題生成 CLIP 影像嵌入的先驗，以及一個根據影像嵌入生成影像的解碼器。我們表明，顯式生成影像表示可以提高影像多樣性，同時最大限度地減少照片真實感和標題相似性的損失。我們的以影像表示為條件的解碼器還可以生成保留影像語義和風格的影像變體，同時改變影像表示中不存在的非必要細節。此外，CLIP 的聯合嵌入空間支援零樣本的語言引導影像操作。我們使用擴散模型作為解碼器，並對先驗模型進行自迴歸和擴散模型的實驗，發現後者在計算上更高效，併產生更高質量的樣本。

提示

Stable unCLIP 在推理過程中將 `noise_level` 作為輸入，它決定了影像嵌入中新增多少噪聲。更高的 `noise_level` 會增加最終去噪影像的變化。預設情況下，我們不對影像嵌入新增任何額外的噪聲（`noise_level = 0`）。

文字到影像生成

Stable unCLIP 可以透過與 KakaoBrain 的開源 DALL-E 2 復現專案 Karlo 的先驗模型進行管道連線，從而實現文字到影像生成。

import torch
from diffusers import UnCLIPScheduler, DDPMScheduler, StableUnCLIPPipeline
from diffusers.models import PriorTransformer
from transformers import CLIPTokenizer, CLIPTextModelWithProjection

prior_model_id = "kakaobrain/karlo-v1-alpha"
data_type = torch.float16
prior = PriorTransformer.from_pretrained(prior_model_id, subfolder="prior", torch_dtype=data_type)

prior_text_model_id = "openai/clip-vit-large-patch14"
prior_tokenizer = CLIPTokenizer.from_pretrained(prior_text_model_id)
prior_text_model = CLIPTextModelWithProjection.from_pretrained(prior_text_model_id, torch_dtype=data_type)
prior_scheduler = UnCLIPScheduler.from_pretrained(prior_model_id, subfolder="prior_scheduler")
prior_scheduler = DDPMScheduler.from_config(prior_scheduler.config)

stable_unclip_model_id = "stabilityai/stable-diffusion-2-1-unclip-small"

pipe = StableUnCLIPPipeline.from_pretrained(
    stable_unclip_model_id,
    torch_dtype=data_type,
    variant="fp16",
    prior_tokenizer=prior_tokenizer,
    prior_text_encoder=prior_text_model,
    prior=prior,
    prior_scheduler=prior_scheduler,
)

pipe = pipe.to("cuda")
wave_prompt = "dramatic wave, the Oceans roar, Strong wave spiral across the oceans as the waves unfurl into roaring crests; perfect wave form; perfect wave shape; dramatic wave shape; wave shape unbelievable; wave; wave shape spectacular"

image = pipe(prompt=wave_prompt).images[0]
image

對於文字到影像，我們使用 `stabilityai/stable-diffusion-2-1-unclip-small`，因為它是在 CLIP ViT-L/14 嵌入上訓練的，與 Karlo 模型先驗相同。不建議使用 stabilityai/stable-diffusion-2-1-unclip，因為它是在 OpenCLIP ViT-H 上訓練的。

文字引導影像到影像變體

from diffusers import StableUnCLIPImg2ImgPipeline
from diffusers.utils import load_image
import torch

pipe = StableUnCLIPImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-1-unclip", torch_dtype=torch.float16, variation="fp16"
)
pipe = pipe.to("cuda")

url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/stable_unclip/tarsila_do_amaral.png"
init_image = load_image(url)

images = pipe(init_image).images
images[0].save("variation_image.png")

（可選）您也可以將提示詞傳遞給 `pipe`，例如

prompt = "A fantasy landscape, trending on artstation"

image = pipe(init_image, prompt=prompt).images[0]
image

請務必檢視排程器指南，瞭解如何探索排程器速度和質量之間的權衡，並檢視跨管道重用元件部分，瞭解如何有效地將相同元件載入到多個管道中。

Diffusers

Stable unCLIP

提示

文字到影像生成

文字引導影像到影像變體

StableUnCLIPPipeline

class diffusers.StableUnCLIPPipeline

__call__

enable_attention_slicing

disable_attention_slicing

enable_vae_slicing

disable_vae_slicing

enable_xformers_memory_efficient_attention

disable_xformers_memory_efficient_attention

encode_prompt

noise_image_embeddings

StableUnCLIPImg2ImgPipeline

class diffusers.StableUnCLIPImg2ImgPipeline