Diffusers

加入 Hugging Face 社群

並獲得增強的文件體驗

在模型、資料集和 Spaces 上進行協作

透過加速推理獲得更快的示例

切換文件主題

開始使用

入門：使用混合推理進行 VAE 編碼

VAE 編碼用於訓練、影像到影像和影像到影片——將影像或影片轉換為潛在表示。

記憶體

這些表格展示了使用 SD v1 和 SD XL 在不同 GPU 上進行 VAE 編碼所需的 VRAM。

對於大多數 GPU，記憶體使用百分比決定了其他模型（文字編碼器、UNet/Transformer）必須解除安裝，或者必須使用平鋪編碼，這會增加時間並影響質量。

SD v1.5

GPU	解析度	時間（秒）	記憶體 (%)	平鋪時間（秒）	平鋪記憶體 (%)
NVIDIA GeForce RTX 4090	512x512	0.015	3.51901	0.015	3.51901
NVIDIA GeForce RTX 4090	256x256	0.004	1.3154	0.005	1.3154
NVIDIA GeForce RTX 4090	2048x2048	0.402	47.1852	0.496	3.51901
NVIDIA GeForce RTX 4090	1024x1024	0.078	12.2658	0.094	3.51901
NVIDIA GeForce RTX 4080 SUPER	512x512	0.023	5.30105	0.023	5.30105
NVIDIA GeForce RTX 4080 SUPER	256x256	0.006	1.98152	0.006	1.98152
NVIDIA GeForce RTX 4080 SUPER	2048x2048	0.574	71.08	0.656	5.30105
NVIDIA GeForce RTX 4080 SUPER	1024x1024	0.111	18.4772	0.14	5.30105
NVIDIA GeForce RTX 3090	512x512	0.032	3.52782	0.032	3.52782
NVIDIA GeForce RTX 3090	256x256	0.01	1.31869	0.009	1.31869
NVIDIA GeForce RTX 3090	2048x2048	0.742	47.3033	0.954	3.52782
NVIDIA GeForce RTX 3090	1024x1024	0.136	12.2965	0.207	3.52782
NVIDIA GeForce RTX 3080	512x512	0.036	8.51761	0.036	8.51761
NVIDIA GeForce RTX 3080	256x256	0.01	3.18387	0.01	3.18387
NVIDIA GeForce RTX 3080	2048x2048	0.863	86.7424	1.191	8.51761
NVIDIA GeForce RTX 3080	1024x1024	0.157	29.6888	0.227	8.51761
NVIDIA GeForce RTX 3070	512x512	0.051	10.6941	0.051	10.6941
NVIDIA GeForce RTX 3070	256x256	0.015	3.99743	0.015	3.99743
NVIDIA GeForce RTX 3070	2048x2048	1.217	96.054	1.482	10.6941
NVIDIA GeForce RTX 3070	1024x1024	0.223	37.2751	0.327	10.6941

SDXL

GPU	解析度	時間（秒）	記憶體消耗 (%)	平鋪時間（秒）	平鋪記憶體 (%)
NVIDIA GeForce RTX 4090	512x512	0.029	4.95707	0.029	4.95707
NVIDIA GeForce RTX 4090	256x256	0.007	2.29666	0.007	2.29666
NVIDIA GeForce RTX 4090	2048x2048	0.873	66.3452	0.863	15.5649
NVIDIA GeForce RTX 4090	1024x1024	0.142	15.5479	0.143	15.5479
NVIDIA GeForce RTX 4080 SUPER	512x512	0.044	7.46735	0.044	7.46735
NVIDIA GeForce RTX 4080 SUPER	256x256	0.01	3.4597	0.01	3.4597
NVIDIA GeForce RTX 4080 SUPER	2048x2048	1.317	87.1615	1.291	23.447
NVIDIA GeForce RTX 4080 SUPER	1024x1024	0.213	23.4215	0.214	23.4215
NVIDIA GeForce RTX 3090	512x512	0.058	5.65638	0.058	5.65638
NVIDIA GeForce RTX 3090	256x256	0.016	2.45081	0.016	2.45081
NVIDIA GeForce RTX 3090	2048x2048	1.755	77.8239	1.614	18.4193
NVIDIA GeForce RTX 3090	1024x1024	0.265	18.4023	0.265	18.4023
NVIDIA GeForce RTX 3080	512x512	0.064	13.6568	0.064	13.6568
NVIDIA GeForce RTX 3080	256x256	0.018	5.91728	0.018	5.91728
NVIDIA GeForce RTX 3080	2048x2048	記憶體溢位	記憶體溢位	1.866	44.4717
NVIDIA GeForce RTX 3080	1024x1024	0.302	44.4308	0.302	44.4308
NVIDIA GeForce RTX 3070	512x512	0.093	17.1465	0.093	17.1465
NVIDIA GeForce RTX 3070	256x256	0.025	7.42931	0.026	7.42931
NVIDIA GeForce RTX 3070	2048x2048	記憶體溢位	記憶體溢位	2.674	55.8355
NVIDIA GeForce RTX 3070	1024x1024	0.443	55.7841	0.443	55.7841

可用 VAEs

	端點	模型
Stable Diffusion v1	https://qc6479g0aac6qwy9.us-east-1.aws.endpoints.huggingface.cloud	`stabilityai/sd-vae-ft-mse`
Stable Diffusion XL	https://xjqqhmyn62rog84g.us-east-1.aws.endpoints.huggingface.cloud	`madebyollin/sdxl-vae-fp16-fix`
Flux	https://ptccx55jz97f9zgo.us-east-1.aws.endpoints.huggingface.cloud	`black-forest-labs/FLUX.1-schnell`

模型支援可以在這裡請求。

程式碼

從 `main` 安裝 `diffusers` 以執行程式碼：`pip install git+https://github.com/huggingface/diffusers@main`

一個輔助方法簡化了與混合推理的互動。

from diffusers.utils.remote_utils import remote_encode

基本示例

讓我們對影像進行編碼，然後解碼以進行演示。

程式碼

from diffusers.utils import load_image
from diffusers.utils.remote_utils import remote_decode

image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/astronaut.jpg?download=true")

latent = remote_encode(
    endpoint="https://ptccx55jz97f9zgo.us-east-1.aws.endpoints.huggingface.cloud/",
    scaling_factor=0.3611,
    shift_factor=0.1159,
)

decoded = remote_decode(
    endpoint="https://whhx50ex1aryqvw6.us-east-1.aws.endpoints.huggingface.cloud/",
    tensor=latent,
    scaling_factor=0.3611,
    shift_factor=0.1159,
)

生成

現在讓我們來看一個生成示例，我們將對影像進行編碼，然後也進行遠端解碼！

程式碼

import torch
from diffusers import StableDiffusionImg2ImgPipeline
from diffusers.utils import load_image
from diffusers.utils.remote_utils import remote_decode, remote_encode

pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    torch_dtype=torch.float16,
    variant="fp16",
    vae=None,
).to("cuda")

init_image = load_image(
    "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
)
init_image = init_image.resize((768, 512))

init_latent = remote_encode(
    endpoint="https://qc6479g0aac6qwy9.us-east-1.aws.endpoints.huggingface.cloud/",
    image=init_image,
    scaling_factor=0.18215,
)

prompt = "A fantasy landscape, trending on artstation"
latent = pipe(
    prompt=prompt,
    image=init_latent,
    strength=0.75,
    output_type="latent",
).images

image = remote_decode(
    endpoint="https://q1bj3bpq6kzilnsu.us-east-1.aws.endpoints.huggingface.cloud/",
    tensor=latent,
    scaling_factor=0.18215,
)
image.save("fantasy_landscape.jpg")

整合

SD.Next：整合的 UI，直接支援混合推理。
ComfyUI-HFRemoteVae：用於混合推理的 ComfyUI 節點。

< > 在 GitHub 上更新

←VAE 解碼 API 參考→