Diffusers 文件

Flux

Diffusers

加入 Hugging Face 社群

並獲得增強的文件體驗

在模型、資料集和 Spaces 上進行協作

透過加速推理獲得更快的示例

切換文件主題

開始使用

Flux

Flux 是一系列基於擴散變換器的文字到影像生成模型。要了解更多關於 Flux 的資訊，請檢視 Flux 的建立者 Black Forest Labs 釋出的原始部落格文章。

Flux 的原始模型檢查點可以在這裡找到。原始推理程式碼可以在這裡找到。

Flux 在消費級硬體裝置上執行可能會非常昂貴。但是，您可以執行一系列最佳化，使其執行更快，並以更節省記憶體的方式執行。有關更多詳細資訊，請檢視此部分。此外，Flux 可以受益於量化以提高記憶體效率，但會犧牲推理延遲。請參閱此部落格文章瞭解更多資訊。有關資源的詳盡列表，請檢視此要點。

Flux 提供以下變體：

模型型別	模型 ID
時間步長蒸餾	`black-forest-labs/FLUX.1-schnell`
引導蒸餾	`black-forest-labs/FLUX.1-dev`
填充修復/外畫（引導蒸餾）	`black-forest-labs/FLUX.1-Fill-dev`
Canny 控制（引導蒸餾）	`black-forest-labs/FLUX.1-Canny-dev`
深度控制（引導蒸餾）	`black-forest-labs/FLUX.1-Depth-dev`
Canny 控制 (LoRA)	`black-forest-labs/FLUX.1-Canny-dev-lora`
深度控制 (LoRA)	`black-forest-labs/FLUX.1-Depth-dev-lora`
Redux（介面卡）	`black-forest-labs/FLUX.1-Redux-dev`

所有檢查點都有不同的用法，我們將在下面詳細介紹。

時間步長蒸餾

max_sequence_length 不能超過 256。
guidance_scale 需設定為 0。
由於這是一個時間步長蒸餾模型，因此它受益於更少的取樣步長。

import torch
from diffusers import FluxPipeline

pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload()

prompt = "A cat holding a sign that says hello world"
out = pipe(
    prompt=prompt,
    guidance_scale=0.,
    height=768,
    width=1360,
    num_inference_steps=4,
    max_sequence_length=256,
).images[0]
out.save("image.png")

引導蒸餾

引導蒸餾變體需要大約 50 個取樣步驟才能生成高質量影像。
它對 max_sequence_length 沒有任何限制。

import torch
from diffusers import FluxPipeline

pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload()

prompt = "a tiny astronaut hatching from an egg on the moon"
out = pipe(
    prompt=prompt,
    guidance_scale=3.5,
    height=768,
    width=1360,
    num_inference_steps=50,
).images[0]
out.save("image.png")

填充修復/外畫

Flux Fill 管道不需要像常規影像修復管道那樣將 strength 作為輸入。
它同時支援影像修復和外畫。

import torch
from diffusers import FluxFillPipeline
from diffusers.utils import load_image

image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/cup.png")
mask = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/cup_mask.png")

repo_id = "black-forest-labs/FLUX.1-Fill-dev"
pipe = FluxFillPipeline.from_pretrained(repo_id, torch_dtype=torch.bfloat16).to("cuda")

image = pipe(
    prompt="a white paper cup",
    image=image,
    mask_image=mask,
    height=1632,
    width=1232,
    max_sequence_length=512,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save(f"output.png")

Canny 控制

注意：black-forest-labs/Flux.1-Canny-dev 不是 ControlNetModel 模型。ControlNet 模型是 UNet/Transformer 的獨立元件，其殘差被新增到實際的基礎模型中。Canny Control 是一種替代架構，透過使用通道級串聯和輸入控制條件，並確保 Transformer 儘可能緊密地遵循條件來學習結構控制，從而達到與 ControlNet 模型相同的效果。

# !pip install -U controlnet-aux
import torch
from controlnet_aux import CannyDetector
from diffusers import FluxControlPipeline
from diffusers.utils import load_image

pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-Canny-dev", torch_dtype=torch.bfloat16).to("cuda")

prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")

processor = CannyDetector()
control_image = processor(control_image, low_threshold=50, high_threshold=200, detect_resolution=1024, image_resolution=1024)

image = pipe(
    prompt=prompt,
    control_image=control_image,
    height=1024,
    width=1024,
    num_inference_steps=50,
    guidance_scale=30.0,
).images[0]
image.save("output.png")

Canny 控制也可以透過此條件的 LoRA 變體實現。用法如下：

# !pip install -U controlnet-aux
import torch
from controlnet_aux import CannyDetector
from diffusers import FluxControlPipeline
from diffusers.utils import load_image

pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16).to("cuda")
pipe.load_lora_weights("black-forest-labs/FLUX.1-Canny-dev-lora")

prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")

processor = CannyDetector()
control_image = processor(control_image, low_threshold=50, high_threshold=200, detect_resolution=1024, image_resolution=1024)

image = pipe(
    prompt=prompt,
    control_image=control_image,
    height=1024,
    width=1024,
    num_inference_steps=50,
    guidance_scale=30.0,
).images[0]
image.save("output.png")

深度控制

注意：black-forest-labs/Flux.1-Depth-dev 不是一個 ControlNet 模型。ControlNetModel 模型是 UNet/Transformer 的獨立元件，其殘差被新增到實際的基礎模型中。深度控制是一種替代架構，透過使用通道級串聯和輸入控制條件，並確保 Transformer 儘可能緊密地遵循條件來學習結構控制，從而達到與 ControlNet 模型相同的效果。

# !pip install git+https://github.com/huggingface/image_gen_aux
import torch
from diffusers import FluxControlPipeline, FluxTransformer2DModel
from diffusers.utils import load_image
from image_gen_aux import DepthPreprocessor

pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-Depth-dev", torch_dtype=torch.bfloat16).to("cuda")

prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")

processor = DepthPreprocessor.from_pretrained("LiheYoung/depth-anything-large-hf")
control_image = processor(control_image)[0].convert("RGB")

image = pipe(
    prompt=prompt,
    control_image=control_image,
    height=1024,
    width=1024,
    num_inference_steps=30,
    guidance_scale=10.0,
    generator=torch.Generator().manual_seed(42),
).images[0]
image.save("output.png")

深度控制也可以透過此條件的 LoRA 變體實現。用法如下：

# !pip install git+https://github.com/huggingface/image_gen_aux
import torch
from diffusers import FluxControlPipeline, FluxTransformer2DModel
from diffusers.utils import load_image
from image_gen_aux import DepthPreprocessor

pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16).to("cuda")
pipe.load_lora_weights("black-forest-labs/FLUX.1-Depth-dev-lora")

prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")

processor = DepthPreprocessor.from_pretrained("LiheYoung/depth-anything-large-hf")
control_image = processor(control_image)[0].convert("RGB")

image = pipe(
    prompt=prompt,
    control_image=control_image,
    height=1024,
    width=1024,
    num_inference_steps=30,
    guidance_scale=10.0,
    generator=torch.Generator().manual_seed(42),
).images[0]
image.save("output.png")

Redux

Flux Redux 管道是 FLUX.1 基礎模型的介面卡。它可以與 flux-dev 和 flux-schnell 一起用於影像到影像生成。
您可以先使用 FluxPriorReduxPipeline 獲取 prompt_embeds 和 pooled_prompt_embeds，然後將它們輸入到 FluxPipeline 中進行影像到影像生成。
當將 FluxPriorReduxPipeline 與基礎管道一起使用時，您可以在基礎管道中設定 text_encoder=None 和 text_encoder_2=None，以節省 VRAM。

import torch
from diffusers import FluxPriorReduxPipeline, FluxPipeline
from diffusers.utils import load_image
device = "cuda"
dtype = torch.bfloat16


repo_redux = "black-forest-labs/FLUX.1-Redux-dev"
repo_base = "black-forest-labs/FLUX.1-dev" 
pipe_prior_redux = FluxPriorReduxPipeline.from_pretrained(repo_redux, torch_dtype=dtype).to(device)
pipe = FluxPipeline.from_pretrained(
    repo_base, 
    text_encoder=None,
    text_encoder_2=None,
    torch_dtype=torch.bfloat16
).to(device)

image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/style_ziggy/img5.png")
pipe_prior_output = pipe_prior_redux(image)
images = pipe(
    guidance_scale=2.5,
    num_inference_steps=50,
    generator=torch.Generator("cpu").manual_seed(0),
    **pipe_prior_output,
).images
images[0].save("flux-redux.png")

將 Flux Turbo LoRA 與 Flux Control、Fill 和 Redux 結合使用

我們可以將 Flux Turbo LoRA 與 Flux Control 和其他管道（如 Fill 和 Redux）結合使用，以實現少量步驟的推理。下面的示例展示瞭如何對深度和來自 ByteDance/Hyper-SD 的 Turbo LoRA 的 Flux Control LoRA 執行此操作。

from diffusers import FluxControlPipeline
from image_gen_aux import DepthPreprocessor
from diffusers.utils import load_image
from huggingface_hub import hf_hub_download
import torch

control_pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16)
control_pipe.load_lora_weights("black-forest-labs/FLUX.1-Depth-dev-lora", adapter_name="depth")
control_pipe.load_lora_weights(
    hf_hub_download("ByteDance/Hyper-SD", "Hyper-FLUX.1-dev-8steps-lora.safetensors"), adapter_name="hyper-sd"
)
control_pipe.set_adapters(["depth", "hyper-sd"], adapter_weights=[0.85, 0.125])
control_pipe.enable_model_cpu_offload()

prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")

processor = DepthPreprocessor.from_pretrained("LiheYoung/depth-anything-large-hf")
control_image = processor(control_image)[0].convert("RGB")

image = control_pipe(
    prompt=prompt,
    control_image=control_image,
    height=1024,
    width=1024,
    num_inference_steps=8,
    guidance_scale=10.0,
    generator=torch.Generator().manual_seed(42),
).images[0]
image.save("output.png")

使用 Flux LoRA 時關於 unload_lora_weights() 的注意事項

解除安裝 Control LoRA 權重時，請呼叫 pipe.unload_lora_weights(reset_to_overwritten_params=True) 以將 pipe.transformer 完全重置回其原始形式。然後可以將生成的管道與 DiffusionPipeline.from_pipe() 等方法一起使用。有關此引數的更多詳細資訊，請參閱此 PR。

IP-Adapter

請檢視IP-Adapter以瞭解 IP-Adapter 的工作原理。

IP-Adapter 允許您除了文字提示外，還使用影像來提示 Flux。當描述僅透過文字難以表達的複雜概念，並且您有參考影像時，這尤其有用。

import torch
from diffusers import FluxPipeline
from diffusers.utils import load_image

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16
).to("cuda")

image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/flux_ip_adapter_input.jpg").resize((1024, 1024))

pipe.load_ip_adapter(
    "XLabs-AI/flux-ip-adapter",
    weight_name="ip_adapter.safetensors",
    image_encoder_pretrained_model_name_or_path="openai/clip-vit-large-patch14"
)
pipe.set_ip_adapter_scale(1.0)

image = pipe(
    width=1024,
    height=1024,
    prompt="wearing sunglasses",
    negative_prompt="",
    true_cfg_scale=4.0,
    generator=torch.Generator().manual_seed(4444),
    ip_adapter_image=image,
).images[0]

image.save('flux_ip_adapter_output.jpg')

帶有提示“戴墨鏡”的 IP-Adapter 示例

最佳化

Flux 是一個非常大的模型，載入所有模型元件需要大約 50GB 的 RAM/VRAM。啟用以下一些最佳化以降低記憶體要求。

組解除安裝

組解除安裝透過解除安裝內部層組而不是整個模型或權重來降低 VRAM 使用。您需要在管道的所有模型元件上使用 apply_group_offloading()。offload_type 引數允許您在塊級和葉級解除安裝之間切換。將其設定為 leaf_level 會將最低葉級引數解除安裝到 CPU，而不是在模組級別解除安裝。

在支援非同步資料流的 CUDA 裝置上，設定 use_stream=True 可重疊資料傳輸和計算以加速推理。

可以在管道的不同元件中混合使用塊級和葉級解除安裝。

import torch
from diffusers import FluxPipeline
from diffusers.hooks import apply_group_offloading

model_id = "black-forest-labs/FLUX.1-dev"
dtype = torch.bfloat16
pipe = FluxPipeline.from_pretrained(
	model_id,
	torch_dtype=dtype,
)

apply_group_offloading(
    pipe.transformer,
    offload_type="leaf_level",
    offload_device=torch.device("cpu"),
    onload_device=torch.device("cuda"),
    use_stream=True,
)
apply_group_offloading(
    pipe.text_encoder, 
    offload_device=torch.device("cpu"),
    onload_device=torch.device("cuda"),
    offload_type="leaf_level",
    use_stream=True,
)
apply_group_offloading(
    pipe.text_encoder_2, 
    offload_device=torch.device("cpu"),
    onload_device=torch.device("cuda"),
    offload_type="leaf_level",
    use_stream=True,
)
apply_group_offloading(
    pipe.vae, 
    offload_device=torch.device("cpu"),
    onload_device=torch.device("cuda"),
    offload_type="leaf_level",
    use_stream=True,
)

prompt="A cat wearing sunglasses and working as a lifeguard at pool."

generator = torch.Generator().manual_seed(181201)
image = pipe(
    prompt,
    width=576,
    height=1024,
    num_inference_steps=30,
    generator=generator
).images[0]
image

執行 FP16 推理

Flux 可以使用 FP16（即加速 Turing/Volta GPU 上的推理）生成高質量影像，但與 FP32/BF16 相比，會產生不同的輸出。問題在於文字編碼器中的某些啟用在 FP16 中執行時必須被剪裁，這會影響整體影像。因此，強制文字編碼器使用 FP32 推理可以消除這種輸出差異。有關詳細資訊，請參閱此處。

FP16 推理程式碼

import torch
from diffusers import FluxPipeline

pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16) # can replace schnell with dev
# to run on low vram GPUs (i.e. between 4 and 32 GB VRAM)
pipe.enable_sequential_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()

pipe.to(torch.float16) # casting here instead of in the pipeline constructor because doing so in the constructor loads all models into CPU memory at once

prompt = "A cat holding a sign that says hello world"
out = pipe(
    prompt=prompt,
    guidance_scale=0.,
    height=768,
    width=1360,
    num_inference_steps=4,
    max_sequence_length=256,
).images[0]
out.save("image.png")

量化

量化有助於透過以較低精度資料型別儲存模型權重來減少大型模型的記憶體需求。但是，量化對影片質量的影響可能因影片模型而異。

請參閱量化概述，瞭解有關支援的量化後端以及選擇適合您用例的量化後端的更多資訊。以下示例演示瞭如何使用 bitsandbytes 載入量化的 FluxPipeline 進行推理。

import torch
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, FluxTransformer2DModel, FluxPipeline
from transformers import BitsAndBytesConfig as BitsAndBytesConfig, T5EncoderModel

quant_config = BitsAndBytesConfig(load_in_8bit=True)
text_encoder_8bit = T5EncoderModel.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    subfolder="text_encoder_2",
    quantization_config=quant_config,
    torch_dtype=torch.float16,
)

quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
transformer_8bit = FluxTransformer2DModel.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    subfolder="transformer",
    quantization_config=quant_config,
    torch_dtype=torch.float16,
)

pipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    text_encoder_2=text_encoder_8bit,
    transformer=transformer_8bit,
    torch_dtype=torch.float16,
    device_map="balanced",
)

prompt = "a tiny astronaut hatching from an egg on the moon"
image = pipeline(prompt, guidance_scale=3.5, height=768, width=1360, num_inference_steps=50).images[0]
image.save("flux.png")

FluxTransformer2DModel 的單一檔案載入

FluxTransformer2DModel 支援載入 Black Forest Labs 提供的原始格式的檢查點。當嘗試載入社群釋出的模型的微調或量化版本時，這也很有用。

根據您使用的 GPU 型別、CUDA 版本和 `torch` 版本，`FP8` 推理可能會不穩定。建議您使用 `optimum-quanto` 庫在您的機器上執行 FP8 推理。

以下示例演示瞭如何使用小於 16GB 的 VRAM 執行 Flux。

首先安裝 optimum-quanto

pip install optimum-quanto

然後執行以下示例

import torch
from diffusers import FluxTransformer2DModel, FluxPipeline
from transformers import T5EncoderModel, CLIPTextModel
from optimum.quanto import freeze, qfloat8, quantize

bfl_repo = "black-forest-labs/FLUX.1-dev"
dtype = torch.bfloat16

transformer = FluxTransformer2DModel.from_single_file("https://huggingface.co/Kijai/flux-fp8/blob/main/flux1-dev-fp8.safetensors", torch_dtype=dtype)
quantize(transformer, weights=qfloat8)
freeze(transformer)

text_encoder_2 = T5EncoderModel.from_pretrained(bfl_repo, subfolder="text_encoder_2", torch_dtype=dtype)
quantize(text_encoder_2, weights=qfloat8)
freeze(text_encoder_2)

pipe = FluxPipeline.from_pretrained(bfl_repo, transformer=None, text_encoder_2=None, torch_dtype=dtype)
pipe.transformer = transformer
pipe.text_encoder_2 = text_encoder_2

pipe.enable_model_cpu_offload()

prompt = "A cat holding a sign that says hello world"
image = pipe(
    prompt,
    guidance_scale=3.5,
    output_type="pil",
    num_inference_steps=20,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0]

image.save("flux-fp8-dev.png")

Diffusers

Flux

時間步長蒸餾

引導蒸餾

填充修復/外畫

Canny 控制

深度控制

Redux

將 Flux Turbo LoRA 與 Flux Control、Fill 和 Redux 結合使用

使用 Flux LoRA 時關於 unload_lora_weights() 的注意事項

IP-Adapter

最佳化

組解除安裝

執行 FP16 推理

量化

FluxTransformer2DModel 的單一檔案載入

FluxPipeline

class diffusers.FluxPipeline

__call__

disable_vae_slicing

disable_vae_tiling

enable_vae_slicing

enable_vae_tiling

encode_prompt

FluxImg2ImgPipeline

class diffusers.FluxImg2ImgPipeline

__call__

disable_vae_slicing

disable_vae_tiling

enable_vae_slicing

enable_vae_tiling

encode_prompt

FluxInpaintPipeline

class diffusers.FluxInpaintPipeline

__call__

encode_prompt

FluxControlNetInpaintPipeline

類 diffusers.FluxControlNetInpaintPipeline

__call__

encode_prompt

FluxControlNetImg2ImgPipeline

class diffusers.FluxControlNetImg2ImgPipeline

__call__

encode_prompt

FluxControlPipeline

class diffusers.FluxControlPipeline

__call__

disable_vae_slicing

disable_vae_tiling

enable_vae_slicing

enable_vae_tiling

encode_prompt

FluxControlImg2ImgPipeline

class diffusers.FluxControlImg2ImgPipeline

__call__

encode_prompt

FluxPriorReduxPipeline

class diffusers.FluxPriorReduxPipeline

__call__

encode_prompt

FluxFillPipeline

class diffusers.FluxFillPipeline

__call__

disable_vae_slicing

disable_vae_tiling

enable_vae_slicing

enable_vae_tiling

encode_prompt

call

call

call

call

call

call

call

call

call