Diffusers

加入 Hugging Face 社群

並獲得增強的文件體驗

在模型、資料集和 Spaces 上進行協作

透過加速推理獲得更快的示例

切換文件主題

開始使用

Latent Consistency Model

Latent Consistency Model (LCM) 透過直接在潛在空間而非畫素空間中預測反向擴散過程，實現了快速、高質量的影像生成。換句話說，與典型的擴散模型迭代地從噪聲影像中去除噪聲不同，LCM 試圖從噪聲影像中預測出無噪聲的影像。由於避免了迭代取樣過程，LCM 能夠在 2-4 個步驟內生成高質量影像，而不是 20-30 個步驟。

LCM 是從預訓練模型中蒸餾出來的，這需要大約 32 個 A100 計算小時。為了加速這一過程，LCM-LoRA 訓練了一個 LoRA 介面卡，其需要訓練的引數比完整模型少得多。LCM-LoRA 訓練完成後可以插入到擴散模型中。

本指南將向您展示如何使用 LCM 和 LCM-LoRA 在各種任務上進行快速推理，以及如何將它們與其他介面卡（如 ControlNet 或 T2I-Adapter）一起使用。

LCM 和 LCM-LoRA 適用於 Stable Diffusion v1.5、Stable Diffusion XL 和 SSD-1B 模型。您可以在 Latent Consistency 集合中找到它們的檢查點。

文字到影像

LCM

LCM-LoRA

影像到影像

LCM

LCM-LoRA

影像修復

要使用 LCM-LoRA 進行影像修復，您需要將排程器替換為 LCMScheduler，並使用 load_lora_weights() 方法載入 LCM-LoRA 權重。然後您可以像往常一樣使用 pipeline，並傳遞一個文字提示、初始影像和蒙版影像，僅需 4 個步驟即可生成影像。

import torch
from diffusers import AutoPipelineForInpainting, LCMScheduler
from diffusers.utils import load_image, make_image_grid

pipe = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting",
    torch_dtype=torch.float16,
    variant="fp16",
).to("cuda")

pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

pipe.load_lora_weights("latent-consistency/lcm-lora-sdv1-5")

init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png")
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png")

prompt = "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k"
generator = torch.manual_seed(0)
image = pipe(
    prompt=prompt,
    image=init_image,
    mask_image=mask_image,
    generator=generator,
    num_inference_steps=4,
    guidance_scale=4,
).images[0]
image

初始影像

生成的影像

介面卡

LCM 與 LoRA、ControlNet、T2I-Adapter 和 AnimateDiff 等介面卡相容。您可以將 LCM 的速度帶到這些介面卡中，以生成特定風格的影像，或根據其他輸入（如 canny 影像）來對模型進行條件化。

LoRA

LoRA 介面卡可以快速進行微調，僅需幾張影像即可學習一種新風格，並將其插入預訓練模型中，以生成該風格的影像。

LCM

LCM-LoRA

ControlNet

ControlNet 是一種介面卡，可以在多種輸入（如 canny 邊緣、姿態估計或深度）上進行訓練。ControlNet 可以插入到 pipeline 中，為模型提供額外的條件和控制，以實現更精確的生成。

您可以在 lllyasviel 的倉庫中找到更多在其他輸入上訓練的 ControlNet 模型。

LCM

LCM-LoRA

T2I-Adapter

T2I-Adapter 是一個比 ControlNet 更輕量級的介面卡，它為預訓練模型提供了一個額外的輸入以進行條件化。它比 ControlNet 更快，但結果可能略差。

您可以在 TencentArc 的倉庫中找到更多在其他輸入上訓練的 T2I-Adapter 檢查點。

LCM

LCM-LoRA

AnimateDiff

AnimateDiff 是一種為影像新增運動的介面卡。它可以與大多數 Stable Diffusion 模型一起使用，有效地將它們轉變為“影片生成”模型。使用影片模型生成好的結果通常需要生成多幀（16-24 幀），這對於常規的 Stable Diffusion 模型來說可能非常慢。LCM-LoRA 可以透過每幀僅需 4-8 個步驟來加速這個過程。

載入一個 AnimateDiffPipeline 並向其傳遞一個 MotionAdapter。然後將排程器替換為 LCMScheduler，並使用 ~loaders.UNet2DConditionLoadersMixin.set_adapters 方法組合兩個 LoRA 介面卡。現在您可以將提示詞傳遞給 pipeline 並生成一個動畫影像。

import torch
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler, LCMScheduler
from diffusers.utils import export_to_gif

adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5")
pipe = AnimateDiffPipeline.from_pretrained(
    "frankjoshua/toonyou_beta6",
    motion_adapter=adapter,
).to("cuda")

# set scheduler
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

# load LCM-LoRA
pipe.load_lora_weights("latent-consistency/lcm-lora-sdv1-5", adapter_name="lcm")
pipe.load_lora_weights("guoyww/animatediff-motion-lora-zoom-in", weight_name="diffusion_pytorch_model.safetensors", adapter_name="motion-lora")

pipe.set_adapters(["lcm", "motion-lora"], adapter_weights=[0.55, 1.2])

prompt = "best quality, masterpiece, 1girl, looking at viewer, blurry background, upper body, contemporary, dress"
generator = torch.manual_seed(0)
frames = pipe(
    prompt=prompt,
    num_inference_steps=5,
    guidance_scale=1.25,
    cross_attention_kwargs={"scale": 1},
    num_frames=24,
    generator=generator
).frames[0]
export_to_gif(frames, "animation.gif")

< > 在 GitHub 上更新

←PAG Shap-E→