Diffusers

加入 Hugging Face 社群

並獲得增強的文件體驗

在模型、資料集和 Spaces 上進行協作

透過加速推理獲得更快的示例

切換文件主題

開始使用

DeepCache

DeepCache 利用 U-Net 架構的特性，透過有策略地快取和重用高層特徵，同時高效地更新低層特徵，從而加速 StableDiffusionPipeline 和 StableDiffusionXLPipeline。

首先安裝 DeepCache

pip install DeepCache

然後載入並啟用 DeepCacheSDHelper

  import torch
  from diffusers import StableDiffusionPipeline
  pipe = StableDiffusionPipeline.from_pretrained('stable-diffusion-v1-5/stable-diffusion-v1-5', torch_dtype=torch.float16).to("cuda")

+ from DeepCache import DeepCacheSDHelper
+ helper = DeepCacheSDHelper(pipe=pipe)
+ helper.set_params(
+     cache_interval=3,
+     cache_branch_id=0,
+ )
+ helper.enable()

  image = pipe("a photo of an astronaut on a moon").images[0]

set_params 方法接受兩個引數：cache_interval 和 cache_branch_id。cache_interval 表示特徵快取的頻率，即每次快取操作之間的步數。cache_branch_id 標識負責執行快取過程的網路分支（按從最淺層到最深層的順序排列）。選擇較小的 cache_branch_id 或較大的 cache_interval 可以加快推理速度，但會犧牲一定的影像質量（關於這兩個超引數的消融實驗可以在論文中找到）。設定好這些引數後，使用 enable 或 disable 方法來啟用或停用 DeepCacheSDHelper。

你可以在 WandB 報告中找到更多生成的樣本（原始 pipeline vs DeepCache）以及相應的推理延遲。提示詞隨機選自 MS-COCO 2017 資料集。

基準測試

我們在 NVIDIA RTX A5000 上測試了 DeepCache 在 50 個推理步驟下對 Stable Diffusion v2.1 的加速效果，測試中使用了不同的解析度、批大小、快取間隔 (I) 和快取分支 (B) 配置。

解析度	批次大小	原始	DeepCache(I=3, B=0)	DeepCache(I=5, B=0)	DeepCache(I=5, B=1)
512	8	15.96	6.88 (2.32x)	5.03 (3.18x)	7.27 (2.20x)
	4	8.39	3.60 (2.33x)	2.62 (3.21x)	3.75 (2.24x)
	1	2.61	1.12 (2.33x)	0.81 (3.24x)	1.11 (2.35x)
768	8	43.58	18.99 (2.29x)	13.96 (3.12x)	21.27 (2.05x)
	4	22.24	9.67 (2.30x)	7.10 (3.13x)	10.74 (2.07x)
	1	6.33	2.72 (2.33x)	1.97 (3.21x)	2.98 (2.12x)
1024	8	101.95	45.57 (2.24x)	33.72 (3.02x)	53.00 (1.92x)
	4	49.25	21.86 (2.25x)	16.19 (3.04x)	25.78 (1.91x)
	1	13.83	6.07 (2.28x)	4.43 (3.12x)	7.15 (1.93x)

< > 在 GitHub 上更新

←令牌合併 TGATE→