AWS Trainium & Inferentia 文件
Stable Diffusion XL
並獲得增強的文件體驗
開始使用
Stable Diffusion XL
本教程的筆記本版本在此處.
概述
Stable Diffusion XL (SDXL) 是用於文字到影像的潛在擴散模型。與之前的 Stable Diffusion 模型版本相比,它透過更大的 UNet 提高了生成影像的質量。
🤗 Optimum
擴充套件了 Diffusers
以支援在第二代 Neuron 裝置(支援 Trainium 和 Inferentia 2)上進行推理。它旨在繼承 Diffusers 在 Neuron 上的易用性。
匯出到 Neuron
要部署 SDXL 模型,我們將首先編譯模型。我們支援匯出管道中的以下元件以提高速度
- 文字編碼器
- 第二個文字編碼器
- U-Net(比 Stable Diffusion 管道中的 UNet 大三倍)
- VAE 編碼器
- VAE 解碼器
您可以透過 CLI 或 NeuronStableDiffusionXLPipeline
類編譯並匯出 Stable Diffusion XL 檢查點。
選項 1: CLI
以下是使用 Optimum
CLI 匯出 SDXL 元件的示例
optimum-cli export neuron --model stabilityai/stable-diffusion-xl-base-1.0 \
--batch_size 1 \
--height 1024 `# height in pixels of generated image, eg. 768, 1024` \
--width 1024 `# width in pixels of generated image, eg. 768, 1024` \
--num_images_per_prompt 1 `# number of images to generate per prompt, defaults to 1` \
--auto_cast matmul `# cast only matrix multiplication operations` \
--auto_cast_type bf16 `# cast operations from FP32 to BF16` \
sd_neuron_xl/
我們建議使用 inf2.8xlarge
或更大的例項進行模型編譯。您還可以使用 Optimum CLI 在僅 CPU 例項(需要約 35 GB 記憶體)上編譯模型,然後將預編譯的模型執行在 inf2.xlarge
上以降低開銷。在這種情況下,請不要忘記透過新增 --disable-validation
引數來停用推理驗證。
選項 2: Python API
以下是使用 NeuronStableDiffusionXLPipeline
匯出穩定擴散元件的示例
>>> from optimum.neuron import NeuronStableDiffusionXLPipeline
>>> model_id = "stabilityai/stable-diffusion-xl-base-1.0"
>>> compiler_args = {"auto_cast": "matmul", "auto_cast_type": "bf16"}
>>> input_shapes = {"batch_size": 1, "height": 1024, "width": 1024}
>>> stable_diffusion_xl = NeuronStableDiffusionXLPipeline.from_pretrained(model_id, export=True, **compiler_args, **input_shapes)
# Save locally or upload to the HuggingFace Hub
>>> save_directory = "sd_neuron_xl/"
>>> stable_diffusion_xl.save_pretrained(save_directory)
>>> stable_diffusion_xl.push_to_hub(
... save_directory, repository_id="my-neuron-repo"
... )
文字到影像
使用預編譯的 SDXL 模型,現在在 Neuron 上使用文字提示生成影像
>>> from optimum.neuron import NeuronStableDiffusionXLPipeline
>>> stable_diffusion_xl = NeuronStableDiffusionXLPipeline.from_pretrained("sd_neuron_xl/")
>>> prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
>>> image = stable_diffusion_xl(prompt).images[0]

影像到影像
使用 NeuronStableDiffusionXLImg2ImgPipeline
,您可以傳入初始影像和文字提示以調節生成的影像
from optimum.neuron import NeuronStableDiffusionXLImg2ImgPipeline
from diffusers.utils import load_image
prompt = "a dog running, lake, moat"
url = "https://huggingface.co/datasets/optimum/documentation-images/resolve/main/intel/openvino/sd_xl/castle_friedrich.png"
init_image = load_image(url).convert("RGB")
pipe = NeuronStableDiffusionXLImg2ImgPipeline.from_pretrained("sd_neuron_xl/")
image = pipe(prompt=prompt, image=init_image).images[0]
影像 | 提示 | 輸出 | |
---|---|---|---|
![]() | 一隻狗在奔跑,湖泊,護城河 | ![]() |
影像修復
使用 NeuronStableDiffusionXLInpaintPipeline
,傳入原始影像和您想在原始影像中替換的遮罩。然後用提示中描述的內容替換遮罩區域。
from optimum.neuron import NeuronStableDiffusionXLInpaintPipeline
from diffusers.utils import load_image
img_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl-text2img.png"
mask_url = (
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl-inpaint-mask.png"
)
init_image = load_image(img_url).convert("RGB")
mask_image = load_image(mask_url).convert("RGB")
prompt = "A deep sea diver floating"
pipe = NeuronStableDiffusionXLInpaintPipeline.from_pretrained("sd_neuron_xl/")
image = pipe(prompt=prompt, image=init_image, mask_image=mask_image, strength=0.85, guidance_scale=12.5).images[0]
影像 | 遮罩影像 | 提示 | 輸出 |
---|---|---|---|
![]() | ![]() | 一名深海潛水員漂浮著 | ![]() |
最佳化影像質量
SDXL 包含一個 refiner 模型,用於對基礎模型生成的低噪聲階段影像進行去噪。有兩種方法可以使用 refiner:
- 同時使用基礎模型和 refiner 模型來生成最佳化後的影像。
- 使用基礎模型生成影像,然後使用 refiner 模型為影像新增更多細節。
基礎模型 + Refiner 模型
from optimum.neuron import NeuronStableDiffusionXLPipeline, NeuronStableDiffusionXLImg2ImgPipeline
prompt = "A majestic lion jumping from a big stone at night"
base = NeuronStableDiffusionXLPipeline.from_pretrained("sd_neuron_xl/")
image = base(
prompt=prompt,
num_inference_steps=40,
denoising_end=0.8,
output_type="latent",
).images[0]
del base # To avoid neuron device OOM
refiner = NeuronStableDiffusionXLImg2ImgPipeline.from_pretrained("sd_neuron_xl_refiner/")
image = refiner(
prompt=prompt,
num_inference_steps=40,
denoising_start=0.8,
image=image,
).images[0]

從基礎模型到 Refiner 模型
from optimum.neuron import NeuronStableDiffusionXLPipeline, NeuronStableDiffusionXLImg2ImgPipeline
prompt = "A majestic lion jumping from a big stone at night"
base = NeuronStableDiffusionXLPipeline.from_pretrained("sd_neuron_xl/")
image = base(prompt=prompt, output_type="latent").images[0]
del base # To avoid neuron device OOM
refiner = NeuronStableDiffusionXLImg2ImgPipeline.from_pretrained("sd_neuron_xl_refiner/")
image = refiner(prompt=prompt, image=image[None, :]).images[0]
基礎影像 | 最佳化後的影像 |
---|---|
![]() | ![]() |
為避免 Neuron 裝置記憶體不足,建議在執行 refiner 之前完成所有基礎推理並釋放裝置記憶體。
NeuronStableDiffusionXLPipeline
class optimum.neuron.NeuronStableDiffusionXLPipeline
< 原始碼 >( config: dict[str, typing.Any] configs: dict[str, 'PretrainedConfig'] neuron_configs: dict[str, 'NeuronDefaultConfig'] data_parallel_mode: typing.Literal['none', 'unet', 'transformer', 'all'] scheduler: diffusers.schedulers.scheduling_utils.SchedulerMixin | None vae_decoder: torch.jit._script.ScriptModule | NeuronModelVaeDecoder text_encoder: torch.jit._script.ScriptModule | NeuronModelTextEncoder | None = None text_encoder_2: torch.jit._script.ScriptModule | NeuronModelTextEncoder | None = None unet: torch.jit._script.ScriptModule | NeuronModelUnet | None = None transformer: torch.jit._script.ScriptModule | NeuronModelTransformer | None = None vae_encoder: torch.jit._script.ScriptModule | NeuronModelVaeEncoder | None = None image_encoder: torch.jit._script.ScriptModule | None = None safety_checker: torch.jit._script.ScriptModule | None = None tokenizer: transformers.models.clip.tokenization_clip.CLIPTokenizer | transformers.models.t5.tokenization_t5.T5Tokenizer | None = None tokenizer_2: transformers.models.clip.tokenization_clip.CLIPTokenizer | None = None feature_extractor: transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor | None = None controlnet: torch.jit._script.ScriptModule | list[torch.jit._script.ScriptModule]| NeuronControlNetModel | NeuronMultiControlNetModel | None = None requires_aesthetics_score: bool = False force_zeros_for_empty_prompt: bool = True add_watermarker: bool | None = None model_save_dir: str | pathlib.Path | tempfile.TemporaryDirectory | None = None model_and_config_save_paths: dict[str, tuple[str, pathlib.Path]] | None = None )
NeuronStableDiffusionXLImg2ImgPipeline
class optimum.neuron.NeuronStableDiffusionXLImg2ImgPipeline
< 原始碼 >( config: dict[str, typing.Any] configs: dict[str, 'PretrainedConfig'] neuron_configs: dict[str, 'NeuronDefaultConfig'] data_parallel_mode: typing.Literal['none', 'unet', 'transformer', 'all'] scheduler: diffusers.schedulers.scheduling_utils.SchedulerMixin | None vae_decoder: torch.jit._script.ScriptModule | NeuronModelVaeDecoder text_encoder: torch.jit._script.ScriptModule | NeuronModelTextEncoder | None = None text_encoder_2: torch.jit._script.ScriptModule | NeuronModelTextEncoder | None = None unet: torch.jit._script.ScriptModule | NeuronModelUnet | None = None transformer: torch.jit._script.ScriptModule | NeuronModelTransformer | None = None vae_encoder: torch.jit._script.ScriptModule | NeuronModelVaeEncoder | None = None image_encoder: torch.jit._script.ScriptModule | None = None safety_checker: torch.jit._script.ScriptModule | None = None tokenizer: transformers.models.clip.tokenization_clip.CLIPTokenizer | transformers.models.t5.tokenization_t5.T5Tokenizer | None = None tokenizer_2: transformers.models.clip.tokenization_clip.CLIPTokenizer | None = None feature_extractor: transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor | None = None controlnet: torch.jit._script.ScriptModule | list[torch.jit._script.ScriptModule]| NeuronControlNetModel | NeuronMultiControlNetModel | None = None requires_aesthetics_score: bool = False force_zeros_for_empty_prompt: bool = True add_watermarker: bool | None = None model_save_dir: str | pathlib.Path | tempfile.TemporaryDirectory | None = None model_and_config_save_paths: dict[str, tuple[str, pathlib.Path]] | None = None )
NeuronStableDiffusionXLInpaintPipeline
class optimum.neuron.NeuronStableDiffusionXLInpaintPipeline
< 原始碼 >( config: dict[str, typing.Any] configs: dict[str, 'PretrainedConfig'] neuron_configs: dict[str, 'NeuronDefaultConfig'] data_parallel_mode: typing.Literal['none', 'unet', 'transformer', 'all'] scheduler: diffusers.schedulers.scheduling_utils.SchedulerMixin | None vae_decoder: torch.jit._script.ScriptModule | NeuronModelVaeDecoder text_encoder: torch.jit._script.ScriptModule | NeuronModelTextEncoder | None = None text_encoder_2: torch.jit._script.ScriptModule | NeuronModelTextEncoder | None = None unet: torch.jit._script.ScriptModule | NeuronModelUnet | None = None transformer: torch.jit._script.ScriptModule | NeuronModelTransformer | None = None vae_encoder: torch.jit._script.ScriptModule | NeuronModelVaeEncoder | None = None image_encoder: torch.jit._script.ScriptModule | None = None safety_checker: torch.jit._script.ScriptModule | None = None tokenizer: transformers.models.clip.tokenization_clip.CLIPTokenizer | transformers.models.t5.tokenization_t5.T5Tokenizer | None = None tokenizer_2: transformers.models.clip.tokenization_clip.CLIPTokenizer | None = None feature_extractor: transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor | None = None controlnet: torch.jit._script.ScriptModule | list[torch.jit._script.ScriptModule]| NeuronControlNetModel | NeuronMultiControlNetModel | None = None requires_aesthetics_score: bool = False force_zeros_for_empty_prompt: bool = True add_watermarker: bool | None = None model_save_dir: str | pathlib.Path | tempfile.TemporaryDirectory | None = None model_and_config_save_paths: dict[str, tuple[str, pathlib.Path]] | None = None )
您希望我們在 🤗Optimum-neuron
中支援其他擴散功能嗎?請向 Optimum-neuron
Github 倉庫 提交問題或在 HuggingFace 社群論壇 上與我們討論,謝謝 🤗!