AWS Trainium & Inferentia 文件
Stable Diffusion
並獲得增強的文件體驗
開始使用
Stable Diffusion
概述
Stable Diffusion 是一個文字到影像的*潛在擴散*模型,它建立在最初的 Stable Diffusion 的工作基礎上,由 Stability AI 和 LAION 的 Robin Rombach 和 Katherine Crowson 領導。
🤗 Optimum
擴充套件了 Diffusers
以支援在第二代 Neuron 裝置(支援 Trainium 和 Inferentia 2)上進行推理。它旨在繼承 Diffusers 在 Neuron 上的易用性。
匯出到 Neuron
為了部署模型,您需要將它們編譯為針對 AWS Neuron 最佳化的 TorchScript。對於 Stable Diffusion,有四個元件需要匯出為 .neuron
格式以提升效能:
- 文字編碼器
- U-Net
- VAE 編碼器
- VAE 解碼器
您可以透過 CLI 或 NeuronStableDiffusionPipeline
類編譯和匯出 Stable Diffusion Checkpoint。
選項 1: CLI
以下是使用 Optimum
CLI 匯出 Stable Diffusion 元件的示例:
optimum-cli export neuron --model stabilityai/stable-diffusion-2-1-base \
--batch_size 1 \
--height 512 `# height in pixels of generated image, eg. 512, 768` \
--width 512 `# width in pixels of generated image, eg. 512, 768` \
--num_images_per_prompt 1 `# number of images to generate per prompt, defaults to 1` \
--auto_cast matmul `# cast only matrix multiplication operations` \
--auto_cast_type bf16 `# cast operations from FP32 to BF16` \
sd_neuron/
我們建議使用 inf2.8xlarge
或更大的例項進行模型編譯。您也可以在僅 CPU 的例項上使用 Optimum CLI 編譯模型(需要約 35 GB 記憶體),然後將預編譯的模型在 inf2.xlarge
上執行以降低成本。在這種情況下,請不要忘記透過新增 --disable-validation
引數來停用推理驗證。
選項 2: Python API
以下是使用 NeuronStableDiffusionPipeline
匯出 Stable Diffusion 元件的示例:
為了應用 Unet 注意力得分的最佳化計算,請將您的環境變數配置為 export NEURON_FUSE_SOFTMAX=1
。
此外,請不要猶豫調整編譯配置,以在您的用例中找到效能與準確性之間的最佳權衡。預設情況下,我們建議將 FP32 矩陣乘法操作轉換為 BF16,這在效能良好且準確性適度犧牲的情況下提供了很好的效能。請查閱 AWS Neuron 文件 中的指南,以更好地瞭解您的編譯選項。
>>> from optimum.neuron import NeuronStableDiffusionPipeline
>>> model_id = "stable-diffusion-v1-5/stable-diffusion-v1-5"
>>> compiler_args = {"auto_cast": "matmul", "auto_cast_type": "bf16"}
>>> input_shapes = {"batch_size": 1, "height": 512, "width": 512}
>>> stable_diffusion = NeuronStableDiffusionPipeline.from_pretrained(model_id, export=True, **compiler_args, **input_shapes)
# Save locally or upload to the HuggingFace Hub
>>> save_directory = "sd_neuron/"
>>> stable_diffusion.save_pretrained(save_directory)
>>> stable_diffusion.push_to_hub(
... save_directory, repository_id="my-neuron-repo"
... )
文字到影像
NeuronStableDiffusionPipeline
類允許您在 Neuron 裝置上根據文字提示生成影像,類似於 Diffusers
的體驗。
使用預編譯的 Stable Diffusion 模型,現在可以在 Neuron 上根據提示生成影像。
>>> from optimum.neuron import NeuronStableDiffusionPipeline
>>> stable_diffusion = NeuronStableDiffusionPipeline.from_pretrained("sd_neuron/")
>>> prompt = "a photo of an astronaut riding a horse on mars"
>>> image = stable_diffusion(prompt).images[0]

影像到影像
使用 NeuronStableDiffusionImg2ImgPipeline
類,您可以根據文字提示和初始影像生成新影像。
import requests
from PIL import Image
from io import BytesIO
from optimum.neuron import NeuronStableDiffusionImg2ImgPipeline
# compile & save
model_id = "nitrosocke/Ghibli-Diffusion"
input_shapes = {"batch_size": 1, "height": 512, "width": 512}
pipeline = NeuronStableDiffusionImg2ImgPipeline.from_pretrained(model_id, export=True, **input_shapes)
pipeline.save_pretrained("sd_img2img/")
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((512, 512))
prompt = "ghibli style, a fantasy landscape with snowcapped mountains, trees, lake with detailed reflection. sunlight and cloud in the sky, warm colors, 8K"
image = pipeline(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5).images[0]
image.save("fantasy_landscape.png")
影像 | 提示 | 輸出 | |
---|---|---|---|
![]() | 吉卜力風格,雪山、樹木、湖泊(有詳細倒影)的奇幻風景。暖色調,8K | ![]() |
區域性重繪
使用 NeuronStableDiffusionInpaintPipeline
類,您可以透過提供一個蒙版和文字提示來編輯影像的特定部分。
import requests
from PIL import Image
from io import BytesIO
from optimum.neuron import NeuronStableDiffusionInpaintPipeline
model_id = "stable-diffusion-v1-5/stable-diffusion-inpainting"
input_shapes = {"batch_size": 1, "height": 512, "width": 512}
pipeline = NeuronStableDiffusionInpaintPipeline.from_pretrained(model_id, export=True, **input_shapes)
pipeline.save_pretrained("sd_inpaint/")
def download_image(url):
response = requests.get(url)
return Image.open(BytesIO(response.content)).convert("RGB")
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
init_image = download_image(img_url).resize((512, 512))
mask_image = download_image(mask_url).resize((512, 512))
prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image).images[0]
image.save("cat_on_bench.png")
影像 | 蒙版影像 | 提示 | 輸出 |
---|---|---|---|
![]() | ![]() | 一隻黃貓的臉,高解析度,坐在公園長凳上 | ![]() |
NeuronStableDiffusionPipeline
class optimum.neuron.NeuronStableDiffusionPipeline
< source >( config: dict[str, typing.Any] configs: dict[str, 'PretrainedConfig'] neuron_configs: dict[str, 'NeuronDefaultConfig'] data_parallel_mode: typing.Literal['none', 'unet', 'transformer', 'all'] scheduler: diffusers.schedulers.scheduling_utils.SchedulerMixin | None vae_decoder: torch.jit._script.ScriptModule | NeuronModelVaeDecoder text_encoder: torch.jit._script.ScriptModule | NeuronModelTextEncoder | None = None text_encoder_2: torch.jit._script.ScriptModule | NeuronModelTextEncoder | None = None unet: torch.jit._script.ScriptModule | NeuronModelUnet | None = None transformer: torch.jit._script.ScriptModule | NeuronModelTransformer | None = None vae_encoder: torch.jit._script.ScriptModule | NeuronModelVaeEncoder | None = None image_encoder: torch.jit._script.ScriptModule | None = None safety_checker: torch.jit._script.ScriptModule | None = None tokenizer: transformers.models.clip.tokenization_clip.CLIPTokenizer | transformers.models.t5.tokenization_t5.T5Tokenizer | None = None tokenizer_2: transformers.models.clip.tokenization_clip.CLIPTokenizer | None = None feature_extractor: transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor | None = None controlnet: torch.jit._script.ScriptModule | list[torch.jit._script.ScriptModule]| NeuronControlNetModel | NeuronMultiControlNetModel | None = None requires_aesthetics_score: bool = False force_zeros_for_empty_prompt: bool = True add_watermarker: bool | None = None model_save_dir: str | pathlib.Path | tempfile.TemporaryDirectory | None = None model_and_config_save_paths: dict[str, tuple[str, pathlib.Path]] | None = None )
NeuronStableDiffusionImg2ImgPipeline
class optimum.neuron.NeuronStableDiffusionImg2ImgPipeline
< source >( config: dict[str, typing.Any] configs: dict[str, 'PretrainedConfig'] neuron_configs: dict[str, 'NeuronDefaultConfig'] data_parallel_mode: typing.Literal['none', 'unet', 'transformer', 'all'] scheduler: diffusers.schedulers.scheduling_utils.SchedulerMixin | None vae_decoder: torch.jit._script.ScriptModule | NeuronModelVaeDecoder text_encoder: torch.jit._script.ScriptModule | NeuronModelTextEncoder | None = None text_encoder_2: torch.jit._script.ScriptModule | NeuronModelTextEncoder | None = None unet: torch.jit._script.ScriptModule | NeuronModelUnet | None = None transformer: torch.jit._script.ScriptModule | NeuronModelTransformer | None = None vae_encoder: torch.jit._script.ScriptModule | NeuronModelVaeEncoder | None = None image_encoder: torch.jit._script.ScriptModule | None = None safety_checker: torch.jit._script.ScriptModule | None = None tokenizer: transformers.models.clip.tokenization_clip.CLIPTokenizer | transformers.models.t5.tokenization_t5.T5Tokenizer | None = None tokenizer_2: transformers.models.clip.tokenization_clip.CLIPTokenizer | None = None feature_extractor: transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor | None = None controlnet: torch.jit._script.ScriptModule | list[torch.jit._script.ScriptModule]| NeuronControlNetModel | NeuronMultiControlNetModel | None = None requires_aesthetics_score: bool = False force_zeros_for_empty_prompt: bool = True add_watermarker: bool | None = None model_save_dir: str | pathlib.Path | tempfile.TemporaryDirectory | None = None model_and_config_save_paths: dict[str, tuple[str, pathlib.Path]] | None = None )
NeuronStableDiffusionInpaintPipeline
class optimum.neuron.NeuronStableDiffusionInpaintPipeline
< source >( config: dict[str, typing.Any] configs: dict[str, 'PretrainedConfig'] neuron_configs: dict[str, 'NeuronDefaultConfig'] data_parallel_mode: typing.Literal['none', 'unet', 'transformer', 'all'] scheduler: diffusers.schedulers.scheduling_utils.SchedulerMixin | None vae_decoder: torch.jit._script.ScriptModule | NeuronModelVaeDecoder text_encoder: torch.jit._script.ScriptModule | NeuronModelTextEncoder | None = None text_encoder_2: torch.jit._script.ScriptModule | NeuronModelTextEncoder | None = None unet: torch.jit._script.ScriptModule | NeuronModelUnet | None = None transformer: torch.jit._script.ScriptModule | NeuronModelTransformer | None = None vae_encoder: torch.jit._script.ScriptModule | NeuronModelVaeEncoder | None = None image_encoder: torch.jit._script.ScriptModule | None = None safety_checker: torch.jit._script.ScriptModule | None = None tokenizer: transformers.models.clip.tokenization_clip.CLIPTokenizer | transformers.models.t5.tokenization_t5.T5Tokenizer | None = None tokenizer_2: transformers.models.clip.tokenization_clip.CLIPTokenizer | None = None feature_extractor: transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor | None = None controlnet: torch.jit._script.ScriptModule | list[torch.jit._script.ScriptModule]| NeuronControlNetModel | NeuronMultiControlNetModel | None = None requires_aesthetics_score: bool = False force_zeros_for_empty_prompt: bool = True add_watermarker: bool | None = None model_save_dir: str | pathlib.Path | tempfile.TemporaryDirectory | None = None model_and_config_save_paths: dict[str, tuple[str, pathlib.Path]] | None = None )
您希望我們在 🤗Optimum-neuron
中支援其他擴散功能嗎?請在 Optimum-neuron
Github 倉庫 中提交問題或在 HuggingFace 社群論壇 上與我們討論,謝謝 🤗!