IP-Adapter

概述

IP-Adapter 是一種影像提示介面卡，可以插入到擴散模型中，從而無需更改底層模型即可啟用影像提示。此外，該介面卡可以與從相同基礎模型微調的其他模型重複使用，並且可以與其他介面卡（如 ControlNet）結合使用。IP-Adapter 背後的關鍵思想是解耦交叉注意力機制，它為影像特徵添加了一個單獨的交叉注意力層，而不是將相同的交叉注意力層用於文字和影像特徵。這使得模型能夠學習更多特定於影像的特徵。

🤗 Optimum 擴充套件了 Diffusers 以支援在第二代 Neuron 裝置（支援 Trainium 和 Inferentia 2）上進行推理。它旨在繼承 Diffusers 在 Neuron 上的易用性。

匯出到 Neuron

要部署模型，您需要將它們編譯為針對 AWS Neuron 最佳化的 TorchScript。

您可以透過 CLI 或 NeuronStableDiffusionPipeline 類編譯和匯出 Stable Diffusion 檢查點。

選項 1：CLI

以下是使用 Optimum CLI 匯出 Stable Diffusion 元件的示例

optimum-cli export neuron --model stable-diffusion-v1-5/stable-diffusion-v1-5 
    --ip_adapter_id h94/IP-Adapter 
    --ip_adapter_subfolder models
    --ip_adapter_weight_name ip-adapter-full-face_sd15.bin
    --ip_adapter_scale 0.5
    --batch_size 1 --height 512 --width 512 --num_images_per_prompt 1
    --auto_cast matmul --auto_cast_type bf16 ip_adapter_neuron/

我們建議使用 inf2.8xlarge 或更大的例項進行模型編譯。您也可以使用 Optimum CLI 在僅限 CPU 的例項上編譯模型（需要約 35 GB 記憶體），然後透過 inf2.xlarge 執行預編譯的模型以降低費用。在這種情況下，請不要忘記透過新增 --disable-validation 引數來停用推理驗證。

選項 2：Python API

以下是使用 NeuronStableDiffusionPipeline 匯出 Stable Diffusion 元件的示例

from optimum.neuron import NeuronStableDiffusionPipeline

model_id = "stable-diffusion-v1-5/stable-diffusion-v1-5"
compiler_args = {"auto_cast": "matmul", "auto_cast_type": "bf16"}
input_shapes = {"batch_size": 1, "height": 512, "width": 512}

stable_diffusion = NeuronStableDiffusionPipeline.from_pretrained(
    model_id, 
    export=True, 
    ip_adapter_id="h94/IP-Adapter",
    ip_adapter_subfolder="models",
    ip_adapter_weight_name="ip-adapter-full-face_sd15.bin",
    ip_adapter_scale=0.5,
    **compiler_args, 
    **input_shapes,
)

# Save locally or upload to the HuggingFace Hub
save_directory = "ip_adapter_neuron/"
stable_diffusion.save_pretrained(save_directory)

文字到影像

以 ip_adapter_image 作為輸入

from optimum.neuron import NeuronStableDiffusionPipeline

model_id = "stable-diffusion-v1-5/stable-diffusion-v1-5"
compiler_args = {"auto_cast": "matmul", "auto_cast_type": "bf16"}
input_shapes = {"batch_size": 1, "height": 512, "width": 512}

stable_diffusion = NeuronStableDiffusionPipeline.from_pretrained(
    model_id, 
    export=True, 
    ip_adapter_id="h94/IP-Adapter",
    ip_adapter_subfolder="models",
    ip_adapter_weight_name="ip-adapter-full-face_sd15.bin",
    ip_adapter_scale=0.5,
    **compiler_args, 
    **input_shapes,
)

# Save locally or upload to the HuggingFace Hub
save_directory = "ip_adapter_neuron/"
stable_diffusion.save_pretrained(save_directory)

以 ip_adapter_image_embeds 作為輸入（首先對影像進行編碼）

image_embeds = stable_diffusion.prepare_ip_adapter_image_embeds(
    ip_adapter_image=image,
    ip_adapter_image_embeds=None,
    device=None,
    num_images_per_prompt=1,
    do_classifier_free_guidance=True,
)
torch.save(image_embeds, "image_embeds.ipadpt")

image_embeds = torch.load("image_embeds.ipadpt")
images = stable_diffusion(
    prompt="a polar bear sitting in a chair drinking a milkshake",
    ip_adapter_image_embeds=image_embeds,
    negative_prompt="deformed, ugly, wrong proportion, low res, bad anatomy, worst quality, low quality",
    num_inference_steps=100,
    generator=generator,
).images[0]

image.save("polar_bear.png")

您希望我們在 🤗Optimum-neuron 中支援其他擴散功能嗎？請向 Optimum-neuron Github 倉庫提交問題或在 HuggingFace 社群論壇上與我們討論，祝您愉快 🤗！

AWS Trainium 和 Inferentia

IP-Adapter

概述

匯出到 Neuron

選項 1：CLI

選項 2：Python API

文字到影像