介面卡熱插拔

熱插拔介面卡的想法如下：我們已經可以同時載入多個介面卡，例如兩個 LoRA。但有時，我們希望載入一個 LoRA，然後將其權重原地替換為另一個介面卡的 LoRA 權重。現在使用 hotswap_adapter 函式就可以實現這一點。

一般來說，這應該比刪除一個介面卡再載入另一個介面卡到其位置要快，後者是實現相同最終結果但沒有使用熱插拔的方式。熱插拔的另一個優點是，如果 PEFT 模型已經使用 torch.compile 進行了編譯，它可以防止重新編譯。這可以節省大量時間。

不使用 torch.compile 的示例

import torch
from transformers import AutoModelForCausalLM
from peft import PeftModel
from peft.utils.hotswap import hotswap_adapter

model_id = ...
inputs = ...
device = ...
model = AutoModelForCausalLM.from_pretrained(model_id).to(device)

# load lora 0
model = PeftModel.from_pretrained(model, <path-adapter-0>)
with torch.inference_mode():
    output_adapter_0 = model(inputs)

# replace the "default" lora adapter with the new one
hotswap_adapter(model, <path-adapter-1>, adapter_name="default", torch_device=device)
with torch.inference_mode():
    output_adapter_1 = model(inputs).logits

使用 torch.compile 的示例

import torch
from transformers import AutoModelForCausalLM
from peft import PeftModel
from peft.utils.hotswap import hotswap_adapter, prepare_model_for_compiled_hotswap

model_id = ...
inputs = ...
device = ...
max_rank = ...  # maximum rank among all LoRA adapters that will be used
model = AutoModelForCausalLM.from_pretrained(model_id).to(device)

# load lora 0
model = PeftModel.from_pretrained(model, <path-adapter-0>)
# Prepare the model to allow hotswapping even if ranks/scalings of 2nd adapter differ.
# You can skip this step if all ranks and scalings are identical.
prepare_model_for_compiled_hotswap(model, target_rank=max_rank)
model = torch.compile(model)
with torch.inference_mode():
    output_adapter_0 = model(inputs)

# replace the "default" lora adapter with the new one
hotswap_adapter(model, <path-adapter-1>, adapter_name="default", torch_device=device)
with torch.inference_mode():
    output_adapter_1 = model(inputs).logits

注意事項

熱插拔適用於 transformers 模型和 diffusers 模型。但是，存在一些注意事項：

目前，僅正確支援 LoRA。
它僅適用於相同的 PEFT 方法，因此不能交換 LoRA 和 LoHa。
正在換入的介面卡必須以與前一個介面卡相同的層或其子集為目標。它不能以新的層為目標。因此，如果可能，請從以最多層為目標的介面卡開始。

peft.utils.hotswap.hotswap_adapter

< 原始檔 >

( model model_name_or_path adapter_name torch_device = None **kwargs )

引數

model (~PeftModel) — 已載入介面卡的 PEFT 模型。
model_name_or_path (str) — 要從中載入新介面卡的模型的名稱或路徑。
adapter_name (str) — 要交換的介面卡的名稱，例如 "default"。交換後名稱將保持不變。
torch_device — (str, 可選, 預設為 None): 載入新介面卡的裝置。
**kwargs (可選) — 用於載入配置和權重的附加關鍵字引數。

用新的介面卡資料替換舊的介面卡資料，其餘部分保持不變。

目前，僅支援 LoRA。

當您想要用新的介面卡替換已載入的介面卡時，此函式非常有用。介面卡名稱將保持不變，但權重和其他引數將被換出。

如果介面卡不相容，例如目標層不同或 alpha 值不同，將引發錯誤。

示例

>>> import torch
>>> from transformers import AutoModelForCausalLM
>>> from peft import PeftModel
>>> from peft.utils.hotswap import hotswap_adapter

>>> model_id = ...
>>> inputs = ...
>>> device = ...
>>> model = AutoModelForCausalLM.from_pretrained(model_id).to(device)

>>> # load lora 0
>>> model = PeftModel.from_pretrained(model, "path-adapter-0")
>>> model = torch.compile(model)  # optionally compile the model
>>> with torch.inference_mode():
...     output_adapter_0 = model(inputs)

>>> # replace the "default" lora adapter with the new one
>>> hotswap_adapter(model, "path-adapter-1", adapter_name="default", torch_device=device)
>>> with torch.inference_mode():
...     output_adapter_1 = model(inputs).logits

peft.utils.hotswap.hotswap_adapter_from_state_dict

< 原始檔 >

( model: torch.nn.Module state_dict: dict[str, torch.Tensor] adapter_name: str config: LoraConfig parameter_prefix: str = 'lora_' )

引數

model (nn.Module) — 已載入介面卡的模型。
state_dict (dict[str, torch.Tensor]) — 新介面卡的狀態字典，需要相容（以相同的模組為目標等）。
adapter_name (str) — 應該被熱插拔的介面卡的名稱，例如 "default"。交換後名稱將保持不變。
config (LoraConfig) — LoRA 介面卡的配置。這用於確定介面卡的縮放和秩。
parameter_prefix (str, 可選, 預設為 "lora_") — 用於在狀態字典中識別介面卡鍵的字首。對於 LoRA，這將是 "lora_" (預設值)。

引發

RuntimeError

RuntimeError — 如果新舊介面卡不相容，將引發 RuntimeError。

用 state_dict 中的權重替換模型中的介面卡權重。

目前，僅支援 LoRA。

這是一個低階函式，假設已檢查介面卡的相容性，並且 state_dict 已正確對映以與 PEFT 一起使用。對於為您執行此工作的高階函式，請改用 hotswap_adapter。

< > 在 GitHub 上更新