自定義模型

一些微調技術，如提示調優，是針對語言模型的。這意味著在 🤗 PEFT 中，假定使用的是 🤗 Transformers 模型。然而，其他微調技術——如 LoRA——並不侷限於特定的模型型別。

在本指南中，我們將看到如何將 LoRA 應用於多層感知機、來自 timm 庫的計算機視覺模型或新的 🤗 Transformers 架構。

多層感知機

假設我們想要用 LoRA 微調一個多層感知機。下面是定義：

from torch import nn


class MLP(nn.Module):
    def __init__(self, num_units_hidden=2000):
        super().__init__()
        self.seq = nn.Sequential(
            nn.Linear(20, num_units_hidden),
            nn.ReLU(),
            nn.Linear(num_units_hidden, num_units_hidden),
            nn.ReLU(),
            nn.Linear(num_units_hidden, 2),
            nn.LogSoftmax(dim=-1),
        )

    def forward(self, X):
        return self.seq(X)

這是一個簡單的多層感知機，包含一個輸入層、一個隱藏層和一個輸出層。

在這個玩具示例中，我們選擇了一個非常大的隱藏單元數量來突出 PEFT 帶來的效率提升，但這些提升與更現實的示例是一致的。

在這個模型中有幾個線性層可以用 LoRA 進行調優。當使用常見的 🤗 Transformers 模型時，PEFT 會知道要對哪些層應用 LoRA，但在這種情況下，由我們使用者來選擇層。要確定要調優的層名稱：

print([(n, type(m)) for n, m in MLP().named_modules()])

這應該會打印出：

[('', __main__.MLP),
 ('seq', torch.nn.modules.container.Sequential),
 ('seq.0', torch.nn.modules.linear.Linear),
 ('seq.1', torch.nn.modules.activation.ReLU),
 ('seq.2', torch.nn.modules.linear.Linear),
 ('seq.3', torch.nn.modules.activation.ReLU),
 ('seq.4', torch.nn.modules.linear.Linear),
 ('seq.5', torch.nn.modules.activation.LogSoftmax)]

假設我們想對輸入層和隱藏層應用 LoRA，它們是 'seq.0' 和 'seq.2'。此外，假設我們想在不使用 LoRA 的情況下更新輸出層，即 'seq.4'。相應的配置將是：

from peft import LoraConfig

config = LoraConfig(
    target_modules=["seq.0", "seq.2"],
    modules_to_save=["seq.4"],
)

有了這個配置，我們就可以建立我們的 PEFT 模型並檢查訓練引數的比例：

from peft import get_peft_model

model = MLP()
peft_model = get_peft_model(model, config)
peft_model.print_trainable_parameters()
# prints trainable params: 56,164 || all params: 4,100,164 || trainable%: 1.369798866581922

最後，我們可以使用任何我們喜歡的訓練框架，或者編寫我們自己的擬合迴圈來訓練 peft_model。

有關完整示例，請檢視此筆記本。

timm 模型

timm 庫包含了大量預訓練的計算機視覺模型。這些模型也可以使用 PEFT 進行微調。讓我們看看這在實踐中是如何工作的。

首先，確保在 Python 環境中安裝了 timm：

python -m pip install -U timm

接下來，我們載入一個用於影像分類任務的 timm 模型：

import timm

num_classes = ...
model_id = "timm/poolformer_m36.sail_in1k"
model = timm.create_model(model_id, pretrained=True, num_classes=num_classes)

同樣，我們需要決定在哪些層上應用 LoRA。由於 LoRA 支援 2D 卷積層，並且這些層是該模型的主要構建塊，我們應該在 2D 卷積層上應用 LoRA。為了識別這些層的名稱，讓我們檢視所有層的名稱：

print([(n, type(m)) for n, m in model.named_modules()])

這將列印一個非常長的列表，我們只顯示前幾個：

[('', timm.models.metaformer.MetaFormer),
 ('stem', timm.models.metaformer.Stem),
 ('stem.conv', torch.nn.modules.conv.Conv2d),
 ('stem.norm', torch.nn.modules.linear.Identity),
 ('stages', torch.nn.modules.container.Sequential),
 ('stages.0', timm.models.metaformer.MetaFormerStage),
 ('stages.0.downsample', torch.nn.modules.linear.Identity),
 ('stages.0.blocks', torch.nn.modules.container.Sequential),
 ('stages.0.blocks.0', timm.models.metaformer.MetaFormerBlock),
 ('stages.0.blocks.0.norm1', timm.layers.norm.GroupNorm1),
 ('stages.0.blocks.0.token_mixer', timm.models.metaformer.Pooling),
 ('stages.0.blocks.0.token_mixer.pool', torch.nn.modules.pooling.AvgPool2d),
 ('stages.0.blocks.0.drop_path1', torch.nn.modules.linear.Identity),
 ('stages.0.blocks.0.layer_scale1', timm.models.metaformer.Scale),
 ('stages.0.blocks.0.res_scale1', torch.nn.modules.linear.Identity),
 ('stages.0.blocks.0.norm2', timm.layers.norm.GroupNorm1),
 ('stages.0.blocks.0.mlp', timm.layers.mlp.Mlp),
 ('stages.0.blocks.0.mlp.fc1', torch.nn.modules.conv.Conv2d),
 ('stages.0.blocks.0.mlp.act', torch.nn.modules.activation.GELU),
 ('stages.0.blocks.0.mlp.drop1', torch.nn.modules.dropout.Dropout),
 ('stages.0.blocks.0.mlp.norm', torch.nn.modules.linear.Identity),
 ('stages.0.blocks.0.mlp.fc2', torch.nn.modules.conv.Conv2d),
 ('stages.0.blocks.0.mlp.drop2', torch.nn.modules.dropout.Dropout),
 ('stages.0.blocks.0.drop_path2', torch.nn.modules.linear.Identity),
 ('stages.0.blocks.0.layer_scale2', timm.models.metaformer.Scale),
 ('stages.0.blocks.0.res_scale2', torch.nn.modules.linear.Identity),
 ('stages.0.blocks.1', timm.models.metaformer.MetaFormerBlock),
 ('stages.0.blocks.1.norm1', timm.layers.norm.GroupNorm1),
 ('stages.0.blocks.1.token_mixer', timm.models.metaformer.Pooling),
 ('stages.0.blocks.1.token_mixer.pool', torch.nn.modules.pooling.AvgPool2d),
 ...
 ('head.global_pool.flatten', torch.nn.modules.linear.Identity),
 ('head.norm', timm.layers.norm.LayerNorm2d),
 ('head.flatten', torch.nn.modules.flatten.Flatten),
 ('head.drop', torch.nn.modules.linear.Identity),
 ('head.fc', torch.nn.modules.linear.Linear)]
 ]

仔細觀察後，我們發現 2D 卷積層的名稱類似於 "stages.0.blocks.0.mlp.fc1" 和 "stages.0.blocks.0.mlp.fc2"。我們如何專門匹配這些層名呢？你可以編寫正則表示式來匹配層名。對於我們的情況，正則表示式 r".*\.mlp\.fc\d" 應該可以完成任務。

此外，與第一個示例一樣，我們應該確保輸出層，在這裡是分類頭，也得到更新。檢視上面列印列表的末尾，我們可以看到它的名稱是 'head.fc'。考慮到這一點，這是我們的 LoRA 配置：

config = LoraConfig(target_modules=r".*\.mlp\.fc\d", modules_to_save=["head.fc"])

然後我們只需將我們的基礎模型和配置傳遞給 get_peft_model 來建立 PEFT 模型：

peft_model = get_peft_model(model, config)
peft_model.print_trainable_parameters()
# prints trainable params: 1,064,454 || all params: 56,467,974 || trainable%: 1.88505789139876

這表明我們只需要訓練不到 2% 的引數，這是一個巨大的效率提升。

有關完整示例，請檢視此筆記本。

新的 transformers 架構

當新的流行 transformers 架構釋出時，我們會盡力將其快速新增到 PEFT 中。如果你遇到一個開箱即用不支援的 transformers 模型，別擔心，如果配置設定正確，它很可能仍然可以工作。具體來說，你必須識別應該被適配的層，並在初始化相應的配置類（例如 LoraConfig）時正確設定它們。以下是一些有助於此的提示。

作為第一步，檢視現有模型以獲取靈感是一個好主意。你可以在 PEFT 倉庫的 constants.py 檔案中找到它們。通常，你會找到一個使用相同名稱的類似架構。例如，如果新模型架構是“mistral”模型的變體，並且你想應用 LoRA，你可以看到 TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING 中“mistral”的條目包含 ["q_proj", "v_proj"]。這告訴你對於“mistral”模型，LoRA 的 target_modules 應該是 ["q_proj", "v_proj"]。

from peft import LoraConfig, get_peft_model

my_mistral_model = ...
config = LoraConfig(
    target_modules=["q_proj", "v_proj"],
    ...,  # other LoRA arguments
)
peft_model = get_peft_model(my_mistral_model, config)

如果這沒有幫助，請使用 named_modules 方法檢查你的模型架構中現有的模組，並嘗試識別注意力層，特別是鍵、查詢和值層。這些層通常會有名為 c_attn、query、q_proj 等。鍵層並不總是被適配，理想情況下，你應該檢查包含它是否能帶來更好的效能。

此外，線性層是常見的適配目標（例如，在 QLoRA 論文中，作者建議也適配它們）。它們的名稱通常會包含字串 fc 或 dense。

如果你想向 PEFT 新增一個新模型，請在 constants.py 中建立一個條目，並在倉庫上發起一個拉取請求。別忘了同時更新 README 檔案。

驗證引數和層

你可以通過幾種方式來驗證你是否已正確地將 PEFT 方法應用於你的模型。

使用 print_trainable_parameters() 方法檢查可訓練引數的比例。如果這個數字低於或高於預期，請透過列印模型來檢查模型的 repr。這會顯示模型中所有層型別的名稱。確保只有預期的目標層被介面卡層替換。例如，如果將 LoRA 應用於 nn.Linear 層，那麼你應該只看到 lora.Linear 層被使用。

peft_model.print_trainable_parameters()

另一種檢視已適配層的方法是使用 targeted_module_names 屬性來列出每個被適配模組的名稱。

print(peft_model.targeted_module_names)

不支援的模組型別

像 LoRA 這樣的方法只有在目標模組被 PEFT 支援時才有效。例如，可以將 LoRA 應用於 nn.Linear 和 nn.Conv2d 層，但不能應用於 nn.LSTM。如果你發現想要應用 PEFT 的層類不受支援，你可以：

定義一個自定義對映，以在 LoRA 中動態分派自定義模組
提交一個 issue 並請求該功能，維護者將會實現它，或者如果對此模組型別的需求足夠高，他們會指導你如何自己實現

實驗性支援 LoRA 中自定義模組的動態分派

此功能是實驗性的，可能會根據社群的反饋而改變。如果有顯著的需求，我們將引入一個公開且穩定的 API。

PEFT 為 LoRA 的自定義模組型別提供了一個實驗性 API。假設你有一個用於 LSTMs 的 LoRA 實現。通常情況下，你無法告訴 PEFT 使用它，即使它理論上可以與 PEFT 一起工作。然而，透過自定義層的動態分派，這是可能的。

實驗性 API 目前如下所示：

class MyLoraLSTMLayer:
    ...

base_model = ...  # load the base model that uses LSTMs

# add the LSTM layer names to target_modules
config = LoraConfig(..., target_modules=["lstm"])
# define a mapping from base layer type to LoRA layer type
custom_module_mapping = {nn.LSTM: MyLoraLSTMLayer}
# register the new mapping
config._register_custom_module(custom_module_mapping)
# after registration, create the PEFT model
peft_model = get_peft_model(base_model, config)
# do training

當你呼叫 get_peft_model() 時，你會看到一個警告，因為 PEFT 無法識別目標模組型別。在這種情況下，你可以忽略這個警告。

透過提供自定義對映，PEFT 首先檢查基礎模型的層與自定義對映，如果匹配，則分派到自定義 LoRA 層型別。如果沒有匹配，PEFT 會檢查內建的 LoRA 層型別以尋找匹配項。

因此，此功能也可用於覆蓋現有的分派邏輯，例如，如果你想為 nn.Linear 使用自己的 LoRA 層而不是 PEFT 提供的層。

建立自定義 LoRA 模組時，請遵循與現有 LoRA 模組相同的規則。需要考慮的一些重要約束：

自定義模組應繼承自 nn.Module 和 peft.tuners.lora.layer.LoraLayer。
自定義模組的 __init__ 方法應具有位置引數 base_layer 和 adapter_name。之後，你可以自由使用或忽略額外的 **kwargs。
可學習引數應儲存在 nn.ModuleDict 或 nn.ParameterDict 中，其中鍵對應於特定介面卡的名稱（請記住，一個模型可以同時有多個介面卡）。
這些可學習引數屬性的名稱應以 "lora_" 開頭，例如 self.lora_new_param = ...。
一些方法是可選的，例如，只有在你想要支援權重合並時才需要實現 merge 和 unmerge。

目前，儲存模型時不會保留有關自定義模組的資訊。載入模型時，你必須再次註冊自定義模組。

# saving works as always and includes the parameters of the custom modules
peft_model.save_pretrained(<model-path>)

# loading the model later:
base_model = ...
# load the LoRA config that you saved earlier
config = LoraConfig.from_pretrained(<model-path>)
# register the custom module again, the same way as the first time
custom_module_mapping = {nn.LSTM: MyLoraLSTMLayer}
config._register_custom_module(custom_module_mapping)
# pass the config instance to from_pretrained:
peft_model = PeftModel.from_pretrained(model, tmp_path / "lora-custom-module", config=config)

如果你使用此功能並覺得有用，或者遇到問題，請透過在 GitHub 上建立 issue 或討論來告知我們。這使我們能夠評估對此功能的需求，並在需求足夠高時新增公共 API。

< > 在 GitHub 上更新