LoRA 方法

一種高效訓練大模型的流行方法是插入（通常在注意力模組中）較小的可訓練矩陣，這些矩陣是微調期間要學習的權重增量矩陣的低秩分解。預訓練模型的原始權重矩陣被凍結，只有較小的矩陣在訓練期間更新。這減少了可訓練引數的數量，從而減少了記憶體使用和訓練時間，而對於大型模型來說，這些成本可能非常高。

有幾種不同的方法可以將權重矩陣表示為低秩分解，但低秩適應（LoRA）是最常見的方法。PEFT 庫支援其他幾種 LoRA 變體，例如低秩哈達瑪積（LoHa）、低秩克羅內克積（LoKr）和自適應低秩適應（AdaLoRA）。你可以在介面卡指南中從概念上了解這些方法的工作原理。如果你有興趣將這些方法應用於其他任務和用例，如語義分割、詞元分類，請檢視我們的notebooks 合集！

此外，PEFT 還支援 X-LoRA (LoRA 專家混合) 方法。

本指南將向您展示如何使用低秩分解方法快速訓練一個影像分類模型，以識別影像中顯示的食物類別。

對訓練影像分類模型的一般流程有一些熟悉會非常有幫助，並能讓您專注於低秩分解方法。如果您是新手，我們建議您先閱讀 Transformers 文件中的影像分類指南。當您準備好後，再回來看看將 PEFT 引入到您的訓練中是多麼容易！

在開始之前，請確保您已安裝所有必要的庫。

pip install -q peft transformers datasets

資料集

在本指南中，您將使用 Food-101 資料集，其中包含 101 個食物類別的影像（可以檢視資料集檢視器以更好地瞭解資料集的樣子）。

使用 load_dataset 函式載入資料集。

from datasets import load_dataset

ds = load_dataset("food101")

每個食物類別都用一個整數標記，為了更容易理解這些整數代表什麼，您將建立一個 label2id 和 id2label 字典，將整數對映到其類別標籤。

labels = ds["train"].features["label"].names
label2id, id2label = dict(), dict()
for i, label in enumerate(labels):
    label2id[label] = i
    id2label[i] = label

id2label[2]
"baklava"

載入一個影像處理器，以正確調整和歸一化訓練和評估影像的畫素值。

from transformers import AutoImageProcessor

image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224-in21k")

您也可以使用影像處理器來準備一些用於資料增強和畫素縮放的轉換函式。

from torchvision.transforms import (
    CenterCrop,
    Compose,
    Normalize,
    RandomHorizontalFlip,
    RandomResizedCrop,
    Resize,
    ToTensor,
)

normalize = Normalize(mean=image_processor.image_mean, std=image_processor.image_std)
train_transforms = Compose(
    [
        RandomResizedCrop(image_processor.size["height"]),
        RandomHorizontalFlip(),
        ToTensor(),
        normalize,
    ]
)

val_transforms = Compose(
    [
        Resize(image_processor.size["height"]),
        CenterCrop(image_processor.size["height"]),
        ToTensor(),
        normalize,
    ]
)

def preprocess_train(example_batch):
    example_batch["pixel_values"] = [train_transforms(image.convert("RGB")) for image in example_batch["image"]]
    return example_batch

def preprocess_val(example_batch):
    example_batch["pixel_values"] = [val_transforms(image.convert("RGB")) for image in example_batch["image"]]
    return example_batch

定義訓練和驗證資料集，並使用 set_transform 函式來即時應用轉換。

train_ds = ds["train"]
val_ds = ds["validation"]

train_ds.set_transform(preprocess_train)
val_ds.set_transform(preprocess_val)

最後，您需要一個數據整理器來建立一批訓練和評估資料，並將標籤轉換為 torch.tensor 物件。

import torch

def collate_fn(examples):
    pixel_values = torch.stack([example["pixel_values"] for example in examples])
    labels = torch.tensor([example["label"] for example in examples])
    return {"pixel_values": pixel_values, "labels": labels}

模型

現在讓我們載入一個預訓練模型作為基礎模型。本指南使用 google/vit-base-patch16-224-in21k 模型，但您可以使用任何您想要的影像分類模型。將 label2id 和 id2label 字典傳遞給模型，使其知道如何將整數標籤對映到其類別標籤，並且如果您要微調一個已經微調過的檢查點，可以任選地傳遞 ignore_mismatched_sizes=True 引數。

from transformers import AutoModelForImageClassification, TrainingArguments, Trainer

model = AutoModelForImageClassification.from_pretrained(
    "google/vit-base-patch16-224-in21k",
    label2id=label2id,
    id2label=id2label,
    ignore_mismatched_sizes=True,
)

PEFT 配置和模型

每種 PEFT 方法都需要一個配置，該配置包含指定如何應用 PEFT 方法的所有引數。配置設定好後，將其與基礎模型一起傳遞給 get_peft_model() 函式，以建立一個可訓練的 PeftModel。

呼叫 print_trainable_parameters() 方法來比較 PeftModel 的引數數量與基礎模型的引數數量！

LoRA

LoHa

LoKr

AdaLoRA

訓練

對於訓練，我們使用 Transformers 的 Trainer 類。Trainer 包含一個 PyTorch 訓練迴圈，當您準備好後，呼叫 train 開始訓練。要自定義訓練過程，請在 TrainingArguments 類中配置訓練超引數。使用類 LoRA 方法，您可以使用更高的批次大小和學習率。

AdaLoRA 有一個 update_and_allocate() 方法，應在每個訓練步驟中呼叫，以更新引數預算和掩碼，否則將不執行適應步驟。這需要編寫一個自定義訓練迴圈或對 Trainer 進行子類化以整合此方法。例如，請看這個自定義訓練迴圈。

from transformers import TrainingArguments, Trainer

account = "stevhliu"
peft_model_id = f"{account}/google/vit-base-patch16-224-in21k-lora"
batch_size = 128

args = TrainingArguments(
    peft_model_id,
    remove_unused_columns=False,
    eval_strategy="epoch",
    save_strategy="epoch",
    learning_rate=5e-3,
    per_device_train_batch_size=batch_size,
    gradient_accumulation_steps=4,
    per_device_eval_batch_size=batch_size,
    fp16=True,
    num_train_epochs=5,
    logging_steps=10,
    load_best_model_at_end=True,
    label_names=["labels"],
)

使用 train 開始訓練。

trainer = Trainer(
    model,
    args,
    train_dataset=train_ds,
    eval_dataset=val_ds,
    processing_class=image_processor,
    data_collator=collate_fn,
)
trainer.train()

分享您的模型

訓練完成後，您可以使用 push_to_hub 方法將您的模型上傳到 Hub。您需要先登入您的 Hugging Face 賬戶，並在提示時輸入您的令牌。

from huggingface_hub import notebook_login

notebook_login()

呼叫 push_to_hub 將您的模型儲存到您的倉庫中。

model.push_to_hub(peft_model_id)

推理

讓我們從 Hub 載入模型，並在食物影像上進行測試。

from peft import PeftConfig, PeftModel
from transformers import AutoImageProcessor
from PIL import Image
import requests

config = PeftConfig.from_pretrained("stevhliu/vit-base-patch16-224-in21k-lora")
model = AutoModelForImageClassification.from_pretrained(
    config.base_model_name_or_path,
    label2id=label2id,
    id2label=id2label,
    ignore_mismatched_sizes=True,
)
model = PeftModel.from_pretrained(model, "stevhliu/vit-base-patch16-224-in21k-lora")

url = "https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/beignets.jpeg"
image = Image.open(requests.get(url, stream=True).raw)
image

將影像轉換為 RGB 並返回底層的 PyTorch 張量。

encoding = image_processor(image.convert("RGB"), return_tensors="pt")

現在執行模型並返回預測的類別！

with torch.no_grad():
    outputs = model(**encoding)
    logits = outputs.logits

predicted_class_idx = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_idx])
"Predicted class: beignets"

< > 在 GitHub 上更新