在 Google TPU 上微調 Gemma

本教程將介紹如何在 Google Cloud 的 TPU 上微調像 Google Gemma 這樣的開源大型語言模型 (LLM)。在我們的示例中，我們將利用 Hugging Face Optimum TPU、🤗 Transformers 和資料集。

Google 的 TPU

Google Cloud TPU 是專門設計的 AI 加速器，經過最佳化，可用於訓練和推理大型 AI 模型。它們非常適用於各種用例，例如聊天機器人、程式碼生成、媒體內容生成、合成語音、視覺服務、推薦引擎、個性化模型等。

使用 TPU 的優勢包括：

旨在以經濟高效的方式擴充套件，適用於各種 AI 工作負載，包括訓練、微調和推理。
針對 TensorFlow、PyTorch 和 JAX 進行了最佳化，並提供各種外形尺寸，包括邊緣裝置、工作站和基於雲的基礎設施。
TPU 在 Google Cloud 中可用，並已與 Vertex AI 和 Google Kubernetes Engine (GKE) 整合。

環境設定

對於本示例，單個主機的 v5litepod8 TPU 就足夠了。要使用 Pytorch XLA 設定 TPU 環境，此 Google Cloud 指南展示瞭如何操作。

我們可以使用 ssh 或 gcloud 命令登入遠端 TPU。啟用埠 8888 的埠轉發，例如：

gcloud compute tpus tpu-vm ssh $TPU_NAME \
        --zone=$ZONE \
        -- -L 8888:localhost:8888

一旦我們獲得了對 TPU VM 的訪問許可權，我們就可以克隆包含相關筆記本的 optimum-tpu 儲存庫。然後我們可以安裝本教程中使用的一些包並啟動筆記本

git clone https://github.com/huggingface/optimum-tpu.git
# Install Optimum tpu
pip install -e . -f https://storage.googleapis.com/libtpu-releases/index.html
# Install TRL and PEFT for training (see later how they are used)
pip install trl peft
# Install Jupyter notebook
pip install -U jupyterlab notebook
# Optionally, install widgets extensions for better rendering
pip install ipywidgets widgetsnbextension
# Change directory and launch Jupyter notebook
cd optimum-tpu/examples/language-modeling
jupyter notebook --port 8888

然後我們應該看到熟悉的 Jupyter 輸出，顯示可以從瀏覽器訪問的地址

http://:8888/tree?token=3ceb24619d0a2f99acf5fba41c51b475b1ddce7cadb2a133

由於我們將使用受限的 gemma 模型，因此我們需要使用 Hugging Face 令牌登入

!huggingface-cli login --token YOUR_HF_TOKEN

啟用 FSDPv2

為了微調 LLM，可能需要將模型分片到 TPU 上，以防止記憶體問題並提高調優效能。完全分片資料並行 (Fully Sharded Data Parallel) 是一種在 PyTorch 上實現的演算法，它允許封裝模組以進行分發。在 TPU 上使用 PyTorch/XLA 時，FSDPv2 是一個實用程式，它使用 SPMD (Single Program Multiple Data) 重新表達了著名的 FSDP 演算法。在 optimum-tpu 中，可以使用專用輔助函式來使用 FSPDv2。要啟用它，可以使用專用函式，該函式應在執行開始時呼叫。

from optimum.tpu import fsdp_v2


fsdp_v2.use_fsdp_v2()

載入並準備資料集

我們將使用 Dolly，這是一個開源的指令遵循記錄資料集，涵蓋了 InstructGPT 論文中列出的類別，包括頭腦風暴、分類、封閉式問答、生成、資訊提取、開放式問答和摘要。

我們將從 Hub 載入資料集

from datasets import load_dataset


dataset = load_dataset("databricks/databricks-dolly-15k", split="train")

我們可以看一個樣本

dataset[321]

我們得到一個類似這樣的結果

{
    "instruction": "When was the 8088 processor released?",
    "context": "The 8086 (also called iAPX 86) is a 16-bit microprocessor chip designed by Intel between early 1976 and June 8, 1978, when it was released. The Intel 8088, released July 1, 1979, is a slightly modified chip with an external 8-bit data bus (allowing the use of cheaper and fewer supporting ICs),[note 1] and is notable as the processor used in the original IBM PC design.",
    "response": "The Intel 8088 processor was released July 1, 1979.",
    "category": "information_extraction",
}

我們將定義一個格式化函式，它結合了 instruction、context 和 response 欄位，並將它們標記化為一個完整的提示。我們將使用與我們打算使用的模型相容的分詞器。

from transformers import AutoTokenizer


model_id = "google/gemma-2b"

tokenizer = AutoTokenizer.from_pretrained(model_id)


def preprocess_function(sample):
    instruction = f"### Instruction\n{sample['instruction']}"
    context = f"### Context\n{sample['context']}" if len(sample["context"]) > 0 else None
    response = f"### Answer\n{sample['response']}"
    # join all the parts together
    prompt = "\n\n".join([i for i in [instruction, context, response] if i is not None])
    prompt += tokenizer.eos_token
    sample["prompt"] = prompt
    return sample

現在可以使用此函式來對映資料集，其中原始列現在可以刪除

data = dataset.map(preprocess_function, remove_columns=list(dataset.features))

準備模型進行調優

我們現在可以載入將用於調優的模型。資料集現在已準備好用於微調。

import torch
from transformers import AutoModelForCausalLM


model = AutoModelForCausalLM.from_pretrained(model_id, use_cache=False, torch_dtype=torch.bfloat16)

我們現在將使用引數高效微調 PEFT 和低秩適應 (LoRA) 來高效地在準備好的資料集上微調模型。在 LoraConfig 例項中，我們將定義將進行微調的 nn.Linear 操作。

from peft import LoraConfig


# Set up PEFT LoRA for fine-tuning.
lora_config = LoraConfig(
    r=8,
    target_modules=["k_proj", "v_proj"],
    task_type="CAUSAL_LM",
)

optimum-tpu 的專用函式將幫助我們獲取引數，以便我們可以建立訓練器例項。

from transformers import TrainingArguments
from trl import SFTTrainer


# Set up the FSDP arguments
fsdp_training_args = fsdp_v2.get_fsdp_training_args(model)

# Set up the trainer
trainer = SFTTrainer(
    model=model,
    train_dataset=data,
    args=TrainingArguments(
        per_device_train_batch_size=64,
        num_train_epochs=32,
        max_steps=-1,
        output_dir="./output",
        optim="adafactor",
        logging_steps=1,
        dataloader_drop_last=True,  # Required for FSDPv2.
        **fsdp_training_args,
    ),
    peft_config=lora_config,
    dataset_text_field="prompt",
    max_seq_length=1024,
    packing=True,
)

一切準備就緒後，調優模型就像呼叫一個函式一樣簡單！

trainer.train()

在此之後，我們已成功在 Dolly 資料集上微調了模型。