開源 AI 食譜文件

使用 PEFT 進行 Prompt Tuning。

開源 AI 食譜

加入 Hugging Face 社群

並獲得增強的文件體驗

在模型、資料集和 Spaces 上進行協作

透過加速推理獲得更快的示例

切換文件主題

開始使用

使用 PEFT 進行 Prompt Tuning。

作者：Pere Martra

在本 notebook 中，我們將介紹如何使用 PEFT 庫對預訓練模型應用 Prompt Tuning。

有關與 PEFT 相容的模型的完整列表，請參閱其文件。

可使用 PEFT 訓練的模型示例包括 Bloom、Llama、GPT-J、GPT-2、BERT 等。Hugging Face 正在努力將更多模型新增到該庫中。

Prompt Tuning 簡介。

這是一種針對模型的增量式微調（Additive Fine-Tuning）技術。這意味著我們 **不會修改原始模型的任何權重**。您可能會想，那我們如何進行微調呢？嗯，我們將訓練新增到模型中的額外層。這就是為什麼它被稱為增量式技術。

考慮到這是一種增量式技術，並且它的名字是 Prompt-Tuning，很明顯我們要新增和訓練的層與 prompt 相關。

Prompt_Tuning_Diagram

我們透過讓模型利用其獲得的知識來增強 prompt 的一部分，從而建立一種超級 prompt。然而，這部分 prompt 無法翻譯成自然語言。**就好像我們已經掌握了用嵌入向量（embeddings）表達自己並生成高效 prompt 的能力。**

在每個訓練週期中，唯一可以修改以最小化損失函式的權重是那些整合到 prompt 中的權重。

這項技術的主要結果是，需要訓練的引數數量非常少。然而，我們遇到了第二個可能更重要的結果，即 **由於我們不修改預訓練模型的權重，它不會改變其行為或忘記任何先前學到的資訊**。

訓練更快、更經濟。此外，我們可以訓練各種模型，在推理時，我們只需要載入一個基礎模型以及新的、更小的訓練模型，因為原始模型的權重沒有被改變。

我們在 notebook 中將要做什麼？

我們將使用兩個不同的資料集，基於同一個來自 Bloom 家族的預訓練模型，來訓練兩個不同的模型。一個模型將使用一個 prompt 資料集進行訓練，另一個模型將使用一個勵志句子資料集。我們將比較訓練前後兩個模型對同一個問題的回答結果。

此外，我們還將探討如何在記憶體中僅用一份基礎模型副本載入這兩個模型。

載入 PEFT 庫

這個庫包含了 Hugging Face 對各種微調技術的實現，包括 Prompt Tuning。

!pip install -q peft==0.8.2

!pip install -q datasets==2.14.5

從 transformers 庫中，我們匯入例項化模型和 tokenizer 所需的類。

from transformers import AutoModelForCausalLM, AutoTokenizer

載入模型和 tokenizers。

Bloom 是可用於使用 PEFT 庫和 Prompt Tuning 進行訓練的最小、最智慧的模型之一。您可以選擇 Bloom 家族中的任何模型，我鼓勵您至少嘗試其中兩個，以觀察差異。

我選擇最小的一個，以最小化訓練時間並避免在 Colab 中出現記憶體問題。

model_name = "bigscience/bloomz-560m"
# model_name="bigscience/bloom-1b1"
NUM_VIRTUAL_TOKENS = 4
NUM_EPOCHS = 6

tokenizer = AutoTokenizer.from_pretrained(model_name)
foundational_model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)

使用預訓練的 Bloom 模型進行推理

如果您想獲得更多樣化和原創的生成結果，請在下面的 model.generate 中取消註釋引數：temperature、top_p 和 do_sample。

使用預設配置，模型的響應在不同調用之間保持一致。

# this function returns the outputs from the model received, and inputs.
def get_outputs(model, inputs, max_new_tokens=100):
    outputs = model.generate(
        input_ids=inputs["input_ids"],
        attention_mask=inputs["attention_mask"],
        max_new_tokens=max_new_tokens,
        # temperature=0.2,
        # top_p=0.95,
        # do_sample=True,
        repetition_penalty=1.5,  # Avoid repetition.
        early_stopping=True,  # The model can stop before reach the max_length
        eos_token_id=tokenizer.eos_token_id,
    )
    return outputs

由於我們希望有兩個不同的訓練模型，我將建立兩個不同的 prompt。

第一個模型將使用包含 prompt 的資料集進行訓練，第二個模型將使用包含勵志句子（motivational sentences）的資料集進行訓練。

第一個模型將收到 prompt “我希望你扮演一個激勵教練的角色。（I want you to act as a motivational coach.）”，第二個模型將收到“有兩件美好的事情你應該關心：（There are two nice things that should matter to you:）”

但首先，我將收集一些模型在未經微調情況下的結果。

>>> input_prompt = tokenizer("I want you to act as a motivational coach. ", return_tensors="pt")
>>> foundational_outputs_prompt = get_outputs(foundational_model, input_prompt, max_new_tokens=50)

>>> print(tokenizer.batch_decode(foundational_outputs_prompt, skip_special_tokens=True))

["I want you to act as a motivational coach.  Don't be afraid of being challenged."]

>>> input_sentences = tokenizer("There are two nice things that should matter to you:", return_tensors="pt")
>>> foundational_outputs_sentence = get_outputs(foundational_model, input_sentences, max_new_tokens=50)

>>> print(tokenizer.batch_decode(foundational_outputs_sentence, skip_special_tokens=True))

['There are two nice things that should matter to you: the price and quality of your product.']

兩個回答都或多或少是正確的。任何 Bloom 模型都經過預訓練，可以準確且合理地生成句子。讓我們看看經過訓練後，響應是相同還是更準確地生成。

準備資料集

使用的資料集是

import os

# os.environ["TOKENIZERS_PARALLELISM"] = "false"

from datasets import load_dataset

dataset_prompt = "fka/awesome-chatgpt-prompts"

# Create the Dataset to create prompts.
data_prompt = load_dataset(dataset_prompt)
data_prompt = data_prompt.map(lambda samples: tokenizer(samples["prompt"]), batched=True)
train_sample_prompt = data_prompt["train"].select(range(50))

display(train_sample_prompt)

>>> print(train_sample_prompt[:1])

&#123;'act': ['Linux Terminal'], 'prompt': ['I want you to act as a linux terminal. I will type commands and you will reply with what the terminal should show. I want you to only reply with the terminal output inside one unique code block, and nothing else. do not write explanations. do not type commands unless I instruct you to do so. when i need to tell you something in english, i will do so by putting text inside curly brackets &#123;like this}. my first command is pwd'], 'input_ids': [[44, 4026, 1152, 427, 1769, 661, 267, 104105, 28434, 17, 473, 2152, 4105, 49123, 530, 1152, 2152, 57502, 1002, 3595, 368, 28434, 3403, 6460, 17, 473, 4026, 1152, 427, 3804, 57502, 1002, 368, 28434, 10014, 14652, 2592, 19826, 4400, 10973, 15, 530, 16915, 4384, 17, 727, 1130, 11602, 184637, 17, 727, 1130, 4105, 49123, 35262, 473, 32247, 1152, 427, 727, 1427, 17, 3262, 707, 3423, 427, 13485, 1152, 7747, 361, 170205, 15, 707, 2152, 727, 1427, 1331, 55385, 5484, 14652, 6291, 999, 117805, 731, 29726, 1119, 96, 17, 2670, 3968, 9361, 632, 269, 42512]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]}

dataset_sentences = load_dataset("Abirate/english_quotes")

data_sentences = dataset_sentences.map(lambda samples: tokenizer(samples["quote"]), batched=True)
train_sample_sentences = data_sentences["train"].select(range(25))
train_sample_sentences = train_sample_sentences.remove_columns(["author", "tags"])

display(train_sample_sentences)

微調。

PEFT 配置

API 文件：https://huggingface.co/docs/peft/main/en/package_reference/tuners#peft.PromptTuningConfig

我們可以為兩個待訓練的模型使用相同的配置。

from peft import get_peft_model, PromptTuningConfig, TaskType, PromptTuningInit

generation_config = PromptTuningConfig(
    task_type=TaskType.CAUSAL_LM,  # This type indicates the model will generate text.
    prompt_tuning_init=PromptTuningInit.RANDOM,  # The added virtual tokens are initializad with random numbers
    num_virtual_tokens=NUM_VIRTUAL_TOKENS,  # Number of virtual tokens to be added and trained.
    tokenizer_name_or_path=model_name,  # The pre-trained model.
)

建立兩個 Prompt Tuning 模型。

我們將使用相同的預訓練模型和相同的配置來建立兩個完全相同的 Prompt Tuning 模型。

>>> peft_model_prompt = get_peft_model(foundational_model, generation_config)
>>> print(peft_model_prompt.print_trainable_parameters())

trainable params: 4,096 || all params: 559,218,688 || trainable%: 0.0007324504863471229
None

>>> peft_model_sentences = get_peft_model(foundational_model, generation_config)
>>> print(peft_model_sentences.print_trainable_parameters())

trainable params: 4,096 || all params: 559,218,688 || trainable%: 0.0007324504863471229
None

太棒了：您看到可訓練引數的減少了嗎？我們將只訓練可用引數的 0.001%。

現在我們將建立訓練引數，我們將在兩次訓練中使用相同的配置。

from transformers import TrainingArguments


def create_training_arguments(path, learning_rate=0.0035, epochs=6):
    training_args = TrainingArguments(
        output_dir=path,  # Where the model predictions and checkpoints will be written
        use_cpu=True,  # This is necessary for CPU clusters.
        auto_find_batch_size=True,  # Find a suitable batch size that will fit into memory automatically
        learning_rate=learning_rate,  # Higher learning rate than full Fine-Tuning
        num_train_epochs=epochs,
    )
    return training_args

import os

working_dir = "./"

# Is best to store the models in separate folders.
# Create the name of the directories where to store the models.
output_directory_prompt = os.path.join(working_dir, "peft_outputs_prompt")
output_directory_sentences = os.path.join(working_dir, "peft_outputs_sentences")

# Just creating the directoris if not exist.
if not os.path.exists(working_dir):
    os.mkdir(working_dir)
if not os.path.exists(output_directory_prompt):
    os.mkdir(output_directory_prompt)
if not os.path.exists(output_directory_sentences):
    os.mkdir(output_directory_sentences)

在建立 TrainingArguments 時，我們需要指明包含模型的目錄。

training_args_prompt = create_training_arguments(output_directory_prompt, 0.003, NUM_EPOCHS)
training_args_sentences = create_training_arguments(output_directory_sentences, 0.003, NUM_EPOCHS)

訓練

我們將為每個待訓練的模型建立一個 trainer 物件。

from transformers import Trainer, DataCollatorForLanguageModeling


def create_trainer(model, training_args, train_dataset):
    trainer = Trainer(
        model=model,  # We pass in the PEFT version of the foundation model, bloomz-560M
        args=training_args,  # The args for the training.
        train_dataset=train_dataset,  # The dataset used to tyrain the model.
        data_collator=DataCollatorForLanguageModeling(
            tokenizer, mlm=False
        ),  # mlm=False indicates not to use masked language modeling
    )
    return trainer

# Training first model.
trainer_prompt = create_trainer(peft_model_prompt, training_args_prompt, train_sample_prompt)
trainer_prompt.train()

# Training second model.
trainer_sentences = create_trainer(peft_model_sentences, training_args_sentences, train_sample_sentences)
trainer_sentences.train()

在不到 10 分鐘的時間內（在 M1 Pro 上的 CPU 時間），我們以同一個基礎模型為基礎，訓練了 2 個具有不同任務的不同模型。

儲存模型

我們將儲存模型。只要我們在記憶體中擁有建立它們時所基於的預訓練模型，這些模型就可以隨時使用。

trainer_prompt.model.save_pretrained(output_directory_prompt)
trainer_sentences.model.save_pretrained(output_directory_sentences)

推理

您可以從之前儲存的路徑載入模型，並要求模型根據我們之前的輸入生成文字！

from peft import PeftModel

loaded_model_prompt = PeftModel.from_pretrained(
    foundational_model,
    output_directory_prompt,
    # device_map='auto',
    is_trainable=False,
)

>>> loaded_model_prompt_outputs = get_outputs(loaded_model_prompt, input_prompt)
>>> print(tokenizer.batch_decode(loaded_model_prompt_outputs, skip_special_tokens=True))

['I want you to act as a motivational coach.  You will be helping students learn how they can improve their performance in the classroom and at school.']

如果我們比較兩個回答，會發現有些變化。

預訓練模型： 我希望你扮演一個激勵教練的角色。不要害怕受到挑戰。（I want you to act as a motivational coach. Don’t be afraid of being challenged.）
微調模型： 我希望你扮演一個激勵教練的角色。如果你對你的...感到焦慮，你可以使用這個方法。（I want you to act as a motivational coach. You can use this method if you’re feeling anxious about your.）

我們必須記住，我們只對模型進行了幾分鐘的訓練，但這已經足夠讓我們獲得更接近我們期望的響應。

loaded_model_prompt.load_adapter(output_directory_sentences, adapter_name="quotes")
loaded_model_prompt.set_adapter("quotes")

>>> loaded_model_sentences_outputs = get_outputs(loaded_model_prompt, input_sentences)
>>> print(tokenizer.batch_decode(loaded_model_sentences_outputs, skip_special_tokens=True))

['There are two nice things that should matter to you: the weather and your health.']

對於第二個模型，我們得到了類似的結果。

預訓練模型： 有兩件美好的事情你應該關心：你產品的價格和質量。（There are two nice things that should matter to you: the price and quality of your product.）
微調模型： 有兩件美好的事情你應該關心：天氣和你的健康。（There are two nice things that should matter to you: the weather and your health.）

結論

Prompt Tuning 是一項了不起的技術，可以為我們節省數小時的訓練時間和大量資金。在本 notebook 中，我們僅用幾分鐘就訓練了兩個模型，並且可以同時將這兩個模型載入到記憶體中，為不同的客戶提供服務。

如果您想嘗試不同的組合和模型，本 notebook 已準備好使用 Bloom 家族中的另一個模型。

您可以在第三個單元格中更改訓練的 epoch 數、虛擬 token 的數量以及模型。但是，還有許多配置可以更改。如果您正在尋找一個好的練習，可以嘗試將虛擬 token 的隨機初始化替換為固定值。

每次訓練時，微調模型的響應可能會有所不同。我貼上了我一次訓練的結果，但實際結果可能會有所不同。

< > 在 GitHub 上更新

←在單 GPU 上基於自定義程式碼微調程式碼大語言模型（Code LLM）使用 Hugging Face 和 Milvus 實現 RAG→