模板

聊天流水線指南介紹了TextGenerationPipeline以及與模型對話時聊天提示或聊天模板的概念。此高階流水線的底層是apply_chat_template方法。聊天模板是分詞器的一部分，它指定了如何將對話轉換為預期模型格式的單個可分詞字串。

在下面的示例中，Mistral-7B-Instruct 和 Zephyr-7B 是從同一個基礎模型微調而來的，但它們使用不同的聊天格式進行訓練。如果沒有聊天模板，您必須為每個模型手動編寫格式化程式碼，即使是微小的錯誤也會影響效能。聊天模板提供了一種通用方法，可以格式化任何模型的聊天輸入。

Mistral

Zephyr

本指南將更詳細地探討apply_chat_template和聊天模板。

apply_chat_template

聊天應結構化為帶有role和content鍵的字典列表。role鍵指定說話者（通常是你和系統之間），content鍵包含你的訊息。對於系統，content是對模型在與你聊天時應如何表現和響應的高階描述。

將您的訊息傳遞給apply_chat_template以進行分詞和格式化。您可以設定add_generation_prompt為True以指示訊息的開始。

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta")
model = AutoModelForCausalLM.from_pretrained("HuggingFaceH4/zephyr-7b-beta", device_map="auto", torch_dtype=torch.bfloat16)

messages = [
    {"role": "system", "content": "You are a friendly chatbot who always responds in the style of a pirate",},
    {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
 ]
tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
print(tokenizer.decode(tokenized_chat[0]))

<|system|>
You are a friendly chatbot who always responds in the style of a pirate</s>
<|user|>
How many helicopters can a human eat in one sitting?</s>
<|assistant|>

現在，將分詞後的聊天傳遞給generate()以生成響應。

outputs = model.generate(tokenized_chat, max_new_tokens=128) 
print(tokenizer.decode(outputs[0]))

<|system|>
You are a friendly chatbot who always responds in the style of a pirate</s>
<|user|>
How many helicopters can a human eat in one sitting?</s>
<|assistant|>
Matey, I'm afraid I must inform ye that humans cannot eat helicopters. Helicopters are not food, they are flying machines. Food is meant to be eaten, like a hearty plate o' grog, a savory bowl o' stew, or a delicious loaf o' bread. But helicopters, they be for transportin' and movin' around, not for eatin'. So, I'd say none, me hearties. None at all.

add_generation_prompt

add_generation_prompt引數新增表示響應開始的標記。這確保聊天模型生成系統響應，而不是繼續使用者訊息。

並非所有模型都需要生成提示，有些模型，例如Llama，在系統響應之前沒有任何特殊標記。在這種情況下，add_generation_prompt不起作用。

tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
tokenized_chat

<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>

continue_final_message

continue_final_message引數控制聊天中的最後一條訊息是否應該繼續，而不是開始一條新訊息。它會刪除序列結束標記，以便模型從最後一條訊息繼續生成。

這對於“預填充”模型響應很有用。在下面的示例中，模型生成的文字繼續 JSON 字串，而不是開始一條新訊息。當您知道如何開始其回覆時，這對於提高指令遵循的準確性非常有用。

chat = [
    {"role": "user", "content": "Can you format the answer in JSON?"},
    {"role": "assistant", "content": '{"name": "'},
]

formatted_chat = tokenizer.apply_chat_template(chat, tokenize=True, return_dict=True, continue_final_message=True)
model.generate(**formatted_chat)

您不應同時使用add_generation_prompt和continue_final_message。前者新增新訊息的開始標記，而後者刪除序列結束標記。同時使用它們會返回錯誤。

TextGenerationPipeline預設將add_generation_prompt設定為True以開始新訊息。但是，如果聊天中的最後一條訊息具有“assistant”角色，則假定該訊息是預填充訊息並切換為continue_final_message=True。這是因為大多數模型不支援多個連續的助手訊息。要覆蓋此行為，請明確將continue_final_message傳遞給流水線。

多個模板

模型可能針對不同的用例有幾種不同的模板。例如，模型可能具有用於常規聊天、工具使用和 RAG 的模板。

當存在多個模板時，聊天模板是一個字典。每個鍵對應於模板的名稱。apply_chat_template根據模板名稱處理多個模板。在大多數情況下，它會查詢名為default的模板，如果找不到，則會引發錯誤。

對於工具呼叫模板，如果使用者傳遞了tools引數並且存在tool_use模板，則使用工具呼叫模板而不是default。

要訪問其他名稱的模板，請將模板名稱傳遞給apply_chat_template中的chat_template引數。例如，如果您使用 RAG 模板，則設定chat_template="rag"。

但是，管理多個模板可能會令人困惑，因此我們建議所有用例都使用單個模板。使用 Jinja 語句（例如if tools is defined）和{% macro %}定義將多個程式碼路徑封裝在一個模板中。

模板選擇

設定與模型預訓練時使用的模板格式相匹配的聊天模板格式非常重要，否則效能可能會受到影響。即使您進一步訓練模型，如果聊天標記保持不變，效能也會最好。

但是，如果您從頭開始訓練模型或微調模型以進行聊天，則有更多選項可選擇模板。例如，ChatML是一種流行的格式，足夠靈活，可以處理許多用例。它甚至支援生成提示，但它不新增字串開頭（BOS）或字串結尾（EOS）標記。如果您的模型需要BOS和EOS標記，請設定add_special_tokens=True並確保將它們新增到您的模板中。

{%- for message in messages %}
    {{- '<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n' }}
{%- endfor %}

使用以下邏輯設定模板以支援生成提示。模板用<|im_start|>和<|im_end|>標記包裝每條訊息，並將角色寫入字串。這允許您輕鬆自定義要訓練的角色。

tokenizer.chat_template = "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}"

user、system和assistant角色是聊天模板中的標準角色。我們建議在有意義時使用這些角色，尤其是在您將模型與TextGenerationPipeline一起使用時。

<|im_start|>system
You are a helpful chatbot that will do its best not to say anything so stupid that people tweet about it.<|im_end|>
<|im_start|>user
How are you?<|im_end|>
<|im_start|>assistant
I'm doing great!<|im_end|>

模型訓練

使用聊天模板訓練模型是確保聊天模板與模型訓練所用的標記匹配的好方法。將聊天模板作為預處理步驟應用於您的資料集。設定add_generation_prompt=False，因為用於提示助手響應的附加標記在訓練期間沒有幫助。

下面顯示了使用聊天模板預處理資料集的示例。

from transformers import AutoTokenizer
from datasets import Dataset

tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta")

chat1 = [
    {"role": "user", "content": "Which is bigger, the moon or the sun?"},
    {"role": "assistant", "content": "The sun."}
]
chat2 = [
    {"role": "user", "content": "Which is bigger, a virus or a bacterium?"},
    {"role": "assistant", "content": "A bacterium."}
]

dataset = Dataset.from_dict({"chat": [chat1, chat2]})
dataset = dataset.map(lambda x: {"formatted_chat": tokenizer.apply_chat_template(x["chat"], tokenize=False, add_generation_prompt=False)})
print(dataset['formatted_chat'][0])

<|user|>
Which is bigger, the moon or the sun?</s>
<|assistant|>
The sun.</s>

在此步驟之後，您可以使用formatted_chat列繼續遵循因果語言模型的訓練方案。

一些分詞器會新增特殊的<bos>和<eos>標記。聊天模板應該已經包含所有必要的特殊標記，新增額外的特殊標記通常是不正確或重複的，會損害模型效能。當您使用apply_chat_template(tokenize=False)格式化文字時，請務必同時設定add_special_tokens=False以避免重複它們。

apply_chat_template(messages, tokenize=False, add_special_tokens=False)

如果apply_chat_template(tokenize=True)，則這不是問題。

< > 在 GitHub 上更新