工具和RAG

apply_chat_template() 方法除了聊天訊息外，支援幾乎所有額外的引數型別——字串、列表、字典。這使得聊天模板可以用於許多用例。

本指南將演示如何將聊天模板與工具和檢索增強生成（RAG）結合使用。

工具

工具是大型語言模型（LLM）可以呼叫以執行特定任務的函式。它是透過即時資訊、計算工具或訪問大型資料庫來擴充套件會話代理功能的一種強大方式。

建立工具時請遵循以下規則。

函式應該有一個描述性的名稱。
函式引數必須在函式頭中包含型別提示（不包含在 `Args` 塊中）。
函式必須具有 Google 風格的文件字串。
函式可以有返回型別和 `Returns` 塊，但這些是可選的，因為大多數工具使用模型會忽略它們。

獲取溫度和風速的示例工具如下所示。

def get_current_temperature(location: str, unit: str) -> float:
    """
    Get the current temperature at a location.
    
    Args:
        location: The location to get the temperature for, in the format "City, Country"
        unit: The unit to return the temperature in. (choices: ["celsius", "fahrenheit"])
    Returns:
        The current temperature at the specified location in the specified units, as a float.
    """
    return 22.  # A real function should probably actually get the temperature!

def get_current_wind_speed(location: str) -> float:
    """
    Get the current wind speed in km/h at a given location.
    
    Args:
        location: The location to get the temperature for, in the format "City, Country"
    Returns:
        The current wind speed at the given location in km/h, as a float.
    """
    return 6.  # A real function should probably actually get the wind speed!

tools = [get_current_temperature, get_current_wind_speed]

載入支援工具使用的模型和分詞器，例如 NousResearch/Hermes-2-Pro-Llama-3-8B，如果你的硬體支援，也可以考慮更大的模型，例如 Command-R 和 Mixtral-8x22B。

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained( "NousResearch/Hermes-2-Pro-Llama-3-8B")
tokenizer = AutoTokenizer.from_pretrained( "NousResearch/Hermes-2-Pro-Llama-3-8B")
model = AutoModelForCausalLM.from_pretrained( "NousResearch/Hermes-2-Pro-Llama-3-8B", torch_dtype=torch.bfloat16, device_map="auto")

建立聊天訊息。

messages = [
  {"role": "system", "content": "You are a bot that responds to weather queries. You should reply with the unit used in the queried location."},
  {"role": "user", "content": "Hey, what's the temperature in Paris right now?"}
]

將 `messages` 和工具列表傳遞給 apply_chat_template()。然後你可以將輸入傳遞給模型進行生成。

inputs = tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt")
inputs = {k: v for k, v in inputs.items()}
outputs = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0][len(inputs["input_ids"][0]):]))

<tool_call>
{"arguments": {"location": "Paris, France", "unit": "celsius"}, "name": "get_current_temperature"}
</tool_call><|im_end|>

聊天模型使用文件字串中的正確引數呼叫了 `get_current_temperature` 工具。它根據巴黎推斷出位置是法國，並且應該使用攝氏度作為溫度單位。

現在將 `get_current_temperature` 函式和這些引數作為 `tool_call` 附加到聊天訊息中。`tool_call` 字典應該提供給 `assistant` 角色，而不是 `system` 或 `user`。

OpenAI API 使用 JSON 字串作為其 `tool_call` 格式。如果在 Transformers 中使用，這可能會導致錯誤或奇怪的模型行為，因為它期望一個字典。

Llama

Mistral/Mixtral

Schema

apply_chat_template() 將函式轉換為 JSON 模式，並將其傳遞給聊天模板。LLM 永遠不會看到函式內部的程式碼。換句話說，LLM 不關心函式在技術上如何工作，它只關心函式 **定義** 和 **引數**。

只要你的函式遵循前面列出的規則，JSON 模式就會在後臺自動生成。但你可以使用get_json_schema來手動轉換模式以獲得更多可見性或除錯。

from transformers.utils import get_json_schema

def multiply(a: float, b: float):
    """
    A function that multiplies two numbers
    
    Args:
        a: The first number to multiply
        b: The second number to multiply
    """
    return a * b

schema = get_json_schema(multiply)
print(schema)

{
  "type": "function", 
  "function": {
    "name": "multiply", 
    "description": "A function that multiplies two numbers", 
    "parameters": {
      "type": "object", 
      "properties": {
        "a": {
          "type": "number", 
          "description": "The first number to multiply"
        }, 
        "b": {
          "type": "number",
          "description": "The second number to multiply"
        }
      }, 
      "required": ["a", "b"]
    }
  }
}

你可以編輯模式，或完全從頭開始編寫一個模式。這為你定義更復雜函式（例如帶有巢狀引數的函式）的精確模式提供了很大的靈活性。

儘量保持函式簽名簡單，引數最少。這些對於模型來說比複雜函式更容易理解和使用，例如帶有巢狀引數的函式。

下面的示例演示了手動編寫模式，然後將其傳遞給 apply_chat_template()。

# A simple function that takes no arguments
current_time = {
  "type": "function", 
  "function": {
    "name": "current_time",
    "description": "Get the current local time as a string.",
    "parameters": {
      'type': 'object',
      'properties': {}
    }
  }
}

# A more complete function that takes two numerical arguments
multiply = {
  'type': 'function',
  'function': {
    'name': 'multiply',
    'description': 'A function that multiplies two numbers', 
    'parameters': {
      'type': 'object', 
      'properties': {
        'a': {
          'type': 'number',
          'description': 'The first number to multiply'
        }, 
        'b': {
          'type': 'number', 'description': 'The second number to multiply'
        }
      }, 
      'required': ['a', 'b']
    }
  }
}

model_input = tokenizer.apply_chat_template(
    messages,
    tools = [current_time, multiply]
)

RAG

檢索增強生成（RAG）模型透過允許模型在返回查詢之前搜尋文件以獲取額外資訊來增強模型的現有知識。對於 RAG 模型，將 `documents` 引數新增到 apply_chat_template()。此 `documents` 引數應為文件列表，每個文件應為帶有 `title` 和 `content` 鍵的單個字典。

RAG 的 `documents` 引數並未得到廣泛支援，許多模型的聊天模板會忽略 `documents`。透過閱讀其模型卡或執行 `print(tokenizer.chat_template)` 以檢視是否存在 `documents` 鍵來驗證模型是否支援 `documents`。Command-R 和 Command-R+ 都支援其 RAG 聊天模板中的 `documents`。

建立一個要傳遞給模型的文件列表。

documents = [
    {
        "title": "The Moon: Our Age-Old Foe", 
        "text": "Man has always dreamed of destroying the moon. In this essay, I shall..."
    },
    {
        "title": "The Sun: Our Age-Old Friend",
        "text": "Although often underappreciated, the sun provides several notable benefits..."
    }
]

在 apply_chat_template() 中設定 `chat_template="rag"` 並生成響應。

from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("CohereForAI/c4ai-command-r-v01-4bit")
model = AutoModelForCausalLM.from_pretrained("CohereForAI/c4ai-command-r-v01-4bit", device_map="auto")
device = model.device # Get the device the model is loaded on

# Define conversation input
conversation = [
    {"role": "user", "content": "What has Man always dreamed of?"}
]

input_ids = tokenizer.apply_chat_template(
    conversation=conversation,
    documents=documents,
    chat_template="rag",
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt").to(device)

# Generate a response 
generated_tokens = model.generate(
    input_ids,
    max_new_tokens=100,
    do_sample=True,
    temperature=0.3,
    )

# Decode and print the generated text along with generation prompt
generated_text = tokenizer.decode(generated_tokens[0])
print(generated_text)

< > 在 GitHub 上更新