文字到SQL

在本教程中，我們將學習如何使用 smolagents 實現一個利用 SQL 的代理。

讓我們從黃金問題開始：為什麼不保持簡單，使用標準的文字到 SQL 管道？

標準的文字到 SQL 管道是脆弱的，因為生成的 SQL 查詢可能不正確。更糟糕的是，查詢可能不正確，但不會引發錯誤，而是給出一些不正確/無用的輸出而不會發出警報。

👉 相反，代理系統能夠批判性地檢查輸出並決定是否需要更改查詢，從而大大提高其效能。

讓我們構建這個代理！💪

執行以下行以安裝所需的依賴項

!pip install smolagents python-dotenv sqlalchemy --upgrade -q

要呼叫推理提供程式，您需要一個有效的令牌作為環境變數 HF_TOKEN。我們使用 python-dotenv 來載入它。

from dotenv import load_dotenv
load_dotenv()

然後，我們設定 SQL 環境

from sqlalchemy import (
    create_engine,
    MetaData,
    Table,
    Column,
    String,
    Integer,
    Float,
    insert,
    inspect,
    text,
)

engine = create_engine("sqlite:///:memory:")
metadata_obj = MetaData()

def insert_rows_into_table(rows, table, engine=engine):
    for row in rows:
        stmt = insert(table).values(**row)
        with engine.begin() as connection:
            connection.execute(stmt)

table_name = "receipts"
receipts = Table(
    table_name,
    metadata_obj,
    Column("receipt_id", Integer, primary_key=True),
    Column("customer_name", String(16), primary_key=True),
    Column("price", Float),
    Column("tip", Float),
)
metadata_obj.create_all(engine)

rows = [
    {"receipt_id": 1, "customer_name": "Alan Payne", "price": 12.06, "tip": 1.20},
    {"receipt_id": 2, "customer_name": "Alex Mason", "price": 23.86, "tip": 0.24},
    {"receipt_id": 3, "customer_name": "Woodrow Wilson", "price": 53.43, "tip": 5.43},
    {"receipt_id": 4, "customer_name": "Margaret James", "price": 21.11, "tip": 1.00},
]
insert_rows_into_table(rows, receipts)

構建我們的代理

現在，讓我們使 SQL 表可以透過工具檢索。

工具的描述屬性將由代理系統嵌入到 LLM 的提示中：它為 LLM 提供了有關如何使用工具的資訊。這就是我們想要描述 SQL 表的地方。

inspector = inspect(engine)
columns_info = [(col["name"], col["type"]) for col in inspector.get_columns("receipts")]

table_description = "Columns:\n" + "\n".join([f"  - {name}: {col_type}" for name, col_type in columns_info])
print(table_description)

Columns:
  - receipt_id: INTEGER
  - customer_name: VARCHAR(16)
  - price: FLOAT
  - tip: FLOAT

現在讓我們構建我們的工具。它需要以下內容：（閱讀工具文件獲取更多詳細資訊）

一個帶有 Args: 部分的文件字串，列出引數。
輸入和輸出上的型別提示。

from smolagents import tool

@tool
def sql_engine(query: str) -> str:
    """
    Allows you to perform SQL queries on the table. Returns a string representation of the result.
    The table is named 'receipts'. Its description is as follows:
        Columns:
        - receipt_id: INTEGER
        - customer_name: VARCHAR(16)
        - price: FLOAT
        - tip: FLOAT

    Args:
        query: The query to perform. This should be correct SQL.
    """
    output = ""
    with engine.connect() as con:
        rows = con.execute(text(query))
        for row in rows:
            output += "\n" + str(row)
    return output

現在讓我們建立一個利用此工具的代理。

我們使用 CodeAgent，它是 smolagents 的主要代理類：一個用程式碼編寫動作並可以根據 ReAct 框架迭代先前輸出的代理。

模型是為代理系統提供動力的 LLM。InferenceClientModel 允許您使用 HF 的推理 API 呼叫 LLM，無論是透過無伺服器還是專用端點，但您也可以使用任何專有 API。

from smolagents import CodeAgent, InferenceClientModel

agent = CodeAgent(
    tools=[sql_engine],
    model=InferenceClientModel(model_id="meta-llama/Llama-3.1-8B-Instruct"),
)
agent.run("Can you give me the name of the client who got the most expensive receipt?")

級別 2：表連線

現在，讓我們增加挑戰！我們希望我們的代理能夠處理多個表之間的連線。

所以讓我們建立第二個表，記錄每個收據 ID 的服務員姓名！

table_name = "waiters"
waiters = Table(
    table_name,
    metadata_obj,
    Column("receipt_id", Integer, primary_key=True),
    Column("waiter_name", String(16), primary_key=True),
)
metadata_obj.create_all(engine)

rows = [
    {"receipt_id": 1, "waiter_name": "Corey Johnson"},
    {"receipt_id": 2, "waiter_name": "Michael Watts"},
    {"receipt_id": 3, "waiter_name": "Michael Watts"},
    {"receipt_id": 4, "waiter_name": "Margaret James"},
]
insert_rows_into_table(rows, waiters)

由於我們更改了表，我們用此表的描述更新了 SQLExecutorTool，以便 LLM 能夠正確利用此表中的資訊。

updated_description = """Allows you to perform SQL queries on the table. Beware that this tool's output is a string representation of the execution output.
It can use the following tables:"""

inspector = inspect(engine)
for table in ["receipts", "waiters"]:
    columns_info = [(col["name"], col["type"]) for col in inspector.get_columns(table)]

    table_description = f"Table '{table}':\n"

    table_description += "Columns:\n" + "\n".join([f"  - {name}: {col_type}" for name, col_type in columns_info])
    updated_description += "\n\n" + table_description

print(updated_description)

由於此請求比上一個請求稍微困難，我們將切換 LLM 引擎以使用更強大的 Qwen/Qwen2.5-Coder-32B-Instruct！

sql_engine.description = updated_description

agent = CodeAgent(
    tools=[sql_engine],
    model=InferenceClientModel(model_id="Qwen/Qwen2.5-Coder-32B-Instruct"),
)

agent.run("Which waiter got more total money from tips?")

它直接工作了！設定出奇地簡單，不是嗎？

這個例子完成了！我們已經涉及了這些概念

構建新工具。
更新工具的描述。
切換到更強大的 LLM 有助於代理推理。

✅ 現在您可以去構建您夢寐以求的文字到 SQL 系統了！✨

< > 在 GitHub 上更新