開源 AI 食譜文件
用於文字到 SQL 的自動糾錯代理
並獲得增強的文件體驗
開始使用
帶自動糾錯功能的Text-to-SQL代理
在本教程中,我們將瞭解如何實現一個利用SQL並帶自動糾錯功能的代理,使用smolagents
。
與標準Text-to-SQL管道相比,它的優勢是什麼?
標準的Text-to-SQL管道很脆弱,因為生成的SQL查詢可能是錯誤的。更糟糕的是,查詢可能錯誤但不會引發錯誤,反而會給出一些不正確/無用的輸出而不會發出警報。
👉 相反,代理系統能夠批判性地檢查輸出並決定是否需要更改查詢,從而大大提高其效能。
讓我們構建這個代理!💪
設定SQL表
from sqlalchemy import (
create_engine,
MetaData,
Table,
Column,
String,
Integer,
Float,
insert,
inspect,
text,
)
engine = create_engine("sqlite:///:memory:")
metadata_obj = MetaData()
# create city SQL table
table_name = "receipts"
receipts = Table(
table_name,
metadata_obj,
Column("receipt_id", Integer, primary_key=True),
Column("customer_name", String(16), primary_key=True),
Column("price", Float),
Column("tip", Float),
)
metadata_obj.create_all(engine)
rows = [
{"receipt_id": 1, "customer_name": "Alan Payne", "price": 12.06, "tip": 1.20},
{"receipt_id": 2, "customer_name": "Alex Mason", "price": 23.86, "tip": 0.24},
{"receipt_id": 3, "customer_name": "Woodrow Wilson", "price": 53.43, "tip": 5.43},
{"receipt_id": 4, "customer_name": "Margaret James", "price": 21.11, "tip": 1.00},
]
for row in rows:
stmt = insert(receipts).values(**row)
with engine.begin() as connection:
cursor = connection.execute(stmt)
讓我們用一個基本查詢來檢查我們的系統是否有效
>>> with engine.connect() as con:
... rows = con.execute(text("""SELECT * from receipts"""))
... for row in rows:
... print(row)
(1, 'Alan Payne', 12.06, 1.2) (2, 'Alex Mason', 23.86, 0.24) (3, 'Woodrow Wilson', 53.43, 5.43) (4, 'Margaret James', 21.11, 1.0)
構建我們的代理
現在,讓我們透過工具使SQL表可檢索。
我們的sql_engine
工具需要以下內容:(詳情請閱讀文件)
- 一個帶有
Args:
部分的文件字串。此文件字串將被解析為工具的description
屬性,該屬性將用作驅動代理的LLM的說明手冊,因此提供它很重要! - 輸入和輸出的型別提示。
from smolagents import tool
@tool
def sql_engine(query: str) -> str:
"""
Allows you to perform SQL queries on the table. Returns a string representation of the result.
The table is named 'receipts'. Its description is as follows:
Columns:
- receipt_id: INTEGER
- customer_name: VARCHAR(16)
- price: FLOAT
- tip: FLOAT
Args:
query: The query to perform. This should be correct SQL.
"""
output = ""
with engine.connect() as con:
rows = con.execute(text(query))
for row in rows:
output += "\n" + str(row)
return output
現在讓我們建立一個利用此工具的代理。
我們使用CodeAgent
,它是transformers.agents
的主要代理類:一個在程式碼中編寫操作並可以根據ReAct框架迭代先前輸出的代理。
llm_engine
是驅動代理系統的LLM。InferenceClientModel
允許您使用Hugging Face的推理API呼叫LLM,無論是透過無伺服器還是專用端點,但您也可以使用任何專有API:請檢視本教程,瞭解如何進行適配。
from smolagents import CodeAgent, InferenceClientModel
agent = CodeAgent(
tools=[sql_engine],
model=InferenceClientModel("meta-llama/Meta-Llama-3-8B-Instruct"),
)
agent.run("Can you give me the name of the client who got the most expensive receipt?")
增加難度:表連線
現在讓我們增加一點挑戰!我們希望我們的代理能夠處理多個表之間的連線。
所以,我們再建立一個表來記錄每個receipt_id
對應的服務員姓名!
table_name = "waiters"
receipts = Table(
table_name,
metadata_obj,
Column("receipt_id", Integer, primary_key=True),
Column("waiter_name", String(16), primary_key=True),
)
metadata_obj.create_all(engine)
rows = [
{"receipt_id": 1, "waiter_name": "Corey Johnson"},
{"receipt_id": 2, "waiter_name": "Michael Watts"},
{"receipt_id": 3, "waiter_name": "Michael Watts"},
{"receipt_id": 4, "waiter_name": "Margaret James"},
]
for row in rows:
stmt = insert(receipts).values(**row)
with engine.begin() as connection:
cursor = connection.execute(stmt)
我們需要用這個表的描述更新SQLExecutorTool
,以便LLM能夠正確利用這個表的資訊。
>>> updated_description = """Allows you to perform SQL queries on the table. Beware that this tool's output is a string representation of the execution output.
... It can use the following tables:"""
>>> inspector = inspect(engine)
>>> for table in ["receipts", "waiters"]:
... columns_info = [(col["name"], col["type"]) for col in inspector.get_columns(table)]
... table_description = f"Table '{table}':\n"
... table_description += "Columns:\n" + "\n".join([f" - {name}: {col_type}" for name, col_type in columns_info])
... updated_description += "\n\n" + table_description
>>> print(updated_description)
Allows you to perform SQL queries on the table. Beware that this tool's output is a string representation of the execution output. It can use the following tables: Table 'receipts': Columns: - receipt_id: INTEGER - customer_name: VARCHAR(16) - price: FLOAT - tip: FLOAT Table 'waiters': Columns: - receipt_id: INTEGER - waiter_name: VARCHAR(16)
由於這個請求比上一個稍微難一些,我們將把LLM引擎切換到使用更強大的Qwen/Qwen2.5-72B-Instruct!
sql_engine.description = updated_description
agent = CodeAgent(
tools=[sql_engine],
model=InferenceClientModel("Qwen/Qwen2.5-72B-Instruct"),
)
agent.run("Which waiter got more total money from tips?")
它直接就奏效了!這個設定出奇地簡單,不是嗎?
✅ 現在你可以去構建你一直夢想的Text-to-SQL系統了!✨
< > 在 GitHub 上更新