使用 smolagents 構建一個具有工具呼叫超能力的智慧體 🦸

本筆記本演示瞭如何使用 smolagents 構建出色的 智慧體！

什麼是 智慧體？智慧體是由 LLM 提供支援的系統，它使 LLM (透過仔細的提示和輸出解析) 能夠使用特定的*工具*來解決問題。

這些*工具*基本上是 LLM 本身無法很好地執行的功能：例如，對於像 Llama-3-70B 這樣的文字生成 LLM，這可能是一個影像生成工具、一個網頁搜尋工具、一個計算器……

什麼是 smolagents？它是一個提供構建塊來構建您自己的智慧體的庫！在文件中瞭解更多資訊。

讓我們看看如何使用它，以及它可以解決哪些用例。

執行以下行以安裝所需的依賴項

!pip install smolagents datasets langchain sentence-transformers faiss-cpu duckduckgo-search openai langchain-community --upgrade -q

讓我們登入以便呼叫 HF 推理 API

from huggingface_hub import notebook_login

notebook_login()

1. 🏞️ 多模態 + 🌐 網頁瀏覽助手

對於這個用例，我們想展示一個可以瀏覽網頁並生成影像的智慧體。

要構建它，我們只需要準備兩個工具：影像生成和網頁搜尋。

對於影像生成，我們從 Hub 載入一個工具，該工具使用 HF 推理 API (無伺服器) 透過 Stable Diffusion 生成影像。
對於網頁搜尋，我們使用一個內建工具。

>>> from smolagents import load_tool, CodeAgent, InferenceClientModel, DuckDuckGoSearchTool

>>> # Import tool from Hub
>>> image_generation_tool = load_tool("m-ric/text-to-image", trust_remote_code=True)


>>> search_tool = DuckDuckGoSearchTool()

>>> model = InferenceClientModel("Qwen/Qwen2.5-72B-Instruct")
>>> # Initialize the agent with both tools
>>> agent = CodeAgent(tools=[image_generation_tool, search_tool], model=model)

>>> # Run it!
>>> result = agent.run(
...     "Generate me a photo of the car that James bond drove in the latest movie.",
... )
>>> result

TOOLCODE:
 from smolagents import Tool
from huggingface_hub import InferenceClient


class TextToImageTool(Tool):
    description = "This tool creates an image according to a prompt, which is a text description."
    name = "image_generator"
    inputs = &#123;"prompt": &#123;"type": "string", "description": "The image generator prompt. Don't hesitate to add details in the prompt to make the image look better, like 'high-res, photorealistic', etc."}}
    output_type = "image"
    model_sdxl = "black-forest-labs/FLUX.1-schnell"
    client = InferenceClient(model_sdxl)


    def forward(self, prompt):
        return self.client.text_to_image(prompt)

Image of an Aston Martin DB5

2. 📚💬 帶有迭代查詢最佳化和源選擇的 RAG

快速定義：檢索增強生成 (RAG) 是 **_“使用 LLM 回答使用者查詢，但基於從知識庫中檢索到的資訊”_**。

這種方法與使用香草或微調 LLM 相比具有許多優勢：舉幾個例子，它允許將答案基於真實事實並減少虛構，它允許為 LLM 提供特定領域的知識，並且它允許對知識庫中的資訊訪問進行精細控制。

現在假設我們想執行 RAG，但有一個額外的約束：某些引數必須動態生成。例如，根據使用者查詢，我們可能希望將搜尋限制在知識庫的特定子集，或者我們可能希望調整檢索到的文件數量。難點在於：**如何根據使用者查詢動態調整這些引數？**
RAG 的一個常見失敗案例是基於使用者查詢的檢索沒有返回任何相關的支援文件。**有沒有辦法透過在先前結果不相關的情況下使用修改後的查詢重新呼叫檢索器來迭代？**

🔧 嗯，我們可以用一種簡單的方法解決上述問題：我們將**讓我們的智慧體控制檢索器的引數！**

➡️ 讓我們看看如何做到這一點。我們首先載入一個我們想要執行 RAG 的知識庫：這個資料集是許多`huggingface`包的文件頁面的編譯，以 markdown 形式儲存。

import datasets

knowledge_base = datasets.load_dataset("m-ric/huggingface_doc", split="train")

現在我們透過處理資料集並將其儲存到向量資料庫中來準備知識庫，供檢索器使用。我們將使用 LangChain，因為它具有出色的向量資料庫實用程式

from langchain.docstore.document import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings

source_docs = [
    Document(page_content=doc["text"], metadata={"source": doc["source"].split("/")[1]}) for doc in knowledge_base
]

docs_processed = RecursiveCharacterTextSplitter(chunk_size=500).split_documents(source_docs)[:1000]

embedding_model = HuggingFaceEmbeddings(model_name="thenlper/gte-small")
vectordb = FAISS.from_documents(documents=docs_processed, embedding=embedding_model)

現在我們已經準備好資料庫，讓我們構建一個 RAG 系統，根據它回答使用者查詢！

我們希望我們的系統根據查詢僅從最相關的資訊源中進行選擇。

我們的文件頁面來自以下來源

>>> all_sources = list(set([doc.metadata["source"] for doc in docs_processed]))
>>> print(all_sources)

['datasets-server', 'datasets', 'optimum', 'gradio', 'blog', 'course', 'hub-docs', 'pytorch-image-models', 'peft', 'evaluate', 'diffusers', 'hf-endpoints-documentation', 'deep-rl-class', 'transformers']

👉 現在讓我們構建一個 `RetrieverTool`，我們的智慧體可以利用它從知識庫中檢索資訊。

由於我們需要將 vectordb 作為工具的屬性新增，我們不能簡單地使用帶有 `@tool` 裝飾器的簡單工具建構函式：因此我們將遵循高階智慧體文件中強調的高階設定。

import json
from smolagents import Tool
from langchain_core.vectorstores import VectorStore


class RetrieverTool(Tool):
    name = "retriever"
    description = (
        "Retrieves some documents from the knowledge base that have the closest embeddings to the input query."
    )
    inputs = {
        "query": {
            "type": "string",
            "description": "The query to perform. This should be semantically close to your target documents. Use the affirmative form rather than a question.",
        },
        "source": {"type": "string", "description": ""},
        "number_of_documents": {
            "type": "string",
            "description": "the number of documents to retrieve. Stay under 10 to avoid drowning in docs",
        },
    }
    output_type = "string"

    def __init__(self, vectordb: VectorStore, all_sources: str, **kwargs):
        super().__init__(**kwargs)
        self.vectordb = vectordb
        self.inputs["source"]["description"] = (
            f"The source of the documents to search, as a str representation of a list. Possible values in the list are: {all_sources}. If this argument is not provided, all sources will be searched.".replace(
                "'", "`"
            )
        )

    def forward(self, query: str, source: str = None, number_of_documents=7) -> str:
        assert isinstance(query, str), "Your search query must be a string"
        number_of_documents = int(number_of_documents)

        if source:
            if isinstance(source, str) and "[" not in str(source):  # if the source is not representing a list
                source = [source]
            source = json.loads(str(source).replace("'", '"'))

        docs = self.vectordb.similarity_search(
            query,
            filter=({"source": source} if source else None),
            k=number_of_documents,
        )

        if len(docs) == 0:
            return "No documents found with this filtering. Try removing the source filter."
        return "Retrieved documents:\n\n" + "\n===Document===\n".join([doc.page_content for doc in docs])

可選：將您的檢索器工具分享到 Hub

要將您的工具分享到 Hub，請首先將 RetrieverTool 定義單元格中的程式碼複製貼上到一個名為例如 `retriever.py` 的新檔案中。

當工具從單獨的檔案載入時，您可以使用以下程式碼將其推送到 Hub（請確保使用具有 `write` 訪問許可權的令牌登入）

share_to_hub = True

if share_to_hub:
    from huggingface_hub import login
    from retriever import RetrieverTool

    login("your_token")

    tool = RetrieverTool(vectordb, all_sources)

    tool.push_to_hub(repo_id="m-ric/retriever-tool")

    # Loading the tool
    from smolagents import load_tool

    retriever_tool = load_tool("m-ric/retriever-tool", vectordb=vectordb, all_sources=all_sources)

執行智慧體！

from smolagents import InferenceClientModel, ToolCallingAgent

model = InferenceClientModel("Qwen/Qwen2.5-72B-Instruct")

retriever_tool = RetrieverTool(vectordb=vectordb, all_sources=all_sources)
agent = ToolCallingAgent(tools=[retriever_tool], model=model, verbose=0)

agent_output = agent.run("Please show me a LORA finetuning script")

print("Final output:")
print(agent_output)

這裡發生了什麼？首先，智慧體啟動了具有特定來源（`['transformers', 'blog']`）的檢索器。

但這次檢索沒有產生足夠的結果 ⇒ 沒問題！智慧體可以迭代以前的結果，所以它只是用限制性更低的搜尋引數重新運行了它的檢索。因此，研究成功了！

請注意，**使用呼叫檢索器作為工具並能動態修改查詢和其他檢索引數的 LLM 智慧體**是 RAG 更通用的形式，它也涵蓋了許多 RAG 改進技術，如迭代查詢最佳化。

3. 💻 除錯 Python 程式碼

由於 CodeAgent 內建了 Python 程式碼直譯器，我們可以用它來除錯我們有故障的 Python 指令碼！

from smolagents import CodeAgent

agent = CodeAgent(tools=[], model=InferenceClientModel("Qwen/Qwen2.5-72B-Instruct"))

code = """
numbers=[0, 1, 2]

for i in range(4):
    print(numbers(i))
"""

final_answer = agent.run(
    "I have some code that creates a bug: please debug it, then run it to make sure it works and return the final code",
    additional_args=dict(code=code),
)

如你所見，智慧體嘗試了給定的程式碼，得到了錯誤，分析了錯誤，糾正了程式碼並在驗證其有效後返回了程式碼！

最終的程式碼是糾正後的程式碼

>>> print(final_answer)

numbers=[0, 1, 2]

for i in range(len(numbers)):
    print(numbers[i])

➡️ 結論

以上用例應該能讓您初步瞭解我們的智慧體框架的可能性！

有關更高階的用法，請閱讀文件。

歡迎所有反饋，這將幫助我們改進框架！ 🚀

< > 在 GitHub 上更新