在 Azure AI 上使用 smolagents 構建智慧體

此示例展示瞭如何使用 smolagents 構建智慧體，利用 Azure AI Foundry Hub 中 Hugging Face Collection 的大型語言模型 (LLM)，並將其部署為 Azure ML 託管線上終結點。

本示例並非旨在深入探討如何在 Azure AI 上部署大型語言模型 (LLM)，而是側重於如何使用它構建智慧體。因此，強烈建議您閱讀有關 Azure AI 部署的更多資訊，例如“在 Azure AI 上部署大型語言模型 (LLM)”。

簡而言之，Smolagents 是一個開源 Python 庫，旨在讓您極輕鬆地構建和執行智慧體，只需幾行程式碼。Azure AI Foundry 提供了一個統一平臺，用於企業 AI 操作、模型構建和應用程式開發。Azure Machine Learning 是一種雲服務，用於加速和管理機器學習 (ML) 專案生命週期。

本示例將特別部署來自 Hugging Face Hub 的 Qwen/Qwen2.5-Coder-32B-Instruct（或在 AzureML 或 Azure AI Foundry 上檢視），作為 Azure AI Foundry Hub 上的 Azure ML 託管線上終結點。

Qwen2.5-Coder 是 Code-Specific Qwen 大型語言模型（前身為 CodeQwen）的最新系列，與 CodeQwen1.5 相比，帶來了以下改進：

在程式碼生成、程式碼推理和程式碼修復方面取得了顯著進步。在強大的 Qwen2.5 的基礎上，我們將訓練令牌擴充套件到 5.5 萬億，包括原始碼、文字-程式碼接地、合成數據等。Qwen2.5-Coder-32B 已成為當前最先進的開原始碼 LLM，其編碼能力與 GPT-4o 相當。
為程式碼智慧體等實際應用提供了更全面的基礎。不僅增強了編碼能力，還在數學和通用能力方面保持了其優勢。
支援長達 128K 令牌的長上下文。

Qwen2.5 Coder 32B Instruct on the Hugging Face Hub

Qwen2.5 Coder 32B Instruct on Azure AI Foundry

有關更多資訊，請務必檢視 Hugging Face Hub 上的其模型卡。

請注意，您可以選擇 Hugging Face Hub 上任何啟用“部署到 AzureML”選項的 LLM，或者直接選擇 Azure ML 或 Azure AI Foundry Hub 模型目錄中“HuggingFace”集合下的任何 LLM（請注意，對於 Azure AI Foundry，Hugging Face Collection 僅適用於基於 Hub 的專案）。

先決條件

要執行以下示例，您需要滿足以下先決條件，或者，您也可以在Azure Machine Learning 教程：建立入門所需的資源中閱讀更多相關資訊。

具有活動訂閱的 Azure 帳戶。
已安裝並登入 Azure CLI。
適用於 Azure CLI 的 Azure 機器學習擴充套件。
一個 Azure 資源組。
基於 Azure AI Foundry Hub 的專案。

有關更多資訊，請按照為 Azure AI 配置 Microsoft Azure 中的步驟操作。

設定和安裝

在本示例中，將使用 Azure Machine Learning SDK for Python 來建立終結點和部署，以及呼叫已部署的 API。同時，您還需要安裝 azure-identity，以便透過 Python 使用 Azure 憑據進行身份驗證。

%pip install azure-ai-ml azure-identity --upgrade --quiet

更多資訊請參見適用於 Python 的 Azure 機器學習 SDK。

然後，為了方便起見，建議設定以下環境變數，因為它們將在示例中用於 Azure ML 客戶端，因此請務必根據您的 Microsoft Azure 帳戶和資源更新並設定這些值。

%env LOCATION eastus
%env SUBSCRIPTION_ID <YOUR_SUBSCRIPTION_ID>
%env RESOURCE_GROUP <YOUR_RESOURCE_GROUP>
%env AI_FOUNDRY_HUB_PROJECT <YOUR_AI_FOUNDRY_HUB_PROJECT>

最後，您還需要定義終結點和部署名稱，因為它們也將在整個示例中使用。

請注意，終結點名稱在每個區域內必須是全域性唯一的，也就是說，即使您沒有在訂閱下執行任何以此命名的終結點，如果該名稱已被其他 Azure 客戶預留，您也將無法使用相同的名稱。建議新增時間戳或自定義識別符號，以防止在嘗試部署已鎖定/預留名稱的終結點時遇到 HTTP 400 驗證問題。此外，終結點名稱的長度必須在 3 到 32 個字元之間。

import os
from uuid import uuid4

os.environ["ENDPOINT_NAME"] = f"qwen-coder-endpoint-{str(uuid4())[:8]}"
os.environ["DEPLOYMENT_NAME"] = f"qwen-coder-deployment-{str(uuid4())[:8]}"

向 Azure ML 進行身份驗證

首先，您需要透過 Azure ML Python SDK 向 Azure AI Foundry Hub 進行身份驗證，該 SDK 將用於將 Qwen/Qwen2.5-Coder-32B-Instruct 部署為 Azure AI Foundry Hub 中的 Azure ML 託管線上終結點。

在標準 Azure ML 部署中，您需要使用 Azure ML 工作區作為 workspace_name 來建立 MLClient，而在 Azure AI Foundry 中，您需要將 Azure AI Foundry Hub 名稱作為 workspace_name 提供，這將把終結點也部署到 Azure AI Foundry 中。

import os
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

client = MLClient(
    credential=DefaultAzureCredential(),
    subscription_id=os.getenv("SUBSCRIPTION_ID"),
    resource_group_name=os.getenv("RESOURCE_GROUP"),
    workspace_name=os.getenv("AI_FOUNDRY_HUB_PROJECT"),
)

建立和部署 Azure AI 終結點

在建立託管線上終結點之前，您需要構建模型 URI，其格式如下：azureml://registries/HuggingFace/models/<MODEL_ID>/labels/latest，其中 MODEL_ID 不是 Hugging Face Hub ID，而是它在 Azure 上的名稱，如下所示：

model_id = "Qwen/Qwen2.5-Coder-32B-Instruct"

model_uri = (
    f"azureml://registries/HuggingFace/models/{model_id.replace('/', '-').replace('_', '-').lower()}/labels/latest"
)
model_uri

要檢查 Hugging Face Hub 中的模型是否在 Azure 中可用，您應該閱讀支援的模型。如果不可用，您隨時可以在 Azure 上的 Hugging Face 集合中請求新增模型）。

然後，您需要透過 Azure ML Python SDK 建立 ManagedOnlineEndpoint，如下所示。

Hugging Face Collection 中的每個模型都由高效的推理後端提供支援，每個模型都可以在各種例項型別上執行（如支援的硬體中所列）。由於模型和推理引擎需要 GPU 加速例項，您可能需要根據管理和增加 Azure Machine Learning 資源的配額和限制來請求增加配額。

from azure.ai.ml.entities import ManagedOnlineEndpoint, ManagedOnlineDeployment

endpoint = ManagedOnlineEndpoint(name=os.getenv("ENDPOINT_NAME"))

deployment = ManagedOnlineDeployment(
    name=os.getenv("DEPLOYMENT_NAME"),
    endpoint_name=os.getenv("ENDPOINT_NAME"),
    model=model_uri,
    instance_type="Standard_NC40ads_H100_v5",
    instance_count=1,
)

client.begin_create_or_update(endpoint).wait()

Azure AI Endpoint from Azure AI Foundry

在 Azure AI Foundry 中，終結點只有在部署建立後才會在“我的資產 -> 模型 + 終結點”選項卡中列出，不像 Azure ML 那樣，即使終結點不包含任何活動或正在進行的部署也會顯示。

client.online_deployments.begin_create_or_update(deployment).wait()

Azure AI Deployment from Azure AI Foundry

請注意，儘管 Azure AI 終結點的建立相對較快，但部署將花費更長時間，因為它需要在 Azure 上分配資源，所以預計需要大約 10-15 分鐘，但也可能根據例項配置和可用性而花費更長時間。

部署完成後，您可以透過 Azure AI Foundry 或 Azure ML Studio 檢查終結點詳細資訊、即時日誌、如何使用終結點，甚至使用仍處於預覽階段的監控功能。欲瞭解更多資訊，請訪問Azure ML 託管線上終結點

使用 smolagents 構建智慧體

現在 Azure AI 終結點已執行，您可以開始向其傳送請求。由於有多種方法，但以下僅涵蓋 OpenAI Python SDK 方法，您應該訪問例如在 Azure AI 上部署大型語言模型 (LLM) 以檢視不同的替代方案。

因此，構建智慧體的步驟如下：

使用 smolagents 建立 OpenAI 客戶端，透過 smolagents.OpenAIServerModel 連線到正在執行的 Azure AI 終結點（請注意，smolagents 也公開了 smolagents.AzureOpenAIServerModel，但這用於透過 Azure 使用 OpenAI 的客戶端，而不是連線到 Azure AI）。
定義智慧體將有權訪問的工具集，即帶有 smolagents.tool 裝飾器的 Python 函式。
建立 smolagents.CodeAgent，利用部署在 Azure AI 上的程式碼-LLM，並新增先前定義的工具集，以便智慧體可以在適當的時候使用這些工具，使用本地執行器（如果執行的程式碼敏感或未識別，則不推薦）。

建立 OpenAI 客戶端

由於 Hugging Face 目錄中的每個 LLM 都部署了暴露 OpenAI 相容路由的推理引擎，因此您也可以透過 smolagents 利用 OpenAI Python SDK 向已部署的 Azure ML 終結點發送請求。

%pip install "smolagents[openai]" --upgrade --quiet

要將 OpenAI Python SDK 與 Azure ML 託管線上終結點一起使用，您需要首先檢索

api_url，帶 /v1 路由（包含 OpenAI Python SDK 將向其傳送請求的 v1/chat/completions 終結點）
api_key，它是 Azure AI 中的 API 金鑰或 Azure ML 中的主金鑰（除非使用專用的 Azure ML 令牌）

from urllib.parse import urlsplit

api_key = client.online_endpoints.get_keys(os.getenv("ENDPOINT_NAME")).primary_key

url_parts = urlsplit(client.online_endpoints.get(os.getenv("ENDPOINT_NAME")).scoring_uri)
api_url = f"{url_parts.scheme}://{url_parts.netloc}/v1"

或者，您也可以手動構建 API URL，如下所示，因為 URI 在每個區域中都是全域性唯一的，這意味著在同一區域中只會有一個同名終結點。

api_url = f"https://{os.getenv('ENDPOINT_NAME')}.{os.getenv('LOCATION')}.inference.ml.azure.com/v1"

或者直接從 Azure AI Foundry 或 Azure ML Studio 中檢索。

然後，您可以正常使用 OpenAI Python SDK，確保包含包含 Azure AI / ML 部署名稱的額外標頭 azureml-model-deployment。

額外的標頭將透過例項化客戶端時 OpenAI Python SDK 的 default_headers 引數提供，透過 smolagents.OpenAIServerModel 的 client_kwargs 引數在 smolagents 中提供，該引數將把這些標頭傳播到底層的 OpenAI 客戶端。

from smolagents import OpenAIServerModel

model = OpenAIServerModel(
    model_id="Qwen/Qwen2.5-Coder-32B-Instruct",
    api_base=api_url,
    api_key=api_key,
    client_kwargs={"default_headers": {"azureml-model-deployment": os.getenv("DEPLOYMENT_NAME")}},
)

構建 Python 工具

smolagents 將用於構建智慧體將利用的工具，以及構建 smolagents.CodeAgent 本身。以下工具將使用 smolagents.tool 裝飾器定義，這將準備 Python 函式以用作 LLM 智慧體中的工具。

請注意，函式簽名應附帶適當的型別提示以指導 LLM，以及清晰的函式名稱，最重要的是，格式良好的文件字串，說明函式的功能、引數、返回值以及可能引發的錯誤（如果適用）。

在這種情況下，將提供給智慧體的工具如下：

世界時間 API - get_time_in_timezone：使用世界時間 API 獲取給定位置的當前時間。
維基百科 API - search_wikipedia：使用維基百科 API 獲取維基百科條目的摘要。

在本例中，為簡化起見，所使用的工具已從https://github.com/huggingface/smolagents/blob/main/examples/multiple_tools.py移植而來，因此所有功勞歸於 smolagents GitHub 倉庫的原始作者和維護者。此外，只保留了用於查詢世界時間 API 和維基百科 API 的工具，因為它們擁有慷慨的免費套餐，允許任何人免費使用，無需付費或建立帳戶/API 令牌。

from smolagents import tool

世界時間 API - get_time_in_timezone

@tool
def get_time_in_timezone(location: str) -> str:
    """
    Fetches the current time for a given location using the World Time API.
    Args:
        location: The location for which to fetch the current time, formatted as 'Region/City'.
    Returns:
        str: A string indicating the current time in the specified location, or an error message if the request fails.
    Raises:
        requests.exceptions.RequestException: If there is an issue with the HTTP request.
    """
    import requests

    url = f"http://worldtimeapi.org/api/timezone/{location}.json"

    try:
        response = requests.get(url)
        response.raise_for_status()

        data = response.json()
        current_time = data["datetime"]

        return f"The current time in {location} is {current_time}."

    except requests.exceptions.RequestException as e:
        return f"Error fetching time data: {str(e)}"

維基百科 API - search_wikipedia

@tool
def search_wikipedia(query: str) -> str:
    """
    Fetches a summary of a Wikipedia page for a given query.
    Args:
        query: The search term to look up on Wikipedia.
    Returns:
        str: A summary of the Wikipedia page if successful, or an error message if the request fails.
    Raises:
        requests.exceptions.RequestException: If there is an issue with the HTTP request.
    """
    import requests

    url = f"https://en.wikipedia.org/api/rest_v1/page/summary/{query}"

    try:
        response = requests.get(url)
        response.raise_for_status()

        data = response.json()
        title = data["title"]
        extract = data["extract"]

        return f"Summary for {title}: {extract}"

    except requests.exceptions.RequestException as e:
        return f"Error fetching Wikipedia data: {str(e)}"

建立智慧體

由於本例中部署在 Azure AI 上的 LLM 是程式碼專用 LLM，因此將使用 smolagents.CodeAgent 建立智慧體，該智慧體添加了相關的提示和解析功能，以便將 LLM 輸出解釋為程式碼。或者，也可以使用 smolagents.ToolCallingAgent，它是一個工具呼叫智慧體，這意味著給定的 LLM 應該具有工具呼叫能力。

然後，smolagents.CodeAgent 需要 model 和模型可以訪問的 tools 集，然後透過 run 方法，您可以以自動方式利用智慧體的所有潛力，無需手動干預；這樣智慧體將在需要時使用給定工具，以回答或滿足您的初始請求。

from smolagents import CodeAgent

agent = CodeAgent(
    tools=[
        get_time_in_timezone,
        search_wikipedia,
    ],
    model=model,
    stream_outputs=True,
)

agent.run(
    "Could you create a Python function that given the summary of 'What is a Lemur?'"
    " replaces all the occurrences of the letter E with the letter U (ignore the casing)"
)
# Summary for Lumur: Lumurs aru wut-nosud primatus of thu supurfamily Lumuroidua, dividud into 8 familius and consisting of 15 gunura and around 100 uxisting spucius. Thuy aru undumic to thu island of Madagascar. Most uxisting lumurs aru small, with a pointud snout, largu uyus, and a long tail. Thuy chiufly livu in truus and aru activu at night.

agent.run("What time is in Thailand right now? And what's the time difference with France?")
# The current time in Thailand is 5 hours ahead of the current time in France.

釋放資源

完成 Azure AI 終結點/部署的使用後，您可以按如下方式刪除資源，這意味著您將停止支付模型執行所在的例項費用，並且所有相關費用都將停止。

client.online_endpoints.begin_delete(name=os.getenv("ENDPOINT_NAME")).result()

結論

透過本示例，您學習瞭如何在基於 Azure AI Foundry Hub 的專案中部署 Azure ML 託管線上終結點，該終結點執行 Azure AI Foundry Hub / Azure ML 模型目錄中的 Hugging Face Collection 中的開放模型，並利用它使用 smolagents 構建智慧體，最後，如何停止和釋放資源。

如果您對此示例有任何疑問、問題或疑問，請隨時提出問題，我們將盡力提供幫助！

📍 在 GitHub 上找到完整示例此處！

< > 在 GitHub 上更新