在 Azure AI 上部署視覺語言模型 (VLM)

此示例展示瞭如何將視覺語言模型（VLM），即具有視覺理解能力的大型語言模型（LLM），從 Azure AI Foundry Hub 的 Hugging Face Collection 中部署為 Azure ML 託管線上終結點。此外，本示例還展示瞭如何使用 Azure Python SDK、OpenAI Python SDK 執行推理，甚至是如何在本地執行 Gradio 應用程式進行帶影像的聊天補全。

請注意，本示例將透過 Python SDK / Azure CLI 程式設計部署，如果您更喜歡使用一鍵部署體驗，請檢視從 Azure ML 上的 Hugging Face Hub 一鍵部署。但請注意，從 Hugging Face Hub 部署時，終結點 + 部署將在 Azure ML 中建立，而不是在 Azure AI Foundry 中建立，而本示例側重於 Azure AI Foundry Hub 部署（也可在 Azure ML 上使用，但反之則不然）。

TL;DR Azure AI Foundry 為企業 AI 運營、模型構建者和應用程式開發提供統一平臺。Azure Machine Learning 是一種雲服務，用於加速和管理機器學習 (ML) 專案生命週期。

本示例將專門部署來自 Hugging Face Hub 的Qwen/Qwen2.5-VL-32B-Instruct（或在AzureML或Azure AI Foundry上檢視），作為 Azure AI Foundry Hub 上的 Azure ML 託管線上終結點。

Qwen2.5-VL 是 Qwen 最新的 VLM 之一，是在之前的 Qwen2 VL 版本釋出後的影響和反饋基礎上釋出的，具有一些關鍵增強功能，例如

視覺理解：Qwen2.5-VL 不僅精通識別花、鳥、魚、昆蟲等常見物體，而且能夠高度分析影像中的文字、圖表、圖示、圖形和佈局。
代理能力：Qwen2.5-VL 直接扮演視覺代理，能夠推理並動態指揮工具，具備計算機使用和手機使用的能力。
理解長影片和捕捉事件：Qwen2.5-VL 可以理解超過 1 小時的影片，這次它有了一個透過精確定位相關影片片段來捕捉事件的新能力。
能夠以不同格式進行視覺定位：Qwen2.5-VL 可以透過生成邊界框或點來精確地定點陣圖像中的物體，並可以為座標和屬性提供穩定的 JSON 輸出。
生成結構化輸出：對於發票、表格、圖表等掃描資料，Qwen2.5-VL 支援其內容的結構化輸出，有利於金融、商業等領域的使用。

Qwen2.5 VL 32B Instruct on the Hugging Face Hub

Qwen2.5 VL 32B Instruct on Azure AI Foundry

欲瞭解更多資訊，請務必檢視其在 Hugging Face Hub 上的模型卡片。

請注意，您可以選擇 Hugging Face Hub 上任何啟用“部署到 AzureML”選項的 VLM，或直接選擇 Azure ML 或 Azure AI Foundry Hub 模型目錄中“HuggingFace”集合下的任何 LLM（請注意，對於 Azure AI Foundry，Hugging Face Collection 僅適用於基於 Hub 的專案）。

先決條件

要執行以下示例，您需要滿足以下先決條件，或者，您也可以在Azure Machine Learning 教程：建立您入門所需的資源中閱讀更多相關資訊。

具有活動訂閱的 Azure 帳戶。
已安裝並登入 Azure CLI。
適用於 Azure CLI 的 Azure 機器學習擴充套件。
一個 Azure 資源組。
基於 Azure AI Foundry Hub 的專案。

有關更多資訊，請按照為 Azure AI 配置 Microsoft Azure 中的步驟操作。

設定和安裝

在此示例中，將使用適用於 Python 的 Azure 機器學習 SDK 來建立終結點和部署，以及呼叫已部署的 API。此外，您還需要安裝 azure-identity 以透過 Python 使用您的 Azure 憑據進行身份驗證。

%pip install azure-ai-ml azure-identity --upgrade --quiet

更多資訊請參見適用於 Python 的 Azure 機器學習 SDK。

然後，為了方便起見，建議設定以下環境變數，因為它們將在示例中用於 Azure ML 客戶端，因此請務必根據您的 Microsoft Azure 帳戶和資源更新並設定這些值。

%env LOCATION eastus
%env SUBSCRIPTION_ID <YOUR_SUBSCRIPTION_ID>
%env RESOURCE_GROUP <YOUR_RESOURCE_GROUP>
%env AI_FOUNDRY_HUB_PROJECT <YOUR_AI_FOUNDRY_HUB_PROJECT>

最後，您還需要定義終結點和部署名稱，因為它們也將在整個示例中使用。

請注意，終結點名稱在每個區域內必須是全域性唯一的，即，即使您的訂閱下沒有以此方式命名的終結點，如果該名稱已被其他 Azure 客戶預留，則您將無法使用相同的名稱。建議新增時間戳或自定義識別符號，以防止在嘗試部署具有已鎖定/預留名稱的終結點時遇到 HTTP 400 驗證問題。此外，終結點名稱長度必須在 3 到 32 個字元之間。

import os
from uuid import uuid4

os.environ["ENDPOINT_NAME"] = f"qwen-vl-endpoint-{str(uuid4())[:8]}"
os.environ["DEPLOYMENT_NAME"] = f"qwen-vl-deployment-{str(uuid4())[:8]}"

向 Azure ML 進行身份驗證

首先，您需要透過 Azure ML Python SDK 登入到 Azure AI Foundry Hub，該 SDK 將用於將 Qwen/Qwen2.5-VL-32B-Instruct 部署為 Azure AI Foundry Hub 中的 Azure ML 託管線上終結點。

在標準的 Azure ML 部署中，您需要使用 Azure ML 工作區作為 workspace_name 來建立 MLClient，而對於 Azure AI Foundry，您需要提供 Azure AI Foundry Hub 名稱作為 workspace_name，這樣終結點也將部署在 Azure AI Foundry 下。

import os
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

client = MLClient(
    credential=DefaultAzureCredential(),
    subscription_id=os.getenv("SUBSCRIPTION_ID"),
    resource_group_name=os.getenv("RESOURCE_GROUP"),
    workspace_name=os.getenv("AI_FOUNDRY_HUB_PROJECT"),
)

建立和部署 Azure AI 終結點

在建立託管線上終結點之前，您需要構建模型 URI，其格式如下：azureml://registries/HuggingFace/models//labels/latest，其中 MODEL_ID 不是 Hugging Face Hub ID，而是其在 Azure 上的名稱，如下所示

model_id = "Qwen/Qwen2.5-VL-32B-Instruct"

model_uri = (
    f"azureml://registries/HuggingFace/models/{model_id.replace('/', '-').replace('_', '-').lower()}/labels/latest"
)
model_uri

要檢查 Hugging Face Hub 中的模型是否在 Azure 中可用，您應該在支援的模型中閱讀相關內容。如果不支援，您隨時可以請求在 Azure 上的 Hugging Face 集合中新增模型）。

然後，您需要透過 Azure ML Python SDK 建立 ManagedOnlineEndpoint，如下所示。

Hugging Face Collection 中的每個模型都由高效的推理後端提供支援，並且每個模型都可以在各種例項型別上執行（如支援的硬體中所列）。由於模型和推理引擎需要 GPU 加速例項，您可能需要根據管理和增加 Azure 機器學習資源配額和限制請求增加配額。

from azure.ai.ml.entities import ManagedOnlineEndpoint, ManagedOnlineDeployment

endpoint = ManagedOnlineEndpoint(name=os.getenv("ENDPOINT_NAME"))

deployment = ManagedOnlineDeployment(
    name=os.getenv("DEPLOYMENT_NAME"),
    endpoint_name=os.getenv("ENDPOINT_NAME"),
    model=model_uri,
    instance_type="Standard_NC40ads_H100_v5",
    instance_count=1,
)

client.begin_create_or_update(endpoint).wait()

Azure AI Endpoint from Azure AI Foundry

在 Azure AI Foundry 中，終結點只有在部署建立後才會在“我的資產 -> 模型 + 終結點”選項卡中列出，不像 Azure ML 那樣，即使終結點不包含任何活動或正在進行的部署也會顯示。

client.online_deployments.begin_create_or_update(deployment).wait()

Azure AI Deployment from Azure AI Foundry

請注意，Azure AI 終結點建立相對較快，但部署將花費更長時間，因為它需要在 Azure 上分配資源，所以預計需要 10-15 分鐘，但也可能根據例項配置和可用性花費更長時間。

部署後，您可以透過 Azure AI Foundry 或 Azure ML Studio 檢查終結點詳細資訊、即時日誌、如何使用終結點，甚至使用仍處於預覽狀態的監控功能。您可以在Azure ML 託管線上終結點中找到更多資訊。

向 Azure AI 終結點發送請求

最後，Azure AI 終結點部署完成後，您可以向其傳送請求。在這種情況下，由於模型的任務是 image-text-to-text（也稱為支援影像的 chat-completion），您可以選擇使用預設的評分終結點，即 /generate，它是沒有聊天功能的標準文字生成終結點（不利用聊天模板或具有與 OpenAI 相容的 OpenAPI 介面），或者利用模型執行的推理引擎暴露 OpenAI 相容路由，如 /v1/chat/completions。

請注意，下面只列出了部分選項，但只要您傳送的 HTTP 請求的 azureml-model-deployment 標頭設定為 Azure AI 部署的名稱（不是終結點），並且擁有向給定終結點發送請求所需的身份驗證令牌/金鑰，您就可以向已部署的終結點發送請求；然後您可以向後端引擎公開的所有路由傳送 HTTP 請求，而不僅僅是評分路由。

Azure Python SDK

您可以使用先前例項化的 azure.ai.ml.MLClient（如果是在不同會話中工作，則例項化一個新的）透過 Azure Python SDK 在評分路由上呼叫 Azure ML 終結點，本例中為 /generate（更多資訊請參閱 AzureML 或 Azure AI Foundry 目錄中的 Qwen/Qwen2.5-VL-32B-Instruct 頁面）。

由於本例中您正在部署一個帶有文字生成推理 (TGI) 的視覺語言模型 (VLM)，為了透過 /generate 終結點利用視覺功能，您需要以 Markdown 格式包含影像 URL 或影像的 base64 編碼，例如 ![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png)這是什麼圖片？\n\n 或 ![](data:image/png;base64,...)這是什麼圖片？\n\n，分別。

更多資訊請參見TGI 中的視覺語言模型推理。

import json
import os
import tempfile

with tempfile.NamedTemporaryFile(mode="w+", delete=True, suffix=".json") as tmp:
    json.dump(
        {
            "inputs": "![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png)What is this a picture of?\n\n",
            "parameters": {"max_new_tokens": 128},
        },
        tmp,
    )

    tmp.flush()

    response = client.online_endpoints.invoke(
        endpoint_name=os.getenv("ENDPOINT_NAME"),
        deployment_name=os.getenv("DEPLOYMENT_NAME"),
        request_file=tmp.name,
    )

print(json.loads(response))

請注意，Azure ML Python SDK 在呼叫終結點時需要 JSON 檔案的路徑，這意味著您想要傳送到終結點的任何負載都需要首先轉換為 JSON 檔案，但這僅適用於透過 Azure ML Python SDK 傳送的請求。

OpenAI Python SDK

由於模型執行的推理引擎公開了與 OpenAI 相容的路由，因此您也可以利用 OpenAI Python SDK 向已部署的 Azure AI 終結點發送請求。

%pip install openai --upgrade --quiet

要將 OpenAI Python SDK 與 Azure ML 託管線上終結點一起使用，您需要首先檢索

api_url，帶 /v1 路由（包含 OpenAI Python SDK 將向其傳送請求的 v1/chat/completions 終結點）
api_key，它是 Azure AI 中的 API 金鑰或 Azure ML 中的主金鑰（除非使用專用的 Azure ML 令牌）

from urllib.parse import urlsplit

api_key = client.online_endpoints.get_keys(os.getenv("ENDPOINT_NAME")).primary_key

url_parts = urlsplit(client.online_endpoints.get(os.getenv("ENDPOINT_NAME")).scoring_uri)
api_url = f"{url_parts.scheme}://{url_parts.netloc}"

或者，您也可以手動構建 API URL，如下所示，因為 URI 在每個區域中都是全域性唯一的，這意味著在同一區域中只會有一個同名終結點。

api_url = f"https://{os.getenv('ENDPOINT_NAME')}.{os.getenv('LOCATION')}.inference.ml.azure.com/v1"

或者直接從 Azure AI Foundry 或 Azure ML Studio 中檢索。

然後，您可以正常使用 OpenAI Python SDK，確保包含包含 Azure AI / ML 部署名稱的額外標頭 azureml-model-deployment。

透過 OpenAI Python SDK，可以透過 extra_headers 引數在每次呼叫 chat.completions.create 時進行設定，如下面註釋所示，或者在例項化 OpenAI 客戶端時透過 default_headers 引數進行設定（這是推薦的方法，因為每個請求都需要存在此標頭，因此只需設定一次即可）。

import os
from openai import OpenAI

openai_client = OpenAI(
    base_url=f"{api_url}/v1",
    api_key=api_key,
    default_headers={"azureml-model-deployment": os.getenv("DEPLOYMENT_NAME")},
)

completion = openai_client.chat.completions.create(
    model="Qwen/Qwen2.5-VL-32B-Instruct",
    messages=[
        {"role": "system", "content": "You are an assistant that responds like a pirate."},
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What is in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png"
                    },
                },
            ],
        },
    ],
    max_tokens=128,
    # extra_headers={"azureml-model-deployment": os.getenv("DEPLOYMENT_NAME")},
)
print(completion)

cURL

或者，您也可以直接使用 cURL 向已部署的終結點發送請求，其中 api_url 和 api_key 值透過 OpenAI 程式碼片段以程式設計方式檢索，現在已設定為環境變數，以便 cURL 可以使用它們，如下所示

os.environ["API_URL"] = api_url
os.environ["API_KEY"] = api_key

!curl -sS $API_URL/v1/chat/completions \
    -H "Authorization: Bearer $API_KEY" \
    -H "Content-Type: application/json" \
    -H "azureml-model-deployment: $DEPLOYMENT_NAME" \
    -d '{ \
"messages":[ \
    {"role":"system","content":"You are an assistant that replies like a pirate."}, \
    {"role":"user","content": [ \
        {"type":"text","text":"What is in this image?"}, \
        {"type":"image_url","image_url":{"url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png"}} \
    ]} \
], \
"max_tokens":128 \
}' | jq

或者，您也可以直接進入 Azure AI Foundry 的“我的資產 -> 模型 + 終結點”或 Azure ML Studio 的“終結點”中的 Azure AI 終結點，檢索 URL（請注意，它將預設為 /generate 終結點，但要使用與 OpenAI 相容的層，您需要使用 /v1/chat/completions 終結點）和 API 金鑰值，以及給定模型的 Azure AI 部署名稱。

Gradio

Gradio 是用友好的 Web 介面演示機器學習模型的最快方式，以便任何人都可以使用它。您還可以利用 OpenAI Python SDK 構建一個簡單的多模態（文字和影像）ChatInterface，您可以在執行它的 Jupyter Notebook 單元格中使用它。

理想情況下，您可以將連線到 Azure ML 託管線上終結點的 Gradio 聊天介面部署為 Azure 容器應用，如教程：從原始碼構建並部署到 Azure 容器應用中所述。如果您希望我們專門為您展示 Gradio 的操作方法，請隨時提出問題請求。

%pip install gradio --upgrade --quiet

請參閱下面如何利用 Gradio 的 ChatInterface 的示例，或在Gradio ChatInterface 文件中查詢更多資訊。

import os
import base64
from typing import Dict, Iterator, List, Literal

import gradio as gr
from openai import OpenAI

openai_client = OpenAI(
    base_url=os.getenv("API_URL"),
    api_key=os.getenv("API_KEY"),
    default_headers={"azureml-model-deployment": os.getenv("DEPLOYMENT_NAME")},
)


def predict(
    message: Dict[str, str | List[str]], history: List[Dict[Literal["role", "content"], str]]
) -> Iterator[str]:
    content = []
    if message["text"]:
        content.append({"type": "text", "text": message["text"]})

    for file_path in message.get("files", []):
        with open(file_path, "rb") as image_file:
            base64_image = base64.b64encode(image_file.read()).decode("utf-8")
            content.append(
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/png;base64,{base64_image}"},
                }
            )

    messages = history.copy()
    messages.append({"role": "user", "content": content})

    stream = openai_client.chat.completions.create(
        model="Qwen/Qwen2.5-VL-32B-Instruct",
        messages=messages,
        stream=True,
    )
    buffer = ""
    for chunk in stream:
        if chunk.choices[0].delta.content:
            buffer += chunk.choices[0].delta.content
            yield buffer


demo = gr.ChatInterface(
    predict,
    textbox=gr.MultimodalTextbox(label="Input", file_types=[".jpg", ".png", ".jpeg"], file_count="multiple"),
    multimodal=True,
    type="messages",
)

demo.launch()

Gradio Chat Interface with Azure ML Endpoint

釋放資源

完成 Azure AI 終結點/部署的使用後，您可以按如下方式刪除資源，這意味著您將停止支付模型執行所在的例項費用，並且所有相關費用都將停止。

client.online_endpoints.begin_delete(name=os.getenv("ENDPOINT_NAME")).result()

總結

透過本示例，您瞭解瞭如何為 Azure ML 和 Azure AI Foundry 建立和配置 Azure 帳戶，如何在 Azure AI Foundry Hub/Azure ML 模型目錄中使用 Hugging Face Collection 中的開放模型建立託管線上終結點，然後如何使用不同的替代方案向其傳送推理請求，如何圍繞它構建一個簡單的 Gradio 聊天介面，最後，如何停止和釋放資源。

如果您對此示例有任何疑問、問題或疑問，請隨時提出問題，我們將盡力提供幫助！

📍 在 GitHub 上找到完整的示例此處！

< > 在 GitHub 上更新