Webhook 監聽器

Webhook 監聽器是我們 Pull Request 代理的入口點。當討論被建立或更新時，它會從 Hugging Face Hub 接收即時事件，從而觸發我們由 MCP 提供支援的標記工作流程。在本節中，我們將使用 FastAPI 實現一個 webhook 處理程式。

理解 Webhook 整合

遵循 Hugging Face Webhook 指南，我們的 webhook 監聽器將驗證傳入的請求並即時處理討論事件。

Webhook Creation

Webhook 事件流

理解 webhook 流程對於構建可靠的監聽器至關重要

使用者操作：某人在模型倉庫討論中建立評論
Hub 事件：Hugging Face 生成一個 webhook 事件
Webhook 交付：Hub 向我們的端點發送 POST 請求
身份驗證：我們驗證 webhook 金鑰
處理：從評論內容中提取標籤
操作：使用 MCP 工具為新標籤建立 Pull Request

Webhooks 是推送通知——Hugging Face Hub 會主動向您的應用程式傳送事件，而不是您輪詢更改。這使得能夠即時響應討論和評論。

FastAPI Webhook 應用程式

讓我們一步步構建 webhook 監聽器，從基礎開始，逐步構建完整的處理邏輯。

1. 應用程式設定

首先，讓我們設定基本的 FastAPI 應用程式，包括所有必要的匯入和配置

import os
import json
from datetime import datetime
from typing import List, Dict, Any, Optional

from fastapi import FastAPI, Request, BackgroundTasks
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel

這些匯入提供了構建健壯的 webhook 處理程式所需的一切。`FastAPI` 提供 Web 框架，`BackgroundTasks` 啟用非同步處理，而型別匯入有助於資料驗證。

現在讓我們配置我們的應用程式

# Configuration
WEBHOOK_SECRET = os.getenv("WEBHOOK_SECRET")
HF_TOKEN = os.getenv("HF_TOKEN")

# Simple storage for processed operations
tag_operations_store: List[Dict[str, Any]] = []

app = FastAPI(title="HF Tagging Bot")
app.add_middleware(CORSMiddleware, allow_origins=["*"])

此配置設定了

Webhook 金鑰：用於驗證傳入的 webhook
HF 令牌：用於向 Hub API 進行身份驗證
操作儲存：用於監控已處理操作的記憶體儲存
CORS 中介軟體：允許 Web 介面的跨域請求

`tag_operations_store` 列表跟蹤最近的 webhook 處理操作。這對於除錯和監控很有用，但在生產環境中，您可能希望使用資料庫或限制此列表的大小。

2. Webhook 資料模型

根據 Hugging Face webhook 文件，我們需要理解 webhook 資料結構

class WebhookEvent(BaseModel):
    event: Dict[str, str]          # Contains action and scope information
    comment: Dict[str, Any]        # Comment content and metadata
    discussion: Dict[str, Any]     # Discussion information
    repo: Dict[str, str]           # Repository details

此 Pydantic 模型幫助我們理解 webhook 結構。

我們關心的關鍵欄位是

event.action：對於新評論通常是“create”
event.scope：對於評論事件通常是“discussion.comment”
comment.content：實際評論文字
repo.name：評論所在的倉庫

3. 核心 Webhook 處理程式

現在是主要的 webhook 處理程式——這是重要部分發生的地方。讓我們把它分解成易於理解的部分

@app.post("/webhook")
async def webhook_handler(request: Request, background_tasks: BackgroundTasks):
    """
    Handle incoming webhooks from Hugging Face Hub
    Following the pattern from: https://raw.githubusercontent.com/huggingface/hub-docs/refs/heads/main/docs/hub/webhooks-guide-discussion-bot.md
    """
    print("🔔 Webhook received!")
    
    # Step 1: Validate webhook secret (security)
    webhook_secret = request.headers.get("X-Webhook-Secret")
    if webhook_secret != WEBHOOK_SECRET:
        print("❌ Invalid webhook secret")
        return {"error": "incorrect secret"}, 400

第一步是安全驗證。我們對照配置的金鑰檢查 `X-Webhook-Secret` 標頭，以確保 webhook 合法。

始終驗證 webhook 金鑰！如果沒有此檢查，任何人都可以向您的應用程式傳送虛假的 webhook 請求。金鑰充當 Hugging Face 和您的應用程式之間的共享密碼。

接下來，讓我們解析和驗證 webhook 資料

    # Step 2: Parse webhook data
    try:
        webhook_data = await request.json()
        print(f"📥 Webhook data: {json.dumps(webhook_data, indent=2)}")
    except Exception as e:
        print(f"❌ Error parsing webhook data: {str(e)}")
        return {"error": "invalid JSON"}, 400
    
    # Step 3: Validate event structure
    event = webhook_data.get("event", {})
    if not event:
        print("❌ No event data in webhook")
        return {"error": "missing event data"}, 400

此解析步驟優雅地處理潛在的 JSON 錯誤，並驗證我們是否具有預期的事件結構。

現在是事件過濾邏輯

    # Step 4: Check if this is a discussion comment creation
    # Following the webhook guide pattern:
    if (
        event.get("action") == "create" and 
        event.get("scope") == "discussion.comment"
    ):
        print("✅ Valid discussion comment creation event")
        
        # Process in background to return quickly to Hub
        background_tasks.add_task(process_webhook_comment, webhook_data)
        
        return {
            "status": "accepted",
            "message": "Comment processing started",
            "timestamp": datetime.now().isoformat()
        }
    else:
        print(f"ℹ️ Ignoring event: action={event.get('action')}, scope={event.get('scope')}")
        return {
            "status": "ignored",
            "reason": "Not a discussion comment creation"
        }

此過濾可確保我們只處理我們關心的事件——新的討論評論。我們忽略其他事件，例如倉庫建立、模型上傳等。

我們使用 FastAPI 的 `background_tasks.add_task()` 非同步處理 webhook。這使我們能夠快速（在幾秒鐘內）返回響應，而實際的標籤處理則在後臺進行。

Webhook 端點應在 10 秒內響應，否則傳送平臺可能會認為它們失敗。使用後臺任務可確保快速響應，同時允許複雜的處理非同步進行。

4. 評論處理邏輯

現在讓我們實現核心評論處理函式，它執行實際的標籤提取和 MCP 工具使用

async def process_webhook_comment(webhook_data: Dict[str, Any]):
    """
    Process webhook comment to detect and add tags
    Integrates with our MCP client for Hub interactions
    """
    print("🏷️ Starting process_webhook_comment...")
    
    try:
        # Extract comment and repository information
        comment_content = webhook_data["comment"]["content"]
        discussion_title = webhook_data["discussion"]["title"]
        repo_name = webhook_data["repo"]["name"]
        discussion_num = webhook_data["discussion"]["num"]
        comment_author = webhook_data["comment"]["author"].get("id", "unknown")
        
        print(f"📝 Comment from {comment_author}: {comment_content}")
        print(f"📰 Discussion: {discussion_title}")
        print(f"📦 Repository: {repo_name}")

此初始部分從 webhook 資料中提取所有相關資訊。我們同時獲取評論內容和討論標題，因為標籤可能在兩者中的任何一個位置被提及。

接下來，我們提取並處理標籤

        # Extract potential tags from comment and title
        comment_tags = extract_tags_from_text(comment_content)
        title_tags = extract_tags_from_text(discussion_title)
        all_tags = list(set(comment_tags + title_tags))
        
        print(f"🔍 Found tags: {all_tags}")
        
        # Store operation for monitoring
        operation = {
            "timestamp": datetime.now().isoformat(),
            "repo_name": repo_name,
            "discussion_num": discussion_num,
            "comment_author": comment_author,
            "extracted_tags": all_tags,
            "comment_preview": comment_content[:100] + "..." if len(comment_content) > 100 else comment_content,
            "status": "processing"
        }
        tag_operations_store.append(operation)

我們結合了來自兩個來源的標籤，並建立了一個操作記錄以供監控。此記錄跟蹤每個 webhook 處理操作的進度。

儲存操作記錄對於除錯和監控至關重要。當出現問題時，您可以檢視最近的操作以瞭解發生了什麼以及為什麼。

現在是 MCP 代理整合

        if not all_tags:
            operation["status"] = "no_tags"
            operation["message"] = "No recognizable tags found"
            print("❌ No tags found to process")
            return
        
        # Get MCP agent for tag processing
        agent = await get_agent()
        if not agent:
            operation["status"] = "error"
            operation["message"] = "Agent not configured (missing HF_TOKEN)"
            print("❌ No agent available")
            return
        
        # Process each extracted tag
        operation["results"] = []
        for tag in all_tags:
            try:
                print(f"🤖 Processing tag '{tag}' for repo '{repo_name}'")
                
                # Create prompt for agent to handle tag processing
                prompt = f"""
                Analyze the repository '{repo_name}' and determine if the tag '{tag}' should be added.
                
                First, check the current tags using get_current_tags.
                If '{tag}' is not already present and it's a valid tag, add it using add_new_tag.
                
                Repository: {repo_name}
                Tag to process: {tag}
                
                Provide a clear summary of what was done.
                """
                
                response = await agent.run(prompt)
                print(f"🤖 Agent response for '{tag}': {response}")
                
                # Parse response and store result
                tag_result = {
                    "tag": tag,
                    "response": response,
                    "timestamp": datetime.now().isoformat()
                }
                operation["results"].append(tag_result)
                
            except Exception as e:
                error_msg = f"❌ Error processing tag '{tag}': {str(e)}"
                print(error_msg)
                operation["results"].append({
                    "tag": tag,
                    "error": str(e),
                    "timestamp": datetime.now().isoformat()
                })
        
        operation["status"] = "completed"
        print(f"✅ Completed processing {len(all_tags)} tags")

此部分處理核心業務邏輯

驗證：確保我們有要處理的標籤和可用的代理
處理：對於每個標籤，為代理建立自然語言提示
記錄：儲存所有結果以供監控和除錯
錯誤處理：優雅地處理單個標籤的錯誤

代理提示經過精心設計，以指示 AI 確切地採取哪些步驟：首先檢查當前標籤，然後在適當的情況下新增新標籤。

5. 健康和監控端點

除了 webhook 處理程式之外，我們還需要用於監控和除錯的端點。讓我們新增這些基本端點

@app.get("/")
async def root():
    """Root endpoint with basic information"""
    return {
        "name": "HF Tagging Bot",
        "status": "running",
        "description": "Webhook listener for automatic model tagging",
        "endpoints": {
            "webhook": "/webhook",
            "health": "/health",
            "operations": "/operations"
        }
    }

根端點提供有關您的服務及其可用端點的基本資訊。

@app.get("/health")
async def health_check():
    """Health check endpoint for monitoring"""
    agent = await get_agent()
    
    return {
        "status": "healthy",
        "timestamp": datetime.now().isoformat(),
        "components": {
            "webhook_secret": "configured" if WEBHOOK_SECRET else "missing",
            "hf_token": "configured" if HF_TOKEN else "missing",
            "mcp_agent": "ready" if agent else "not_ready"
        }
    }

健康檢查端點驗證您的所有元件是否都已正確配置。這對於生產監控至關重要。

@app.get("/operations")
async def get_operations():
    """Get recent tag operations for monitoring"""
    # Return last 50 operations
    recent_ops = tag_operations_store[-50:] if tag_operations_store else []
    return {
        "total_operations": len(tag_operations_store),
        "recent_operations": recent_ops
    }

操作端點讓您可以檢視最近的 webhook 處理活動，這對於除錯和監控非常寶貴。

健康和監控端點對於生產部署至關重要。它們可幫助您快速識別配置問題並監控應用程式活動，而無需深入研究日誌。

Hugging Face Hub 上的 Webhook 配置

現在我們的 webhook 監聽器已準備就緒，讓我們在 Hugging Face Hub 上配置它。在這裡，我們將我們的應用程式連線到真實的倉庫事件。

1. 在設定中建立 Webhook

遵循 webhook 設定指南

Webhook Settings

導航到您的 Hugging Face 設定並配置

目標倉庫：指定要監控的倉庫
Webhook URL：您的已部署應用程式端點（例如，`https://your-space.hf.space/webhook`）
金鑰：使用與您的 `WEBHOOK_SECRET` 環境變數相同的金鑰
事件：訂閱“社群（PR 和討論）”事件

在為許多倉庫配置 webhook 之前，先從一兩個測試倉庫開始。這使您可以在擴充套件之前驗證您的應用程式是否正常執行。

2. Space URL 配置

對於 Hugging Face Spaces 部署，您需要獲取您的直接 URL

Direct URL

過程是

在您的 Space 設定中單擊“嵌入此 Space”
複製“直接 URL”
附加 `/webhook` 以建立您的 webhook 端點
使用此 URL 更新您的 webhook 配置

例如，如果您的 Space URL 是 `https://username-space-name.hf.space`，您的 webhook 端點將是 `https://username-space-name.hf.space/webhook`。

Space URL

測試 Webhook 監聽器

在部署到生產環境之前，測試至關重要。讓我們透過不同的測試方法進行演練

1. 本地測試

您可以使用一個簡單的指令碼在本地測試您的 webhook 處理程式

# test_webhook_local.py
import requests
import json

# Test data matching webhook format
test_webhook_data = {
    "event": {
        "action": "create",
        "scope": "discussion.comment"
    },
    "comment": {
        "content": "This model needs tags: pytorch, transformers",
        "author": {"id": "test-user"}
    },
    "discussion": {
        "title": "Missing tags",
        "num": 1
    },
    "repo": {
        "name": "test-user/test-model"
    }
}

# Send test webhook
response = requests.post(
    "https://:8000/webhook",
    json=test_webhook_data,
    headers={"X-Webhook-Secret": "your-test-secret"}
)

print(f"Status: {response.status_code}")
print(f"Response: {response.json()}")

此指令碼模擬真實的 webhook 請求，讓您無需等待真實事件即可測試您的處理程式。

2. 開發用的模擬端點

您還可以向 FastAPI 應用程式新增一個模擬端點，以便於測試

@app.post("/simulate_webhook")
async def simulate_webhook(
    repo_name: str, 
    discussion_title: str, 
    comment_content: str
) -> str:
    """Simulate webhook for testing purposes"""
    
    # Create mock webhook data
    mock_webhook_data = {
        "event": {
            "action": "create",
            "scope": "discussion.comment"
        },
        "comment": {
            "content": comment_content,
            "author": {"id": "test-user"}
        },
        "discussion": {
            "title": discussion_title,
            "num": 999
        },
        "repo": {
            "name": repo_name
        }
    }
    
    # Process the simulated webhook
    await process_webhook_comment(mock_webhook_data)
    
    return f"Simulated webhook processed for {repo_name}"

此端點使透過應用程式介面測試不同場景變得容易。

模擬端點在開發過程中非常有用。它們讓您可以在不建立實際倉庫討論的情況下測試不同的標籤組合和邊緣情況。

預期的 Webhook 結果

當一切正常時，您應該看到類似討論機器人示例的結果

Discussion Result

此螢幕截圖顯示了成功的 webhook 處理，其中機器人響應討論評論建立了一個拉取請求。

下一步

透過實現 webhook 監聽器，我們現在擁有

安全的 webhook 驗證，遵循 Hugging Face 最佳實踐
即時事件處理，帶有後臺任務處理
MCP 整合，用於智慧標籤管理
監控和除錯功能

在下一節中，我們將把所有內容整合到一個完整的 Pull Request 代理中，該代理演示從 webhook 到 PR 建立的完整工作流程。

始終快速（在 10 秒內）返回 webhook 響應，以避免超時。對於 MCP 工具執行和拉取請求建立等較長的處理操作，請使用後臺任務。

< > 在 GitHub 上更新

MCP 課程