Webhook 指南：設定當資料集更改時自動重新訓練模型的系統

Webhooks 現已公開！

本指南將引導您在 Hugging Face 平臺上使用 HF Datasets、Webhooks、Spaces 和 AutoTrain 設定自動訓練管道。

我們將構建一個 Webhook，它監聽影像分類資料集的變化，並使用 AutoTrain 觸發 microsoft/resnet-50 的微調。

先決條件：將資料集上傳到 Hub

為了示例，我們將使用一個簡單的影像分類資料集。有關將資料上傳到 Hub 的更多資訊，請點選此處。

dataset

建立 Webhook 以響應資料集更改

首先，讓我們從您的設定中建立一個 Webhook。

選擇您的資料集作為目標儲存庫。在此示例中，我們將目標設定為 huggingface-projects/input-dataset。
現在可以放置一個虛擬 Webhook URL。定義 Webhook 後，您就可以檢視將傳送到 Webhook 的事件。您還可以重播它們，這對於除錯很有用！
輸入一個金鑰以使其更安全。
訂閱“倉庫更新”事件，因為我們希望響應資料更改

您的 Webhook 將如下所示

webhook-creation

建立 Space 以響應 Webhook

我們現在需要一種方法來響應您的 Webhook 事件。一種簡單的方法是使用Space！

您可以在此處找到一個示例 Space。

此 Space 使用 Docker、Python、FastAPI 和 uvicorn 來執行一個簡單的 HTTP 伺服器。在此處閱讀有關 Docker Spaces 的更多資訊：https://huggingface.co/docs/hub/spaces-sdks-docker。

入口點是 src/main.py。讓我們詳細介紹一下這個檔案及其功能

它啟動了一個 FastAPI 應用程式，該應用程式將監聽 /webhook 上的 HTTP POST 請求

from fastapi import FastAPI

# [...]
@app.post("/webhook")
async def post_webhook(
	# ...
):

# ...

1. 此路由檢查 X-Webhook-Secret 標頭是否存在，以及其值是否與您在 Webhook 設定中設定的值相同。WEBHOOK_SECRET 金鑰必須在 Space 的設定中設定，並且與 Webhook 中設定的金鑰相同。

# [...]

WEBHOOK_SECRET = os.getenv("WEBHOOK_SECRET")

# [...]

@app.post("/webhook")
async def post_webhook(
	# [...]
	x_webhook_secret:  Optional[str] = Header(default=None),
	# ^ checks for the X-Webhook-Secret HTTP header
):
	if x_webhook_secret is None:
		raise HTTPException(401)
	if x_webhook_secret != WEBHOOK_SECRET:
		raise HTTPException(403)
	# [...]

事件的有效負載以 JSON 編碼。在這裡，我們將使用 pydantic 模型來解析事件有效負載。我們還指定只在以下情況執行 Webhook：

事件與輸入資料集有關
事件是倉庫內容的更新，即有新的提交

# defined in src/models.py
class WebhookPayloadEvent(BaseModel):
	action: Literal["create", "update", "delete"]
	scope: str

class WebhookPayloadRepo(BaseModel):
	type: Literal["dataset", "model", "space"]
	name: str
	id: str
	private: bool
	headSha: str

class WebhookPayload(BaseModel):
	event: WebhookPayloadEvent
	repo: WebhookPayloadRepo

# [...]

@app.post("/webhook")
async def post_webhook(
	# [...]
	payload: WebhookPayload,
	# ^ Pydantic model defining the payload format
):
	# [...]
	if not (
		payload.event.action == "update"
		and payload.event.scope.startswith("repo.content")
		and payload.repo.name == config.input_dataset
		and payload.repo.type == "dataset"
	):
		# no-op if the payload does not match our expectations
		return {"processed": False}
	#[...]

如果有效負載有效，下一步是在 AutoTrain 上建立一個專案，安排對輸入模型（在我們的示例中是 microsoft/resnet-50）進行微調，並在完成後在資料集上建立討論！

def schedule_retrain(payload: WebhookPayload):
	# Create the autotrain project
	try:
		project = AutoTrain.create_project(payload)
		AutoTrain.add_data(project_id=project["id"])
		AutoTrain.start_processing(project_id=project["id"])
	except requests.HTTPError as err:
		print("ERROR while requesting AutoTrain API:")
		print(f"  code: {err.response.status_code}")
		print(f"  {err.response.json()}")
		raise
	# Notify in the community tab
	notify_success(project["id"])

訪問評論中的連結以檢視訓練成本估算，並開始微調模型！

community tab notification

在此示例中，我們使用 Hugging Face AutoTrain 快速微調了模型，但您當然可以接入自己的訓練基礎設施！

您可以隨意將 Space 複製到您的個人名稱空間並進行操作。您需要提供兩個金鑰

WEBHOOK_SECRET：您的 Webhook 中的金鑰。
HF_ACCESS_TOKEN：一個具有 write 許可權的使用者訪問令牌。您可以從您的設定中建立一個。

您還需要修改 config.json 檔案，以使用您選擇的資料集和模型

{
	"target_namespace": "the namespace where the trained model should end up",
	"input_dataset": "the dataset on which the model will be trained",
	"input_model": "the base model to re-train",
	"autotrain_project_prefix": "A prefix for the AutoTrain project"
}

配置您的 Webhook 以向您的 Space 傳送事件

最後但同樣重要的是，您需要配置 Webhook 以向您的 Space 傳送 POST 請求。

讓我們首先從上下文選單中獲取 Space 的“直接 URL”。單擊“嵌入此 Space”並複製“直接 URL”。

embed this Space

direct URL

更新您的 Webhook 以向該 URL 傳送請求

webhook settings

就是這樣！現在，對輸入資料集的每次提交都將觸發使用 AutoTrain 對 ResNet-50 進行微調 🎉

< > 在 GitHub 上更新