如何在 AWS 上部署和微調 DeepSeek 模型

釋出於 2025 年 1 月 30 日

在 GitHub 上更新

贊

西蒙·帕熱齊 (Simon Pagezy)

pagezyhf

傑夫·布迪耶 (Jeff Boudier)

jeffboudier

大衛·科沃伊西耶 (David Corvoysier)

dacorvo

一份執行中的文件，展示如何在 AWS 上使用 Hugging Face 部署和微調 DeepSeek R1 模型。

什麼是 DeepSeek-R1？

如果你曾被一個棘手的數學問題困擾，你就會知道多花點時間仔細思考和解決問題是多麼有用。 OpenAI 的 o1 模型 表明，當 LLM 被訓練以在推理時使用更多計算來做到這一點時，它們在解決數學、編碼和邏輯等推理任務方面的表現會顯著提升。

然而，OpenAI 推理模型背後的秘訣一直被嚴密保守。直到上週，DeepSeek 釋出了他們的 DeepSeek-R1 模型，並迅速轟動了網際網路（甚至股市！）。

DeepSeek AI 開源了 DeepSeek-R1-Zero、DeepSeek-R1，以及從 DeepSeek-R1 蒸餾而來的六個基於 Llama 和 Qwen 架構的密集模型。您可以在 DeepSeek R1 集合中找到它們。

我們與亞馬遜網路服務合作，使開發人員更容易在 AWS 服務上部署最新的 Hugging Face 模型，以構建更好的生成式 AI 應用程式。

讓我們回顧一下如何在 AWS 上使用 Hugging Face 部署和微調 DeepSeek R1 模型。

部署 DeepSeek R1 模型
- 使用 Hugging Face 推理端點在 AWS 上部署
- [在 Amazon Bedrock Marketplace 上部署]
- 使用 Hugging Face LLM DLC 在 Amazon SageMaker AI 上部署
- 使用 Hugging Face Neuron 深度學習 AMI 在 EC2 Neuron 上部署
微調 DeepSeek R1 模型
- 使用 Hugging Face 訓練 DLC 在 Amazon SageMaker AI 上微調
- 使用 Hugging Face Neuron 深度學習 AMI 在 EC2 Neuron 上微調

部署 DeepSeek R1 模型

使用 Hugging Face 推理端點在 AWS 上部署

Hugging Face 推理端點 提供了一種簡單且安全的方式，用於在 AWS 上的專用計算資源上部署機器學習模型以供生產使用。推理端點使開發人員和資料科學家都能建立 AI 應用程式而無需管理基礎設施：簡化部署過程到只需點選幾下，包括處理大量請求的自動擴縮、透過按需擴充套件到零來降低基礎設施成本以及提供高階安全性。

藉助推理端點，您可以部署 DeepSeek-R1 的 6 個蒸餾模型中的任何一個，以及由 Unsloth 建立的 DeepSeek R1 量化版本：https://huggingface.co/unsloth/DeepSeek-R1-GGUF。在模型頁面上，點選“部署”，然後點選“HF 推理端點”。您將被重定向到推理端點頁面，我們已經為您選擇了一個最佳化的推理容器和執行模型推薦的硬體。建立端點後，您可以向 DeepSeek R1 傳送查詢，每小時只需 8.3 美元，費用驚人！🤯

您可以在推理端點模型目錄中找到 DeepSeek R1 和蒸餾模型，以及其他流行的開放 LLM，它們都已準備好在最佳化配置上部署。

| 注意： 團隊正在努力支援在 Inferentia 例項上部署 DeepSeek 模型。敬請期待！

在 Amazon Bedrock Marketplace 上部署

您可以透過 Bedrock 市場在 Amazon Bedrock 上部署 Deepseek 蒸餾模型，這將在 Amazon SageMaker AI 後臺部署一個端點。以下是您如何透過 AWS 控制檯進行操作的影片：

使用 Hugging Face LLM DLC 在 Amazon Sagemaker AI 上部署

DeepSeek R1 在 GPU 上

| 注意： 團隊正在努力支援使用 Hugging Face LLM DLC 在 Amazon Sagemaker AI 的 GPU 上部署 DeepSeek-R1。敬請期待！

蒸餾模型在 GPU 上

您可以使用 Jumpstart 或 Python Sagemaker SDK，在 Amazon Sagemaker AI 上使用 Hugging Face LLM DLC 部署 Deepseek 蒸餾模型。以下是您如何透過 AWS 控制檯進行操作的影片：

現在我們已經瞭解瞭如何使用 Jumpstart 部署，接下來我們來詳細介紹使用 Python Sagemaker SDK 部署 DeepSeek-R1-Distill-Llama-70B 的過程。

程式碼片段可在模型頁面的“部署”按鈕下找到！

在此之前，我們先了解一些先決條件。請確保您已配置 SageMaker Domain，在 SageMaker 中擁有足夠的配額，並擁有一個 JupyterLab 空間。對於 DeepSeek-R1-Distill-Llama-70B，您應該將 ml.g6.48xlarge 用於端點使用的預設配額提高到 1。

供參考，以下是我們為您推薦的每種蒸餾變體的硬體配置

模型	例項型別	每個副本的 GPU 數量
deepseek-ai/DeepSeek-R1-Distill-Llama-70B	ml.g6.48xlarge	8
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B	ml.g6.12xlarge	4
deepseek-ai/DeepSeek-R1-Distill-Qwen-14B	ml.g6.12xlarge	4
deepseek-ai/DeepSeek-R1-Distill-Llama-8B	ml.g6.2xlarge	1
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	ml.g6.2xlarge	1
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	ml.g6.2xlarge	1

進入筆記本後，請確保安裝最新版本的 SageMaker SDK。

!pip install sagemaker --upgrade

然後，例項化一個 `sagemaker_session`，用於確定當前區域和執行角色。

import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client("iam")
    role = iam.get_role(RoleName="sagemaker_execution_role")["Role"]["Arn"]

使用 Python SDK 建立 SageMaker 模型物件

model_id = "deepseek-ai/DeepSeek-R1-Distill-Llama-70B"
model_name = hf_model_id.split("/")[-1].lower()

# Hub Model configuration. https://huggingface.co/models
hub = {
    "HF_MODEL_ID": model_id,
    "SM_NUM_GPUS": json.dumps(8)
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    image_uri=get_huggingface_llm_image_uri("huggingface", version="3.0.1"),
    env=hub,
    role=role,
)

將模型部署到 SageMaker 端點並測試端點

endpoint_name = f"{model_name}-ep"

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
    endpoint_name=endpoint_name,
    initial_instance_count=1,
    instance_type="ml.g6.48xlarge",
    container_startup_health_check_timeout=2400,
)
  
# send request
predictor.predict({"inputs": "What is the meaning of life?"})

就是這樣，您部署了一個 Llama 70B 推理模型！

因為您正在使用 TGI v3 容器，所以會自動選擇給定硬體的最佳效能引數。

測試完端點後，請務必將其刪除。

predictor.delete_model()
predictor.delete_endpoint()

蒸餾模型在 Neuron 上

讓我們詳細介紹如何在 Neuron 例項（如 AWS Trainium 2 和 AWS Inferentia 2）上部署 DeepSeek-R1-Distill-Llama-70B。

程式碼片段可在模型頁面的“部署”按鈕下找到！

部署到 Neuron 例項的先決條件是相同的。請確保您已配置 SageMaker Domain，在 SageMaker 中擁有足夠的配額，並擁有一個 JupyterLab 空間。對於 DeepSeek-R1-Distill-Llama-70B，您應該將 ml.inf2.48xlarge 用於端點使用的預設配額提高到 1。

然後，例項化一個 `sagemaker_session`，用於確定當前區域和執行角色。

import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client("iam")
    role = iam.get_role(RoleName="sagemaker_execution_role")["Role"]["Arn"]

使用 Python SDK 建立 SageMaker 模型物件

image_uri = get_huggingface_llm_image_uri("huggingface-neuronx", version="0.0.25")
model_id = "deepseek-ai/DeepSeek-R1-Distill-Llama-70B"
model_name = hf_model_id.split("/")[-1].lower()

# Hub Model configuration
hub = {
    "HF_MODEL_ID": model_id,
    "HF_NUM_CORES": "24",
    "HF_AUTO_CAST_TYPE": "bf16",
    "MAX_BATCH_SIZE": "4",
    "MAX_INPUT_TOKENS": "3686",
    "MAX_TOTAL_TOKENS": "4096",
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    image_uri=image_uri,
    env=hub,
    role=role,
)

將模型部署到 SageMaker 端點並測試端點

endpoint_name = f"{model_name}-ep"

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
    endpoint_name=endpoint_name,
    initial_instance_count=1,
    instance_type="ml.inf2.48xlarge",
    container_startup_health_check_timeout=3600,
    volume_size=512,
)
  
# send request
predictor.predict(
    {
        "inputs": "What is is the capital of France?",
        "parameters": {
            "do_sample": True,
            "max_new_tokens": 128,
            "temperature": 0.7,
            "top_k": 50,
            "top_p": 0.95,
        }
    }
)

就是這樣，您已在 Neuron 例項上部署了 Llama 70B 推理模型！它在後臺從 Hugging Face 下載了一個預編譯模型，以加快端點啟動時間。

測試完端點後，請務必將其刪除。

predictor.delete_model()
predictor.delete_endpoint()

使用 Hugging Face Neuron 深度學習 AMI 在 EC2 Neuron 上部署

本指南將詳細介紹如何在 inf2.48xlarge AWS EC2 例項上匯出、部署和執行 DeepSeek-R1-Distill-Llama-70B。

在此之前，我們先了解一些先決條件。請確保您已在 Marketplace 上訂閱了 Hugging Face Neuron 深度學習 AMI。它為您提供了在 Trainium 和 Inferentia 上訓練和部署 Hugging Face 模型所需的所有依賴項。然後，使用該 AMI 在 EC2 中啟動一個 inf2.48xlarge 例項，並透過 SSH 連線。如果您從未操作過，可以檢視我們的分步指南。

連線到例項後，您可以使用此命令在端點上部署模型

docker run -p 8080:80 \
    -v $(pwd)/data:/data \
    --device=/dev/neuron0 \
    --device=/dev/neuron1 \
    --device=/dev/neuron2 \
    --device=/dev/neuron3 \
    --device=/dev/neuron4 \
    --device=/dev/neuron5 \
    --device=/dev/neuron6 \
    --device=/dev/neuron7 \
    --device=/dev/neuron8 \
    --device=/dev/neuron9 \
    --device=/dev/neuron10 \
    --device=/dev/neuron11 \
    -e HF_BATCH_SIZE=4 \
    -e HF_SEQUENCE_LENGTH=4096 \
    -e HF_AUTO_CAST_TYPE="bf16" \
    -e HF_NUM_CORES=24 \
    ghcr.io/huggingface/neuronx-tgi:latest \
    --model-id deepseek-ai/DeepSeek-R1-Distill-Llama-70B \
    --max-batch-size 4 \
    --max-total-tokens 4096

下載 Hugging Face 快取中的編譯模型並啟動 TGI 端點需要幾分鐘。

然後，您可以測試端點

curl localhost:8080/generate \
    -X POST \
    -d '{"inputs":"Why is the sky dark at night?"}' \
    -H 'Content-Type: application/json'

測試完成後，請務必暫停 EC2 例項。

| 注意： 團隊正在努力支援使用 Hugging Face Neuron 深度學習 AMI 在 Trainium 和 Inferentia 上部署 DeepSeek R1。敬請期待！

微調 DeepSeek R1 模型

使用 Hugging Face 訓練 DLC 在 Amazon SageMaker AI 上微調

| 注意： 團隊正在努力支援使用 Hugging Face 訓練 DLC 對所有 DeepSeek 模型進行微調。敬請期待！

使用 Hugging Face Neuron 深度學習 AMI 在 EC2 Neuron 上微調

| 注意： 團隊正在努力支援使用 Hugging Face Neuron 深度學習 AMI 對所有 DeepSeek 模型進行微調。敬請期待！

更多部落格文章

Amazon Bedrock 中的 Hugging Face 模型

作者： 2024年12月9日 • 16

推出 HUGS - 用開放模型擴充套件您的 AI

作者： 2024年10月23日 • 38

社群

nikitsenka

2 月 1 日

好文章。很想看到成本分析。

kvasist

3 月 5 日

好文章。我一直在嘗試在 Inferentia 上部署 deepseek-ai/DeepSeek-R1-Distill-Qwen-32B，上下文視窗大於 4096 (例如 MAX_TOTAL_TOKENS=8192)，但似乎沒有預編譯模型。如果您能新增編譯這些模型的說明，那將非常棒。

透過拖放到文字輸入框、貼上或點選此處上傳圖片、音訊和影片。

點選或貼上此處以上傳圖片

· 註冊或登入發表評論

贊