AWS Trainium & Inferentia 文件

快速入門

Hugging Face's logo
加入 Hugging Face 社群

並獲得增強的文件體驗

開始使用

快速入門

🤗 Optimum Neuron 透過提供標準訓練和推理元件的**直接替換**,使 Hugging Face 使用者能夠無縫採用 AWS 加速器。

*🚀 需要先設定您的環境嗎? 請檢視我們的 EC2 入門頁面,獲取完整的安裝和 AWS 設定說明。*

主要功能

  • 🔄 直接替換標準 Transformers 訓練和推理
  • 分散式訓練支援,只需極少的程式碼更改
  • 🎯 針對 AWS 加速器**最佳化的模型**
  • 📈 使用編譯模型實現**生產就緒**的推理

訓練

在 AWS Trainium 上進行訓練只需對您現有的程式碼進行最少的更改 - 只需換入 Optimum Neuron 的直接替換元件即可

import torch
import torch_xla.runtime as xr

from datasets import load_dataset
from transformers import AutoTokenizer

# Optimum Neuron's drop-in replacements for standard training components
from optimum.neuron import NeuronSFTConfig, NeuronSFTTrainer, NeuronTrainingArguments
from optimum.neuron.models.training import NeuronModelForCausalLM


def format_dolly_dataset(example):
    """Format Dolly dataset into instruction-following format."""
    instruction = f"### Instruction\n{example['instruction']}"
    context = f"### Context\n{example['context']}" if example["context"] else None
    response = f"### Answer\n{example['response']}"
    
    # Combine all parts with double newlines
    parts = [instruction, context, response]
    return "\n\n".join(part for part in parts if part)


def main():
    # Load instruction-following dataset
    dataset = load_dataset("databricks/databricks-dolly-15k", split="train")
    
    # Model configuration
    model_id = "Qwen/Qwen3-1.7B"
    output_dir = "qwen3-1.7b-finetuned"
    
    # Setup tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    tokenizer.pad_token = tokenizer.eos_token
    
    # Configure training for Trainium
    training_args = NeuronTrainingArguments(
        learning_rate=1e-4,
        tensor_parallel_size=8,  # Split model across 8 accelerators
        per_device_train_batch_size=1,  # Batch size per device
        gradient_accumulation_steps=8,
        logging_steps=1,
        output_dir=output_dir,
    )
    
    # Load model optimized for Trainium
    model = NeuronModelForCausalLM.from_pretrained(
        model_id,
        training_args.trn_config,
        torch_dtype=torch.bfloat16,
        use_flash_attention_2=True,  # Enable fast attention
    )
    
    # Setup supervised fine-tuning
    sft_config = NeuronSFTConfig(
        max_seq_length=2048,
        packing=True,  # Pack multiple samples for efficiency
        **training_args.to_dict(),
    )
    
    # Initialize trainer and start training
    trainer = NeuronSFTTrainer(
        model=model,
        args=sft_config,
        tokenizer=tokenizer,
        train_dataset=dataset,
        formatting_func=format_dolly_dataset,
    )
    
    trainer.train()
    
    # Share your model with the community
    trainer.push_to_hub(
        commit_message="Fine-tuned on Databricks Dolly dataset",
        blocking=True,
        model_name=output_dir,
    )
    
    if xr.local_ordinal() == 0:
        print(f"Training complete! Model saved to {output_dir}")


if __name__ == "__main__":
    main()

此示例演示了在 Databricks Dolly 資料集上使用 NeuronSFTTrainerNeuronModelForCausalLM(標準 Transformers 元件的 Trainium 最佳化版本)進行監督微調。

執行訓練

編譯(首次執行時可選)

NEURON_CC_FLAGS="--model-type transformer" neuron_parallel_compile torchrun --nproc_per_node 32 sft_finetune_qwen3.py

訓練

NEURON_CC_FLAGS="--model-type transformer" torchrun --nproc_per_node 32 sft_finetune_qwen3.py

推理

最佳化推理需要兩個步驟:將模型**匯出**為 Neuron 格式,然後使用 NeuronModelForXXX 類**執行**它。

1. 匯出您的模型

optimum-cli export neuron \
  --model distilbert-base-uncased-finetuned-sst-2-english \
  --batch_size 1 \
  --sequence_length 32 \
  --auto_cast matmul \
  --auto_cast_type bf16 \
  distilbert_base_uncased_finetuned_sst2_english_neuron/

這將使用最佳化設定匯出模型:靜態形狀(batch_size=1, sequence_length=32)和用於 matmul 操作的 BF16 精度。請檢視匯出器指南瞭解更多編譯選項。

2. 執行推理

from transformers import AutoTokenizer
from optimum.neuron import NeuronModelForSequenceClassification

# Load the compiled Neuron model
model = NeuronModelForSequenceClassification.from_pretrained(
    "distilbert_base_uncased_finetuned_sst2_english_neuron"
)

# Setup tokenizer (same as original model)
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")

# Run inference
inputs = tokenizer("Hamilton is considered to be the best musical of past years.", return_tensors="pt")
logits = model(**inputs).logits

print(model.config.id2label[logits.argmax().item()])
# 'POSITIVE'

NeuronModelForXXX 類可作為其對應的 AutoModelForXXX 類的直接替換,使遷移無縫進行。

後續步驟

準備好深入瞭解了嗎?請檢視我們的綜合指南

© . This site is unofficial and not affiliated with Hugging Face, Inc.