AWS Trainium & Inferentia 文件
快速入門
加入 Hugging Face 社群
並獲得增強的文件體驗
開始使用
快速入門
🤗 Optimum Neuron 透過提供標準訓練和推理元件的**直接替換**,使 Hugging Face 使用者能夠無縫採用 AWS 加速器。
*🚀 需要先設定您的環境嗎? 請檢視我們的 EC2 入門頁面,獲取完整的安裝和 AWS 設定說明。*
主要功能
- 🔄 直接替換標準 Transformers 訓練和推理
- ⚡ 分散式訓練支援,只需極少的程式碼更改
- 🎯 針對 AWS 加速器**最佳化的模型**
- 📈 使用編譯模型實現**生產就緒**的推理
訓練
在 AWS Trainium 上進行訓練只需對您現有的程式碼進行最少的更改 - 只需換入 Optimum Neuron 的直接替換元件即可
import torch
import torch_xla.runtime as xr
from datasets import load_dataset
from transformers import AutoTokenizer
# Optimum Neuron's drop-in replacements for standard training components
from optimum.neuron import NeuronSFTConfig, NeuronSFTTrainer, NeuronTrainingArguments
from optimum.neuron.models.training import NeuronModelForCausalLM
def format_dolly_dataset(example):
"""Format Dolly dataset into instruction-following format."""
instruction = f"### Instruction\n{example['instruction']}"
context = f"### Context\n{example['context']}" if example["context"] else None
response = f"### Answer\n{example['response']}"
# Combine all parts with double newlines
parts = [instruction, context, response]
return "\n\n".join(part for part in parts if part)
def main():
# Load instruction-following dataset
dataset = load_dataset("databricks/databricks-dolly-15k", split="train")
# Model configuration
model_id = "Qwen/Qwen3-1.7B"
output_dir = "qwen3-1.7b-finetuned"
# Setup tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
# Configure training for Trainium
training_args = NeuronTrainingArguments(
learning_rate=1e-4,
tensor_parallel_size=8, # Split model across 8 accelerators
per_device_train_batch_size=1, # Batch size per device
gradient_accumulation_steps=8,
logging_steps=1,
output_dir=output_dir,
)
# Load model optimized for Trainium
model = NeuronModelForCausalLM.from_pretrained(
model_id,
training_args.trn_config,
torch_dtype=torch.bfloat16,
use_flash_attention_2=True, # Enable fast attention
)
# Setup supervised fine-tuning
sft_config = NeuronSFTConfig(
max_seq_length=2048,
packing=True, # Pack multiple samples for efficiency
**training_args.to_dict(),
)
# Initialize trainer and start training
trainer = NeuronSFTTrainer(
model=model,
args=sft_config,
tokenizer=tokenizer,
train_dataset=dataset,
formatting_func=format_dolly_dataset,
)
trainer.train()
# Share your model with the community
trainer.push_to_hub(
commit_message="Fine-tuned on Databricks Dolly dataset",
blocking=True,
model_name=output_dir,
)
if xr.local_ordinal() == 0:
print(f"Training complete! Model saved to {output_dir}")
if __name__ == "__main__":
main()
此示例演示了在 Databricks Dolly 資料集上使用 NeuronSFTTrainer
和 NeuronModelForCausalLM
(標準 Transformers 元件的 Trainium 最佳化版本)進行監督微調。
執行訓練
編譯(首次執行時可選)
NEURON_CC_FLAGS="--model-type transformer" neuron_parallel_compile torchrun --nproc_per_node 32 sft_finetune_qwen3.py
訓練
NEURON_CC_FLAGS="--model-type transformer" torchrun --nproc_per_node 32 sft_finetune_qwen3.py
推理
最佳化推理需要兩個步驟:將模型**匯出**為 Neuron 格式,然後使用 NeuronModelForXXX
類**執行**它。
1. 匯出您的模型
optimum-cli export neuron \
--model distilbert-base-uncased-finetuned-sst-2-english \
--batch_size 1 \
--sequence_length 32 \
--auto_cast matmul \
--auto_cast_type bf16 \
distilbert_base_uncased_finetuned_sst2_english_neuron/
這將使用最佳化設定匯出模型:靜態形狀(batch_size=1
, sequence_length=32
)和用於 matmul
操作的 BF16 精度。請檢視匯出器指南瞭解更多編譯選項。
2. 執行推理
from transformers import AutoTokenizer
from optimum.neuron import NeuronModelForSequenceClassification
# Load the compiled Neuron model
model = NeuronModelForSequenceClassification.from_pretrained(
"distilbert_base_uncased_finetuned_sst2_english_neuron"
)
# Setup tokenizer (same as original model)
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
# Run inference
inputs = tokenizer("Hamilton is considered to be the best musical of past years.", return_tensors="pt")
logits = model(**inputs).logits
print(model.config.id2label[logits.argmax().item()])
# 'POSITIVE'
NeuronModelForXXX
類可作為其對應的 AutoModelForXXX
類的直接替換,使遷移無縫進行。
後續步驟
準備好深入瞭解了嗎?請檢視我們的綜合指南