Optimum 文件
使用 ONNX Runtime 進行最佳推理
並獲得增強的文件體驗
開始使用
使用 ONNX Runtime 進行最佳推理
Optimum 是一個實用程式包,用於使用 ONNX Runtime 等加速執行時構建和執行推理。Optimum 可用於從 Hugging Face Hub 載入最佳化模型,並建立管道以執行加速推理,而無需重寫 API。
載入
Transformers 模型
一旦您的模型匯出為 ONNX 格式,您可以透過將 `AutoModelForXxx` 替換為相應的 `ORTModelForXxx` 類來載入它。
from transformers import AutoTokenizer, pipeline
- from transformers import AutoModelForCausalLM
+ from optimum.onnxruntime import ORTModelForCausalLM
- model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B") # PyTorch checkpoint
+ model = ORTModelForCausalLM.from_pretrained("onnx-community/Llama-3.2-1B", subfolder="onnx") # ONNX checkpoint
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
result = pipe("He never went out without a book under his arm")
有關所有支援的 `ORTModelForXxx` 的更多資訊,請參閱我們的文件
Diffusers 模型
一旦您的模型匯出為 ONNX 格式,您可以透過將 `DiffusionPipeline` 替換為相應的 `ORTDiffusionPipeline` 類來載入它。
- from diffusers import DiffusionPipeline
+ from optimum.onnxruntime import ORTDiffusionPipeline
model_id = "runwayml/stable-diffusion-v1-5"
- pipeline = DiffusionPipeline.from_pretrained(model_id)
+ pipeline = ORTDiffusionPipeline.from_pretrained(model_id, revision="onnx")
prompt = "sailing ship in storm by Leonardo da Vinci"
image = pipeline(prompt).images[0]
Sentence Transformers 模型
一旦您的模型匯出為 ONNX 格式,您可以透過將 `AutoModel` 替換為相應的 `ORTModelForFeatureExtraction` 類來載入它。
from transformers import AutoTokenizer
- from transformers import AutoModel
+ from optimum.onnxruntime import ORTModelForFeatureExtraction
tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
- model = AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
+ model = ORTModelForFeatureExtraction.from_pretrained("optimum/all-MiniLM-L6-v2")
inputs = tokenizer("This is an example sentence", return_tensors="pt")
outputs = model(**inputs)
您還可以直接使用 `sentence_transformers.SentenceTransformer` 類載入您的 ONNX 模型,只需確保已安裝 `sentence-transformers>=3.2`。如果模型尚未轉換為 ONNX,它將自動即時轉換。
from sentence_transformers import SentenceTransformer
model_id = "sentence-transformers/all-MiniLM-L6-v2"
- model = SentenceTransformer(model_id)
+ model = SentenceTransformer(model_id, backend="onnx")
sentences = ["This is an example sentence", "Each sentence is converted"]
embeddings = model.encode(sentences)
Timm 模型
一旦您的模型匯出為 ONNX 格式,您可以透過將 `create_model` 替換為相應的 `ORTModelForImageClassification` 類來載入它。
import requests
from PIL import Image
- from timm import create_model
from timm.data import resolve_data_config, create_transform
+ from optimum.onnxruntime import ORTModelForImageClassification
- model = create_model("timm/mobilenetv3_large_100.ra_in1k", pretrained=True)
+ model = ORTModelForImageClassification.from_pretrained("optimum/mobilenetv3_large_100.ra_in1k")
transform = create_transform(**resolve_data_config(model.config.pretrained_cfg, model=model))
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png"
image = Image.open(requests.get(url, stream=True).raw)
inputs = transform(image).unsqueeze(0)
outputs = model(inputs)
即時將模型轉換為 ONNX
如果您的模型尚未轉換為 ONNX,ORTModel 包含一個即時將模型轉換為 ONNX 的方法。只需將 `export=True` 傳遞給 from_pretrained() 方法,您的模型將即時載入並轉換為 ONNX
>>> from optimum.onnxruntime import ORTModelForSequenceClassification
>>> # Load the model from the hub and export it to the ONNX format
>>> model_id = "distilbert-base-uncased-finetuned-sst-2-english"
>>> model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)
將模型推送到 Hub
您還可以直接在模型上呼叫 `push_to_hub` 將其上傳到 Hub。
>>> from optimum.onnxruntime import ORTModelForSequenceClassification
>>> # Load the model from the hub and export it to the ONNX format
>>> model_id = "distilbert-base-uncased-finetuned-sst-2-english"
>>> model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)
>>> # Save the converted model locally
>>> output_dir = "a_local_path_for_convert_onnx_model"
>>> model.save_pretrained(output_dir)
# Push the onnx model to HF Hub
>>> model.push_to_hub(output_dir, repository_id="my-onnx-repo")