Optimum 文件
使用ONNX Runtime加速器的推理流水線
並獲得增強的文件體驗
開始使用
使用ONNX Runtime加速器的推理流水線
pipeline()
函式使得從模型中心使用模型進行各種任務的加速推理變得簡單,例如文字分類、問答和影像分類。
您還可以使用Transformers中的pipeline()函式,並提供您的Optimum模型類。
目前支援的任務有
特徵提取
文字分類
詞元分類
問題回答
零樣本分類
文字生成
文字到文字生成
摘要
翻譯
影像分類
自動語音識別
影像到文字
Optimum 流水線用法
雖然每個任務都有一個相關的流水線類,但使用包裝所有特定任務流水線的通用pipeline()
函式更為簡單。pipeline()
函式會自動載入一個預設模型和分詞器/特徵提取器,能夠為您的任務執行推理。
- 首先透過指定推理任務來建立管道
>>> from optimum.pipelines import pipeline
>>> classifier = pipeline(task="text-classification", accelerator="ort")
- 將您的輸入文字/影像傳遞給
pipeline()
函式
>>> classifier("I like you. I love you.")
[{'label': 'POSITIVE', 'score': 0.9998838901519775}]
注意:pipeline()
函式中使用的預設模型未針對推理進行最佳化或量化,因此與它們的PyTorch對應模型相比,不會有效能提升。
使用原生Transformers模型並轉換為ONNX
pipeline()
函式接受來自Hugging Face Hub的任何受支援模型。模型中心上有一些標籤,允許您篩選出您想用於任務的模型。
為了能夠使用ONNX Runtime後端載入模型,需要支援將模型匯出為ONNX的架構。
您可以在此處檢視支援的架構列表。
一旦您選擇了一個合適的模型,您可以透過指定模型倉庫來建立pipeline()
>>> from optimum.pipelines import pipeline
# The model will be loaded to an ORTModelForQuestionAnswering.
>>> onnx_qa = pipeline("question-answering", model="deepset/roberta-base-squad2", accelerator="ort")
>>> question = "What's my name?"
>>> context = "My name is Philipp and I live in Nuremberg."
>>> pred = onnx_qa(question=question, context=context)
也可以使用與ORTModelForXXX
類相關的from_pretrained(model_name_or_path, export=True)
方法載入它。
例如,下面是如何為問答任務載入ORTModelForQuestionAnswering類
>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForQuestionAnswering
>>> from optimum.pipelines import pipeline
>>> tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2")
>>> # Loading the PyTorch checkpoint and converting to the ONNX format by providing
>>> # export=True
>>> model = ORTModelForQuestionAnswering.from_pretrained(
... "deepset/roberta-base-squad2",
... export=True
... )
>>> onnx_qa = pipeline("question-answering", model=model, tokenizer=tokenizer, accelerator="ort")
>>> question = "What's my name?"
>>> context = "My name is Philipp and I live in Nuremberg."
>>> pred = onnx_qa(question=question, context=context)
使用Optimum模型
pipeline()
函式與Hugging Face Hub緊密整合,可以直接載入ONNX模型。
>>> from optimum.pipelines import pipeline
>>> onnx_qa = pipeline("question-answering", model="optimum/roberta-base-squad2", accelerator="ort")
>>> question = "What's my name?"
>>> context = "My name is Philipp and I live in Nuremberg."
>>> pred = onnx_qa(question=question, context=context)
也可以使用與ORTModelForXXX
類相關的from_pretrained(model_name_or_path)
方法載入它。
例如,下面是如何為問答任務載入ORTModelForQuestionAnswering類
>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForQuestionAnswering
>>> from optimum.pipelines import pipeline
>>> tokenizer = AutoTokenizer.from_pretrained("optimum/roberta-base-squad2")
>>> # Loading directly an ONNX model from a model repo.
>>> model = ORTModelForQuestionAnswering.from_pretrained("optimum/roberta-base-squad2")
>>> onnx_qa = pipeline("question-answering", model=model, tokenizer=tokenizer, accelerator="ort")
>>> question = "What's my name?"
>>> context = "My name is Philipp and I live in Nuremberg."
>>> pred = onnx_qa(question=question, context=context)
流水線中的最佳化和量化
pipeline()
函式不僅可以在原生ONNX Runtime檢查點上執行推理,您還可以使用透過ORTQuantizer和ORTOptimizer最佳化的檢查點。
下面是兩個示例,說明如何使用ORTOptimizer和ORTQuantizer最佳化/量化模型,並在之後用於推理。
使用ORTQuantizer進行量化
>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import (
... AutoQuantizationConfig,
... ORTModelForSequenceClassification,
... ORTQuantizer
... )
>>> from optimum.pipelines import pipeline
>>> # Load the tokenizer and export the model to the ONNX format
>>> model_id = "distilbert-base-uncased-finetuned-sst-2-english"
>>> save_dir = "distilbert_quantized"
>>> model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)
>>> # Load the quantization configuration detailing the quantization we wish to apply
>>> qconfig = AutoQuantizationConfig.avx512_vnni(is_static=False, per_channel=True)
>>> quantizer = ORTQuantizer.from_pretrained(model)
>>> # Apply dynamic quantization and save the resulting model
>>> quantizer.quantize(save_dir=save_dir, quantization_config=qconfig)
>>> # Load the quantized model from a local repository
>>> model = ORTModelForSequenceClassification.from_pretrained(save_dir)
>>> # Create the transformers pipeline
>>> onnx_clx = pipeline("text-classification", model=model, accelerator="ort")
>>> text = "I like the new ORT pipeline"
>>> pred = onnx_clx(text)
>>> print(pred)
>>> # [{'label': 'POSITIVE', 'score': 0.9974810481071472}]
>>> # Save and push the model to the hub (in practice save_dir could be used here instead)
>>> model.save_pretrained("new_path_for_directory")
>>> model.push_to_hub("new_path_for_directory", repository_id="my-onnx-repo", use_auth_token=True)
使用ORTOptimizer進行最佳化
>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import (
... AutoOptimizationConfig,
... ORTModelForSequenceClassification,
... ORTOptimizer
... )
>>> from optimum.onnxruntime.configuration import OptimizationConfig
>>> from optimum.pipelines import pipeline
>>> # Load the tokenizer and export the model to the ONNX format
>>> model_id = "distilbert-base-uncased-finetuned-sst-2-english"
>>> save_dir = "distilbert_optimized"
>>> tokenizer = AutoTokenizer.from_pretrained(model_id)
>>> model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)
>>> # Load the optimization configuration detailing the optimization we wish to apply
>>> optimization_config = AutoOptimizationConfig.O3()
>>> optimizer = ORTOptimizer.from_pretrained(model)
>>> optimizer.optimize(save_dir=save_dir, optimization_config=optimization_config)
# Load the optimized model from a local repository
>>> model = ORTModelForSequenceClassification.from_pretrained(save_dir)
# Create the transformers pipeline
>>> onnx_clx = pipeline("text-classification", model=model, accelerator="ort")
>>> text = "I like the new ORT pipeline"
>>> pred = onnx_clx(text)
>>> print(pred)
>>> # [{'label': 'POSITIVE', 'score': 0.9973127245903015}]
# Save and push the model to the hub
>>> tokenizer.save_pretrained("new_path_for_directory")
>>> model.save_pretrained("new_path_for_directory")
>>> model.push_to_hub("new_path_for_directory", repository_id="my-onnx-repo", use_auth_token=True)