使用 AWS Neuron (Inf2/Trn1) 的推理管道

pipeline() 函式可以方便地使用模型中心中的模型在各種任務（如文字分類、問答和影像分類）上進行加速推理。

您也可以使用 Transformers 中的 pipeline() 函式，並提供您的 NeuronModel 模型類。

目前支援的任務有

特徵提取
掩碼填充
文字分類
詞元分類
問題回答
零樣本分類

Optimum 管道用法

雖然每個任務都有一個關聯的管道類，但使用通用的 pipeline() 函式更為簡單，它將所有特定於任務的管道包裝在一個物件中。pipeline() 函式會自動載入一個能夠為您的任務執行推理的預設模型和分詞器/特徵提取器。

首先透過指定推理任務來建立管道

>>> from optimum.neuron.pipelines import pipeline

>>> classifier = pipeline(task="text-classification")

將您的輸入文字/影像傳遞給 pipeline() 函式

>>> classifier("I like you. I love you.")
[{'label': 'POSITIVE', 'score': 0.9998838901519775}]

注意：pipeline() 函式中使用的預設模型並未針對推理進行最佳化或量化，因此與它們的 PyTorch 對應版本相比，效能上不會有提升。

使用原生 Transformers 模型並轉換為 AWS Neuron

pipeline() 函式接受來自 Hugging Face Hub 的任何受支援的模型。模型中心上有標籤，可讓您篩選出想要用於任務的模型。

為了能夠使用 Neuron 執行時載入模型，所考慮的架構需要支援匯出到 Neuron。

您可以在這裡檢視支援的架構列表。

一旦選擇了合適的模型，您可以透過指定模型倉庫來建立 pipeline()

>>> from optimum.neuron.pipelines import pipeline

# The model will be loaded to an NeuronModelForQuestionAnswering.
>>> neuron_qa = pipeline("question-answering", model="deepset/roberta-base-squad2", export=True)
>>> question = "What's my name?"
>>> context = "My name is Philipp and I live in Nuremberg."

>>> pred = neuron_qa(question=question, context=context)

也可以使用與 NeuronModelForXXX 類關聯的 from_pretrained(model_name_or_path, export=True) 方法來載入它。

例如，以下是如何載入用於問答任務的 ~neuron.NeuronModelForQuestionAnswering 類

>>> from transformers import AutoTokenizer
>>> from optimum.neuron import NeuronModelForQuestionAnswering, pipeline

>>> tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2")

>>> # Loading the PyTorch checkpoint and converting to the neuron format by providing export=True
>>> model = NeuronModelForQuestionAnswering.from_pretrained(
...     "deepset/roberta-base-squad2",
...     export=True
... )

>>> neuron_qa = pipeline("question-answering", model=model, tokenizer=tokenizer)
>>> question = "What's my name?"
>>> context = "My name is Philipp and I live in Nuremberg."

>>> pred = neuron_qa(question=question, context=context)

定義輸入形狀

NeuronModels 當前需要靜態的 input_shapes 來執行推理。如果您在提供 export=True 引數時未提供輸入形狀，則將使用預設的輸入形狀。下面是一個示例，演示如何為序列長度和批處理大小指定輸入形狀。

>>> from optimum.neuron.pipelines import pipeline

>>> input_shapes = {"batch_size": 1, "sequence_length": 64} 
>>> clt = pipeline("token-classification", model="dslim/bert-base-NER", export=True,input_shapes=input_shapes)
>>> context = "My name is Philipp and I live in Nuremberg."

>>> pred = clt(context)

AWS Trainium & Inferentia

使用 AWS Neuron (Inf2/Trn1) 的推理管道

Optimum 管道用法

使用原生 Transformers 模型並轉換為 AWS Neuron

定義輸入形狀