ROCm 支援的 AMD GPU 加速推理

預設情況下，ONNX Runtime 在 CPU 裝置上執行推理。但是，可以將受支援的操作放在 AMD Instinct GPU 上，同時將任何不受支援的操作留在 CPU 上。在大多數情況下，這允許將開銷大的操作放在 GPU 上，從而顯著加速推理。

我們的測試涉及 AMD Instinct GPU，有關特定 GPU 相容性，請參閱此處提供的官方 GPU 支援列表。

本指南將向您展示如何在 ONNX Runtime 支援 AMD GPU 的 ROCMExecutionProvider 執行提供程式上執行推理。

安裝

以下設定安裝了對 ROCm 6.0 的 ONNX Runtime 支援，以及 ROCm Execution Provider。

1 ROCm 安裝

請參閱 ROCm 安裝指南以安裝 ROCm 6.0。

2 安裝 onnxruntime-rocm

請使用提供的 Dockerfile 示例或從原始碼進行本地安裝，因為 pip wheel 目前不可用。

Docker 安裝

docker build -f Dockerfile -t ort/rocm .

本地安裝步驟

2.1 支援 ROCm 的 PyTorch

Optimum ONNX Runtime 整合依賴於 Transformers 的某些功能，這些功能需要 PyTorch。目前，我們建議使用針對 RoCm 6.0 編譯的 PyTorch，可以按照 PyTorch 安裝指南進行安裝

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0
# Use 'rocm/pytorch:rocm6.0.2_ubuntu22.04_py3.10_pytorch_2.1.2' as the preferred base image when using Docker for PyTorch installation.

2.2 帶有 ROCm 執行提供程式的 ONNX Runtime

# pre-requisites
pip install -U pip
pip install cmake onnx
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Install ONNXRuntime from source
git clone --single-branch --branch main --recursive https://github.com/Microsoft/onnxruntime onnxruntime
cd onnxruntime

./build.sh --config Release --build_wheel --allow_running_as_root --update --build --parallel --cmake_extra_defines CMAKE_HIP_ARCHITECTURES=gfx90a,gfx942 ONNXRUNTIME_VERSION=$(cat ./VERSION_NUMBER) --use_rocm --rocm_home=/opt/rocm
pip install build/Linux/Release/dist/*

注意：說明適用於 `MI210/MI250/MI300` GPU 的 ORT 構建。要支援其他架構，請更新構建命令中的 `CMAKE_HIP_ARCHITECTURES`。

為避免 `onnxruntime` 和 `onnxruntime-rocm` 之間的衝突，請確保在安裝 `onnxruntime-rocm` 之前透過執行 `pip uninstall onnxruntime` 來解除安裝 `onnxruntime` 包。

檢查 ROCm 安裝是否成功

在繼續之前，執行以下示例程式碼以檢查安裝是否成功

>>> from optimum.onnxruntime import ORTModelForSequenceClassification
>>> from transformers import AutoTokenizer

>>> ort_model = ORTModelForSequenceClassification.from_pretrained(
...   "philschmid/tiny-bert-sst2-distilled",
...   export=True,
...   provider="ROCMExecutionProvider",
... )

>>> tokenizer = AutoTokenizer.from_pretrained("philschmid/tiny-bert-sst2-distilled")
>>> inputs = tokenizer("expectations were low, actual enjoyment was high", return_tensors="pt", padding=True)

>>> outputs = ort_model(**inputs)
>>> assert ort_model.providers == ["ROCMExecutionProvider", "CPUExecutionProvider"]

如果此程式碼執行順利，恭喜，安裝成功！如果您遇到以下錯誤或類似錯誤，

ValueError: Asked to use ROCMExecutionProvider as an ONNX Runtime execution provider, but the available execution providers are ['CPUExecutionProvider'].

則說明 ROCM 或 ONNX Runtime 安裝有問題。

將 ROCM 執行提供程式與 ORT 模型一起使用

對於 ORT 模型，使用起來很簡單。只需在 `ORTModel.from_pretrained()` 方法中指定 `provider` 引數即可。下面是一個示例

>>> from optimum.onnxruntime import ORTModelForSequenceClassification

>>> ort_model = ORTModelForSequenceClassification.from_pretrained(
...   "distilbert-base-uncased-finetuned-sst-2-english",
...   export=True,
...   provider="ROCMExecutionProvider",
... )

然後，模型可以與常見的 🤗 Transformers API 一起用於推理和評估，例如管道。當使用 Transformers 管道時，請注意 `device` 引數應設定為在 GPU 上執行預處理和後處理，如下例所示

>>> from optimum.pipelines import pipeline
>>> from transformers import AutoTokenizer

>>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")

>>> pipe = pipeline(task="text-classification", model=ort_model, tokenizer=tokenizer, device="cuda:0")
>>> result = pipe("Both the music and visual were astounding, not to mention the actors performance.")
>>> print(result)
# printing: [{'label': 'POSITIVE', 'score': 0.9997727274894c714}]

此外，您可以透過傳遞會話選項 `log_severity_level = 0`（詳細），來檢查所有節點是否確實放置在 ROCM 執行提供程式上

>>> import onnxruntime

>>> session_options = onnxruntime.SessionOptions()
>>> session_options.log_severity_level = 0

>>> ort_model = ORTModelForSequenceClassification.from_pretrained(
...     "distilbert-base-uncased-finetuned-sst-2-english",
...     export=True,
...     provider="ROCMExecutionProvider",
...     session_options=session_options
... )

觀察到的時間增益

即將推出！

< > 在 GitHub 上更新