在本地使用 TEI 和 GPU

您可以在本地安裝 `text-embeddings-inference` 並在您自己的機器上使用 GPU 執行它。要確保您的硬體受支援，請檢視支援的模型和硬體頁面。

第 1 步：CUDA 和 NVIDIA 驅動程式

確保您已安裝 CUDA 和 NVIDIA 驅動程式 - 您裝置上的 NVIDIA 驅動程式需要與 CUDA 12.2 或更高版本相容。

將 NVIDIA 二進位制檔案新增到您的路徑

export PATH=$PATH:/usr/local/cuda/bin

在您的機器上安裝 Rust，在您的終端中執行以下命令，然後按照說明操作

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

此步驟可能需要一段時間，因為我們需要編譯許多 CUDA 核心。

cargo install --path router -F candle-cuda-turing -F http --no-default-features

cargo install --path router -F candle-cuda -F http --no-default-features

現在您可以使用以下命令在 GPU 上啟動文字嵌入推理：

model=Qwen/Qwen3-Embedding-0.6B

text-embeddings-router --model-id $model --dtype float16 --port 8080