在伺服器或容器上評估模型

除了在本地啟動評估，另一種方法是將模型部署在與 TGI 相容的伺服器/容器上，然後透過向伺服器傳送請求來執行評估。命令與之前相同，但需要指定一個 yaml 配置檔案（詳見下文）的路徑。

lighteval endpoint {tgi,inference-endpoint} \
    "/path/to/config/file"\
    <task parameters>

有兩種型別的配置檔案可用於在伺服器上執行

Hugging Face 推理端點

要使用 HuggingFace 的推理端點啟動模型，您需要提供以下檔案：endpoint_model.yaml。Lighteval 將自動部署端點，執行評估，並最終刪除端點（除非您指定了已啟動的端點，在這種情況下，端點之後不會被刪除）。

配置檔案示例

model_parameters:
    reuse_existing: false # if true, ignore all params in instance, and don't delete the endpoint after evaluation
# endpoint_name: "llama-2-7B-lighteval" # needs to be lower case without special characters
    model_name: "meta-llama/Llama-2-7b-hf"
    revision: "main"  # defaults to "main"
    dtype: "float16" # can be any of "awq", "eetq", "gptq", "4bit' or "8bit" (will use bitsandbytes), "bfloat16" or "float16"
    accelerator: "gpu"
    region: "eu-west-1"
    vendor: "aws"
    instance_type: "nvidia-a10g"
    instance_size: "x1"
    framework: "pytorch"
    endpoint_type: "protected"
    namespace: null # The namespace under which to launch the endpoint. Defaults to the current user's namespace
    image_url: null # Optionally specify the docker image to use when launching the endpoint model. E.g., launching models with later releases of the TGI container with support for newer models.
    env_vars:
    null # Optional environment variables to include when launching the endpoint. e.g., `MAX_INPUT_LENGTH: 2048`

文字生成推理 (TGI)

使用已部署在 TGI 伺服器上的模型，例如 HuggingFace 的無伺服器推理。

配置檔案示例

model_parameters:
    inference_server_address: ""
    inference_server_auth: null
    model_id: null # Optional, only required if the TGI container was launched with model_id pointing to a local directory

< > 在 GitHub 上更新