評估自定義模型

Lighteval 允許你透過建立一個繼承自 LightevalModel 的自定義模型類來評估自定義模型實現。當你想評估標準後端（transformers, vllm 等）不直接支援的模型時，這非常有用。

建立自定義模型

建立一個包含自定義模型實現的 Python 檔案。該模型必須繼承自 LightevalModel 並實現所有必需的方法。

這是一個基礎示例

from lighteval.models.abstract_model import LightevalModel

class MyCustomModel(LightevalModel):
    def __init__(self, config):
        super().__init__(config)
        # Initialize your model here...

    def greedy_until(self, requests, max_tokens=None, stop_sequences=None):
        # Implement generation logic
        pass

    def loglikelihood(self, requests, log=True):
        # Implement loglikelihood computation
        pass

    def loglikelihood_rolling(self, requests):
        # Implement rolling loglikelihood computation
        pass

    def loglikelihood_single_token(self, requests):
        # Implement single token loglikelihood computation
        pass

自定義模型檔案應只包含一個繼承自 LightevalModel 的類。載入模型時，這個類將被自動檢測和例項化。

你可以在 examples/custom_models/google_translate_model.py 中找到一個完整的自定義模型實現示例。

執行評估

你可以使用命令列介面或 Python API 來評估你的自定義模型。

使用命令列

lighteval custom \
    "google-translate" \
    "examples/custom_models/google_translate_model.py" \
    "lighteval|wmt20:fr-de|0|0" \
    --max-samples 10

該命令需要三個必需的引數

模型名稱（用於在結果/日誌中跟蹤）
你的模型實現檔案的路徑
要評估的任務（格式與其他後端相同）

使用 Python API

from lighteval.logging.evaluation_tracker import EvaluationTracker
from lighteval.models.custom.custom_model import CustomModelConfig
from lighteval.pipeline import Pipeline, PipelineParameters

# Set up evaluation tracking
evaluation_tracker = EvaluationTracker(
    output_dir="results",
    save_details=True
)

# Configure the pipeline
pipeline_params = PipelineParameters(
    launcher_type=ParallelismManager.CUSTOM,
)

# Configure your custom model
model_config = CustomModelConfig(
    model="my-custom-model",
    model_definition_file_path="path/to/my_model.py"
)

# Create and run the pipeline
pipeline = Pipeline(
    tasks="leaderboard|truthfulqa:mc|0|0",
    pipeline_parameters=pipeline_params,
    evaluation_tracker=evaluation_tracker,
    model_config=model_config
)

pipeline.evaluate()
pipeline.save_and_push_results()

必需的方法

你的自定義模型必須實現以下核心方法

greedy_until：用於生成文字，直到達到停止序列或最大令牌數
loglikelihood：用於計算特定續寫的對數機率
loglikelihood_rolling：用於計算序列的滾動對數機率
loglikelihood_single_token：用於計算單個令牌的對數機率

請參閱 LightevalModel 基類文件，瞭解詳細的方法簽名和要求。

最佳實踐

錯誤處理：在模型方法中實現穩健的錯誤處理，以優雅地處理邊緣情況。
批處理：考慮在模型方法中實現高效的批處理以提高效能。
資源管理：在模型的 __init__ 和 __del__ 方法中正確管理任何資源（例如，API 連線、模型權重）。
文件：為你的模型類和方法新增清晰的文件字串，解釋任何具體要求或限制。

用例示例

自定義模型在以下場景中特別有用

評估透過自定義 API 訪問的模型
封裝具有專門預處理/後處理的模型
測試新穎的模型架構
評估整合模型
與外部服務或工具整合

有關封裝 Google 翻譯 API 的自定義模型的完整示例，請參閱 examples/custom_models/google_translate_model.py。

< > 在 GitHub 上更新