為多語言評估做貢獻

貢獻少量翻譯

我們定義了 19 個 literals，它們是在自動建立評估提示時使用的基本關鍵字或標點符號，例如 yes、no、because 等。

我們歡迎您提供您所用語言的翻譯！

要做出貢獻，您需要：

開啟 translation_literals 檔案
編輯該檔案，為您感興趣的語言新增或擴充套件字面量。

    Language.ENGLISH: TranslationLiterals(
        language=Language.ENGLISH,
        question_word="question", # Usage: "Question: How are you?"
        answer="answer", # Usage: "Answer: I am fine"
        confirmation_word="right", # Usage: "He is smart, right?"
        yes="yes", # Usage: "Yes, he is"
        no="no", # Usage: "No, he is not"
        also="also", # Usage: "Also, she is smart."
        cause_word="because", # Usage: "She is smart, because she is tall"
        effect_word="therefore", # Usage: "He is tall therefore he is smart"
        or_word="or", # Usage: "He is tall or small"
        true="true", # Usage: "He is smart, true, false or neither?"
        false="false", # Usage: "He is smart, true, false or neither?"
        neither="neither", # Usage: "He is smart, true, false or neither?"
        # Punctuation and spacing: only adjust if your language uses something different than in English
        full_stop=".",
        comma=",",
        question_mark="?",
        exclamation_mark="!",
        word_space=" ",
        sentence_space=" ",
        colon=":",
        # The first characters of your alphabet used in enumerations, if different from English
        indices=["A", "B", "C", ...]
    )

提交包含您修改的 PR！就是這樣！

貢獻新的多語言任務

您應該首先閱讀我們關於新增自定義任務的指南，以便更好地理解我們使用的不同引數。

然後，您應該檢視當前的多語言任務檔案，以瞭解它們是如何定義的。對於多語言評估，prompt_function 應透過適應語言的模板來實現。該模板將負責正確的格式化、正確且一致地使用適應語言的提示錨點（例如，Question/Answer）和標點符號。

在此處瀏覽所有模板列表，檢視哪些最適合您自己的任務。

然後，準備好後，要定義您自己的任務，您應該：

按照上述指南建立一個 Python 檔案
為您的任務型別匯入相關模板（XNLI、Copa、多項選擇、問答等）
使用我們可引數化的 LightevalTaskConfig 類，為每種相關語言和評估表述（用於多項選擇）定義一個或一組任務

your_tasks = [
    LightevalTaskConfig(
        # Name of your evaluation
        name=f"evalname_{language.value}_{formulation.name.lower()}",
        # The evaluation is community contributed
        suite=["community"],
        # This will automatically get the correct metrics for your chosen formulation
        metric=get_metrics_for_formulation(
            formulation,
            [
                loglikelihood_acc_metric(normalization=None),
                loglikelihood_acc_metric(normalization=LogProbTokenNorm()),
                loglikelihood_acc_metric(normalization=LogProbCharNorm()),
            ],
        ),
        # In this function, you choose which template to follow and for which language and formulation
        prompt_function=get_template_prompt_function(
            language=language,
            # then use the adapter to define the mapping between the
            # keys of the template (left), and the keys of your dataset
            # (right)
            # To know which template keys are required and available,
            # consult the appropriate adapter type and doc-string.
            adapter=lambda line: {
                "key": line["relevant_key"],
                ...
            },
            formulation=formulation,
        ),
        # You can also add specific filters to remove irrelevant samples
        hf_filter=lambda line: line["label"] in <condition>,
        # You then select your huggingface dataset as well as
        # the splits available for evaluation
        hf_repo=<dataset>,
        hf_subset=<subset>,
        evaluation_splits=["train"],
        hf_avail_splits=["train"],
    )
    for language in [
        Language.YOUR_LANGUAGE, ...
    ]
    for formulation in [MCFFormulation(), CFFormulation(), HybridFormulation()]
]

然後，您可以返回指南，測試您的任務是否已正確實現！

所有 LightevalTaskConfig 引數都是強型別的，包括模板函式的輸入。請確保利用您 IDE 的功能，以便更容易地正確填寫這些引數。

一切就緒後，提交一個 PR，我們很樂意對其進行審查！

< > 在 GitHub 上更新