Evaluate 文件

主要類

評估

加入 Hugging Face 社群

並獲得增強的文件體驗

在模型、資料集和 Spaces 上進行協作

透過加速推理獲得更快的示例

切換文件主題

開始使用

主要類

EvaluationModuleInfo

基類 EvaluationModuleInfo 為其子類 MetricInfo、ComparisonInfo 和 MeasurementInfo 實現了邏輯。

class evaluate.EvaluationModuleInfo

( description: str citation: str features: typing.Union[datasets.features.features.Features, typing.List[datasets.features.features.Features]] inputs_description: str = <factory> homepage: str = <factory> license: str = <factory> codebase_urls: typing.List[str] = <factory> reference_urls: typing.List[str] = <factory> streamable: bool = False format: typing.Optional[str] = None module_type: str = 'metric' module_name: typing.Optional[str] = None config_name: typing.Optional[str] = None experiment_id: typing.Optional[str] = None )

用於儲存 MetricInfo、ComparisonInfo 和 MeasurementInfo 評估資訊的基類。

EvaluationModuleInfo 記錄了一次評估，包括其名稱、版本和特性。完整列表請參閱建構函式引數和屬性。

注意：並非所有欄位在構建時都是已知的，可能會在以後更新。

from_directory

( metric_info_dir )

引數

metric_info_dir (str) — 包含 metric_info JSON 檔案的目錄。這應該是特定指標版本的根目錄。

從 metric_info_dir 中的 JSON 檔案建立 EvaluationModuleInfo。

示例

>>> my_metric = EvaluationModuleInfo.from_directory("/path/to/directory/")

write_to_directory

( metric_info_dir )

引數

metric_info_dir (str) — 要儲存 metric_info_dir 的目錄。

將 EvaluationModuleInfo 作為 JSON 寫入 metric_info_dir。同時將許可證單獨儲存在 LICENSE 檔案中。

示例

>>> my_metric.info.write_to_directory("/path/to/directory/")

class evaluate.MetricInfo

( description: str citation: str features: typing.Union[datasets.features.features.Features, typing.List[datasets.features.features.Features]] inputs_description: str = <factory> homepage: str = <factory> license: str = <factory> codebase_urls: typing.List[str] = <factory> reference_urls: typing.List[str] = <factory> streamable: bool = False format: typing.Optional[str] = None module_type: str = 'metric' module_name: typing.Optional[str] = None config_name: typing.Optional[str] = None experiment_id: typing.Optional[str] = None )

關於指標的資訊。

EvaluationModuleInfo 記錄了一個指標，包括其名稱、版本和特性。完整列表請參閱建構函式引數和屬性。

注意：並非所有欄位在構建時都是已知的，可能會在以後更新。

class evaluate.ComparisonInfo

( description: str citation: str features: typing.Union[datasets.features.features.Features, typing.List[datasets.features.features.Features]] inputs_description: str = <factory> homepage: str = <factory> license: str = <factory> codebase_urls: typing.List[str] = <factory> reference_urls: typing.List[str] = <factory> streamable: bool = False format: typing.Optional[str] = None module_type: str = 'comparison' module_name: typing.Optional[str] = None config_name: typing.Optional[str] = None experiment_id: typing.Optional[str] = None )

關於比較的資訊。

EvaluationModuleInfo 記錄了一次比較，包括其名稱、版本和特性。完整列表請參閱建構函式引數和屬性。

注意：並非所有欄位在構建時都是已知的，可能會在以後更新。

class evaluate.MeasurementInfo

( description: str citation: str features: typing.Union[datasets.features.features.Features, typing.List[datasets.features.features.Features]] inputs_description: str = <factory> homepage: str = <factory> license: str = <factory> codebase_urls: typing.List[str] = <factory> reference_urls: typing.List[str] = <factory> streamable: bool = False format: typing.Optional[str] = None module_type: str = 'measurement' module_name: typing.Optional[str] = None config_name: typing.Optional[str] = None experiment_id: typing.Optional[str] = None )

關於度量的資訊。

EvaluationModuleInfo 記錄了一次度量，包括其名稱、版本和特性。完整列表請參閱建構函式引數和屬性。

注意：並非所有欄位在構建時都是已知的，可能會在以後更新。

EvaluationModule

基類 EvaluationModule 為其子類 Metric、Comparison 和 Measurement 實現了邏輯。

class evaluate.EvaluationModule

( config_name: typing.Optional[str] = None keep_in_memory: bool = False cache_dir: typing.Optional[str] = None num_process: int = 1 process_id: int = 0 seed: typing.Optional[int] = None experiment_id: typing.Optional[str] = None hash: str = None max_concurrent_cache_files: int = 10000 timeout: typing.Union[int, float] = 100 **kwargs )

引數

config_name (str) — 用於定義特定於模組計算指令碼的雜湊值，並防止在模組載入指令碼被修改時模組資料被覆蓋。
keep_in_memory (bool) — 在記憶體中保留所有預測和參考。在分散式設定中不可行。
cache_dir (str) — 用於儲存臨時預測/參考資料的目錄路徑。在分散式設定中，資料目錄應位於共享檔案系統上。
num_process (int) — 指定分散式設定中的總節點數。這對於在分散式設定中計算模組（特別是非累加模組，如 F1）很有用。
process_id (int) — 在分散式設定中指定當前程序的 ID（介於 0 和 num_process-1 之間）。這對於在分散式設定中計算模組（特別是非累加指標，如 F1）很有用。
seed (int, 可選) — 如果指定，這將在執行 compute() 時臨時設定 numpy 的隨機種子。
experiment_id (str) — 一個特定的實驗 ID。如果多個分散式評估共享同一個檔案系統，則使用此 ID。這對於在分散式設定中計算模組（特別是非累加指標，如 F1）很有用。
hash (str) — 用於根據雜湊檔案內容識別評估模組。
max_concurrent_cache_files (int) — 併發模組快取檔案的最大數量（預設為 10000）。
timeout (Union[int, float]) — 分散式設定同步的超時時間（秒）。

EvaluationModule 是指標、比較和度量的基類和通用 API。

add

( prediction = None reference = None **kwargs )

引數

prediction (list/array/tensor, 可選) — 預測。
reference (list/array/tensor, 可選) — 參考。

為評估模組的堆疊新增一個預測和參考。

示例

>>> import evaluate
>>> accuracy = evaluate.load("accuracy")
>>> accuracy.add(references=[0,1], predictions=[1,0])

add_batch

( predictions = None references = None **kwargs )

引數

predictions (list/array/tensor, 可選) — 預測。
references (list/array/tensor, 可選) — 參考。

為評估模組的堆疊新增一批預測和參考。

示例

>>> import evaluate
>>> accuracy = evaluate.load("accuracy")
>>> for refs, preds in zip([[0,1],[0,1]], [[1,0],[0,1]]):
...     accuracy.add_batch(references=refs, predictions=preds)

compute

( predictions = None references = None **kwargs ) → dict 或 None

引數

predictions (list/array/tensor, 可選) — 預測。
references (list/array/tensor, 可選) — 參考。
**kwargs (可選) — 將被轉發到評估模組的 compute() 方法的關鍵字引數（詳情見 docstring）。

返回

dict 或 None

如果在主程序（process_id == 0）上執行此評估模組，則返回包含結果的字典。
如果評估模組不在主程序上執行（process_id != 0），則返回 None。

計算評估模組。

不允許使用位置引數以防止錯誤。

>>> import evaluate
>>> accuracy =  evaluate.load("accuracy")
>>> accuracy.compute(predictions=[0, 1, 1, 0], references=[0, 1, 0, 1])

download_and_prepare

( download_config: typing.Optional[datasets.download.download_config.DownloadConfig] = None dl_manager: typing.Optional[datasets.download.download_manager.DownloadManager] = None )

引數

download_config (DownloadConfig, 可選) — 特定的下載配置引數。
dl_manager (DownloadManager, 可選) — 要使用的特定下載管理器。

下載並準備評估模組以供讀取。

示例

>>> import evaluate

class evaluate.Metric

( config_name: typing.Optional[str] = None keep_in_memory: bool = False cache_dir: typing.Optional[str] = None num_process: int = 1 process_id: int = 0 seed: typing.Optional[int] = None experiment_id: typing.Optional[str] = None hash: str = None max_concurrent_cache_files: int = 10000 timeout: typing.Union[int, float] = 100 **kwargs )

引數

config_name (str) — 用於定義特定於指標計算指令碼的雜湊值，並防止在指標載入指令碼被修改時指標資料被覆蓋。
keep_in_memory (bool) — 在記憶體中保留所有預測和參考。在分散式設定中不可行。
cache_dir (str) — 用於儲存臨時預測/參考資料的目錄路徑。在分散式設定中，資料目錄應位於共享檔案系統上。
num_process (int) — 指定分散式設定中的總節點數。這對於在分散式設定中計算指標（特別是非累加指標，如 F1）很有用。
process_id (int) — 在分散式設定中指定當前程序的 ID（介於 0 和 num_process-1 之間）。這對於在分散式設定中計算指標（特別是非累加指標，如 F1）很有用。
seed (int, 可選) — 如果指定，這將在執行 compute() 時臨時設定 numpy 的隨機種子。
experiment_id (str) — 一個特定的實驗 ID。如果多個分散式評估共享同一個檔案系統，則使用此 ID。這對於在分散式設定中計算指標（特別是非累加指標，如 F1）很有用。
max_concurrent_cache_files (int) — 併發指標快取檔案的最大數量（預設為 10000）。
timeout (Union[int, float]) — 分散式設定同步的超時時間（秒）。

Metric 是所有指標的基類和通用 API。

class evaluate.Comparison

( config_name: typing.Optional[str] = None keep_in_memory: bool = False cache_dir: typing.Optional[str] = None num_process: int = 1 process_id: int = 0 seed: typing.Optional[int] = None experiment_id: typing.Optional[str] = None hash: str = None max_concurrent_cache_files: int = 10000 timeout: typing.Union[int, float] = 100 **kwargs )

引數

config_name (str) — 用於定義特定於比較計算指令碼的雜湊值，並防止在比較載入指令碼被修改時比較資料被覆蓋。
keep_in_memory (bool) — 在記憶體中保留所有預測和參考。在分散式設定中不可行。
cache_dir (str) — 用於儲存臨時預測/參考資料的目錄路徑。在分散式設定中，資料目錄應位於共享檔案系統上。
num_process (int) — 指定分散式設定中的節點總數。這對於在分散式設定中（特別是非加性比較）計算比較非常有用。
process_id (int) — 在分散式設定中指定當前程序的ID（介於0和num_process-1之間）。這對於在分散式設定中（特別是非加性比較）計算比較非常有用。
seed (int, optional) — 如果指定，這將在執行 compute() 時臨時設定numpy的隨機種子。
experiment_id (str) — 特定的實驗ID。當多個分散式評估共享同一個檔案系統時使用。這對於在分散式設定中（特別是非加性比較）計算比較非常有用。
max_concurrent_cache_files (int) — 併發比較快取檔案的最大數量（預設為 10000）。
timeout (Union[int, float]) — 分散式設定同步的超時時間（秒）。

Comparison是所有比較的基類和通用API。

class evaluate.Measurement

( config_name: typing.Optional[str] = None keep_in_memory: bool = False cache_dir: typing.Optional[str] = None num_process: int = 1 process_id: int = 0 seed: typing.Optional[int] = None experiment_id: typing.Optional[str] = None hash: str = None max_concurrent_cache_files: int = 10000 timeout: typing.Union[int, float] = 100 **kwargs )

引數

config_name (str) — 這用於定義特定於度量計算指令碼的雜湊值，並防止在修改度量載入指令碼時覆蓋度量資料。
keep_in_memory (bool) — 將所有預測和參考保留在記憶體中。在分散式設定中不可行。
cache_dir (str) — 儲存臨時預測/參考資料的目錄路徑。在分散式設定中，資料目錄應位於共享檔案系統上。
num_process (int) — 指定分散式設定中的節點總數。這對於在分散式設定中（特別是非加性度量）計算度量非常有用。
process_id (int) — 在分散式設定中指定當前程序的ID（介於0和num_process-1之間）。這對於在分散式設定中（特別是非加性度量）計算度量非常有用。
seed (int, optional) — 如果指定，這將在執行 compute() 時臨時設定numpy的隨機種子。
experiment_id (str) — 特定的實驗ID。當多個分散式評估共享同一個檔案系統時使用。這對於在分散式設定中（特別是非加性度量）計算度量非常有用。
max_concurrent_cache_files (int) — 併發度量快取檔案的最大數量（預設為 10000）。
timeout (Union[int, float]) — 分散式設定同步的超時時間（秒）。

Measurement是所有度量的基類和通用API。

CombinedEvaluations

combine 函式允許將多個 EvaluationModule 合併為一個單一的 CombinedEvaluations。

evaluate.combine

( evaluations force_prefix = False )

引數

evaluations (Union[list, dict]) — 評估模組的列表或字典。模組可以作為字串傳遞，也可以是已載入的 EvaluationModule。如果傳遞的是字典，其鍵是使用的名稱，值是模組。當每個模組返回的結果中存在名稱重疊時，或者當 force_prefix=True 時，這些名稱將用作字首。
force_prefix (bool, optional, 預設為 False) — 如果為 True，模組的所有分數都將以其名稱為字首。如果傳遞的是字典，則使用鍵作為名稱，否則使用模組的名稱。

將多個指標、比較或度量組合成一個單一的 CombinedEvaluations 物件，可以像單個評估模組一樣使用。

如果兩個分數具有相同的名稱，則它們將以其模組名稱為字首。如果兩個模組具有相同的名稱，請使用字典為它們指定不同的名稱，否則將向其字首附加一個整數ID。

示例

>>> import evaluate
>>> accuracy = evaluate.load("accuracy")
>>> f1 = evaluate.load("f1")
>>> clf_metrics = combine(["accuracy", "f1"])

class evaluate.CombinedEvaluations

( evaluation_modules force_prefix = False )

add

( prediction = None reference = None **kwargs )

引數

predictions (list/array/tensor, optional) — 預測值。
references (list/array/tensor, optional) — 參考值。

為每個評估模組的堆疊新增一個預測和參考。

示例

>>> import evaluate
>>> accuracy = evaluate.load("accuracy")
>>> f1 = evaluate.load("f1")
>>> clf_metrics = combine(["accuracy", "f1"])
>>> for ref, pred in zip([0,1,0,1], [1,0,0,1]):
...     clf_metrics.add(references=ref, predictions=pred)

add_batch

( predictions = None references = None **kwargs )

引數

predictions (list/array/tensor, optional) — 預測值。
references (list/array/tensor, optional) — 參考值。

為每個評估模組的堆疊新增一批預測和參考。

示例

>>> import evaluate
>>> accuracy = evaluate.load("accuracy")
>>> f1 = evaluate.load("f1")
>>> clf_metrics = combine(["accuracy", "f1"])
>>> for refs, preds in zip([[0,1],[0,1]], [[1,0],[0,1]]):
...     clf_metrics.add(references=refs, predictions=preds)

compute

( predictions = None references = None **kwargs ) → dict 或 None

引數

predictions (list/array/tensor, optional) — 預測值。
references (list/array/tensor, optional) — 參考值。
**kwargs (optional) — 將轉發給評估模組 compute() 方法的關鍵字引數（詳情請見文件字串）。

返回

dict 或 None

如果在主程序（process_id == 0）上執行此評估模組，則返回包含結果的字典。
如果評估模組不在主程序上執行（process_id != 0），則返回 None。

計算每個評估模組。

不允許使用位置引數以防止錯誤。

示例

>>> import evaluate
>>> accuracy = evaluate.load("accuracy")
>>> f1 = evaluate.load("f1")
>>> clf_metrics = combine(["accuracy", "f1"])
>>> clf_metrics.compute(predictions=[0,1], references=[1,1])
{'accuracy': 0.5, 'f1': 0.6666666666666666}

< > 在 GitHub 上更新

←模型評估的注意事項載入方法→

© . This site is unofficial and not affiliated with Hugging Face, Inc.