Optimum 文件

GaudiTrainer

您正在檢視的是需要從原始碼安裝。如果您希望進行常規 pip 安裝,請檢視最新的穩定版本 (v1.27.0)。
Hugging Face's logo
加入 Hugging Face 社群

並獲得增強的文件體驗

開始使用

GaudiTrainer

GaudiTrainer 類為功能齊全的 Transformers Trainer 提供了擴充套件 API。它在所有 示例指令碼 中均有使用。

在例項化 GaudiTrainer 之前,請建立一個 GaudiTrainingArguments 物件,以訪問訓練過程中的所有自定義點。

GaudiTrainer 類已針對在 Intel Gaudi 上執行的 🤗 Transformers 模型進行了最佳化。

以下是自定義 GaudiTrainer 以使用加權損失(當訓練集不平衡時很有用)的示例:

from torch import nn
from optimum.habana import GaudiTrainer


class CustomGaudiTrainer(GaudiTrainer):
    def compute_loss(self, model, inputs, return_outputs=False):
        labels = inputs.get("labels")
        # forward pass
        outputs = model(**inputs)
        logits = outputs.get("logits")
        # compute custom loss (suppose one has 3 labels with different weights)
        loss_fct = nn.CrossEntropyLoss(weight=torch.tensor([1.0, 2.0, 3.0]))
        loss = loss_fct(logits.view(-1, self.model.config.num_labels), labels.view(-1))
        return (loss, outputs) if return_outputs else loss

自定義 PyTorch GaudiTrainer 訓練迴圈行為的另一種方法是使用 回撥,它可以檢查訓練迴圈狀態(用於進度報告、在 TensorBoard 或其他 ML 平臺上記錄等)並做出決策(如提前停止)。

GaudiTrainer

class optimum.habana.GaudiTrainer

< >

( model: typing.Union[transformers.modeling_utils.PreTrainedModel, torch.nn.modules.module.Module, NoneType] = None gaudi_config: GaudiConfig = None args: TrainingArguments = None data_collator: typing.Optional[transformers.data.data_collator.DataCollator] = None train_dataset: typing.Union[torch.utils.data.dataset.Dataset, torch.utils.data.dataset.IterableDataset, ForwardRef('datasets.Dataset'), NoneType] = None eval_dataset: typing.Union[torch.utils.data.dataset.Dataset, dict[str, torch.utils.data.dataset.Dataset], ForwardRef('datasets.Dataset'), NoneType] = None processing_class: typing.Union[transformers.tokenization_utils_base.PreTrainedTokenizerBase, transformers.image_processing_utils.BaseImageProcessor, transformers.feature_extraction_utils.FeatureExtractionMixin, transformers.processing_utils.ProcessorMixin, NoneType] = None model_init: typing.Optional[typing.Callable[[], transformers.modeling_utils.PreTrainedModel]] = None compute_loss_func: typing.Optional[typing.Callable] = None compute_metrics: typing.Optional[typing.Callable[[transformers.trainer_utils.EvalPrediction], dict]] = None callbacks: typing.Optional[list[transformers.trainer_callback.TrainerCallback]] = None optimizers: tuple = (None, None) optimizer_cls_and_kwargs: typing.Optional[tuple[type[torch.optim.optimizer.Optimizer], dict[str, typing.Any]]] = None preprocess_logits_for_metrics: typing.Optional[typing.Callable[[torch.Tensor, torch.Tensor], torch.Tensor]] = None )

GaudiTrainer 構建在 transformers 的 Trainer 之上,以實現在 Habana 的 Gaudi 上的部署。

autocast_smart_context_manager

< >

( cache_enabled: typing.Optional[bool] = True )

一個輔助包裝器,它根據情況建立適當的 `autocast` 上下文管理器併為其提供所需的引數。

由 Habana 修改,以實現在 Gaudi 裝置上使用 `autocast`。

評估

< >

( eval_dataset: typing.Union[torch.utils.data.dataset.Dataset, dict[str, torch.utils.data.dataset.Dataset], NoneType] = None ignore_keys: typing.Optional[list[str]] = None metric_key_prefix: str = 'eval' )

來自 https://github.com/huggingface/transformers/blob/v4.38.2/src/transformers/trainer.py#L3162,並進行了以下修改

  1. 在評估吞吐量計算中使用 throughput_warmup_steps

evaluation_loop

< >

( dataloader: DataLoader description: str prediction_loss_only: typing.Optional[bool] = None ignore_keys: typing.Optional[list[str]] = None metric_key_prefix: str = 'eval' )

預測/評估迴圈,由 `Trainer.evaluate()` 和 `Trainer.predict()` 共享。帶標籤和不帶標籤均可使用。

predict

< >

( test_dataset: Dataset ignore_keys: typing.Optional[list[str]] = None metric_key_prefix: str = 'test' )

來自 https://github.com/huggingface/transformers/blob/v4.45.2/src/transformers/trainer.py#L3904,並進行了以下修改

  1. 註釋掉與 TPU 相關的內容
  2. 在評估吞吐量計算中使用 throughput_warmup_steps

prediction_step

< >

( model: Module inputs: dict prediction_loss_only: bool ignore_keys: typing.Optional[list[str]] = None ) Tuple[Optional[torch.Tensor], Optional[torch.Tensor], Optional[torch.Tensor]]

引數

  • model (torch.nn.Module) — 要評估的模型。
  • inputs (Dict[str, Union[torch.Tensor, Any]]) — 模型的輸入和目標。字典將在輸入到模型之前被解包。大多數模型期望目標在引數 `labels` 下。檢查您的模型文件以瞭解所有接受的引數。
  • prediction_loss_only (bool) — 是否僅返回損失。
  • ignore_keys (List[str], 可選) — 您的模型輸出(如果它是字典)中應在收集預測時忽略的鍵列表。

返回

Tuple[Optional[torch.Tensor], Optional[torch.Tensor], Optional[torch.Tensor]]

一個包含損失、logits 和標籤(每個都是可選)的元組。

使用 `inputs` 對 `model` 執行評估步驟。子類並重寫以注入自定義行為。

save_model

< >

( output_dir: typing.Optional[str] = None _internal_call: bool = False )

將儲存模型,以便您可以使用 `from_pretrained()` 重新載入它。僅從主程序儲存。

training_step

< >

( model: Module inputs: dict num_items_in_batch = None ) torch.Tensor

引數

  • model (torch.nn.Module) — 要訓練的模型。
  • inputs (Dict[str, Union[torch.Tensor, Any]]) — 模型的輸入和目標。

    字典將在輸入到模型之前被解包。大多數模型期望目標在引數 `labels` 下。檢查您的模型文件以瞭解所有接受的引數。

返回

torch.Tensor

此批次的訓練損失張量。

對一批輸入執行訓練步驟。

子類並重寫以注入自定義行為。

GaudiSeq2SeqTrainer

class optimum.habana.GaudiSeq2SeqTrainer

< >

( model: typing.Union[ForwardRef('PreTrainedModel'), torch.nn.modules.module.Module] = None gaudi_config: GaudiConfig = None args: GaudiTrainingArguments = None data_collator: typing.Optional[ForwardRef('DataCollator')] = None train_dataset: typing.Union[torch.utils.data.dataset.Dataset, ForwardRef('IterableDataset'), ForwardRef('datasets.Dataset'), NoneType] = None eval_dataset: typing.Union[torch.utils.data.dataset.Dataset, dict[str, torch.utils.data.dataset.Dataset], NoneType] = None processing_class: typing.Union[ForwardRef('PreTrainedTokenizerBase'), ForwardRef('BaseImageProcessor'), ForwardRef('FeatureExtractionMixin'), ForwardRef('ProcessorMixin'), NoneType] = None model_init: typing.Optional[typing.Callable[[], ForwardRef('PreTrainedModel')]] = None compute_loss_func: typing.Optional[typing.Callable] = None compute_metrics: typing.Optional[typing.Callable[[ForwardRef('EvalPrediction')], dict]] = None callbacks: typing.Optional[list['TrainerCallback']] = None optimizers: tuple = (None, None) preprocess_logits_for_metrics: typing.Optional[typing.Callable[[torch.Tensor, torch.Tensor], torch.Tensor]] = None )

評估

< >

( eval_dataset: typing.Optional[torch.utils.data.dataset.Dataset] = None ignore_keys: typing.Optional[list[str]] = None metric_key_prefix: str = 'eval' **gen_kwargs )

引數

  • eval_dataset (Dataset, 可選) — 如果您希望覆蓋 `self.eval_dataset`,請傳入一個數據集。如果它是 Dataset,則模型 `model.forward()` 方法不接受的列將自動移除。它必須實現 `__len__` 方法。
  • ignore_keys (List[str], 可選) — 您的模型輸出(如果它是字典)中應在收集預測時忽略的鍵列表。
  • metric_key_prefix (str, 可選, 預設為 "eval") — 用作指標鍵字首的可選字首。例如,如果字首為 "eval"(預設),則指標“bleu”將命名為“eval_bleu”。
  • max_length (int, 可選) — 使用生成方法預測時的最大目標長度。
  • num_beams (int, 可選) — 使用生成方法預測時,用於束搜尋的束數。1 表示無束搜尋。
  • gen_kwargs — 其他 `generate` 特定的 kwargs。

執行評估並返回指標。呼叫指令碼將負責提供一個計算指標的方法,因為它們是依賴於任務的(將其傳遞給 `compute_metrics` 引數進行初始化)。您還可以子類化並重寫此方法以注入自定義行為。

predict

< >

( test_dataset: Dataset ignore_keys: typing.Optional[list[str]] = None metric_key_prefix: str = 'test' **gen_kwargs )

引數

  • test_dataset (Dataset) — 用於執行預測的資料集。如果它是 Dataset,則模型 `model.forward()` 方法不接受的列將自動移除。它必須實現 `__len__` 方法
  • ignore_keys (List[str], 可選) — 您的模型輸出(如果它是字典)中應在收集預測時忽略的鍵列表。
  • metric_key_prefix (str, 可選, 預設為 "eval") — 用作指標鍵字首的可選字首。例如,如果字首為 "eval"(預設),則指標“bleu”將命名為“eval_bleu”。
  • max_length (int, 可選) — 使用生成方法預測時的最大目標長度。
  • num_beams (int, 可選) — 使用生成方法預測時,用於束搜尋的束數。1 表示無束搜尋。
  • gen_kwargs — 其他 `generate` 特定的 kwargs。

執行預測並返回預測結果和潛在指標。根據資料集和您的用例,您的測試資料集可能包含標籤。在這種情況下,此方法還將返回指標,就像在 `evaluate()` 中一樣。

如果您的預測或標籤具有不同的序列長度(例如,因為您在令牌分類任務中進行動態填充),則預測將被填充(在右側)以允許連線成一個數組。填充索引為 -100。
返回: *NamedTuple* 一個命名元組,包含以下鍵: - predictions (`np.ndarray`):`test_dataset` 上的預測。 - label_ids (`np.ndarray`, *可選*):標籤(如果資料集包含標籤)。 - metrics (`Dict[str, float]`, *可選*):潛在的指標字典(如果資料集包含標籤)。

GaudiTrainingArguments

class optimum.habana.GaudiTrainingArguments

< >

( output_dir: typing.Optional[str] = None overwrite_output_dir: bool = False do_train: bool = False do_eval: bool = False do_predict: bool = False eval_strategy: typing.Union[transformers.trainer_utils.IntervalStrategy, str] = 'no' prediction_loss_only: bool = False per_device_train_batch_size: int = 8 per_device_eval_batch_size: int = 8 per_gpu_train_batch_size: typing.Optional[int] = None per_gpu_eval_batch_size: typing.Optional[int] = None gradient_accumulation_steps: int = 1 eval_accumulation_steps: typing.Optional[int] = None eval_delay: typing.Optional[float] = 0 torch_empty_cache_steps: typing.Optional[int] = None learning_rate: float = 5e-05 weight_decay: float = 0.0 adam_beta1: float = 0.9 adam_beta2: float = 0.999 adam_epsilon: typing.Optional[float] = 1e-06 max_grad_norm: float = 1.0 num_train_epochs: float = 3.0 max_steps: int = -1 lr_scheduler_type: typing.Union[transformers.trainer_utils.SchedulerType, str] = 'linear' lr_scheduler_kwargs: typing.Union[dict, str, NoneType] = <factory> warmup_ratio: float = 0.0 warmup_steps: int = 0 log_level: typing.Optional[str] = 'passive' log_level_replica: typing.Optional[str] = 'warning' log_on_each_node: bool = True logging_dir: typing.Optional[str] = None logging_strategy: typing.Union[transformers.trainer_utils.IntervalStrategy, str] = 'steps' logging_first_step: bool = False logging_steps: float = 500 logging_nan_inf_filter: typing.Optional[bool] = False save_strategy: typing.Union[transformers.trainer_utils.SaveStrategy, str] = 'steps' save_steps: float = 500 save_total_limit: typing.Optional[int] = None save_safetensors: typing.Optional[bool] = True save_on_each_node: bool = False save_only_model: bool = False restore_callback_states_from_checkpoint: bool = False no_cuda: bool = False use_cpu: bool = False use_mps_device: bool = False seed: int = 42 data_seed: typing.Optional[int] = None jit_mode_eval: bool = False use_ipex: bool = False bf16: bool = False fp16: bool = False fp16_opt_level: str = 'O1' half_precision_backend: str = 'hpu_amp' bf16_full_eval: bool = False fp16_full_eval: bool = False tf32: typing.Optional[bool] = None local_rank: int = -1 ddp_backend: typing.Optional[str] = None tpu_num_cores: typing.Optional[int] = None tpu_metrics_debug: bool = False debug: typing.Union[str, list[transformers.debug_utils.DebugOption]] = '' dataloader_drop_last: bool = False eval_steps: typing.Optional[float] = None dataloader_num_workers: int = 0 dataloader_prefetch_factor: typing.Optional[int] = None past_index: int = -1 run_name: typing.Optional[str] = None disable_tqdm: typing.Optional[bool] = None remove_unused_columns: typing.Optional[bool] = True label_names: typing.Optional[list[str]] = None load_best_model_at_end: typing.Optional[bool] = False metric_for_best_model: typing.Optional[str] = None greater_is_better: typing.Optional[bool] = None ignore_data_skip: bool = False fsdp: typing.Union[list[transformers.trainer_utils.FSDPOption], str, NoneType] = '' fsdp_min_num_params: int = 0 fsdp_config: typing.Union[dict, str, NoneType] = None tp_size: typing.Optional[int] = 0 fsdp_transformer_layer_cls_to_wrap: typing.Optional[str] = None accelerator_config: typing.Union[dict, str, NoneType] = None deepspeed: typing.Union[dict, str, NoneType] = None label_smoothing_factor: float = 0.0 optim: typing.Union[transformers.training_args.OptimizerNames, str, NoneType] = 'adamw_torch' optim_args: typing.Optional[str] = None adafactor: bool = False group_by_length: bool = False length_column_name: typing.Optional[str] = 'length' report_to: typing.Union[NoneType, str, list[str]] = None ddp_find_unused_parameters: typing.Optional[bool] = False ddp_bucket_cap_mb: typing.Optional[int] = 230 ddp_broadcast_buffers: typing.Optional[bool] = None dataloader_pin_memory: bool = True dataloader_persistent_workers: bool = False skip_memory_metrics: bool = True use_legacy_prediction_loop: bool = False push_to_hub: bool = False resume_from_checkpoint: typing.Optional[str] = None hub_model_id: typing.Optional[str] = None hub_strategy: typing.Union[transformers.trainer_utils.HubStrategy, str] = 'every_save' hub_token: typing.Optional[str] = None hub_private_repo: typing.Optional[bool] = None hub_always_push: bool = False gradient_checkpointing: bool = False gradient_checkpointing_kwargs: typing.Union[dict, str, NoneType] = None include_inputs_for_metrics: bool = False include_for_metrics: list = <factory> eval_do_concat_batches: bool = True fp16_backend: str = 'auto' push_to_hub_model_id: typing.Optional[str] = None push_to_hub_organization: typing.Optional[str] = None push_to_hub_token: typing.Optional[str] = None mp_parameters: str = '' auto_find_batch_size: bool = False full_determinism: bool = False torchdynamo: typing.Optional[str] = None ray_scope: typing.Optional[str] = 'last' ddp_timeout: typing.Optional[int] = 1800 torch_compile: bool = False torch_compile_backend: typing.Optional[str] = None torch_compile_mode: typing.Optional[str] = None include_tokens_per_second: typing.Optional[bool] = False include_num_input_tokens_seen: typing.Optional[bool] = False neftune_noise_alpha: typing.Optional[float] = None optim_target_modules: typing.Union[NoneType, str, list[str]] = None batch_eval_metrics: bool = False eval_on_start: bool = False use_liger_kernel: typing.Optional[bool] = False eval_use_gather_object: typing.Optional[bool] = False average_tokens_across_devices: typing.Optional[bool] = False use_habana: typing.Optional[bool] = False gaudi_config_name: typing.Optional[str] = None use_lazy_mode: typing.Optional[bool] = True use_hpu_graphs: typing.Optional[bool] = False use_hpu_graphs_for_inference: typing.Optional[bool] = False use_hpu_graphs_for_training: typing.Optional[bool] = False use_compiled_autograd: typing.Optional[bool] = False compile_from_sec_iteration: typing.Optional[bool] = False compile_dynamic: typing.Optional[bool] = None use_zero3_leaf_promotion: typing.Optional[bool] = False cache_size_limit: typing.Optional[int] = None use_regional_compilation: typing.Optional[bool] = False inline_inbuilt_nn_modules: typing.Optional[bool] = None allow_unspec_int_on_nn_module: typing.Optional[bool] = None disable_tensor_cache_hpu_graphs: typing.Optional[bool] = False max_hpu_graphs: typing.Optional[int] = None distribution_strategy: typing.Optional[str] = 'ddp' context_parallel_size: typing.Optional[int] = 1 minimize_memory: typing.Optional[bool] = False throughput_warmup_steps: typing.Optional[int] = 0 adjust_throughput: bool = False pipelining_fwd_bwd: typing.Optional[bool] = False ignore_eos: typing.Optional[bool] = True non_blocking_data_copy: typing.Optional[bool] = False profiling_warmup_steps: typing.Optional[int] = 0 profiling_steps: typing.Optional[int] = 0 profiling_warmup_steps_eval: typing.Optional[int] = 0 profiling_steps_eval: typing.Optional[int] = 0 profiling_record_shapes: typing.Optional[bool] = True profiling_with_stack: typing.Optional[bool] = False attn_implementation: typing.Optional[str] = 'eager' sdp_on_bf16: bool = False fp8: typing.Optional[bool] = False )

引數

  • use_habana (bool, optional, defaults to False) — 是否使用Habana的HPU執行模型。
  • gaudi_config_name (str, optional) — 預訓練的Gaudi配置名稱或路徑。
  • use_lazy_mode (bool, optional, defaults to True) — 是否使用惰性模式執行模型。
  • use_hpu_graphs (bool, optional, defaults to False) — 已棄用,請改用 use_hpu_graphs_for_inference。是否使用HPU圖進行推理。
  • use_hpu_graphs_for_inference (bool, optional, defaults to False) — 是否使用HPU圖進行推理。這會加快延遲,但可能與某些操作不相容。
  • use_hpu_graphs_for_training (bool, optional, defaults to False) — 是否使用HPU圖進行推理。這會加速訓練,但可能與某些操作不相容。
  • use_compiled_autograd (bool, optional, defaults to False) — 是否使用編譯的自動梯度進行訓練。目前僅適用於摘要模型。
  • compile_from_sec_iteration (bool, optional, defaults to False) — 是否從第二次訓練迭代開始進行torch.compile。
  • compile_dynamic (bool|None, optional, defaults to None) — 為torch.compile設定“dynamic”引數的值。
  • use_regional_compilation (bool, optional, defaults to False) — 是否使用帶有deepspeed的區域編譯。
  • inline_inbuilt_nn_modules (bool, optional, defaults to None) — 為torch._dynamo.config設定“inline_inbuilt_nn_modules”引數的值。目前,停用此引數可提高ALBERT模型的效能。
  • cache_size_limit(int, optional, defaults to ‘None’) — 為torch._dynamo.config設定“cache_size_limit”引數的值。
  • allow_unspec_int_on_nn_module (bool, optional, defaults to None) — 為torch._dynamo.config設定“allow_unspec_int_on_nn_module”引數的值。
  • disable_tensor_cache_hpu_graphs (bool, optional, defaults to False) — 是否在使用HPU圖時停用張量快取。如果為True,張量將不會在HPU圖中快取,可以節省記憶體。
  • max_hpu_graphs (int, optional) — 要快取的最大HPU圖數量。減少此值可節省裝置記憶體。
  • distribution_strategy (str, optional, defaults to ddp) — 確定如何實現資料並行分散式訓練。可以是:ddpfast_ddp
  • throughput_warmup_steps (int, optional, defaults to 0) — 計算吞吐量時要忽略的步數。例如,設定throughput_warmup_steps=N時,前N步將不計入吞吐量計算。這在惰性模式下特別有用,因為前兩三個迭代通常需要更長時間。
  • adjust_throughput (‘bool’, optional, defaults to False) — 是否在吞吐量計算中排除日誌記錄、評估和儲存所花費的時間。
  • pipelining_fwd_bwd (bool, optional, defaults to False) — 是否在正向和反向之間新增額外的 mark_step 以進行主機反向構建和HPU正向計算的流水線操作。
  • non_blocking_data_copy (bool, optional, defaults to False) — 是否在準備輸入時啟用非同步資料複製。
  • profiling_warmup_steps (int, optional, defaults to 0) — 剖析時要忽略的訓練步驟數。
  • profiling_steps (int, optional, defaults to 0) — 啟用剖析時要捕獲的訓練步驟數。
  • profiling_warmup_steps_eval (int, optional, defaults to 0) — 剖析時要忽略的評估步驟數。
  • profiling_steps_eval (int, optional, defaults to 0) — 啟用剖析時要捕獲的評估步驟數。

GaudiTrainingArguments構建於Transformer的TrainingArguments之上,以實現在Habana的Gaudi上部署。

GaudiSeq2SeqTrainingArguments

class optimum.habana.GaudiSeq2SeqTrainingArguments

< >

( output_dir: typing.Optional[str] = None overwrite_output_dir: bool = False do_train: bool = False do_eval: bool = False do_predict: bool = False eval_strategy: typing.Union[transformers.trainer_utils.IntervalStrategy, str] = 'no' prediction_loss_only: bool = False per_device_train_batch_size: int = 8 per_device_eval_batch_size: int = 8 per_gpu_train_batch_size: typing.Optional[int] = None per_gpu_eval_batch_size: typing.Optional[int] = None gradient_accumulation_steps: int = 1 eval_accumulation_steps: typing.Optional[int] = None eval_delay: typing.Optional[float] = 0 torch_empty_cache_steps: typing.Optional[int] = None learning_rate: float = 5e-05 weight_decay: float = 0.0 adam_beta1: float = 0.9 adam_beta2: float = 0.999 adam_epsilon: typing.Optional[float] = 1e-06 max_grad_norm: float = 1.0 num_train_epochs: float = 3.0 max_steps: int = -1 lr_scheduler_type: typing.Union[transformers.trainer_utils.SchedulerType, str] = 'linear' lr_scheduler_kwargs: typing.Union[dict, str, NoneType] = <factory> warmup_ratio: float = 0.0 warmup_steps: int = 0 log_level: typing.Optional[str] = 'passive' log_level_replica: typing.Optional[str] = 'warning' log_on_each_node: bool = True logging_dir: typing.Optional[str] = None logging_strategy: typing.Union[transformers.trainer_utils.IntervalStrategy, str] = 'steps' logging_first_step: bool = False logging_steps: float = 500 logging_nan_inf_filter: typing.Optional[bool] = False save_strategy: typing.Union[transformers.trainer_utils.SaveStrategy, str] = 'steps' save_steps: float = 500 save_total_limit: typing.Optional[int] = None save_safetensors: typing.Optional[bool] = True save_on_each_node: bool = False save_only_model: bool = False restore_callback_states_from_checkpoint: bool = False no_cuda: bool = False use_cpu: bool = False use_mps_device: bool = False seed: int = 42 data_seed: typing.Optional[int] = None jit_mode_eval: bool = False use_ipex: bool = False bf16: bool = False fp16: bool = False fp16_opt_level: str = 'O1' half_precision_backend: str = 'hpu_amp' bf16_full_eval: bool = False fp16_full_eval: bool = False tf32: typing.Optional[bool] = None local_rank: int = -1 ddp_backend: typing.Optional[str] = None tpu_num_cores: typing.Optional[int] = None tpu_metrics_debug: bool = False debug: typing.Union[str, list[transformers.debug_utils.DebugOption]] = '' dataloader_drop_last: bool = False eval_steps: typing.Optional[float] = None dataloader_num_workers: int = 0 dataloader_prefetch_factor: typing.Optional[int] = None past_index: int = -1 run_name: typing.Optional[str] = None disable_tqdm: typing.Optional[bool] = None remove_unused_columns: typing.Optional[bool] = True label_names: typing.Optional[list[str]] = None load_best_model_at_end: typing.Optional[bool] = False metric_for_best_model: typing.Optional[str] = None greater_is_better: typing.Optional[bool] = None ignore_data_skip: bool = False fsdp: typing.Union[list[transformers.trainer_utils.FSDPOption], str, NoneType] = '' fsdp_min_num_params: int = 0 fsdp_config: typing.Union[dict, str, NoneType] = None tp_size: typing.Optional[int] = 0 fsdp_transformer_layer_cls_to_wrap: typing.Optional[str] = None accelerator_config: typing.Union[dict, str, NoneType] = None deepspeed: typing.Union[dict, str, NoneType] = None label_smoothing_factor: float = 0.0 optim: typing.Union[transformers.training_args.OptimizerNames, str, NoneType] = 'adamw_torch' optim_args: typing.Optional[str] = None adafactor: bool = False group_by_length: bool = False length_column_name: typing.Optional[str] = 'length' report_to: typing.Union[NoneType, str, list[str]] = None ddp_find_unused_parameters: typing.Optional[bool] = False ddp_bucket_cap_mb: typing.Optional[int] = 230 ddp_broadcast_buffers: typing.Optional[bool] = None dataloader_pin_memory: bool = True dataloader_persistent_workers: bool = False skip_memory_metrics: bool = True use_legacy_prediction_loop: bool = False push_to_hub: bool = False resume_from_checkpoint: typing.Optional[str] = None hub_model_id: typing.Optional[str] = None hub_strategy: typing.Union[transformers.trainer_utils.HubStrategy, str] = 'every_save' hub_token: typing.Optional[str] = None hub_private_repo: typing.Optional[bool] = None hub_always_push: bool = False gradient_checkpointing: bool = False gradient_checkpointing_kwargs: typing.Union[dict, str, NoneType] = None include_inputs_for_metrics: bool = False include_for_metrics: list = <factory> eval_do_concat_batches: bool = True fp16_backend: str = 'auto' push_to_hub_model_id: typing.Optional[str] = None push_to_hub_organization: typing.Optional[str] = None push_to_hub_token: typing.Optional[str] = None mp_parameters: str = '' auto_find_batch_size: bool = False full_determinism: bool = False torchdynamo: typing.Optional[str] = None ray_scope: typing.Optional[str] = 'last' ddp_timeout: typing.Optional[int] = 1800 torch_compile: bool = False torch_compile_backend: typing.Optional[str] = None torch_compile_mode: typing.Optional[str] = None include_tokens_per_second: typing.Optional[bool] = False include_num_input_tokens_seen: typing.Optional[bool] = False neftune_noise_alpha: typing.Optional[float] = None optim_target_modules: typing.Union[NoneType, str, list[str]] = None batch_eval_metrics: bool = False eval_on_start: bool = False use_liger_kernel: typing.Optional[bool] = False eval_use_gather_object: typing.Optional[bool] = False average_tokens_across_devices: typing.Optional[bool] = False use_habana: typing.Optional[bool] = False gaudi_config_name: typing.Optional[str] = None use_lazy_mode: typing.Optional[bool] = True use_hpu_graphs: typing.Optional[bool] = False use_hpu_graphs_for_inference: typing.Optional[bool] = False use_hpu_graphs_for_training: typing.Optional[bool] = False use_compiled_autograd: typing.Optional[bool] = False compile_from_sec_iteration: typing.Optional[bool] = False compile_dynamic: typing.Optional[bool] = None use_zero3_leaf_promotion: typing.Optional[bool] = False cache_size_limit: typing.Optional[int] = None use_regional_compilation: typing.Optional[bool] = False inline_inbuilt_nn_modules: typing.Optional[bool] = None allow_unspec_int_on_nn_module: typing.Optional[bool] = None disable_tensor_cache_hpu_graphs: typing.Optional[bool] = False max_hpu_graphs: typing.Optional[int] = None distribution_strategy: typing.Optional[str] = 'ddp' context_parallel_size: typing.Optional[int] = 1 minimize_memory: typing.Optional[bool] = False throughput_warmup_steps: typing.Optional[int] = 0 adjust_throughput: bool = False pipelining_fwd_bwd: typing.Optional[bool] = False ignore_eos: typing.Optional[bool] = True non_blocking_data_copy: typing.Optional[bool] = False profiling_warmup_steps: typing.Optional[int] = 0 profiling_steps: typing.Optional[int] = 0 profiling_warmup_steps_eval: typing.Optional[int] = 0 profiling_steps_eval: typing.Optional[int] = 0 profiling_record_shapes: typing.Optional[bool] = True profiling_with_stack: typing.Optional[bool] = False attn_implementation: typing.Optional[str] = 'eager' sdp_on_bf16: bool = False fp8: typing.Optional[bool] = False sortish_sampler: bool = False predict_with_generate: bool = False generation_max_length: typing.Optional[int] = None generation_num_beams: typing.Optional[int] = None generation_config: typing.Union[str, pathlib.Path, optimum.habana.transformers.generation.configuration_utils.GaudiGenerationConfig, NoneType] = None )

引數

  • predict_with_generate (bool, optional, 預設為 False) — 是否使用生成來計算生成度量(ROUGE,BLEU)。
  • generation_max_length (int, optional) — 當 predict_with_generate=True 時,每次評估迴圈中使用的 max_length。預設為模型配置的 max_length 值。
  • generation_num_beams (int, optional) — 當 predict_with_generate=True 時,每次評估迴圈中使用的 num_beams。預設為模型配置的 num_beams 值。
  • generation_config (strPathtransformers.generation.GenerationConfig, 可選) — 允許從 from_pretrained 方法載入 transformers.generation.GenerationConfig。它可以是:

    • 一個字串,即 huggingface.co 上模型倉庫中預訓練模型配置的模型ID
    • 一個包含使用 transformers.GenerationConfig.save_pretrained 方法儲存的配置檔案的目錄路徑,例如 ./my_model_directory/
    • 一個 transformers.generation.GenerationConfig 物件。

GaudiSeq2SeqTrainingArguments 構建於 Transformers 的 Seq2SeqTrainingArguments 之上,以實現在 Habana 的 Gaudi 上的部署。

to_dict

< >

( )

此例項在序列化時,會將 Enum 替換為其值,並將 GaudiGenerationConfig 替換為字典(以支援 JSON 序列化)。它透過移除令牌值來混淆令牌值。

< > 在 GitHub 上更新

© . This site is unofficial and not affiliated with Hugging Face, Inc.