Seq2Seq

Seq2Seq 是一項將一個詞序列轉換為另一個詞序列的任務。它用於機器翻譯、文字摘要和問答等領域。

資料格式

您可以將資料集儲存為 CSV 檔案

text,target
"this movie is great","dieser Film ist großartig"
"this movie is bad","dieser Film ist schlecht"
.
.
.

或者 JSONL 檔案

{"text": "this movie is great", "target": "dieser Film ist großartig"}
{"text": "this movie is bad", "target": "dieser Film ist schlecht"}
.
.
.

列

您的 CSV/JSONL 資料集必須包含兩列： text 和 target。

引數

class autotrain.trainers.seq2seq.params.Seq2SeqParams

< source >

( data_path: str = None model: str = 'google/flan-t5-base' username: typing.Optional[str] = None seed: int = 42 train_split: str = 'train' valid_split: typing.Optional[str] = None project_name: str = 'project-name' token: typing.Optional[str] = None push_to_hub: bool = False text_column: str = 'text' target_column: str = 'target' lr: float = 5e-05 epochs: int = 3 max_seq_length: int = 128 max_target_length: int = 128 batch_size: int = 2 warmup_ratio: float = 0.1 gradient_accumulation: int = 1 optimizer: str = 'adamw_torch' scheduler: str = 'linear' weight_decay: float = 0.0 max_grad_norm: float = 1.0 logging_steps: int = -1 eval_strategy: str = 'epoch' auto_find_batch_size: bool = False mixed_precision: typing.Optional[str] = None save_total_limit: int = 1 peft: bool = False quantization: typing.Optional[str] = 'int8' lora_r: int = 16 lora_alpha: int = 32 lora_dropout: float = 0.05 target_modules: str = 'all-linear' log: str = 'none' early_stopping_patience: int = 5 early_stopping_threshold: float = 0.01 )

引數

data_path (str) — 資料集的路徑。
model (str) — 要使用的模型名稱。預設為 “google/flan-t5-base”。
username (Optional[str]) — Hugging Face 使用者名稱。
seed (int) — 用於可復現性的隨機種子。預設為 42。
train_split (str) — 訓練資料切分的名稱。預設為 “train”。
valid_split (Optional[str]) — 驗證資料切分的名稱。
project_name (str) — 專案或輸出目錄的名稱。預設為 “project-name”。
token (Optional[str]) — 用於身份驗證的 Hub Token。
push_to_hub (bool) — 是否將模型推送到 Hugging Face Hub。預設為 False。
text_column (str) — 資料集中文字列的名稱。預設為 “text”。
target_column (str) — 資料集中目標文字列的名稱。預設為 “target”。
lr (float) — 訓練學習率。預設為 5e-5。
epochs (int) — 訓練輪數。預設為 3。
max_seq_length (int) — 輸入文字的最大序列長度。預設為 128。
max_target_length (int) — 目標文字的最大序列長度。預設為 128。
batch_size (int) — 訓練批次大小。預設為 2。
warmup_ratio (float) — 預熱步數的比例。預設為 0.1。
gradient_accumulation (int) — 梯度累積步數。預設為 1。
optimizer (str) — 要使用的最佳化器。預設為 “adamw_torch”。
scheduler (str) — 要使用的學習率排程器。預設為 “linear”。
weight_decay (float) — 最佳化器的權重衰減。預設為 0.0。
max_grad_norm (float) — 用於梯度裁剪的最大梯度範數。預設為 1.0。
logging_steps (int) — 兩次日誌記錄之間的步數。預設為 -1 (停用)。
eval_strategy (str) — 評估策略。預設為 “epoch”。
auto_find_batch_size (bool) — 是否自動尋找批次大小。預設為 False。
mixed_precision (Optional[str]) — 混合精度訓練模式 (fp16, bf16, 或 None)。
save_total_limit (int) — 要儲存的最大檢查點數。預設為 1。
peft (bool) — 是否使用引數高效微調 (PEFT)。預設為 False。
quantization (Optional[str]) — 量化模式 (int4, int8, 或 None)。預設為 “int8”。
lora_r (int) — 用於 PEFT 的 LoRA-R 引數。預設為 16。
lora_alpha (int) — 用於 PEFT 的 LoRA-Alpha 引數。預設為 32。
lora_dropout (float) — 用於 PEFT 的 LoRA-Dropout 引數。預設為 0.05。
target_modules (str) — PEFT 的目標模組。預設為 “all-linear”。
log (str) — 用於實驗跟蹤的日誌記錄方法。預設為 “none”。
early_stopping_patience (int) — 提前停止的耐心值。預設為 5。
early_stopping_threshold (float) — 提前停止的閾值。預設為 0.01。

Seq2SeqParams 是一個用於序列到序列訓練引數的配置類。

< > 在 GitHub 上更新

AutoTrain

Seq2Seq

資料格式

列

引數

class autotrain.trainers.seq2seq.params.Seq2SeqParams