使用 AutoTrain 微調 PaliGemma

社群文章釋出於 2024 年 7 月 25 日

贊

Abhishek Thakur

abhishek

在這篇部落格中，我們將探討如何使用 AutoTrain 對 PaliGemma 模型進行微調，以完成視覺問答 (VQA) 和影像描述 (captioning) 任務。

AutoTrain 是一個無程式碼解決方案，旨在為資料科學家、機器學習工程師和愛好者簡化工作流程。它允許您無需編寫任何程式碼即可訓練 (幾乎) 任何最先進的模型。要開始使用 AutoTrain，請檢視文件和 Github 倉庫。

資料集

您可以使用來自 Hub 的資料集或本地資料集。

Hub 資料集

Hub 資料集的格式應如下：

我們感興趣的列是：

image: 影像 (image)
question: 問題 (prompt_text_column)
multiple_choice_answer: 答案 (text_column)

注意：對於 VQA 任務，我們使用以上三列。對於影像描述任務，我們只使用 image 和 text_column 列。

本地資料集

如果使用本地資料集，其格式應如下：

train/
├── 0001.png
├── 0002.png
├── 0003.png
├── .
├── .
├── .
└── metadata.jsonl

其中 metadata.jsonl 的內容如下：

{"file_name": "0001.jpg", "question": "What vehicles are shown?", "multiple_choice_answer": "motorcycles"}
{"file_name": "0002.jpg", "question": "Is the plane upside down?", "multiple_choice_answer": "no"}
{"file_name": "0003.jpg", "question": "What is the boy doing?", "multiple_choice_answer": "batting"}

metadata.jsonl 必須包含 file_name 列，您可以更改其他列的名稱。

如果您有驗證資料，可以新增一個與上述格式相同的資料夾。

注意：使用 AutoTrain UI 時，資料夾需要被壓縮成 ZIP 檔案。當 train.zip 被解壓時，它應該包含所有影像和 metadata.jsonl 檔案，不能有任何資料夾或子資料夾。

本地訓練

在本地，autotrain 可以在 UI 模式或 CLI 模式下使用。

要安裝 autotrain，請使用 pip 命令：

$ pip install -U autotrain-advanced

安裝完成後，您可以使用以下命令啟動 UI：

$ autotrain app

使用 CLI/config 訓練

要使用配置檔案進行訓練，請建立一個名為 config.yml 的檔案，內容如下：

task: vlm:vqa
base_model: google/paligemma-3b-pt-224
project_name: autotrain-paligemma-finetuned-vqa
log: tensorboard
backend: local

data:
  path: abhishek/vqa_small
  train_split: train
  valid_split: validation
  column_mapping:
    image_column: image
    text_column: multiple_choice_answer
    prompt_text_column: question

params:
  epochs: 3
  batch_size: 2
  lr: 2e-5
  optimizer: adamw_torch
  scheduler: linear
  gradient_accumulation: 4
  mixed_precision: fp16
  peft: true
  quantization: int4

hub:
  username: ${HF_USERNAME}
  token: ${HF_TOKEN}
  push_to_hub: true

以上配置使用了來自 Hub 的資料集，如果使用本地資料集，請更改以下內容：

data:
  path: local_dataset_folder_path # where training and validation (optional) folders are
  train_split: train # name of training folder
  valid_split: validation # name of validation folder or none
  column_mapping:
    image_column: image
    text_column: multiple_choice_answer
    prompt_text_column: question

請仔細檢查列的對映關係！

完成後，執行：

$ export HF_USERNAME=your_hugging_face_username
$ export HF_TOKEN=your_hugging_face_write_token
$ autotrain --config path_to_config.yml

然後等待並觀察訓練進度 :)

使用 UI 訓練

這是使用 HuggingFace Hub 資料集的 UI 截圖：

這是使用本地資料集的截圖：

再次強調，請特別注意列的對映關係 ;)

最後，您的模型可以被推送到 Hub (由您選擇)，然後就可以使用了。如果遇到任何問題，請在 Github 問題跟蹤器上提問：這裡。

祝您 AutoTraining 愉快！🤗

社群

透過拖放到文字輸入框、貼上或點選此處上傳圖片、音訊和影片。

點選或貼上此處以上傳圖片

· 註冊或登入以發表評論

贊