使用 AutoTrain SpaceRunner 在 Hugging Face Spaces 上訓練自定義模型

社群文章釋出於 2024 年 5 月 9 日

你知道嗎？你可以在 Hugging Face Spaces 上訓練自己的模型！是的，這完全可能，而且使用 AutoTrain SpaceRunner 還能超級簡單地完成 💥 你所需要的只是一個 Hugging Face 賬戶（你可能已經有了）和一個綁定了支付方式的賬戶（如果你想使用 GPU 的話，CPU 訓練是免費的！）。所以，別再花時間在其他雲服務提供商上配置環境了，直接用 AutoTrain SpaceRunner 來訓練你的模型吧：訓練環境已經為你準備好了，你還可以安裝/解除安裝專案所需的任何依賴項！聽起來很激動人心？讓我們來看看怎麼做吧！

第一步是建立一個專案資料夾。專案資料夾裡可以包含任何東西，但必須有一個 script.py 檔案。這個指令碼檔案是入口點。

-- my_project
---- some_module
---- some_other_module
---- script.py
---- requirements.txt

requirements.txt 是可選的，只有當你需要新增/刪除某些依賴時才需要它。例如，下面的 requirements.txt 檔案會移除預裝的 xgboost，然後安裝 catboost。

-xgboost
catboost

包名前的 - 表示解除安裝。

script.py 應該是什麼樣的？

嗯，你可以按你喜歡的方式編寫。下面是一個示例：

for _ in range(10):
    print("Hello World!")

你可以在 script.py 中做任何你想做的事情。只要本地模組存在於專案目錄中，你也可以匯入它們。

最後一步是在 Spaces 上執行程式碼。操作方法如下。

如果還沒安裝 AutoTrain，請先安裝：pip install -U autotrain-advanced。然後你可以執行 autotrain spacerunner --help。這將顯示所有需要的引數。

❯ autotrain spacerunner --help
usage: autotrain <command> [<args>] spacerunner [-h] --project-name PROJECT_NAME --script-path SCRIPT_PATH --username USERNAME --token TOKEN
                                                --backend {spaces-a10gl,spaces-a10gs,spaces-a100,spaces-t4m,spaces-t4s,spaces-cpu,spaces-cpuf}
                                                [--env ENV] [--args ARGS]

✨ Run AutoTrain SpaceRunner

options:
  -h, --help            show this help message and exit
  --project-name PROJECT_NAME
                        Name of the project. Must be unique.
  --script-path SCRIPT_PATH
                        Path to the script
  --username USERNAME   Hugging Face Username, can also be an organization name
  --token TOKEN         Hugging Face API Token
  --backend {spaces-a10gl,spaces-a10gs,spaces-a100,spaces-t4m,spaces-t4s,spaces-cpu,spaces-cpuf}
                        Hugging Face backend to use
  --env ENV             Environment variables, e.g. --env FOO=bar;FOO2=bar2;FOO3=bar3
  --args ARGS           Arguments to pass to the script, e.g. --args foo=bar;foo2=bar2;foo3=bar3;store_true_arg

--project-name 是在 Hugging Face Hub 上建立 space 和資料集（包含你的專案檔案）的唯一名稱。所有內容都以私密方式儲存，指令碼執行完畢後你可以刪除它。

--script-path 是包含 script.py 的目錄的本地路徑。

需要傳遞環境變數嗎？使用 --env。如果你需要向 script.py 傳遞引數，請使用 --args。

你可以選擇任何 spaces-* 後端來執行你的程式碼。任務完成後，space 會自動暫停（從而為你省錢）🚀

這是一個示例命令：

$ autotrain spacerunner \
    --project-name custom_llama_training \
    --script-path /path/to/script/py/ \
    --username abhishek \
    --token $HF_WRITE_TOKEN \
    --backend spaces-a10g-large \
    --args padding=right;push_to_hub
    --env TOKENIZERS_PARALLELISM=false;TRANSFORMERS_VERBOSITY=error

在本地，該指令碼的執行方式如下：

$ TOKENIZERS_PARALLELISM=false;TRANSFORMERS_VERBOSITY=error python script.py --padding right --push_to_hub

可用後端

"spaces-a10g-large": "a10g-large",
"spaces-a10g-small": "a10g-small",
"spaces-a100-large": "a100-large",
"spaces-t4-medium": "t4-medium",
"spaces-t4-small": "t4-small",
"spaces-cpu-upgrade": "cpu-upgrade",
"spaces-cpu-basic": "cpu-basic",
"spaces-l4x1": "l4x1",
"spaces-l4x4": "l4x4",
"spaces-a10g-largex2": "a10g-largex2",
"spaces-a10g-largex4": "a10g-largex4",

執行 spacerunner 命令後，你會得到一個 space 連結，用於監控你的訓練過程。就是這麼簡單！

注意：autotrain spacerunner 不會自動儲存產物，所以你必須在 script.py 中編寫程式碼來儲存產物/輸出。另外，建議將它們儲存在一個 Hugging Face 資料集倉庫中 ;)

有任何問題、評論、功能請求或 issue 嗎？請使用 AutoTrain Advanced 的 GitHub issues 頁面：https://github.com/huggingface/autotrain-advanced ⭐️

社群

9voltfan2009

7月11日

說真的，這確實是個很棒的功能。

透過拖放到文字輸入框、貼上或點選此處上傳圖片、音訊和影片。

點選或貼上此處以上傳圖片

· 註冊或登入以發表評論

贊