Transformers 文件
模型除錯工具箱
並獲得增強的文件體驗
開始使用
模型除錯工具箱
本頁列出了庫使用的所有除錯和模型新增工具,以及它提供的實用函式。
這些工具大多隻在您向庫中新增新模型時才有用。
模型新增偵錯程式
模型新增偵錯程式 - 模型新增者的上下文管理器
此上下文管理器是為模型新增者設計的強力工具。它會跟蹤模型前向傳播中的所有前向呼叫,並在巢狀的 JSON 中記錄每個輸入和輸出的切片。值得注意的是,此上下文管理器強制執行 torch.no_grad()
。
原理
將模型移植到 Transformers 時,即使是從 Python 到 Python,模型新增者也常常需要進行大量手動操作,包括儲存和載入張量、比較資料型別等。這個小工具希望能節省一些時間。
使用方法
按如下方式新增此上下文管理器以除錯模型
import torch
from PIL import Image
import requests
from transformers import LlavaProcessor, LlavaForConditionalGeneration
from transformers.model_debugging_utils import model_addition_debugger_context
torch.random.manual_seed(673)
# load pretrained model and processor
model_id = "llava-hf/llava-1.5-7b-hf"
processor = LlavaProcessor.from_pretrained(model_id)
model = LlavaForConditionalGeneration.from_pretrained(model_id)
# create random image input
random_image = Image.fromarray(torch.randint(0, 256, (224, 224, 3), dtype=torch.uint8).numpy())
# prompt
prompt = "<image>Describe this image."
# process inputs
inputs = processor(text=prompt, images=random_image, return_tensors="pt")
# call forward method (not .generate!)
with model_addition_debugger_context(
model,
debug_path="optional_path_to_your_directory",
do_prune_layers=False # This will output ALL the layers of a model.
):
output = model.forward(**inputs)
讀取結果
偵錯程式會從前向呼叫生成兩個檔案,它們具有相同的基本名稱,但分別以 _SUMMARY.json
或 _FULL_TENSORS.json
結尾。
第一個檔案將包含每個模組的*輸入*和*輸出*張量值和形狀的摘要。
{
"module_path": "MolmoForConditionalGeneration",
"inputs": {
"args": [],
"kwargs": {
"input_ids": {
"shape": "torch.Size([1, 589])",
"dtype": "torch.int64"
},
"attention_mask": {
"shape": "torch.Size([1, 589])",
"dtype": "torch.int64"
},
"pixel_values": {
"shape": "torch.Size([1, 5, 576, 588])",
"dtype": "torch.float32",
"mean": "tensor(-8.9514e-01, device='cuda:0')",
"std": "tensor(9.2586e-01, device='cuda:0')",
"min": "tensor(-1.7923e+00, device='cuda:0')",
"max": "tensor(1.8899e+00, device='cuda:0')"
}
},
"children": [
{
"module_path": "MolmoForConditionalGeneration.language_model.model.embed_tokens",
"inputs": {
"args": [
{
"shape": "torch.Size([1, 589])",
"dtype": "torch.int64"
}
]
},
"outputs": {
"shape": "torch.Size([1, 589, 3584])",
"dtype": "torch.float32",
"mean": "tensor(6.5460e-06, device='cuda:0')",
"std": "tensor(2.3807e-02, device='cuda:0')",
"min": "tensor(-3.3398e-01, device='cuda:0')",
"max": "tensor(3.9453e-01, device='cuda:0')"
}
},
{
"module_path": "MolmoForConditionalGeneration.vision_tower",
"inputs": {
"args": [
{
"shape": "torch.Size([5, 1, 576, 588])",
"dtype": "torch.float32",
"mean": "tensor(-8.9514e-01, device='cuda:0')",
"std": "tensor(9.2586e-01, device='cuda:0')",
"min": "tensor(-1.7923e+00, device='cuda:0')",
"max": "tensor(1.8899e+00, device='cuda:0')"
}
],
"kwargs": {
"output_hidden_states": "True"
}
},
"children": [
{ ... and so on
_FULL_TENSORS.json
檔案將顯示所有張量的完整檢視,這對於比較兩個檔案很有用。
"pixel_values": {
"shape": "torch.Size([1, 5, 576, 588])",
"dtype": "torch.float32",
"value": [
"tensor([[[[-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
" [-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
" [-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
" ...,",
" [-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
" [-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
" [-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00]],",
"",
" [[-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
" [-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
" [-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
" ...,",
" [-1.4857e+00, -1.4820e+00, -1.2100e+00, ..., -6.0979e-01, -5.9650e-01, -3.8527e-01],",
" [-1.6755e+00, -1.7221e+00, -1.4518e+00, ..., -7.5577e-01, -7.4658e-01, -5.5592e-01],",
" [-7.9957e-01, -8.2162e-01, -5.7014e-01, ..., -1.3689e+00, -1.3169e+00, -1.0678e+00]],",
"",
" [[-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
" [-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
" [-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
" ...,",
" [-3.0322e-01, -5.0645e-01, -5.8436e-01, ..., -6.2439e-01, -7.9160e-01, -8.1188e-01],",
" [-4.4921e-01, -6.5653e-01, -7.2656e-01, ..., -3.4702e-01, -5.2146e-01, -5.1326e-01],",
" [-3.4702e-01, -5.3647e-01, -5.4170e-01, ..., -1.0915e+00, -1.1968e+00, -1.0252e+00]],",
"",
" [[-1.1207e+00, -1.2718e+00, -1.0678e+00, ..., 1.2013e-01, -1.3126e-01, -1.7197e-01],",
" [-6.9738e-01, -9.1166e-01, -8.5454e-01, ..., -5.5050e-02, -2.8134e-01, -4.2793e-01],",
" [-3.4702e-01, -5.5148e-01, -5.8436e-01, ..., 1.9312e-01, -8.6235e-02, -2.1463e-01],",
" ...,",
" [-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
" [-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
" [-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00]],",
"",
" [[-1.0039e+00, -9.5669e-01, -6.5546e-01, ..., -1.4711e+00, -1.4219e+00, -1.1389e+00],",
" [-1.0039e+00, -9.5669e-01, -6.5546e-01, ..., -1.7193e+00, -1.6771e+00, -1.4091e+00],",
" [-1.6317e+00, -1.6020e+00, -1.2669e+00, ..., -1.2667e+00, -1.2268e+00, -8.9720e-01],",
" ...,",
" [-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
" [-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00],",
" [-1.7923e+00, -1.7521e+00, -1.4802e+00, ..., -1.7923e+00, -1.7521e+00, -1.4802e+00]]]], device='cuda:0')"
],
"mean": "tensor(-8.9514e-01, device='cuda:0')",
"std": "tensor(9.2586e-01, device='cuda:0')",
"min": "tensor(-1.7923e+00, device='cuda:0')",
"max": "tensor(1.8899e+00, device='cuda:0')"
},
將張量儲存到磁碟
一些模型新增者可能會從將完整的張量值記錄到磁碟中受益,例如,支援跨實現的數值分析。
將 use_repr=False
設定為使用 SafeTensors 將張量寫入磁碟。
with model_addition_debugger_context(
model,
debug_path="optional_path_to_your_directory",
do_prune_layers=False,
use_repr=False, # Defaults to True
):
output = model.forward(**inputs)
當使用 use_repr=False
時,張量會寫入與 _SUMMARY.json
和 _FULL_TENSORS.json
檔案相同的磁碟位置。_FULL_TENSORS.json
檔案中條目的 value
屬性將包含對關聯的 .safetensors
檔案的相對路徑引用。每個張量都作為狀態字典的 data
屬性寫入自己的檔案。檔名使用 module_path
作為字首,並帶有一些遞迴構建的可能字尾。
- 模組輸入用
_inputs
表示,輸出用_outputs
表示。 list
和tuple
例項,例如args
或函式返回值,將以_{index}
作為字尾。dict
例項將以_{key}
作為字尾。
不同實現之間的比較
一旦偵錯程式跟蹤了兩個模型的前向傳播,就可以比較 json
輸出檔案。如下所示:我們可以看到這兩個實現的鍵投影層之間存在細微差異。輸入基本相同,但並不完全一致。透過檢視檔案差異,可以更容易地找出哪個層是錯誤的。
侷限性和範圍
此功能僅適用於基於 torch 的模型,對於通常編譯的基於 jax
的模型則需要更多的工作和逐案處理。嚴重依賴外部核心呼叫的模型可能有效,但跟蹤可能會遺漏一些東西。無論如何,任何旨在模仿另一個實現的 Python 實現都可以一次性進行跟蹤,而不是重複執行 N 次並設定斷點。
如果您將 do_prune_layers=False
傳遞給您的模型偵錯程式,則所有層都將輸出到 json
。否則,將只顯示第一層和最後一層。這在某些層(通常是交叉注意力)僅在 N 層之後才出現時非常有用。
transformers.model_addition_debugger_context
< 來源 >( model debug_path: typing.Optional[str] = None do_prune_layers: typing.Optional[bool] = True use_repr: typing.Optional[bool] = True )
模型新增偵錯程式 - 模型新增者的上下文管理器
此上下文管理器是為模型新增者設計的強力工具。
它跟蹤模型前向傳播中的所有前向呼叫,並在巢狀的 JSON 檔案中記錄每個輸入和輸出的切片。如果 use_repr=True
(預設值),JSON 檔案將記錄張量的 repr()
化版本,作為字串列表。如果 use_repr=False
,完整的張量將儲存在單獨的 SafeTensors 檔案中,JSON 檔案將提供指向該檔案的相對路徑。
值得注意的是,此上下文管理器強制執行 torch.no_grad()
。
使用方法
將上下文管理器新增到模型以進行除錯
import torch
from PIL import Image
from transformers import LlavaProcessor, LlavaForConditionalGeneration, model_addition_debugger_context
torch.random.manual_seed(673)
# load pretrained model and processor
model_id = "llava-hf/llava-1.5-7b-hf"
processor = LlavaProcessor.from_pretrained(model_id)
model = LlavaForConditionalGeneration.from_pretrained(model_id)
# create random image input
random_image = Image.fromarray(torch.randint(0, 256, (224, 224, 3), dtype=torch.uint8).numpy())
# prompt
prompt = "<image>Describe this image."
# process inputs
inputs = processor(text=prompt, images=random_image, return_tensors="pt")
# call forward method (not .generate!)
with model_addition_debugger_context(model, debug_path="Your_debug_path", do_prune_layers=False):
output = model.forward(**inputs)