DePlot

概述

DePlot 是在 Fangyu Liu, Julian Martin Eisenschlos, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Wenhu Chen, Nigel Collier, Yasemin Altun 撰寫的論文 DePlot: One-shot visual language reasoning by plot-to-table translation 中提出的。

論文摘要如下：

圖表等視覺語言在人類世界中無處不在。理解圖表需要強大的推理能力。之前的最先進 (SOTA) 模型需要至少數萬個訓練樣本，其推理能力仍然非常有限，尤其是在處理複雜的人工編寫查詢時。本文提出了第一個用於視覺語言推理的單次解決方案。我們將視覺語言推理的挑戰分解為兩個步驟：(1) 圖表到文字的轉換，以及 (2) 對轉換後的文字進行推理。該方法的關鍵是一個名為 DePlot 的模態轉換模組，它將圖表影像轉換為線性化表格。DePlot 的輸出可以直接用於提示預訓練的大型語言模型 (LLM)，從而利用 LLM 的少樣本推理能力。為了獲得 DePlot，我們透過建立統一的任務格式和指標來標準化圖表到表格的任務，並在這個任務上端到端地訓練 DePlot。然後，DePlot 可以與 LLM 即插即用地使用。與在一個擁有超過 28k 個數據點上進行微調的 SOTA 模型相比，僅進行單次提示的 DePlot+LLM 在圖表問答任務的人工編寫查詢上，比微調後的 SOTA 模型提高了 24.0%。

DePlot 是一個使用 `Pix2Struct` 架構訓練的模型。您可以在 Pix2Struct 文件中找到更多關於 `Pix2Struct` 的資訊。DePlot 是 `Pix2Struct` 架構中視覺問答的一個子集。它在影像上渲染輸入問題並預測答案。

使用示例

目前 DePlot 有一個檢查點可用

`google/deplot`：在 ChartQA 資料集上微調的 DePlot

from transformers import AutoProcessor, Pix2StructForConditionalGeneration
import requests
from PIL import Image

model = Pix2StructForConditionalGeneration.from_pretrained("google/deplot")
processor = AutoProcessor.from_pretrained("google/deplot")
url = "https://raw.githubusercontent.com/vis-nlp/ChartQA/main/ChartQA%20Dataset/val/png/5090.png"
image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(images=image, text="Generate underlying data table of the figure below:", return_tensors="pt")
predictions = model.generate(**inputs, max_new_tokens=512)
print(processor.decode(predictions[0], skip_special_tokens=True))

微調

要微調 DePlot，請參閱 pix2struct 微調筆記本。對於 `Pix2Struct` 模型，我們發現使用 Adafactor 和餘弦學習率排程器進行微調可以加快收斂速度

from transformers.optimization import Adafactor, get_cosine_schedule_with_warmup

optimizer = Adafactor(self.parameters(), scale_parameter=False, relative_step=False, lr=0.01, weight_decay=1e-05)
scheduler = get_cosine_schedule_with_warmup(optimizer, num_warmup_steps=1000, num_training_steps=40000)

DePlot 是一個使用 `Pix2Struct` 架構訓練的模型。有關 API 參考，請參閱 `Pix2Struct` 文件。

< > 在 GitHub 上更新

Transformers

DePlot

概述

使用示例

微調