MatCha

概述

MatCha 在論文 MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering 中被提出，作者是 Fangyu Liu、Francesco Piccinno、Syrine Krichene、Chenxi Pang、Kenton Lee、Mandar Joshi、Yasemin Altun、Nigel Collier 和 Julian Martin Eisenschlos。

論文摘要如下：

圖表、圖形和資訊圖等視覺語言資料在人類世界中無處不在。然而，最先進的視覺語言模型在這些資料上的表現不佳。我們提出了 MatCha（數學推理和圖表去渲染預訓練）來增強視覺語言模型在聯合建模圖表/圖形和語言資料方面的能力。具體來說，我們提出了幾個預訓練任務，涵蓋了圖表解構和數值推理，這些是視覺語言建模中的關鍵能力。我們從 Pix2Struct（一種最近提出的影像到文字視覺語言模型）開始執行 MatCha 預訓練。在 PlotQA 和 ChartQA 等標準基準測試中，MatCha 模型比最先進的方法高出近 20%。我們還檢查了 MatCha 預訓練在螢幕截圖、教科書圖表和文件圖形等領域中的遷移效果，並觀察到整體改進，驗證了 MatCha 預訓練在更廣泛的視覺語言任務中的有用性。

模型描述

MatCha 是使用 Pix2Struct 架構訓練的模型。您可以在 Pix2Struct 文件中找到有關 Pix2Struct 的更多資訊。MatCha 是 Pix2Struct 架構的視覺問答子集。它將輸入問題渲染到影像上並預測答案。

用法

目前有 6 個 MatCha 的檢查點可用

google/matcha：基礎 MatCha 模型，用於在下游任務上微調 MatCha
google/matcha-chartqa：在 ChartQA 資料集上微調的 MatCha 模型。它可用於回答有關圖表的問題。
google/matcha-plotqa-v1：在 PlotQA 資料集上微調的 MatCha 模型。它可用於回答有關繪圖的問題。
google/matcha-plotqa-v2：在 PlotQA 資料集上微調的 MatCha 模型。它可用於回答有關繪圖的問題。
google/matcha-chart2text-statista：在 Statista 資料集上微調的 MatCha 模型。
google/matcha-chart2text-pew：在 Pew 資料集上微調的 MatCha 模型。

在 chart2text-pew 和 chart2text-statista 上微調的模型更適合摘要，而在 plotqa 和 chartqa 上微調的模型更適合問答。

您可以按如下方式使用這些模型（以 ChatQA 資料集為例）

from transformers import AutoProcessor, Pix2StructForConditionalGeneration
import requests
from PIL import Image

model = Pix2StructForConditionalGeneration.from_pretrained("google/matcha-chartqa").to(0)
processor = AutoProcessor.from_pretrained("google/matcha-chartqa")
url = "https://raw.githubusercontent.com/vis-nlp/ChartQA/main/ChartQA%20Dataset/val/png/20294671002019.png"
image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(images=image, text="Is the sum of all 4 places greater than Laos?", return_tensors="pt").to(0)
predictions = model.generate(**inputs, max_new_tokens=512)
print(processor.decode(predictions[0], skip_special_tokens=True))

微調

要微調 MatCha，請參考 pix2struct 微調筆記本。對於 Pix2Struct 模型，我們發現使用 Adafactor 和餘弦學習率排程器微調模型可以實現更快的收斂。

from transformers.optimization import Adafactor, get_cosine_schedule_with_warmup

optimizer = Adafactor(self.parameters(), scale_parameter=False, relative_step=False, lr=0.01, weight_decay=1e-05)
scheduler = get_cosine_schedule_with_warmup(optimizer, num_warmup_steps=1000, num_training_steps=40000)

MatCha 是使用 Pix2Struct 架構訓練的模型。您可以在 Pix2Struct 文件中找到有關 Pix2Struct 的更多資訊。

< > 在 GitHub 上更新

Transformers

MatCha

概述

模型描述

用法

微調