SuperGlue

概述

SuperGlue 模型在 Paul-Edouard Sarlin、Daniel DeTone、Tomasz Malisiewicz 和 Andrew Rabinovich 的論文 SuperGlue: Learning Feature Matching with Graph Neural Networks 中提出。

該模型旨在匹配影像中檢測到的兩組興趣點。與 SuperPoint 模型結合使用時，它可以用於匹配兩幅影像並估計它們之間的姿態。該模型適用於影像匹配、單應性估計等任務。

論文摘要如下：

本文介紹了 SuperGlue，一個透過共同查詢對應點和拒絕不可匹配點來匹配兩組區域性特徵的神經網路。透過求解可微分的最優傳輸問題來估計分配，其成本由圖神經網路預測。我們引入了一種基於注意力的靈活上下文聚合機制，使 SuperGlue 能夠共同推斷底層 3D 場景和特徵分配。與傳統的、手工設計的啟發式方法相比，我們的技術透過影像對的端到端訓練學習了關於幾何變換和 3D 世界規律的先驗知識。SuperGlue 優於其他學習方法，並在具有挑戰性的真實室內外環境中實現了姿態估計任務的最新成果。所提出的方法在現代 GPU 上即時執行匹配，並且可以很容易地整合到現代 SfM 或 SLAM 系統中。程式碼和訓練好的權重可在此 URL 公開獲取。

如何使用

以下是使用該模型的一個快速示例。由於該模型是一個影像匹配模型，它需要成對的影像進行匹配。原始輸出包含關鍵點檢測器檢測到的關鍵點列表以及匹配及其相應匹配分數的列表。

from transformers import AutoImageProcessor, AutoModel
import torch
from PIL import Image
import requests

url_image1 = "https://raw.githubusercontent.com/magicleap/SuperGluePretrainedNetwork/refs/heads/master/assets/phototourism_sample_images/united_states_capitol_98169888_3347710852.jpg"
image1 = Image.open(requests.get(url_image1, stream=True).raw)
url_image2 = "https://raw.githubusercontent.com/magicleap/SuperGluePretrainedNetwork/refs/heads/master/assets/phototourism_sample_images/united_states_capitol_26757027_6717084061.jpg"
image_2 = Image.open(requests.get(url_image2, stream=True).raw)

images = [image1, image2]

processor = AutoImageProcessor.from_pretrained("magic-leap-community/superglue_outdoor")
model = AutoModel.from_pretrained("magic-leap-community/superglue_outdoor")

inputs = processor(images, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)

您可以使用 SuperGlueImageProcessor 的 post_process_keypoint_matching 方法以更可讀的格式獲取關鍵點和匹配。

image_sizes = [[(image.height, image.width) for image in images]]
outputs = processor.post_process_keypoint_matching(outputs, image_sizes, threshold=0.2)
for i, output in enumerate(outputs):
    print("For the image pair", i)
    for keypoint0, keypoint1, matching_score in zip(
            output["keypoints0"], output["keypoints1"], output["matching_scores"]
    ):
        print(
            f"Keypoint at coordinate {keypoint0.numpy()} in the first image matches with keypoint at coordinate {keypoint1.numpy()} in the second image with a score of {matching_score}."
        )

從輸出中，您可以使用以下程式碼視覺化兩幅影像之間的匹配。

import matplotlib.pyplot as plt
import numpy as np

# Create side by side image
merged_image = np.zeros((max(image1.height, image2.height), image1.width + image2.width, 3))
merged_image[: image1.height, : image1.width] = np.array(image1) / 255.0
merged_image[: image2.height, image1.width :] = np.array(image2) / 255.0
plt.imshow(merged_image)
plt.axis("off")

# Retrieve the keypoints and matches
output = outputs[0]
keypoints0 = output["keypoints0"]
keypoints1 = output["keypoints1"]
matching_scores = output["matching_scores"]
keypoints0_x, keypoints0_y = keypoints0[:, 0].numpy(), keypoints0[:, 1].numpy()
keypoints1_x, keypoints1_y = keypoints1[:, 0].numpy(), keypoints1[:, 1].numpy()

# Plot the matches
for keypoint0_x, keypoint0_y, keypoint1_x, keypoint1_y, matching_score in zip(
        keypoints0_x, keypoints0_y, keypoints1_x, keypoints1_y, matching_scores
):
    plt.plot(
        [keypoint0_x, keypoint1_x + image1.width],
        [keypoint0_y, keypoint1_y],
        color=plt.get_cmap("RdYlGn")(matching_score.item()),
        alpha=0.9,
        linewidth=0.5,
    )
    plt.scatter(keypoint0_x, keypoint0_y, c="black", s=2)
    plt.scatter(keypoint1_x + image1.width, keypoint1_y, c="black", s=2)

# Save the plot
plt.savefig("matched_image.png", dpi=300, bbox_inches='tight')
plt.close()

image/png

該模型由 stevenbucaille 貢獻。原始程式碼可在此處找到。

SuperGlueConfig

class transformers.SuperGlueConfig

< 源 >

( keypoint_detector_config: SuperPointConfig = None hidden_size: int = 256 keypoint_encoder_sizes: typing.Optional[list[int]] = None gnn_layers_types: typing.Optional[list[str]] = None num_attention_heads: int = 4 sinkhorn_iterations: int = 100 matching_threshold: float = 0.0 initializer_range: float = 0.02 **kwargs )

引數

keypoint_detector_config (Union[AutoConfig, dict], 可選, 預設為 SuperPointConfig) — 關鍵點檢測器的配置物件或字典。
hidden_size (int, 可選, 預設為 256) — 描述符的維度。
keypoint_encoder_sizes (list[int], 可選, 預設為 [32, 64, 128, 256]) — 關鍵點編碼器層的尺寸。
gnn_layers_types (list[str], 可選, 預設為 ['self', 'cross', 'self', 'cross', 'self', 'cross', 'self', 'cross', 'self', 'cross', 'self', 'cross', 'self', 'cross', 'self', 'cross', 'self', 'cross']) — GNN 層的型別。必須是“self”或“cross”。
num_attention_heads (int, 可選, 預設為 4) — GNN 層中的注意力頭數。
sinkhorn_iterations (int, 可選, 預設為 100) — Sinkhorn 迭代次數。
matching_threshold (float, 可選, 預設為 0.0) — 過濾低分匹配的閾值。
initializer_range (float, 可選, 預設為 0.02) — 用於初始化所有權重矩陣的 truncated_normal_initializer 的標準差。

這是用於儲存 SuperGlueModel 配置的配置類。它用於根據指定引數例項化 SuperGlue 模型，定義模型架構。使用預設值例項化配置將產生與 SuperGlue magic-leap-community/superglue_indoor 架構相似的配置。

配置物件繼承自 PretrainedConfig，可用於控制模型輸出。有關詳細資訊，請參閱 PretrainedConfig 的文件。

示例

>>> from transformers import SuperGlueConfig, SuperGlueModel

>>> # Initializing a SuperGlue superglue style configuration
>>> configuration = SuperGlueConfig()

>>> # Initializing a model from the superglue style configuration
>>> model = SuperGlueModel(configuration)

>>> # Accessing the model configuration
>>> configuration = model.config

SuperGlueImageProcessor

class transformers.SuperGlueImageProcessor

< 源 >

( do_resize: bool = True size: typing.Optional[dict[str, int]] = None resample: Resampling = <Resampling.BILINEAR: 2> do_rescale: bool = True rescale_factor: float = 0.00392156862745098 do_grayscale: bool = True **kwargs )

引數

do_resize (bool, 可選, 預設為 True) — 控制是否將影像的 (height, width) 尺寸調整為指定的 size。可在 preprocess 方法中透過 do_resize 覆蓋。
size (dict[str, int] 可選, 預設為 {"height" -- 480, "width": 640}): 應用 resize 後輸出影像的解析度。僅當 do_resize 設定為 True 時有效。可在 preprocess 方法中透過 size 覆蓋。
resample (PILImageResampling, 可選, 預設為 Resampling.BILINEAR) — 如果調整影像大小，則使用重取樣過濾器。可在 preprocess 方法中透過 resample 覆蓋。
do_rescale (bool, 可選, 預設為 True) — 是否按指定的 rescale_factor 縮放影像。可在 preprocess 方法中透過 do_rescale 覆蓋。
rescale_factor (int 或 float, 可選, 預設為 1/255) — 如果縮放影像，則使用的縮放因子。可在 preprocess 方法中透過 rescale_factor 覆蓋。
do_grayscale (bool, 可選, 預設為 True) — 是否將影像轉換為灰度。可在 preprocess 方法中透過 do_grayscale 覆蓋。

構建 SuperGlue 影像處理器。

post_process_keypoint_matching

< 源 >

( outputs: KeypointMatchingOutput target_sizes: typing.Union[transformers.utils.generic.TensorType, list[tuple]] threshold: float = 0.0 ) → list[Dict]

引數

outputs (KeypointMatchingOutput) — 模型的原始輸出。
target_sizes (torch.Tensor 或 list[tuple[tuple[int, int]]], 可選) — 形狀為 (batch_size, 2, 2) 的張量或元組列表（tuple[int, int]），包含批處理中每幅影像的目標尺寸 (height, width)。這必須是原始影像尺寸（在任何處理之前）。
threshold (float, 可選, 預設為 0.0) — 過濾低分匹配的閾值。

list[Dict]

字典列表，每個字典包含影像對中第一張和第二張影像的關鍵點、匹配分數和匹配索引。

將 KeypointMatchingOutput 的原始輸出轉換為關鍵點、分數和描述符的列表，其座標相對於原始影像尺寸。

preprocess

< 源 >

( images do_resize: typing.Optional[bool] = None size: typing.Optional[dict[str, int]] = None resample: Resampling = None do_rescale: typing.Optional[bool] = None rescale_factor: typing.Optional[float] = None do_grayscale: typing.Optional[bool] = None return_tensors: typing.Union[str, transformers.utils.generic.TensorType, NoneType] = None data_format: ChannelDimension = <ChannelDimension.FIRST: 'channels_first'> input_data_format: typing.Union[str, transformers.image_utils.ChannelDimension, NoneType] = None **kwargs )

引數

images (ImageInput) — 要預處理的影像對。期望是包含 2 張影像的列表或包含 2 張影像列表的列表，畫素值範圍為 0 到 255。如果傳入的影像畫素值在 0 到 1 之間，請將 do_rescale=False。
do_resize (bool, 可選, 預設為 self.do_resize) — 是否調整影像大小。
size (dict[str, int], 可選, 預設為 self.size) — 應用 resize 後輸出影像的尺寸。如果 size["shortest_edge"] >= 384，則影像被調整為 (size["shortest_edge"], size["shortest_edge"])。否則，影像的較短邊將被匹配到 int(size["shortest_edge"]/ crop_pct)，之後影像被裁剪為 (size["shortest_edge"], size["shortest_edge"])。僅當 do_resize 設定為 True 時有效。
resample (PILImageResampling, 可選, 預設為 self.resample) — 如果調整影像大小，則使用重取樣過濾器。這可以是 PILImageResampling 的過濾器之一。僅當 do_resize 設定為 True 時有效。
do_rescale (bool, 可選, 預設為 self.do_rescale) — 是否將影像值縮放到 [0 - 1] 之間。
rescale_factor (float, 可選, 預設為 self.rescale_factor) — 如果 do_rescale 設定為 True，則按此縮放因子對影像進行縮放。
do_grayscale (bool, 可選, 預設為 self.do_grayscale) — 是否將影像轉換為灰度。
return_tensors (str 或 TensorType, 可選) — 要返回的張量型別。可以是以下之一：
- 未設定：返回 np.ndarray 列表。
- TensorType.TENSORFLOW 或 'tf'：返回 tf.Tensor 型別的批處理。
- TensorType.PYTORCH 或 'pt'：返回 torch.Tensor 型別的批處理。
- TensorType.NUMPY 或 'np'：返回 np.ndarray 型別的批處理。
- TensorType.JAX 或 'jax'：返回 jax.numpy.ndarray 型別的批處理。
data_format (ChannelDimension 或 str, 可選, 預設為 ChannelDimension.FIRST) — 輸出影像的通道維度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST：影像格式為 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST：影像格式為 (height, width, num_channels)。
- 未設定：使用輸入影像的通道維度格式。
input_data_format (ChannelDimension 或 str, 可選) — 輸入影像的通道維度格式。如果未設定，則從輸入影像推斷通道維度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST：影像格式為 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST：影像格式為 (height, width, num_channels)。
- "none" 或 ChannelDimension.NONE：影像格式為 (height, width)。

預處理一張或一批影像。

resize

< 源 >

( image: ndarray size: dict data_format: typing.Union[str, transformers.image_utils.ChannelDimension, NoneType] = None input_data_format: typing.Union[str, transformers.image_utils.ChannelDimension, NoneType] = None **kwargs )

引數

image (np.ndarray) — 要調整大小的影像。
size (dict[str, int]) — 字典形式為 {"height": int, "width": int}，指定輸出影像的尺寸。
data_format (ChannelDimension 或 str, 可選) — 輸出影像的通道維度格式。如果未提供，則從輸入影像推斷。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST：影像格式為 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST：影像格式為 (height, width, num_channels)。
- "none" 或 ChannelDimension.NONE：影像格式為 (height, width)。
input_data_format (ChannelDimension 或 str, 可選) — 輸入影像的通道維度格式。如果未設定，則從輸入影像推斷通道維度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST：影像格式為 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST：影像格式為 (height, width, num_channels)。
- "none" 或 ChannelDimension.NONE：影像格式為 (height, width)。

調整影像大小。

preprocess

SuperGlueForKeypointMatching

class transformers.SuperGlueForKeypointMatching

< source >

( config: SuperGlueConfig )

引數

config (SuperGlueConfig) — 模型配置類，包含模型的所有引數。使用配置檔案初始化不載入與模型關聯的權重，只加載配置。請查閱 from_pretrained() 方法載入模型權重。

SuperGlue 模型，接受影像作為輸入並輸出影像的匹配。

該模型繼承自 PreTrainedModel。請查閱超類文件，瞭解庫為其所有模型實現的通用方法（如下載或儲存、調整輸入嵌入大小、修剪頭部等）。

該模型也是 PyTorch torch.nn.Module 子類。將其作為常規 PyTorch 模組使用，並參考 PyTorch 文件瞭解所有與通用使用和行為相關的事項。

前向傳播

< source >

( pixel_values: FloatTensor labels: typing.Optional[torch.LongTensor] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.models.superglue.modeling_superglue.KeypointMatchingOutput 或 tuple(torch.FloatTensor)

引數

pixel_values (torch.FloatTensor, 形狀為 (batch_size, num_channels, image_size, image_size)) — 對應於輸入影像的張量。畫素值可以使用 {image_processor_class} 獲取。有關詳細資訊，請參見 {image_processor_class}.__call__（{processor_class} 使用 {image_processor_class} 處理影像）。
labels (torch.LongTensor, 形狀為 (batch_size, sequence_length), 可選) — 用於計算掩碼語言建模損失的標籤。索引應在 [0, ..., config.vocab_size] 或 -100 之間（參見 input_ids 文件字串）。索引設定為 -100 的標記將被忽略（掩碼），損失只針對標籤在 [0, ..., config.vocab_size] 中的標記計算。
output_attentions (bool, 可選) — 是否返回所有注意力層的注意力張量。有關更多詳細資訊，請參閱返回張量中的 attentions。
output_hidden_states (bool, 可選) — 是否返回所有層的隱藏狀態。有關更多詳細資訊，請參閱返回張量中的 hidden_states。
return_dict (bool, 可選) — 是否返回 ModelOutput 而不是普通元組。

transformers.models.superglue.modeling_superglue.KeypointMatchingOutput 或 tuple(torch.FloatTensor)

一個 transformers.models.superglue.modeling_superglue.KeypointMatchingOutput 或一個 torch.FloatTensor 元組（如果傳入 return_dict=False 或當 config.return_dict=False 時），根據配置 (SuperGlueConfig) 和輸入包含各種元素。

loss (形狀為 (1,) 的 torch.FloatTensor，可選) — 訓練期間計算的損失。
matches (torch.FloatTensor, 形狀為 (batch_size, 2, num_matches)) — 另一個影像中匹配的關鍵點索引。
matching_scores (torch.FloatTensor, 形狀為 (batch_size, 2, num_matches)) — 預測匹配的分數。
keypoints (torch.FloatTensor, 形狀為 (batch_size, num_keypoints, 2)) — 給定影像中預測關鍵點的絕對 (x, y) 座標。
mask (torch.IntTensor, 形狀為 (batch_size, num_keypoints)) — 指示匹配和匹配分數中的哪些值是關鍵點匹配資訊的掩碼。
hidden_states (tuple[torch.FloatTensor, ...], 可選) — torch.FloatTensor 元組（每個階段輸出一個），形狀為 (batch_size, 2, num_channels, num_keypoints)，當傳入 output_hidden_states=True 或當 config.output_hidden_states=True 時返回。
attentions (tuple[torch.FloatTensor, ...], 可選) — torch.FloatTensor 元組（每層一個），形狀為 (batch_size, 2, num_heads, num_keypoints, num_keypoints)，當傳入 output_attentions=True 或當 config.output_attentions=True 時返回。

SuperGlueForKeypointMatching 前向方法，覆蓋了 __call__ 特殊方法。

儘管前向傳播的實現需要在該函式中定義，但之後應該呼叫 Module 例項而不是直接呼叫此函式，因為前者負責執行預處理和後處理步驟，而後者會默默忽略它們。

示例

>>> from transformers import AutoImageProcessor, AutoModel
>>> import torch
>>> from PIL import Image
>>> import requests

>>> url = "https://github.com/magicleap/SuperGluePretrainedNetwork/blob/master/assets/phototourism_sample_images/london_bridge_78916675_4568141288.jpg?raw=true"
>>> image1 = Image.open(requests.get(url, stream=True).raw)
>>> url = "https://github.com/magicleap/SuperGluePretrainedNetwork/blob/master/assets/phototourism_sample_images/london_bridge_19481797_2295892421.jpg?raw=true"
>>> image2 = Image.open(requests.get(url, stream=True).raw)
>>> images = [image1, image2]

>>> processor = AutoImageProcessor.from_pretrained("magic-leap-community/superglue_outdoor")
>>> model = AutoModel.from_pretrained("magic-leap-community/superglue_outdoor")

>>> with torch.no_grad():
>>>     inputs = processor(images, return_tensors="pt")
>>>     outputs = model(**inputs)

前向傳播
post_process_keypoint_matching

< > 在 GitHub 上更新

Transformers

SuperGlue

概述

如何使用

SuperGlueConfig

class transformers.SuperGlueConfig

SuperGlueImageProcessor

class transformers.SuperGlueImageProcessor

post_process_keypoint_matching

preprocess

resize

SuperGlueForKeypointMatching

class transformers.SuperGlueForKeypointMatching

前向傳播