SuperPoint

概述

SuperPoint 模型在 Daniel DeTone、Tomasz Malisiewicz 和 Andrew Rabinovich 的 SuperPoint: 自監督興趣點檢測與描述中被提出。

該模型是全卷積網路自監督訓練的結果，用於興趣點檢測和描述。該模型能夠檢測在同應變換下可重複的興趣點，併為每個點提供描述符。該模型的獨立使用有限，但可作為特徵提取器用於其他任務，如單應性估計、影像匹配等。

論文摘要如下：

本文提出了一種自監督框架，用於訓練適用於計算機視覺中大量多檢視幾何問題的興趣點檢測器和描述符。與基於補丁的神經網路不同，我們的全卷積模型在全尺寸影像上操作，並在一次前向傳播中聯合計算畫素級興趣點位置和相關描述符。我們引入了 Homographic Adaptation，一種多尺度、多單應性方法，用於提高興趣點檢測的可重複性並執行跨域適應（例如，從合成到真實）。我們的模型在 MS-COCO 通用影像資料集上使用 Homographic Adaptation 進行訓練時，能夠比初始預適應的深度模型和任何其他傳統角點檢測器更重複地檢測出更豐富的興趣點集。最終系統在 HPatches 上與 LIFT、SIFT 和 ORB 相比，實現了最先進的單應性估計結果。

SuperPoint 概述。摘自原始論文。

使用提示

下面是使用模型檢測影像中興趣點的快速示例：

from transformers import AutoImageProcessor, SuperPointForKeypointDetection
import torch
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

processor = AutoImageProcessor.from_pretrained("magic-leap-community/superpoint")
model = SuperPointForKeypointDetection.from_pretrained("magic-leap-community/superpoint")

inputs = processor(image, return_tensors="pt")
outputs = model(**inputs)

輸出包含關鍵點座標列表及其相應的分數和描述（一個 256 長的向量）。

您還可以向模型輸入多張影像。由於 SuperPoint 的特性，要輸出動態數量的關鍵點，您需要使用 mask 屬性來檢索相應的資訊。

from transformers import AutoImageProcessor, SuperPointForKeypointDetection
import torch
from PIL import Image
import requests

url_image_1 = "http://images.cocodataset.org/val2017/000000039769.jpg"
image_1 = Image.open(requests.get(url_image_1, stream=True).raw)
url_image_2 = "http://images.cocodataset.org/test-stuff2017/000000000568.jpg"
image_2 = Image.open(requests.get(url_image_2, stream=True).raw)

images = [image_1, image_2]

processor = AutoImageProcessor.from_pretrained("magic-leap-community/superpoint")
model = SuperPointForKeypointDetection.from_pretrained("magic-leap-community/superpoint")

inputs = processor(images, return_tensors="pt")
outputs = model(**inputs)
image_sizes = [(image.height, image.width) for image in images]
outputs = processor.post_process_keypoint_detection(outputs, image_sizes)

for output in outputs:
    for keypoints, scores, descriptors in zip(output["keypoints"], output["scores"], output["descriptors"]):
        print(f"Keypoints: {keypoints}")
        print(f"Scores: {scores}")
        print(f"Descriptors: {descriptors}")

然後，您可以在您選擇的影像上列印關鍵點以視覺化結果。

import matplotlib.pyplot as plt

plt.axis("off")
plt.imshow(image_1)
plt.scatter(
    outputs[0]["keypoints"][:, 0],
    outputs[0]["keypoints"][:, 1],
    c=outputs[0]["scores"] * 100,
    s=outputs[0]["scores"] * 50,
    alpha=0.8
)
plt.savefig(f"output_image.png")

image/png

此模型由 stevenbucaille 貢獻。原始程式碼可以在此處找到。

資源

Hugging Face 官方和社群（由 🌎 指示）資源列表，可幫助您開始使用 SuperPoint。如果您有興趣提交要包含在此處的資源，請隨時開啟拉取請求，我們將對其進行稽核！資源應理想地展示新內容，而不是複製現有資源。

一個展示 SuperPoint 推理和視覺化的筆記本可以在此處找到。🌎

SuperPointConfig

class transformers.SuperPointConfig

< source >

( encoder_hidden_sizes: list = [64, 64, 128, 128] decoder_hidden_size: int = 256 keypoint_decoder_dim: int = 65 descriptor_decoder_dim: int = 256 keypoint_threshold: float = 0.005 max_keypoints: int = -1 nms_radius: int = 4 border_removal_distance: int = 4 initializer_range = 0.02 **kwargs )

引數

encoder_hidden_sizes (List, 可選, 預設為 [64, 64, 128, 128]) — 編碼器中每個卷積層的通道數。
decoder_hidden_size (int, 可選, 預設為 256) — 解碼器的隱藏大小。
keypoint_decoder_dim (int, 可選, 預設為 65) — 關鍵點解碼器的輸出維度。
descriptor_decoder_dim (int, 可選, 預設為 256) — 描述符解碼器的輸出維度。
keypoint_threshold (float, 可選, 預設為 0.005) — 用於提取關鍵點的閾值。
max_keypoints (int, 可選, 預設為 -1) — 要提取的最大關鍵點數。如果為 -1，則提取所有關鍵點。
nms_radius (int, 可選, 預設為 4) — 非極大值抑制的半徑。
border_removal_distance (int, 可選, 預設為 4) — 移除關鍵點與邊界的距離。
initializer_range (float, 可選, 預設為 0.02) — 用於初始化所有權重矩陣的 truncated_normal_initializer 的標準差。

這是用於儲存 SuperPointForKeypointDetection 配置的配置類。它用於根據指定的引數例項化 SuperPoint 模型，定義模型架構。例項化預設配置將生成與 SuperPoint magic-leap-community/superpoint 架構相似的配置。

配置物件繼承自 PretrainedConfig，可用於控制模型輸出。有關更多資訊，請參閱 PretrainedConfig 的文件。

示例

>>> from transformers import SuperPointConfig, SuperPointForKeypointDetection

>>> # Initializing a SuperPoint superpoint style configuration
>>> configuration = SuperPointConfig()
>>> # Initializing a model from the superpoint style configuration
>>> model = SuperPointForKeypointDetection(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config

SuperPointImageProcessor

class transformers.SuperPointImageProcessor

< source >

( do_resize: bool = True size: typing.Optional[dict[str, int]] = None do_rescale: bool = True rescale_factor: float = 0.00392156862745098 do_grayscale: bool = False **kwargs )

引數

do_resize (bool, 可選, 預設為 True) — 控制是否將影像的（高、寬）維度調整為指定的 size。可以透過 preprocess 方法中的 do_resize 覆蓋。
size (dict[str, int] 可選, 預設為 {"height" -- 480, "width": 640})：應用 resize 後輸出影像的解析度。僅在 do_resize 設定為 True 時有效。可以透過 preprocess 方法中的 size 覆蓋。
do_rescale (bool, 可選, 預設為 True) — 是否按指定的比例 rescale_factor 縮放影像。可以透過 preprocess 方法中的 do_rescale 覆蓋。
rescale_factor (int 或 float, 可選, 預設為 1/255) — 如果 do_rescale 設定為 True，則用於縮放影像的比例因子。可以透過 preprocess 方法中的 rescale_factor 覆蓋。
do_grayscale (bool, 可選, 預設為 False) — 是否將影像轉換為灰度。可以透過 preprocess 方法中的 do_grayscale 覆蓋。

構造一個 SuperPoint 影像處理器。

post_process_keypoint_detection

< source >

( outputs: SuperPointKeypointDescriptionOutput target_sizes: typing.Union[transformers.utils.generic.TensorType, list[tuple]] ) → list[Dict]

引數

outputs (SuperPointKeypointDescriptionOutput) — 模型的原始輸出，包含相對（x，y）格式的關鍵點，以及分數和描述符。
target_sizes (torch.Tensor 或 list[tuple[int, int]]) — 形狀為 (batch_size, 2) 的張量或元組列表 (tuple[int, int])，包含批次中每張影像的目標大小 (height, width)。這必須是原始影像大小（在任何處理之前）。

list[Dict]

一個字典列表，每個字典包含根據目標大小、模型預測的批次中影像的分數和描述符的絕對格式的關鍵點。

將 SuperPointForKeypointDetection 的原始輸出轉換為關鍵點列表、分數和描述符，其座標相對於原始影像大小。

預處理

< source >

( images do_resize: typing.Optional[bool] = None size: typing.Optional[dict[str, int]] = None do_rescale: typing.Optional[bool] = None rescale_factor: typing.Optional[float] = None do_grayscale: typing.Optional[bool] = None return_tensors: typing.Union[str, transformers.utils.generic.TensorType, NoneType] = None data_format: ChannelDimension = <ChannelDimension.FIRST: 'channels_first'> input_data_format: typing.Union[str, transformers.image_utils.ChannelDimension, NoneType] = None **kwargs )

引數

images (ImageInput) — 要預處理的影像。期望單個或批次影像，畫素值範圍為 0 到 255。如果傳入的影像畫素值介於 0 到 1 之間，請設定 do_rescale=False。
do_resize (bool, 可選, 預設為 self.do_resize) — 是否調整影像大小。
size (dict[str, int], 可選, 預設為 self.size) — 應用 resize 後輸出影像的大小。如果 size["shortest_edge"] >= 384，影像將被調整為 (size["shortest_edge"], size["shortest_edge"])。否則，影像的較短邊將匹配到 int(size["shortest_edge"]/ crop_pct)，之後影像將被裁剪為 (size["shortest_edge"], size["shortest_edge"])。僅在 do_resize 設定為 True 時有效。
do_rescale (bool, 可選, 預設為 self.do_rescale) — 是否將影像值縮放到 [0 - 1] 之間。
rescale_factor (float, 可選, 預設為 self.rescale_factor) — 如果 do_rescale 設定為 True，則用於縮放影像的比例因子。
do_grayscale (bool, 可選, 預設為 self.do_grayscale) — 是否將影像轉換為灰度。
return_tensors (str 或 TensorType, 可選) — 要返回的張量型別。可以是以下之一：
- 未設定：返回 np.ndarray 列表。
- TensorType.TENSORFLOW 或 'tf'：返回 tf.Tensor 型別的批次。
- TensorType.PYTORCH 或 'pt'：返回 torch.Tensor 型別的批次。
- TensorType.NUMPY 或 'np'：返回 np.ndarray 型別的批次。
- TensorType.JAX 或 'jax'：返回 jax.numpy.ndarray 型別的批次。
data_format (ChannelDimension或str，可選，預設為ChannelDimension.FIRST) — 輸出影像的通道維度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST: 影像格式為 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST: 影像格式為 (height, width, num_channels)。
- 未設定: 使用輸入影像的通道維度格式。
input_data_format (ChannelDimension或str，可選) — 輸入影像的通道維度格式。如果未設定，則從輸入影像推斷通道維度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST: 影像格式為 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST: 影像格式為 (height, width, num_channels)。
- "none" 或 ChannelDimension.NONE: 影像格式為 (height, width)。

預處理一張或一批影像。

resize

< source >

( image: ndarray size: dict data_format: typing.Union[str, transformers.image_utils.ChannelDimension, NoneType] = None input_data_format: typing.Union[str, transformers.image_utils.ChannelDimension, NoneType] = None **kwargs )

引數

image (np.ndarray) — 要調整大小的影像。
size (dict[str, int]) — 形式為 {"height": int, "width": int} 的字典，指定輸出影像的大小。
data_format (ChannelDimension或str，可選) — 輸出影像的通道維度格式。如果未提供，則從輸入影像推斷。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST: 影像格式為 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST: 影像格式為 (height, width, num_channels)。
- "none" 或 ChannelDimension.NONE: 影像格式為 (height, width)。
input_data_format (ChannelDimension或str，可選) — 輸入影像的通道維度格式。如果未設定，則從輸入影像推斷通道維度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST: 影像格式為 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST: 影像格式為 (height, width, num_channels)。
- "none" 或 ChannelDimension.NONE: 影像格式為 (height, width)。

調整影像大小。

預處理
post_process_keypoint_detection

用於關鍵點檢測的 SuperPoint

class transformers.SuperPointForKeypointDetection

< source >

( config: SuperPointConfig )

引數

config (SuperPointConfig) — 模型配置類，包含模型的所有引數。使用配置檔案初始化不載入與模型關聯的權重，只加載配置。請檢視 from_pretrained() 方法以載入模型權重。

SuperPoint 模型輸出關鍵點和描述符。

此模型繼承自 PreTrainedModel。請檢視超類文件，瞭解庫為其所有模型實現的通用方法（如下載或儲存、調整輸入嵌入大小、修剪頭部等）。

此模型也是 PyTorch torch.nn.Module 子類。將其用作常規 PyTorch 模組，並參閱 PyTorch 文件以瞭解所有與一般用法和行為相關的事項。

前向傳播

< source >

( pixel_values: FloatTensor labels: typing.Optional[torch.LongTensor] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.models.superpoint.modeling_superpoint.SuperPointKeypointDescriptionOutput 或 tuple(torch.FloatTensor)

引數

pixel_values (形狀為 (batch_size, num_channels, image_size, image_size) 的 torch.FloatTensor) — 對應於輸入影像的張量。畫素值可以使用 {image_processor_class} 獲得。有關詳細資訊，請參見 {image_processor_class}.__call__（{processor_class} 使用 {image_processor_class} 處理影像）。
labels (形狀為 (batch_size, sequence_length) 的 torch.LongTensor，可選) — 用於計算遮罩語言模型損失的標籤。索引應在 [0, ..., config.vocab_size] 範圍內或為 -100（請參閱 input_ids 文件字串）。索引設定為 -100 的標記將被忽略（遮罩），損失僅針對標籤在 [0, ..., config.vocab_size] 範圍內的標記計算。
output_hidden_states (bool，可選) — 是否返回所有層的隱藏狀態。有關更多詳細資訊，請參閱返回張量下的 hidden_states。
return_dict (bool，可選) — 是否返回 ModelOutput 而不是純元組。

transformers.models.superpoint.modeling_superpoint.SuperPointKeypointDescriptionOutput 或 tuple(torch.FloatTensor)

一個 transformers.models.superpoint.modeling_superpoint.SuperPointKeypointDescriptionOutput 或一個 torch.FloatTensor 的元組（如果傳入 return_dict=False 或當 config.return_dict=False 時），根據配置（SuperPointConfig）和輸入包含各種元素。

loss (形狀為 (1,) 的 torch.FloatTensor，可選) — 訓練期間計算的損失。
keypoints (形狀為 (batch_size, num_keypoints, 2) 的 torch.FloatTensor) — 給定影像中預測關鍵點的相對 (x, y) 座標。
scores (形狀為 (batch_size, num_keypoints) 的 torch.FloatTensor) — 預測關鍵點的得分。
descriptors (形狀為 (batch_size, num_keypoints, descriptor_size) 的 torch.FloatTensor) — 預測關鍵點的描述符。
mask (形狀為 (batch_size, num_keypoints) 的 torch.BoolTensor) — 指示關鍵點、得分和描述符中哪些值是關鍵點資訊的掩碼。
hidden_states (tuple(torch.FloatTensor)，可選，當傳入 output_hidden_states=True 時返回，或者
當 config.output_hidden_states=True) — 形狀為 (batch_size, sequence_length, hidden_size) 的 torch.FloatTensor 元組（一個用於嵌入層的輸出，如果模型有嵌入層，+ 一個用於每個階段的輸出）。模型在每個階段輸出的隱藏狀態（也稱為特徵圖）。

SuperPointForKeypointDetection 的 forward 方法，覆蓋了 __call__ 特殊方法。

儘管前向傳播的配方需要在此函式中定義，但應在此之後呼叫 Module 例項而不是此函式，因為前者負責執行預處理和後處理步驟，而後者則默默地忽略它們。

示例

>>> from transformers import AutoImageProcessor, SuperPointForKeypointDetection
>>> import torch
>>> from PIL import Image
>>> import requests

>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> image = Image.open(requests.get(url, stream=True).raw)

>>> processor = AutoImageProcessor.from_pretrained("magic-leap-community/superpoint")
>>> model = SuperPointForKeypointDetection.from_pretrained("magic-leap-community/superpoint")

>>> inputs = processor(image, return_tensors="pt")
>>> outputs = model(**inputs)

前向傳播

< > 在 GitHub 上更新

Transformers

SuperPoint

概述

使用提示

資源

SuperPointConfig

class transformers.SuperPointConfig

SuperPointImageProcessor

class transformers.SuperPointImageProcessor

post_process_keypoint_detection

預處理

resize

用於關鍵點檢測的 SuperPoint

class transformers.SuperPointForKeypointDetection

前向傳播