SegGPT

概述

SegGPT 模型由 Xinlong Wang、Xiaosong Zhang、Yue Cao、Wen Wang、Chunhua Shen、Tiejun Huang 在SegGPT: Segmenting Everything In Context 中提出。SegGPT 採用僅解碼器 Transformer，可以在給定輸入影像、提示影像及其對應提示掩碼的情況下生成分割掩碼。該模型在 COCO-20 上取得了 56.1 mIoU、在 FSS-1000 上取得了 85.6 mIoU 的卓越一次性結果。

論文摘要如下：

我們提出 SegGPT，一個用於上下文語境中分割一切的通用模型。我們將各種分割任務統一到一個通用上下文學習框架中，該框架透過將不同型別的分割資料轉換為相同的影像格式來適應它們。SegGPT 的訓練被定義為一種上下文著色問題，每個資料樣本都有隨機顏色對映。目標是根據上下文完成各種任務，而不是依賴於特定顏色。訓練後，SegGPT 可以透過上下文推理在影像或影片中執行任意分割任務，例如物件例項、內容、部分、輪廓和文字。SegGPT 在廣泛的任務上進行了評估，包括少量語義分割、影片物件分割、語義分割和全景分割。我們的結果顯示了在域內和域外分割的強大能力。

技巧

可以使用 SegGptImageProcessor 為模型準備影像輸入、提示和掩碼。
可以使用分割圖或 RGB 影像作為提示掩碼。如果使用後者，請確保在 preprocess 方法中將 do_convert_rgb 設定為 False。
強烈建議在使用 SegGptImageProcessor 進行預處理和後處理時，為您的用例傳遞 num_labels（不考慮背景）。
在使用 SegGptForImageSegmentation 進行推理時，如果您的 batch_size 大於 1，您可以透過在 forward 方法中傳遞 feature_ensemble=True 來在影像之間使用特徵整合。

以下是如何使用該模型進行一次性語義分割：

import torch
from datasets import load_dataset
from transformers import SegGptImageProcessor, SegGptForImageSegmentation

checkpoint = "BAAI/seggpt-vit-large"
image_processor = SegGptImageProcessor.from_pretrained(checkpoint)
model = SegGptForImageSegmentation.from_pretrained(checkpoint)

dataset_id = "EduardoPacheco/FoodSeg103"
ds = load_dataset(dataset_id, split="train")
# Number of labels in FoodSeg103 (not including background)
num_labels = 103

image_input = ds[4]["image"]
ground_truth = ds[4]["label"]
image_prompt = ds[29]["image"]
mask_prompt = ds[29]["label"]

inputs = image_processor(
    images=image_input, 
    prompt_images=image_prompt,
    segmentation_maps=mask_prompt, 
    num_labels=num_labels,
    return_tensors="pt"
)

with torch.no_grad():
    outputs = model(**inputs)

target_sizes = [image_input.size[::-1]]
mask = image_processor.post_process_semantic_segmentation(outputs, target_sizes, num_labels=num_labels)[0]

此模型由 EduardoPacheco 貢獻。原始程式碼可在此處找到。

SegGptConfig

class transformers.SegGptConfig

< source >

( hidden_size = 1024 num_hidden_layers = 24 num_attention_heads = 16 hidden_act = 'gelu' hidden_dropout_prob = 0.0 initializer_range = 0.02 layer_norm_eps = 1e-06 image_size = [896, 448] patch_size = 16 num_channels = 3 qkv_bias = True mlp_dim = None drop_path_rate = 0.1 pretrain_image_size = 224 decoder_hidden_size = 64 use_relative_position_embeddings = True merge_index = 2 intermediate_hidden_state_indices = [5, 11, 17, 23] beta = 0.01 **kwargs )

引數

hidden_size (int, optional, 預設為 1024) — 編碼器層和池化層的維度。
num_hidden_layers (int, optional, 預設為 24) — Transformer 編碼器中的隱藏層數量。
num_attention_heads (int, optional, 預設為 16) — Transformer 編碼器中每個注意力層的注意力頭數量。
hidden_act (str 或 function, optional, 預設為 "gelu") — 編碼器和池化器中的非線性啟用函式（函式或字串）。如果是字串，支援 "gelu", "relu", "selu" 和 "gelu_new"。
hidden_dropout_prob (float, optional, 預設為 0.0) — 嵌入層、編碼器和池化器中所有全連線層的 dropout 機率。
initializer_range (float, optional, 預設為 0.02) — 用於初始化所有權重矩陣的截斷正態初始化器的標準差。
layer_norm_eps (float, optional, 預設為 1e-06) — 層歸一化層使用的 epsilon 值。
image_size (list[int], optional, 預設為 [896, 448]) — 每張影像的大小（解析度）。
patch_size (int, optional, 預設為 16) — 每個補丁的大小（解析度）。
num_channels (int, optional, 預設為 3) — 輸入通道數。
qkv_bias (bool, optional, 預設為 True) — 是否在查詢、鍵和值中新增偏置。
mlp_dim (int, optional) — Transformer 編碼器中 MLP 層的維度。如果未設定，預設為 hidden_size * 4。
drop_path_rate (float, optional, 預設為 0.1) — dropout 層的 drop path 比率。
pretrain_image_size (int, optional, 預設為 224) — 絕對位置嵌入的預訓練大小。
decoder_hidden_size (int, optional, 預設為 64) — 解碼器的隱藏層大小。
use_relative_position_embeddings (bool, optional, 預設為 True) — 是否在注意力層中使用相對位置嵌入。
merge_index (int, optional, 預設為 2) — 合併嵌入的編碼器層索引。
intermediate_hidden_state_indices (list[int], optional, 預設為 [5, 11, 17, 23]) — 我們作為解碼器特徵儲存的編碼器層索引。
beta (float, optional, 預設為 0.01) — SegGptLoss (smooth-l1 loss) 的正則化因子。

這是用於儲存 SegGptModel 配置的配置類。它用於根據指定的引數例項化 SegGPT 模型，定義模型架構。使用預設值例項化配置將生成與 SegGPT BAAI/seggpt-vit-large 架構類似的配置。

配置物件繼承自 PretrainedConfig，可用於控制模型輸出。有關更多資訊，請參閱 PretrainedConfig 的文件。

示例

>>> from transformers import SegGptConfig, SegGptModel

>>> # Initializing a SegGPT seggpt-vit-large style configuration
>>> configuration = SegGptConfig()

>>> # Initializing a model (with random weights) from the seggpt-vit-large style configuration
>>> model = SegGptModel(configuration)

>>> # Accessing the model configuration
>>> configuration = model.config

SegGptImageProcessor

class transformers.SegGptImageProcessor

< source >

( do_resize: bool = True size: typing.Optional[dict[str, int]] = None resample: Resampling = <Resampling.BICUBIC: 3> do_rescale: bool = True rescale_factor: typing.Union[int, float] = 0.00392156862745098 do_normalize: bool = True image_mean: typing.Union[float, list[float], NoneType] = None image_std: typing.Union[float, list[float], NoneType] = None do_convert_rgb: bool = True **kwargs )

引數

do_resize (bool, optional, 預設為 True) — 是否將影像的（高度，寬度）尺寸調整為指定的 (size["height"], size["width"])。可以透過 preprocess 方法中的 do_resize 引數覆蓋。
size (dict, optional, 預設為 {"height" -- 448, "width": 448})：調整大小後輸出影像的大小。可以透過 preprocess 方法中的 size 引數覆蓋。
resample (PILImageResampling, optional, 預設為 Resampling.BICUBIC) — 如果調整影像大小，要使用的重取樣過濾器。可以透過 preprocess 方法中的 resample 引數覆蓋。
do_rescale (bool, optional, 預設為 True) — 是否按指定的比例 rescale_factor 重新縮放影像。可以透過 preprocess 方法中的 do_rescale 引數覆蓋。
rescale_factor (int 或 float, optional, 預設為 1/255) — 如果重新縮放影像，要使用的縮放因子。可以透過 preprocess 方法中的 rescale_factor 引數覆蓋。
do_normalize (bool, optional, 預設為 True) — 是否規範化影像。可以透過 preprocess 方法中的 do_normalize 引數覆蓋。
image_mean (float 或 list[float], optional, 預設為 IMAGENET_DEFAULT_MEAN) — 如果規範化影像，要使用的均值。這是一個浮點數或浮點數列表，長度與影像中的通道數相同。可以透過 preprocess 方法中的 image_mean 引數覆蓋。
image_std (float 或 list[float], optional, 預設為 IMAGENET_DEFAULT_STD) — 如果規範化影像，要使用的標準差。這是一個浮點數或浮點數列表，長度與影像中的通道數相同。可以透過 preprocess 方法中的 image_std 引數覆蓋。
do_convert_rgb (bool, optional, 預設為 True) — 是否將提示掩碼轉換為 RGB 格式。可以透過 preprocess 方法中的 do_convert_rgb 引數覆蓋。

構建 SegGpt 影像處理器。

預處理

< source >

( images: typing.Union[ForwardRef('PIL.Image.Image'), numpy.ndarray, ForwardRef('torch.Tensor'), list['PIL.Image.Image'], list[numpy.ndarray], list['torch.Tensor'], NoneType] = None prompt_images: typing.Union[ForwardRef('PIL.Image.Image'), numpy.ndarray, ForwardRef('torch.Tensor'), list['PIL.Image.Image'], list[numpy.ndarray], list['torch.Tensor'], NoneType] = None prompt_masks: typing.Union[ForwardRef('PIL.Image.Image'), numpy.ndarray, ForwardRef('torch.Tensor'), list['PIL.Image.Image'], list[numpy.ndarray], list['torch.Tensor'], NoneType] = None do_resize: typing.Optional[bool] = None size: typing.Optional[dict[str, int]] = None resample: Resampling = None do_rescale: typing.Optional[bool] = None rescale_factor: typing.Optional[float] = None do_normalize: typing.Optional[bool] = None image_mean: typing.Union[float, list[float], NoneType] = None image_std: typing.Union[float, list[float], NoneType] = None do_convert_rgb: typing.Optional[bool] = None num_labels: typing.Optional[int] = None return_tensors: typing.Union[str, transformers.utils.generic.TensorType, NoneType] = None data_format: typing.Union[str, transformers.image_utils.ChannelDimension] = <ChannelDimension.FIRST: 'channels_first'> input_data_format: typing.Union[str, transformers.image_utils.ChannelDimension, NoneType] = None **kwargs )

引數

images (ImageInput) — 要預處理的影像。期望畫素值範圍為 0 到 255 的單張或批次影像。如果傳入畫素值範圍在 0 到 1 之間的影像，請設定 do_rescale=False。
prompt_images (ImageInput) — 要預處理的提示影像。期望畫素值範圍為 0 到 255 的單張或批次影像。如果傳入畫素值範圍在 0 到 1 之間的影像，請設定 do_rescale=False。
prompt_masks (ImageInput) — 要預處理的提示影像的提示掩碼，它在預處理輸出中指定了 prompt_masks 值。可以是分割圖（無通道）或 RGB 影像格式。如果是 RGB 影像格式，應將 do_convert_rgb 設定為 False。如果是分割圖格式，建議指定 num_labels 以構建調色盤，將提示掩碼從單通道對映到 3 通道 RGB。如果未指定 num_labels，提示掩碼將在通道維度上重複。
do_resize (bool, 可選, 預設為 self.do_resize) — 是否調整影像大小。
size (dict[str, int], 可選, 預設為 self.size) — 格式為 {"height": h, "width": w} 的字典，指定調整大小後輸出影像的尺寸。
resample (PILImageResampling 濾波器, 可選, 預設為 self.resample) — 如果調整影像大小，則使用的 PILImageResampling 濾波器，例如 PILImageResampling.BICUBIC。僅在 do_resize 設定為 True 時有效。不適用於提示掩碼，因為它使用最近鄰插值進行調整。
do_rescale (bool, 可選, 預設為 self.do_rescale) — 是否將影像值縮放到 [0 - 1] 之間。
rescale_factor (float, 可選, 預設為 self.rescale_factor) — 如果 do_rescale 設定為 True，則按此縮放因子縮放影像。
do_normalize (bool, 可選, 預設為 self.do_normalize) — 是否對影像進行歸一化。
image_mean (float 或 list[float], 可選, 預設為 self.image_mean) — 如果 do_normalize 設定為 True，則使用的影像均值。
image_std (float 或 list[float], 可選, 預設為 self.image_std) — 如果 do_normalize 設定為 True，則使用的影像標準差。
do_convert_rgb (bool, 可選, 預設為 self.do_convert_rgb) — 是否將提示掩碼轉換為 RGB 格式。如果指定了 num_labels，將構建調色盤，將提示掩碼從單通道對映到 3 通道 RGB。如果未設定，提示掩碼將在通道維度上重複。如果提示掩碼已為 RGB 格式，則必須設定為 False。
num_labels — (int, 可選): 分割任務中的類別數量（不包括背景）。如果指定，將構建一個調色盤，假設 class_idx 0 是背景，將提示掩碼從無通道的純分割圖對映到 3 通道 RGB。如果未指定，如果提示掩碼已為 RGB 格式（如果 do_convert_rgb 為 false），則提示掩碼將按原樣傳遞，或者在通道維度上重複。
return_tensors (str 或 TensorType, 可選) — 要返回的張量型別。可以是以下之一：
- 未設定：返回 np.ndarray 列表。
- TensorType.TENSORFLOW 或 'tf'：返回 tf.Tensor 批次。
- TensorType.PYTORCH 或 'pt'：返回 torch.Tensor 批次。
- TensorType.NUMPY 或 'np'：返回 np.ndarray 批次。
- TensorType.JAX 或 'jax'：返回 jax.numpy.ndarray 批次。
data_format (ChannelDimension 或 str, 可選, 預設為 ChannelDimension.FIRST) — 輸出影像的通道維度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST：影像為 (num_channels, height, width) 格式。
- "channels_last" 或 ChannelDimension.LAST：影像為 (height, width, num_channels) 格式。
- 未設定：使用輸入影像的通道維度格式。
input_data_format (ChannelDimension 或 str, 可選) — 輸入影像的通道維度格式。如果未設定，將從輸入影像推斷通道維度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST：影像為 (num_channels, height, width) 格式。
- "channels_last" 或 ChannelDimension.LAST：影像為 (height, width, num_channels) 格式。
- "none" 或 ChannelDimension.NONE：影像為 (height, width) 格式。

預處理一張或一批影像。

post_process_semantic_segmentation

< 來源 >

( outputs target_sizes: typing.Optional[list[tuple[int, int]]] = None num_labels: typing.Optional[int] = None ) → semantic_segmentation

引數

outputs (SegGptImageSegmentationOutput) — 模型的原始輸出。
target_sizes (list[tuple[int, int]], 可選) — 長度為 (batch_size) 的列表，其中每個列表項 (tuple[int, int]) 對應於每個預測所需的最終大小（高度，寬度）。如果留空，則預測不會調整大小。
num_labels (int, 可選) — 分割任務中的類別數量（不包括背景）。如果指定，將構建一個調色盤，假設 class_idx 0 是背景，將預測掩碼從 RGB 值對映到類別索引。此值應與預處理輸入時使用的值相同。

語義分割

長度為 batch_size 的 list[torch.Tensor]，其中每個項是形狀為 (height, width) 的語義分割圖，對應於 target_sizes 條目（如果指定了 target_sizes）。每個 torch.Tensor 的每個條目對應一個語義類別 ID。

將 SegGptImageSegmentationOutput 的輸出轉換為分割圖。僅支援 PyTorch。

SegGptModel

class transformers.SegGptModel

< 來源 >

( config: SegGptConfig )

引數

config (SegGptConfig) — 包含模型所有引數的模型配置類。使用配置檔案初始化不會載入與模型相關的權重，只加載配置。請檢視 from_pretrained() 方法來載入模型權重。

Seggpt 基礎模型，輸出原始隱藏狀態，不帶任何特定的頭部。

此模型繼承自 PreTrainedModel。請檢視超類文件，瞭解庫為其所有模型實現的通用方法（例如下載或儲存、調整輸入嵌入大小、修剪頭部等）。

此模型也是 PyTorch torch.nn.Module 子類。將其作為常規 PyTorch 模組使用，並參考 PyTorch 文件瞭解所有與通用用法和行為相關的事項。

forward

< 來源 >

( pixel_values: Tensor prompt_pixel_values: Tensor prompt_masks: Tensor bool_masked_pos: typing.Optional[torch.BoolTensor] = None feature_ensemble: typing.Optional[bool] = None embedding_type: typing.Optional[str] = None labels: typing.Optional[torch.FloatTensor] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.models.seggpt.modeling_seggpt.SegGptEncoderOutput 或 tuple(torch.FloatTensor)

引數

pixel_values (形狀為 (batch_size, num_channels, image_size, image_size) 的 torch.Tensor) — 對應於輸入影像的張量。畫素值可以使用 {image_processor_class} 獲取。有關詳細資訊，請參閱 {image_processor_class}.__call__（{processor_class} 使用 {image_processor_class} 處理影像）。
prompt_pixel_values (形狀為 (batch_size, num_channels, height, width) 的 torch.FloatTensor) — 提示畫素值。提示畫素值可以使用 AutoImageProcessor 獲取。有關詳細資訊，請參閱 SegGptImageProcessor.__call__()。
prompt_masks (形狀為 (batch_size, num_channels, height, width) 的 torch.FloatTensor) — 提示掩碼。提示掩碼可以使用 AutoImageProcessor 獲取。有關詳細資訊，請參閱 SegGptImageProcessor.__call__()。
bool_masked_pos (形狀為 (batch_size, num_patches) 的 torch.BoolTensor, 可選) — 布林掩碼位置。指示哪些補丁被掩碼 (1)，哪些未被掩碼 (0)。
feature_ensemble (bool, 可選) — 布林值，指示是否使用特徵集合。如果為 True，且至少有兩個提示，則模型將使用特徵集合。如果為 False，模型將不使用特徵集合。在對單個輸入影像進行少樣本推理時，即同一影像有多個提示時，應考慮此引數。
embedding_type (str, 可選) — 嵌入型別。指示提示是語義嵌入還是例項嵌入。可以是 instance 或 semantic。
labels (形狀為 (batch_size, num_channels, height, width) 的 torch.FloatTensor, 可選) — 輸入影像的真實掩碼。
output_attentions (bool, 可選) — 是否返回所有注意力層的注意力張量。有關詳細資訊，請參閱返回張量下的 attentions。
output_hidden_states (bool, 可選) — 是否返回所有層的隱藏狀態。有關詳細資訊，請參閱返回張量下的 hidden_states。
return_dict (bool, 可選) — 是否返回 ModelOutput 而不是普通元組。

transformers.models.seggpt.modeling_seggpt.SegGptEncoderOutput 或 tuple(torch.FloatTensor)

transformers.models.seggpt.modeling_seggpt.SegGptEncoderOutput 或 torch.FloatTensor 元組（如果傳入 return_dict=False 或 config.return_dict=False），包含根據配置 (SegGptConfig) 和輸入的不同元素。

last_hidden_state (形狀為 (batch_size, patch_height, patch_width, hidden_size) 的 torch.FloatTensor) — 模型最後一層輸出的隱藏狀態序列。
hidden_states (tuple[torch.FloatTensor], 可選, 當 config.output_hidden_states=True 時返回) — 形狀為 (batch_size, patch_height, patch_width, hidden_size) 的 torch.FloatTensor 元組（一個用於嵌入輸出 + 每個層一個）。
attentions (tuple[torch.FloatTensor], 可選, 當 config.output_attentions=True 時返回) — 形狀為 (batch_size, num_heads, seq_len, seq_len) 的 torch.FloatTensor 元組（每個層一個）。
intermediate_hidden_states (tuple[torch.FloatTensor], 可選, 當設定 config.intermediate_hidden_state_indices 時返回) — 形狀為 (batch_size, patch_height, patch_width, hidden_size) 的 torch.FloatTensor 元組。元組中的每個元素對應於 config.intermediate_hidden_state_indices 中指定的層的輸出。此外，每個特徵都透過 LayerNorm。

SegGptModel 的 forward 方法，覆蓋了 __call__ 特殊方法。

儘管前向傳遞的實現需要在該函式中定義，但在此之後應該呼叫 Module 例項，因為前者負責執行預處理和後處理步驟，而後者則默默地忽略它們。

示例

>>> from transformers import SegGptImageProcessor, SegGptModel
>>> from PIL import Image
>>> import requests

>>> image_input_url = "https://raw.githubusercontent.com/baaivision/Painter/main/SegGPT/SegGPT_inference/examples/hmbb_2.jpg"
>>> image_prompt_url = "https://raw.githubusercontent.com/baaivision/Painter/main/SegGPT/SegGPT_inference/examples/hmbb_1.jpg"
>>> mask_prompt_url = "https://raw.githubusercontent.com/baaivision/Painter/main/SegGPT/SegGPT_inference/examples/hmbb_1_target.png"

>>> image_input = Image.open(requests.get(image_input_url, stream=True).raw)
>>> image_prompt = Image.open(requests.get(image_prompt_url, stream=True).raw)
>>> mask_prompt = Image.open(requests.get(mask_prompt_url, stream=True).raw).convert("L")

>>> checkpoint = "BAAI/seggpt-vit-large"
>>> model = SegGptModel.from_pretrained(checkpoint)
>>> image_processor = SegGptImageProcessor.from_pretrained(checkpoint)

>>> inputs = image_processor(images=image_input, prompt_images=image_prompt, prompt_masks=mask_prompt, return_tensors="pt")

>>> outputs = model(**inputs)
>>> list(outputs.last_hidden_state.shape)
[1, 56, 28, 1024]

SegGptForImageSegmentation

class transformers.SegGptForImageSegmentation

< 來源 >

( config: SegGptConfig )

引數

config (SegGptConfig) — 包含模型所有引數的模型配置類。使用配置檔案初始化不會載入與模型相關的權重，只加載配置。請檢視 from_pretrained() 方法來載入模型權重。

帶解碼器的 SegGpt 模型，用於單次影像分割。

此模型繼承自 PreTrainedModel。請檢視超類文件，瞭解庫為其所有模型實現的通用方法（例如下載或儲存、調整輸入嵌入大小、修剪頭部等）。

此模型也是 PyTorch torch.nn.Module 子類。將其作為常規 PyTorch 模組使用，並參考 PyTorch 文件瞭解所有與通用用法和行為相關的事項。

forward

< 來源 >

引數

pixel_values (形狀為 (batch_size, num_channels, image_size, image_size) 的 torch.Tensor) — 對應於輸入影像的張量。畫素值可以使用 {image_processor_class} 獲取。有關詳細資訊，請參閱 {image_processor_class}.__call__（{processor_class} 使用 {image_processor_class} 處理影像）。
prompt_pixel_values (形狀為 (batch_size, num_channels, height, width) 的 torch.FloatTensor) — 提示畫素值。提示畫素值可以使用 AutoImageProcessor 獲取。有關詳細資訊，請參閱 SegGptImageProcessor.__call__()。
prompt_masks (形狀為 (batch_size, num_channels, height, width) 的 torch.FloatTensor) — 提示掩碼。提示掩碼可以使用 AutoImageProcessor 獲取。有關詳細資訊，請參閱 SegGptImageProcessor.__call__()。
bool_masked_pos (形狀為 (batch_size, num_patches) 的 torch.BoolTensor, 可選) — 布林掩碼位置。指示哪些補丁被掩碼 (1)，哪些未被掩碼 (0)。
feature_ensemble (bool, 可選) — 布林值，指示是否使用特徵集合。如果為 True，且至少有兩個提示，則模型將使用特徵集合。如果為 False，模型將不使用特徵集合。在對單個輸入影像進行少樣本推理時，即同一影像有多個提示時，應考慮此引數。
embedding_type (str, 可選) — 嵌入型別。指示提示是語義嵌入還是例項嵌入。可以是 instance 或 semantic。
labels (形狀為 (batch_size, num_channels, height, width) 的 torch.FloatTensor, 可選) — 輸入影像的真實掩碼。
output_attentions (bool, 可選) — 是否返回所有注意力層的注意力張量。有關詳細資訊，請參閱返回張量下的 attentions。
output_hidden_states (bool, 可選) — 是否返回所有層的隱藏狀態。有關詳細資訊，請參閱返回張量下的 hidden_states。
return_dict (bool, 可選) — 是否返回 ModelOutput 而不是普通元組。

transformers.models.seggpt.modeling_seggpt.SegGptImageSegmentationOutput 或 tuple(torch.FloatTensor)

transformers.models.seggpt.modeling_seggpt.SegGptImageSegmentationOutput 或 torch.FloatTensor 元組（如果傳入 return_dict=False 或 config.return_dict=False），包含根據配置 (SegGptConfig) 和輸入的不同元素。

loss (torch.FloatTensor, 可選, 當提供 labels 時返回) — 損失值。
pred_masks (形狀為 (batch_size, num_channels, height, width) 的 torch.FloatTensor) — 預測的掩碼。
hidden_states (tuple[torch.FloatTensor], 可選, 當 config.output_hidden_states=True 時返回) — 形狀為 (batch_size, patch_height, patch_width, hidden_size) 的 torch.FloatTensor 元組（一個用於嵌入輸出 + 每個層一個）。
attentions (tuple[torch.FloatTensor], 可選, 當 config.output_attentions=True 時返回) — 形狀為 (batch_size, num_heads, seq_len, seq_len) 的 torch.FloatTensor 元組（每個層一個）。

SegGptForImageSegmentation 的 forward 方法，覆蓋了 __call__ 特殊方法。

儘管前向傳遞的實現需要在該函式中定義，但在此之後應該呼叫 Module 例項，因為前者負責執行預處理和後處理步驟，而後者則默默地忽略它們。

示例

>>> from transformers import SegGptImageProcessor, SegGptForImageSegmentation
>>> from PIL import Image
>>> import requests

>>> image_input_url = "https://raw.githubusercontent.com/baaivision/Painter/main/SegGPT/SegGPT_inference/examples/hmbb_2.jpg"
>>> image_prompt_url = "https://raw.githubusercontent.com/baaivision/Painter/main/SegGPT/SegGPT_inference/examples/hmbb_1.jpg"
>>> mask_prompt_url = "https://raw.githubusercontent.com/baaivision/Painter/main/SegGPT/SegGPT_inference/examples/hmbb_1_target.png"

>>> image_input = Image.open(requests.get(image_input_url, stream=True).raw)
>>> image_prompt = Image.open(requests.get(image_prompt_url, stream=True).raw)
>>> mask_prompt = Image.open(requests.get(mask_prompt_url, stream=True).raw).convert("L")

>>> checkpoint = "BAAI/seggpt-vit-large"
>>> model = SegGptForImageSegmentation.from_pretrained(checkpoint)
>>> image_processor = SegGptImageProcessor.from_pretrained(checkpoint)

>>> inputs = image_processor(images=image_input, prompt_images=image_prompt, prompt_masks=mask_prompt, return_tensors="pt")
>>> outputs = model(**inputs)
>>> result = image_processor.post_process_semantic_segmentation(outputs, target_sizes=[(image_input.height, image_input.width)])[0]
>>> print(list(result.shape))
[170, 297]

< > 在 GitHub 上更新

Transformers

SegGPT

概述

SegGptConfig

class transformers.SegGptConfig

SegGptImageProcessor

class transformers.SegGptImageProcessor

預處理

post_process_semantic_segmentation

SegGptModel

class transformers.SegGptModel

forward

SegGptForImageSegmentation

class transformers.SegGptForImageSegmentation

forward