影像分割

影像分割模型將影像中對應於不同感興趣區域的區域分開。這些模型透過為每個畫素分配一個標籤來工作。分割有幾種型別：語義分割、例項分割和全景分割。

在本指南中，我們將：

瞭解不同型別的分割.
提供一個端到端的語義分割微調示例.

在開始之前，請確保您已安裝所有必要的庫

# uncomment to install the necessary libraries
!pip install -q datasets transformers evaluate accelerate

我們鼓勵您登入到 Hugging Face 帳戶，以便您可以將模型上傳並與社群共享。當出現提示時，輸入您的令牌進行登入。

>>> from huggingface_hub import notebook_login

>>> notebook_login()

分割型別

語義分割為影像中的每個畫素分配一個標籤或類別。讓我們看看語義分割模型的輸出。它會為影像中遇到的每個物件例項分配相同的類別，例如，所有貓都將被標記為“cat”，而不是“cat-1”、“cat-2”。我們可以使用 Transformers 的影像分割管道快速推斷語義分割模型。讓我們看看示例影像。

from transformers import pipeline
from PIL import Image
import requests

url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/segmentation_input.jpg"
image = Image.open(requests.get(url, stream=True).raw)
image

我們將使用 nvidia/segformer-b1-finetuned-cityscapes-1024-1024。

semantic_segmentation = pipeline("image-segmentation", "nvidia/segformer-b1-finetuned-cityscapes-1024-1024")
results = semantic_segmentation(image)
results

分割管道輸出包括每個預測類別的掩碼。

[{'score': None,
  'label': 'road',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': None,
  'label': 'sidewalk',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': None,
  'label': 'building',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': None,
  'label': 'wall',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': None,
  'label': 'pole',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': None,
  'label': 'traffic sign',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': None,
  'label': 'vegetation',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': None,
  'label': 'terrain',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': None,
  'label': 'sky',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': None,
  'label': 'car',
  'mask': <PIL.Image.Image image mode=L size=612x415>}]

檢視汽車類別的掩碼，我們可以看到每輛車都用相同的掩碼進行分類。

results[-1]["mask"]

在例項分割中，目標不是對每個畫素進行分類，而是為給定影像中**每個物件例項**預測一個掩碼。它的工作原理與目標檢測非常相似，只是這裡不是每個例項都有一個邊界框，而是有一個分割掩碼。我們將使用 facebook/mask2former-swin-large-cityscapes-instance 來實現此功能。

instance_segmentation = pipeline("image-segmentation", "facebook/mask2former-swin-large-cityscapes-instance")
results = instance_segmentation(image)
results

正如您在下面看到的，有多個汽車被分類，並且除了屬於汽車和人物例項的畫素之外，沒有對畫素進行分類。

[{'score': 0.999944,
  'label': 'car',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': 0.999945,
  'label': 'car',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': 0.999652,
  'label': 'car',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': 0.903529,
  'label': 'person',
  'mask': <PIL.Image.Image image mode=L size=612x415>}]

檢視下面的一個汽車掩碼。

results[2]["mask"]

全景分割結合了語義分割和例項分割，其中每個畫素都被分類為某個類別及其一個例項，並且每個類別的例項都有多個掩碼。我們可以使用 facebook/mask2former-swin-large-cityscapes-panoptic 來實現。

panoptic_segmentation = pipeline("image-segmentation", "facebook/mask2former-swin-large-cityscapes-panoptic")
results = panoptic_segmentation(image)
results

正如您在下面看到的，我們有更多的類別。我們稍後將說明每個畫素是如何被分類到其中一個類別中的。

[{'score': 0.999981,
  'label': 'car',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': 0.999958,
  'label': 'car',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': 0.99997,
  'label': 'vegetation',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': 0.999575,
  'label': 'pole',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': 0.999958,
  'label': 'building',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': 0.999634,
  'label': 'road',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': 0.996092,
  'label': 'sidewalk',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': 0.999221,
  'label': 'car',
  'mask': <PIL.Image.Image image mode=L size=612x415>},
 {'score': 0.99987,
  'label': 'sky',
  'mask': <PIL.Image.Image image mode=L size=612x415>}]

讓我們並排比較所有型別的分割。

瞭解了所有型別的分割，讓我們深入研究如何微調語義分割模型。

語義分割的常見實際應用包括訓練自動駕駛汽車識別行人和重要的交通訊息、識別醫學影像中的細胞和異常，以及監測衛星影像中的環境變化。

為分割任務微調模型

我們現在將

在 SceneParse150 資料集上微調 SegFormer。
使用您的微調模型進行推理。

要檢視與此任務相容的所有架構和檢查點，我們建議檢視任務頁面

載入 SceneParse150 資料集

首先從 🤗 Datasets 庫載入 SceneParse150 資料集的一個較小子集。這讓您有機會進行實驗並確保一切正常，然後再花更多時間在完整資料集上進行訓練。

>>> from datasets import load_dataset

>>> ds = load_dataset("scene_parse_150", split="train[:50]")

使用 train_test_split 方法將資料集的 `train` 拆分為訓練集和測試集

>>> ds = ds.train_test_split(test_size=0.2)
>>> train_ds = ds["train"]
>>> test_ds = ds["test"]

然後檢視一個示例

>>> train_ds[0]
{'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=512x683 at 0x7F9B0C201F90>,
 'annotation': <PIL.PngImagePlugin.PngImageFile image mode=L size=512x683 at 0x7F9B0C201DD0>,
 'scene_category': 368}

# view the image
>>> train_ds[0]["image"]

image：場景的 PIL 影像。
annotation：分割圖的 PIL 影像，它也是模型的訓練目標。
scene_category：描述影像場景的類別 ID，例如“廚房”或“辦公室”。在本指南中，您只需要 image 和 annotation，它們都是 PIL 影像。

您還需要建立一個將標籤 ID 對映到標籤類別的字典，這在以後設定模型時會很有用。從 Hub 下載對映並建立 `id2label` 和 `label2id` 字典

>>> import json
>>> from pathlib import Path
>>> from huggingface_hub import hf_hub_download

>>> repo_id = "huggingface/label-files"
>>> filename = "ade20k-id2label.json"
>>> id2label = json.loads(Path(hf_hub_download(repo_id, filename, repo_type="dataset")).read_text())
>>> id2label = {int(k): v for k, v in id2label.items()}
>>> label2id = {v: k for k, v in id2label.items()}
>>> num_labels = len(id2label)

自定義資料集

如果您喜歡使用 run_semantic_segmentation.py 指令碼而不是筆記本例項進行訓練，您也可以建立並使用自己的資料集。該指令碼需要：

一個包含兩個 Image 列（“image”和“label”）的 DatasetDict

from datasets import Dataset, DatasetDict, Image

image_paths_train = ["path/to/image_1.jpg/jpg", "path/to/image_2.jpg/jpg", ..., "path/to/image_n.jpg/jpg"]
label_paths_train = ["path/to/annotation_1.png", "path/to/annotation_2.png", ..., "path/to/annotation_n.png"]

image_paths_validation = [...]
label_paths_validation = [...]

def create_dataset(image_paths, label_paths):
    dataset = Dataset.from_dict({"image": sorted(image_paths),
                                "label": sorted(label_paths)})
    dataset = dataset.cast_column("image", Image())
    dataset = dataset.cast_column("label", Image())
    return dataset

# step 1: create Dataset objects
train_dataset = create_dataset(image_paths_train, label_paths_train)
validation_dataset = create_dataset(image_paths_validation, label_paths_validation)

# step 2: create DatasetDict
dataset = DatasetDict({
     "train": train_dataset,
     "validation": validation_dataset,
     }
)

# step 3: push to Hub (assumes you have ran the huggingface-cli login command in a terminal/notebook)
dataset.push_to_hub("your-name/dataset-repo")

# optionally, you can push to a private repo on the Hub
# dataset.push_to_hub("name of repo on the hub", private=True)

一個將類整數對映到其類名的 id2label 字典

import json
# simple example
id2label = {0: 'cat', 1: 'dog'}
with open('id2label.json', 'w') as fp:
json.dump(id2label, fp)

例如，檢視這個示例資料集，它是透過上述步驟建立的。

預處理

下一步是載入 SegFormer 影像處理器，以準備影像和標註用於模型。有些資料集，例如本資料集，使用零索引作為背景類別。但是，背景類別實際上不包含在 150 個類別中，因此您需要設定 `do_reduce_labels=True` 以從所有標籤中減去一。零索引被替換為 `255`，因此 SegFormer 的損失函式會忽略它。

>>> from transformers import AutoImageProcessor

>>> checkpoint = "nvidia/mit-b0"
>>> image_processor = AutoImageProcessor.from_pretrained(checkpoint, do_reduce_labels=True)

Pytorch

隱藏 Pytorch 內容

通常會對影像資料集應用一些資料增強，以使模型更健壯，從而避免過擬合。在本指南中，您將使用 torchvision 的 ColorJitter 函式來隨機改變影像的顏色屬性，但您也可以使用任何您喜歡的影像庫。

>>> from torchvision.transforms import ColorJitter

>>> jitter = ColorJitter(brightness=0.25, contrast=0.25, saturation=0.25, hue=0.1)

現在建立兩個預處理函式，用於準備影像和標註供模型使用。這些函式將影像轉換為 `pixel_values`，並將標註轉換為 `labels`。對於訓練集，在將影像提供給影像處理器之前會應用 `jitter`。對於測試集，影像處理器會裁剪並標準化 `images`，並且只裁剪 `labels`，因為在測試期間不應用資料增強。

>>> def train_transforms(example_batch):
...     images = [jitter(x) for x in example_batch["image"]]
...     labels = [x for x in example_batch["annotation"]]
...     inputs = image_processor(images, labels)
...     return inputs


>>> def val_transforms(example_batch):
...     images = [x for x in example_batch["image"]]
...     labels = [x for x in example_batch["annotation"]]
...     inputs = image_processor(images, labels)
...     return inputs

為了在整個資料集上應用 `jitter`，使用 🤗 Datasets 的 set_transform 函式。轉換是即時應用的，這樣更快，並且佔用更少的磁碟空間

>>> train_ds.set_transform(train_transforms)
>>> test_ds.set_transform(val_transforms)

TensorFlow

隱藏 TensorFlow 內容

通常，我們會對影像資料集應用一些資料增強，以使模型更健壯，從而避免過擬合。在本指南中，您將使用 tf.image 隨機更改影像的顏色屬性，但您也可以使用任何您喜歡的影像庫。定義兩個獨立的轉換函式：

包含影像增強的訓練資料轉換
僅轉置影像的驗證資料轉換，因為 🤗 Transformers 中的計算機視覺模型期望通道優先佈局

>>> import tensorflow as tf


>>> def aug_transforms(image):
...     image = tf.keras.utils.img_to_array(image)
...     image = tf.image.random_brightness(image, 0.25)
...     image = tf.image.random_contrast(image, 0.5, 2.0)
...     image = tf.image.random_saturation(image, 0.75, 1.25)
...     image = tf.image.random_hue(image, 0.1)
...     image = tf.transpose(image, (2, 0, 1))
...     return image


>>> def transforms(image):
...     image = tf.keras.utils.img_to_array(image)
...     image = tf.transpose(image, (2, 0, 1))
...     return image

接下來，建立兩個預處理函式，用於準備影像和標註批次供模型使用。這些函式應用影像變換並使用之前載入的 `image_processor` 將影像轉換為 `pixel_values`，將標註轉換為 `labels`。`ImageProcessor` 還負責影像的縮放和歸一化。

>>> def train_transforms(example_batch):
...     images = [aug_transforms(x.convert("RGB")) for x in example_batch["image"]]
...     labels = [x for x in example_batch["annotation"]]
...     inputs = image_processor(images, labels)
...     return inputs


>>> def val_transforms(example_batch):
...     images = [transforms(x.convert("RGB")) for x in example_batch["image"]]
...     labels = [x for x in example_batch["annotation"]]
...     inputs = image_processor(images, labels)
...     return inputs

要對整個資料集應用預處理轉換，請使用 🤗 Datasets 的 set_transform 函式。該轉換是即時應用的，速度更快，佔用的磁碟空間也更少

>>> train_ds.set_transform(train_transforms)
>>> test_ds.set_transform(val_transforms)

評估

在訓練期間包含一個度量指標通常有助於評估模型的效能。您可以使用 🤗 Evaluate 庫快速載入評估方法。對於此任務，載入平均交併比 (IoU) 度量（請參閱 🤗 Evaluate 快速入門以瞭解如何載入和計算度量）

>>> import evaluate

>>> metric = evaluate.load("mean_iou")

然後建立一個函式來計算指標。您的預測需要首先轉換為 logits，然後重塑以匹配標籤的大小，才能呼叫計算。

Pytorch

隱藏 Pytorch 內容

>>> import numpy as np
>>> import torch
>>> from torch import nn

>>> def compute_metrics(eval_pred):
...     with torch.no_grad():
...         logits, labels = eval_pred
...         logits_tensor = torch.from_numpy(logits)
...         logits_tensor = nn.functional.interpolate(
...             logits_tensor,
...             size=labels.shape[-2:],
...             mode="bilinear",
...             align_corners=False,
...         ).argmax(dim=1)

...         pred_labels = logits_tensor.detach().cpu().numpy()
...         metrics = metric.compute(
...             predictions=pred_labels,
...             references=labels,
...             num_labels=num_labels,
...             ignore_index=255,
...             reduce_labels=False,
...         )
...         for key, value in metrics.items():
...             if isinstance(value, np.ndarray):
...                 metrics[key] = value.tolist()
...         return metrics

TensorFlow

隱藏 TensorFlow 內容

>>> def compute_metrics(eval_pred):
...     logits, labels = eval_pred
...     logits = tf.transpose(logits, perm=[0, 2, 3, 1])
...     logits_resized = tf.image.resize(
...         logits,
...         size=tf.shape(labels)[1:],
...         method="bilinear",
...     )

...     pred_labels = tf.argmax(logits_resized, axis=-1)
...     metrics = metric.compute(
...         predictions=pred_labels,
...         references=labels,
...         num_labels=num_labels,
...         ignore_index=-1,
...         reduce_labels=image_processor.do_reduce_labels,
...     )

...     per_category_accuracy = metrics.pop("per_category_accuracy").tolist()
...     per_category_iou = metrics.pop("per_category_iou").tolist()

...     metrics.update({f"accuracy_{id2label[i]}": v for i, v in enumerate(per_category_accuracy)})
...     metrics.update({f"iou_{id2label[i]}": v for i, v in enumerate(per_category_iou)})
...     return {"val_" + k: v for k, v in metrics.items()}

您的 compute_metrics 函式現在可以使用了，您將在設定訓練時再次用到它。

訓練

Pytorch

隱藏 Pytorch 內容

如果您不熟悉使用 Trainer 微調模型，請檢視此處的基本教程！

您現在可以開始訓練模型了！使用 AutoModelForSemanticSegmentation 載入 SegFormer，並向模型傳遞標籤 ID 和標籤類別之間的對映

>>> from transformers import AutoModelForSemanticSegmentation, TrainingArguments, Trainer

>>> model = AutoModelForSemanticSegmentation.from_pretrained(checkpoint, id2label=id2label, label2id=label2id)

此時，只剩下三個步驟

在 TrainingArguments 中定義您的訓練超引數。重要的是不要刪除未使用的列，因為這會刪除 `image` 列。如果沒有 `image` 列，您將無法建立 `pixel_values`。設定 `remove_unused_columns=False` 以防止此行為！唯一需要的其他引數是 `output_dir`，它指定了模型的儲存位置。您將透過設定 `push_to_hub=True` 將此模型推送到 Hub（您需要登入 Hugging Face 才能上傳您的模型）。在每個 epoch 結束時，Trainer 將評估 IoU 指標並儲存訓練檢查點。
將訓練引數與模型、資料集、分詞器、資料整理器和 compute_metrics 函式一起傳遞給 Trainer。
呼叫 train() 來微調您的模型。

>>> training_args = TrainingArguments(
...     output_dir="segformer-b0-scene-parse-150",
...     learning_rate=6e-5,
...     num_train_epochs=50,
...     per_device_train_batch_size=2,
...     per_device_eval_batch_size=2,
...     save_total_limit=3,
...     eval_strategy="steps",
...     save_strategy="steps",
...     save_steps=20,
...     eval_steps=20,
...     logging_steps=1,
...     eval_accumulation_steps=5,
...     remove_unused_columns=False,
...     push_to_hub=True,
... )

>>> trainer = Trainer(
...     model=model,
...     args=training_args,
...     train_dataset=train_ds,
...     eval_dataset=test_ds,
...     compute_metrics=compute_metrics,
... )

>>> trainer.train()

訓練完成後，使用 push_to_hub() 方法將您的模型分享到 Hub，以便所有人都可以使用您的模型。

>>> trainer.push_to_hub()

TensorFlow

隱藏 TensorFlow 內容

如果您不熟悉使用 Keras 微調模型，請先檢視基本教程！

要在 TensorFlow 中微調模型，請遵循以下步驟：

定義訓練超引數，並設定最佳化器和學習率排程器。
例項化一個預訓練模型。
將 🤗 Dataset 轉換為 tf.data.Dataset。
編譯您的模型。
添加回調函式以計算指標並將您的模型上傳到 🤗 Hub
使用 fit() 方法執行訓練。

首先定義超引數、最佳化器和學習率排程器

>>> from transformers import create_optimizer

>>> batch_size = 2
>>> num_epochs = 50
>>> num_train_steps = len(train_ds) * num_epochs
>>> learning_rate = 6e-5
>>> weight_decay_rate = 0.01

>>> optimizer, lr_schedule = create_optimizer(
...     init_lr=learning_rate,
...     num_train_steps=num_train_steps,
...     weight_decay_rate=weight_decay_rate,
...     num_warmup_steps=0,
... )

然後，使用 TFAutoModelForSemanticSegmentation 載入 SegFormer 以及標籤對映，並使用最佳化器對其進行編譯。請注意，Transformers 模型都具有預設的任務相關損失函式，因此除非您需要，否則無需指定一個

>>> from transformers import TFAutoModelForSemanticSegmentation

>>> model = TFAutoModelForSemanticSegmentation.from_pretrained(
...     checkpoint,
...     id2label=id2label,
...     label2id=label2id,
... )
>>> model.compile(optimizer=optimizer)  # No loss argument!

使用 to_tf_dataset 和 DefaultDataCollator 將資料集轉換為 tf.data.Dataset 格式

>>> from transformers import DefaultDataCollator

>>> data_collator = DefaultDataCollator(return_tensors="tf")

>>> tf_train_dataset = train_ds.to_tf_dataset(
...     columns=["pixel_values", "label"],
...     shuffle=True,
...     batch_size=batch_size,
...     collate_fn=data_collator,
... )

>>> tf_eval_dataset = test_ds.to_tf_dataset(
...     columns=["pixel_values", "label"],
...     shuffle=True,
...     batch_size=batch_size,
...     collate_fn=data_collator,
... )

要從預測中計算準確率並將模型推送到 🤗 Hub，請使用 Keras 回撥。將您的 compute_metrics 函式傳遞給 KerasMetricCallback，並使用 PushToHubCallback 上傳模型

>>> from transformers.keras_callbacks import KerasMetricCallback, PushToHubCallback

>>> metric_callback = KerasMetricCallback(
...     metric_fn=compute_metrics, eval_dataset=tf_eval_dataset, batch_size=batch_size, label_cols=["labels"]
... )

>>> push_to_hub_callback = PushToHubCallback(output_dir="scene_segmentation", tokenizer=image_processor)

>>> callbacks = [metric_callback, push_to_hub_callback]

最後，您已準備好訓練您的模型！呼叫 fit() 並傳入訓練和驗證資料集、epoch 數量以及您的回撥函式，以微調模型

>>> model.fit(
...     tf_train_dataset,
...     validation_data=tf_eval_dataset,
...     callbacks=callbacks,
...     epochs=num_epochs,
... )

恭喜！您已經微調了模型並將其分享到 🤗 Hub。您現在可以將其用於推理！

推理

太棒了，現在您已經微調了模型，您可以將其用於推理了！

重新載入資料集並載入影像進行推理。

>>> from datasets import load_dataset

>>> ds = load_dataset("scene_parse_150", split="train[:50]")
>>> ds = ds.train_test_split(test_size=0.2)
>>> test_ds = ds["test"]
>>> image = ds["test"][0]["image"]
>>> image

Pytorch

隱藏 Pytorch 內容

我們現在將展示如何在沒有管道的情況下進行推理。使用影像處理器處理影像並將 `pixel_values` 放到 GPU 上

>>> from accelerate.test_utils.testing import get_backend
# automatically detects the underlying device type (CUDA, CPU, XPU, MPS, etc.)
>>> device, _, _ = get_backend()
>>> encoding = image_processor(image, return_tensors="pt")
>>> pixel_values = encoding.pixel_values.to(device)

將輸入傳遞給模型並返回 logits

>>> outputs = model(pixel_values=pixel_values)
>>> logits = outputs.logits.cpu()

接下來，將 logits 重新縮放到原始影像大小

>>> upsampled_logits = nn.functional.interpolate(
...     logits,
...     size=image.size[::-1],
...     mode="bilinear",
...     align_corners=False,
... )

>>> pred_seg = upsampled_logits.argmax(dim=1)[0]

TensorFlow

隱藏 TensorFlow 內容

載入影像處理器以預處理影像並將輸入作為 TensorFlow 張量返回

>>> from transformers import AutoImageProcessor

>>> image_processor = AutoImageProcessor.from_pretrained("MariaK/scene_segmentation")
>>> inputs = image_processor(image, return_tensors="tf")

將輸入傳遞給模型並返回 logits

>>> from transformers import TFAutoModelForSemanticSegmentation

>>> model = TFAutoModelForSemanticSegmentation.from_pretrained("MariaK/scene_segmentation")
>>> logits = model(**inputs).logits

接下來，將 logits 重新縮放到原始影像大小，並在類別維度上應用 argmax

>>> logits = tf.transpose(logits, [0, 2, 3, 1])

>>> upsampled_logits = tf.image.resize(
...     logits,
...     # We reverse the shape of `image` because `image.size` returns width and height.
...     image.size[::-1],
... )

>>> pred_seg = tf.math.argmax(upsampled_logits, axis=-1)[0]

為了視覺化結果，載入資料集顏色調色盤作為 `ade_palette()`，它將每個類別對映到其 RGB 值。

def ade_palette():
  return np.asarray([
      [0, 0, 0],
      [120, 120, 120],
      [180, 120, 120],
      [6, 230, 230],
      [80, 50, 50],
      [4, 200, 3],
      [120, 120, 80],
      [140, 140, 140],
      [204, 5, 255],
      [230, 230, 230],
      [4, 250, 7],
      [224, 5, 255],
      [235, 255, 7],
      [150, 5, 61],
      [120, 120, 70],
      [8, 255, 51],
      [255, 6, 82],
      [143, 255, 140],
      [204, 255, 4],
      [255, 51, 7],
      [204, 70, 3],
      [0, 102, 200],
      [61, 230, 250],
      [255, 6, 51],
      [11, 102, 255],
      [255, 7, 71],
      [255, 9, 224],
      [9, 7, 230],
      [220, 220, 220],
      [255, 9, 92],
      [112, 9, 255],
      [8, 255, 214],
      [7, 255, 224],
      [255, 184, 6],
      [10, 255, 71],
      [255, 41, 10],
      [7, 255, 255],
      [224, 255, 8],
      [102, 8, 255],
      [255, 61, 6],
      [255, 194, 7],
      [255, 122, 8],
      [0, 255, 20],
      [255, 8, 41],
      [255, 5, 153],
      [6, 51, 255],
      [235, 12, 255],
      [160, 150, 20],
      [0, 163, 255],
      [140, 140, 140],
      [250, 10, 15],
      [20, 255, 0],
      [31, 255, 0],
      [255, 31, 0],
      [255, 224, 0],
      [153, 255, 0],
      [0, 0, 255],
      [255, 71, 0],
      [0, 235, 255],
      [0, 173, 255],
      [31, 0, 255],
      [11, 200, 200],
      [255, 82, 0],
      [0, 255, 245],
      [0, 61, 255],
      [0, 255, 112],
      [0, 255, 133],
      [255, 0, 0],
      [255, 163, 0],
      [255, 102, 0],
      [194, 255, 0],
      [0, 143, 255],
      [51, 255, 0],
      [0, 82, 255],
      [0, 255, 41],
      [0, 255, 173],
      [10, 0, 255],
      [173, 255, 0],
      [0, 255, 153],
      [255, 92, 0],
      [255, 0, 255],
      [255, 0, 245],
      [255, 0, 102],
      [255, 173, 0],
      [255, 0, 20],
      [255, 184, 184],
      [0, 31, 255],
      [0, 255, 61],
      [0, 71, 255],
      [255, 0, 204],
      [0, 255, 194],
      [0, 255, 82],
      [0, 10, 255],
      [0, 112, 255],
      [51, 0, 255],
      [0, 194, 255],
      [0, 122, 255],
      [0, 255, 163],
      [255, 153, 0],
      [0, 255, 10],
      [255, 112, 0],
      [143, 255, 0],
      [82, 0, 255],
      [163, 255, 0],
      [255, 235, 0],
      [8, 184, 170],
      [133, 0, 255],
      [0, 255, 92],
      [184, 0, 255],
      [255, 0, 31],
      [0, 184, 255],
      [0, 214, 255],
      [255, 0, 112],
      [92, 255, 0],
      [0, 224, 255],
      [112, 224, 255],
      [70, 184, 160],
      [163, 0, 255],
      [153, 0, 255],
      [71, 255, 0],
      [255, 0, 163],
      [255, 204, 0],
      [255, 0, 143],
      [0, 255, 235],
      [133, 255, 0],
      [255, 0, 235],
      [245, 0, 255],
      [255, 0, 122],
      [255, 245, 0],
      [10, 190, 212],
      [214, 255, 0],
      [0, 204, 255],
      [20, 0, 255],
      [255, 255, 0],
      [0, 153, 255],
      [0, 41, 255],
      [0, 255, 204],
      [41, 0, 255],
      [41, 255, 0],
      [173, 0, 255],
      [0, 245, 255],
      [71, 0, 255],
      [122, 0, 255],
      [0, 255, 184],
      [0, 92, 255],
      [184, 255, 0],
      [0, 133, 255],
      [255, 214, 0],
      [25, 194, 194],
      [102, 255, 0],
      [92, 0, 255],
  ])

然後，您可以組合並繪製您的影像和預測的分割圖

>>> import matplotlib.pyplot as plt
>>> import numpy as np

>>> color_seg = np.zeros((pred_seg.shape[0], pred_seg.shape[1], 3), dtype=np.uint8)
>>> palette = np.array(ade_palette())
>>> for label, color in enumerate(palette):
...     color_seg[pred_seg == label, :] = color
>>> color_seg = color_seg[..., ::-1]  # convert to BGR

>>> img = np.array(image) * 0.5 + color_seg * 0.5  # plot the image with the segmentation map
>>> img = img.astype(np.uint8)

>>> plt.figure(figsize=(15, 10))
>>> plt.imshow(img)
>>> plt.show()

Image of bedroom overlaid with segmentation map

< > 在 GitHub 上更新