影像處理器

影像處理器負責為視覺模型準備輸入特徵，並對其輸出進行後處理。這包括調整大小、歸一化以及轉換為 PyTorch、TensorFlow、Flax 和 Numpy 張量等變換。它也可能包括特定模型的後處理，例如將 logits 轉換為分割掩碼。

少數模型已提供快速影像處理器，未來將會有更多模型支援。它們基於 torchvision 庫，能顯著提升速度，尤其是在 GPU 上處理時。它們與基本影像處理器具有相同的 API，可作為直接替代品使用。要使用快速影像處理器，你需要安裝 `torchvision` 庫，並在例項化影像處理器時將 `use_fast` 引數設定為 `True`。

from transformers import AutoImageProcessor

processor = AutoImageProcessor.from_pretrained("facebook/detr-resnet-50", use_fast=True)

請注意，`use_fast` 將在未來的版本中預設設定為 `True`。

使用快速影像處理器時，你還可以設定 `device` 引數來指定處理應在哪個裝置上進行。預設情況下，如果輸入是張量，則處理在與輸入相同的裝置上進行；否則在 CPU 上進行。

from torchvision.io import read_image
from transformers import DetrImageProcessorFast

images = read_image("image.jpg")
processor = DetrImageProcessorFast.from_pretrained("facebook/detr-resnet-50")
images_processed = processor(images, return_tensors="pt", device="cuda")

以下是 `DETR` 和 `RT-DETR` 模型的基準和快速影像處理器的速度比較，以及它們對整體推理時間的影響：

這些基準測試在 AWS EC2 g5.2xlarge 例項上執行，使用 NVIDIA A10G Tensor Core GPU。

ImageProcessingMixin

class transformers.ImageProcessingMixin

< 源 >

( **kwargs )

這是一個影像處理器混合類（mixin），用於為序列和影像特徵提取器提供儲存/載入功能。

from_pretrained

< 源 >

( pretrained_model_name_or_path: typing.Union[str, os.PathLike] cache_dir: typing.Union[str, os.PathLike, NoneType] = None force_download: bool = False local_files_only: bool = False token: typing.Union[str, bool, NoneType] = None revision: str = 'main' **kwargs )

引數

pretrained_model_name_or_path (str 或 os.PathLike) — 這可以是以下之一：
- 一個字串，即託管在 huggingface.co 的模型倉庫中的預訓練 image_processor 的 *模型 ID*。
- 一個包含使用 save_pretrained() 方法儲存的影像處理器檔案的*目錄*路徑，例如：`./my_model_directory/`。
- 一個指向已儲存的影像處理器 JSON *檔案*的路徑或 URL，例如：`./my_model_directory/preprocessor_config.json`。
cache_dir (str 或 os.PathLike, *可選*) — 如果不使用標準快取，則為下載的預訓練模型影像處理器應快取到的目錄路徑。
force_download (bool, *可選*, 預設為 False) — 是否強制（重新）下載影像處理器檔案並覆蓋已存在的快取版本。
resume_download — 已棄用並忽略。所有下載現在都預設在可能的情況下斷點續傳。將在 Transformers 的 v5 版本中移除。
proxies (dict[str, str], *可選*) — 一個按協議或端點使用的代理伺服器字典，例如，`{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}`。代理將在每個請求中使用。
token (str 或 bool, *可選*) — 用於遠端檔案的 HTTP Bearer 授權的令牌。如果為 `True` 或未指定，將使用執行 `huggingface-cli login` 時生成的令牌（儲存在 `~/.huggingface` 中）。
revision (str, *可選*, 預設為 "main") — 要使用的特定模型版本。它可以是分支名、標籤名或提交 ID，因為我們在 huggingface.co 上使用基於 Git 的系統來儲存模型和其他工件，所以 `revision` 可以是 Git 允許的任何識別符號。

從影像處理器例項化一個 ImageProcessingMixin 型別的物件。

示例

# We can't instantiate directly the base class *ImageProcessingMixin* so let's show the examples on a
# derived class: *CLIPImageProcessor*
image_processor = CLIPImageProcessor.from_pretrained(
    "openai/clip-vit-base-patch32"
)  # Download image_processing_config from huggingface.co and cache.
image_processor = CLIPImageProcessor.from_pretrained(
    "./test/saved_model/"
)  # E.g. image processor (or model) was saved using *save_pretrained('./test/saved_model/')*
image_processor = CLIPImageProcessor.from_pretrained("./test/saved_model/preprocessor_config.json")
image_processor = CLIPImageProcessor.from_pretrained(
    "openai/clip-vit-base-patch32", do_normalize=False, foo=False
)
assert image_processor.do_normalize is False
image_processor, unused_kwargs = CLIPImageProcessor.from_pretrained(
    "openai/clip-vit-base-patch32", do_normalize=False, foo=False, return_unused_kwargs=True
)
assert image_processor.do_normalize is False
assert unused_kwargs == {"foo": False}

save_pretrained

< 源 >

( save_directory: typing.Union[str, os.PathLike] push_to_hub: bool = False **kwargs )

引數

save_directory (str 或 os.PathLike) — 影像處理器 JSON 檔案將儲存到的目錄（如果不存在，將會建立）。
push_to_hub (bool, *可選*, 預設為 False) — 是否在儲存模型後將其推送到 Hugging Face 模型中心。你可以使用 `repo_id` 指定要推送到的倉庫（預設為你名稱空間中 `save_directory` 的名稱）。
kwargs (dict[str, Any], *可選*) — 傳遞給 push_to_hub() 方法的附加關鍵字引數。

將影像處理器物件儲存到目錄 `save_directory` 中，以便可以使用 from_pretrained() 類方法重新載入。

BatchFeature

class transformers.BatchFeature

< 源 >

( data: typing.Optional[dict[str, typing.Any]] = None tensor_type: typing.Union[NoneType, str, transformers.utils.generic.TensorType] = None )

引數

data (dict, *可選*) — 由 __call__/pad 方法返回的列表/陣列/張量字典（‘input_values’、‘attention_mask’等）。
tensor_type (Union[None, str, TensorType], *可選*) — 你可以在此處指定一個 tensor_type，以便在初始化時將整數列表轉換為 PyTorch/TensorFlow/Numpy 張量。

儲存 pad() 方法和特徵提取器特定的 `__call__` 方法的輸出。

該類派生自 Python 字典，可以像字典一樣使用。

convert_to_tensors

< 源 >

( tensor_type: typing.Union[str, transformers.utils.generic.TensorType, NoneType] = None )

引數

tensor_type (str 或 TensorType, *可選*) — 要使用的張量型別。如果為 `str`，則應為列舉 TensorType 的值之一。如果為 `None`，則不進行修改。

將內部內容轉換為張量。

到

< 源 >

( *args **kwargs ) → BatchFeature

引數

args (Tuple) — 將傳遞給張量的 `to(...)` 函式。
kwargs (Dict, *可選*) — 將傳遞給張量的 `to(...)` 函式。要啟用非同步資料傳輸，請在 `kwargs` 中設定 `non_blocking` 標誌（預設為 `False`）。

批次特徵

修改後的同一例項。

透過呼叫 `v.to(*args, **kwargs)` 將所有值傳送到指定裝置（僅限 PyTorch）。這應該支援轉換為不同的 `dtypes` 並將 `BatchFeature` 傳送到不同的 `device`。

BaseImageProcessor

class transformers.BaseImageProcessor

< 源 >

( **kwargs )

center_crop

< 源 >

( image: ndarray size: dict data_format: typing.Union[str, transformers.image_utils.ChannelDimension, NoneType] = None input_data_format: typing.Union[str, transformers.image_utils.ChannelDimension, NoneType] = None **kwargs )

引數

image (np.ndarray) — 要進行中心裁剪的影像。
size (dict[str, int]) — 輸出影像的大小。
data_format (str 或 ChannelDimension, *可選*) — 輸出影像的通道維度格式。如果未設定，則使用輸入影像的通道維度格式。可以是以下之一：
- `"channels_first"` 或 `ChannelDimension.FIRST`：影像格式為 (通道數, 高度, 寬度)。
- `"channels_last"` 或 `ChannelDimension.LAST`：影像格式為 (高度, 寬度, 通道數)。
input_data_format (ChannelDimension 或 str, *可選*) — 輸入影像的通道維度格式。如果未設定，則從輸入影像中推斷通道維度格式。可以是以下之一：
- `"channels_first"` 或 `ChannelDimension.FIRST`：影像格式為 (通道數, 高度, 寬度)。
- `"channels_last"` 或 `ChannelDimension.LAST`：影像格式為 (高度, 寬度, 通道數)。

將影像中心裁剪為 `(size["height"], size["width"])`。如果輸入尺寸在任何一邊小於 `crop_size`，則影像將用 0 填充，然後進行中心裁剪。

歸一化

< 源 >

( image: ndarray mean: typing.Union[float, collections.abc.Iterable[float]] std: typing.Union[float, collections.abc.Iterable[float]] data_format: typing.Union[str, transformers.image_utils.ChannelDimension, NoneType] = None input_data_format: typing.Union[str, transformers.image_utils.ChannelDimension, NoneType] = None **kwargs ) → np.ndarray

引數

image (np.ndarray) — 要歸一化的影像。
mean (float 或 Iterable[float]) — 用於歸一化的影像均值。
std (float 或 Iterable[float]) — 用於歸一化的影像標準差。
data_format (str 或 ChannelDimension, *可選*) — 輸出影像的通道維度格式。如果未設定，則使用輸入影像的通道維度格式。可以是以下之一：
- `"channels_first"` 或 `ChannelDimension.FIRST`：影像格式為 (通道數, 高度, 寬度)。
- `"channels_last"` 或 `ChannelDimension.LAST`：影像格式為 (高度, 寬度, 通道數)。
input_data_format (ChannelDimension 或 str, *可選*) — 輸入影像的通道維度格式。如果未設定，則從輸入影像中推斷通道維度格式。可以是以下之一：
- `"channels_first"` 或 `ChannelDimension.FIRST`：影像格式為 (通道數, 高度, 寬度)。
- `"channels_last"` 或 `ChannelDimension.LAST`：影像格式為 (高度, 寬度, 通道數)。

np.ndarray

歸一化後的影像。

歸一化影像。`image = (image - image_mean) / image_std`。

rescale

< 源 >

( image: ndarray scale: float data_format: typing.Union[str, transformers.image_utils.ChannelDimension, NoneType] = None input_data_format: typing.Union[str, transformers.image_utils.ChannelDimension, NoneType] = None **kwargs ) → np.ndarray

引數

image (np.ndarray) — 要重新縮放的影像。
scale (float) — 用於重新縮放畫素值的縮放因子。
data_format (str 或 ChannelDimension, *可選*) — 輸出影像的通道維度格式。如果未設定，則使用輸入影像的通道維度格式。可以是以下之一：
- `"channels_first"` 或 `ChannelDimension.FIRST`：影像格式為 (通道數, 高度, 寬度)。
- `"channels_last"` 或 `ChannelDimension.LAST`：影像格式為 (高度, 寬度, 通道數)。
input_data_format (ChannelDimension 或 str, *可選*) — 輸入影像的通道維度格式。如果未設定，則從輸入影像中推斷通道維度格式。可以是以下之一：
- `"channels_first"` 或 `ChannelDimension.FIRST`：影像格式為 (通道數, 高度, 寬度)。
- `"channels_last"` 或 `ChannelDimension.LAST`：影像格式為 (高度, 寬度, 通道數)。

np.ndarray

重新縮放後的影像。

透過一個縮放因子重新縮放影像。`image = image * scale`。

BaseImageProcessorFast

class transformers.BaseImageProcessorFast

< source >

( **kwargs: typing_extensions.Unpack[transformers.image_processing_utils_fast.DefaultFastImageProcessorKwargs] )

center_crop

< source >

( image: torch.Tensor size: dict **kwargs ) → torch.Tensor

引數

image ("torch.Tensor") — 需要中心裁剪的影像。
size (dict[str, int]) — 輸出影像的尺寸。

torch.Tensor

中心裁剪後的影像。

將影像中心裁剪為 `(size["height"], size["width"])`。如果輸入尺寸在任何一邊小於 `crop_size`，則影像將用 0 填充，然後進行中心裁剪。

compile_friendly_resize

< source >

( image: torch.Tensor new_size: tuple interpolation: typing.Optional[ForwardRef('F.InterpolationMode')] = None antialias: bool = True )

F.resize 的包裝器，使其在影像為 uint8 張量時與 torch.compile 相容。

轉換為 RGB

< source >

( image: typing.Union[ForwardRef('PIL.Image.Image'), numpy.ndarray, ForwardRef('torch.Tensor'), list['PIL.Image.Image'], list[numpy.ndarray], list['torch.Tensor']] ) → ImageInput

引數

image (ImageInput) — 要轉換的影像。

ImageInput

轉換後的影像。

將影像轉換為 RGB 格式。僅當影像型別為 PIL.Image.Image 時才進行轉換，否則按原樣返回影像。

filter_out_unused_kwargs

< source >

( kwargs: dict )

從 kwargs 字典中過濾掉未使用的關鍵字引數。

歸一化

< source >

( image: torch.Tensor mean: typing.Union[float, collections.abc.Iterable[float]] std: typing.Union[float, collections.abc.Iterable[float]] **kwargs ) → torch.Tensor

引數

image (torch.Tensor) — 需要歸一化的影像。
mean (torch.Tensor, float or Iterable[float]) — 用於歸一化的影像均值。
std (torch.Tensor, float or Iterable[float]) — 用於歸一化的影像標準差。

torch.Tensor

歸一化後的影像。

歸一化影像。`image = (image - image_mean) / image_std`。

preprocess

< source >

( images: typing.Union[ForwardRef('PIL.Image.Image'), numpy.ndarray, ForwardRef('torch.Tensor'), list['PIL.Image.Image'], list[numpy.ndarray], list['torch.Tensor']] *args **kwargs: typing_extensions.Unpack[transformers.image_processing_utils_fast.DefaultFastImageProcessorKwargs] ) → <class 'transformers.image_processing_base.BatchFeature'>

引數

images (Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, list['PIL.Image.Image'], list[numpy.ndarray], list['torch.Tensor']]) — 需要預處理的影像。需要單個或一批畫素值在 0 到 255 範圍內的影像。如果傳入畫素值在 0 到 1 之間的影像，請設定 do_rescale=False。
do_resize (bool, optional) — 是否調整影像大小。
size (dict[str, int], optional) — 描述模型的最大輸入尺寸。
default_to_square (bool, optional) — 如果 `size` 是一個整數，在調整大小時是否預設為方形影像。
resample (Union[PILImageResampling, F.InterpolationMode, NoneType]) — 調整影像大小時使用的重取樣濾波器。可以是 `PILImageResampling` 列舉之一。僅在 `do_resize` 設定為 `True` 時有效。
do_center_crop (bool, optional) — 是否對影像進行中心裁剪。
crop_size (dict[str, int], optional) — 應用 `center_crop` 後輸出影像的尺寸。
do_rescale (bool, optional) — 是否重新縮放影像。
rescale_factor (Union[int, float, NoneType]) — 如果 `do_rescale` 設定為 `True`，用於重新縮放影像的縮放因子。
do_normalize (bool, optional) — 是否對影像進行歸一化。
image_mean (Union[float, list[float], NoneType]) — 用於歸一化的影像均值。僅在 `do_normalize` 設定為 `True` 時有效。
image_std (Union[float, list[float], NoneType]) — 用於歸一化的影像標準差。僅在 `do_normalize` 設定為 `True` 時有效。
do_convert_rgb (bool, optional) — 是否將影像轉換為 RGB 格式。
return_tensors (Union[str, ~utils.generic.TensorType, NoneType]) — 如果設定為 `pt`，則返回堆疊的張量，否則返回張量列表。
data_format (~image_utils.ChannelDimension, optional) — 僅支援 `ChannelDimension.FIRST`。為與慢速處理器相容而新增。
input_data_format (Union[str, ~image_utils.ChannelDimension, NoneType]) — 輸入影像的通道維度格式。如果未設定，則從輸入影像中推斷通道維度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST：影像格式為 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST：影像格式為 (height, width, num_channels)。
- "none" 或 ChannelDimension.NONE：影像格式為 (height, width)。
device (torch.device, optional) — 處理影像的裝置。如果未設定，則從輸入影像中推斷裝置。
disable_grouping (bool, optional) — 是否停用按尺寸對影像進行分組，以便單獨處理而非批次處理。如果為 None，則在影像位於 CPU 上時設定為 True，否則為 False。此選擇基於經驗觀察，詳見：https://github.com/huggingface/transformers/pull/38157

<class 'transformers.image_processing_base.BatchFeature'>

data (dict) — 由 call 方法返回的列表/陣列/張量字典（“pixel_values”等）。
tensor_type (Union[None, str, TensorType], 可選) — 您可以在此處提供一個`tensor_type`，以便在初始化時將整數列表轉換為PyTorch/TensorFlow/Numpy張量。

rescale

< source >

( image: torch.Tensor scale: float **kwargs ) → torch.Tensor

引數

image (torch.Tensor) — 需要重新縮放的影像。
scale (float) — 用於重新縮放畫素值的縮放因子。

torch.Tensor

重新縮放後的影像。

透過一個縮放因子重新縮放影像。`image = image * scale`。

rescale_and_normalize

< source >

( images: torch.Tensor do_rescale: bool rescale_factor: float do_normalize: bool image_mean: typing.Union[float, list[float]] image_std: typing.Union[float, list[float]] )

重新縮放和歸一化影像。

resize

< source >

( image: torch.Tensor size: SizeDict interpolation: F.InterpolationMode = None antialias: bool = True **kwargs ) → torch.Tensor

引數

image (torch.Tensor) — 需要調整大小的影像。
size (SizeDict) — 格式為 `{"height": int, "width": int}` 的字典，指定輸出影像的尺寸。
interpolation (InterpolationMode, optional, defaults to InterpolationMode.BILINEAR) — 調整影像大小時使用的 `InterpolationMode` 濾波器，例如 `InterpolationMode.BICUBIC`。

torch.Tensor

調整大小後的影像。

將影像調整為 (size["height"], size["width"])。

< > 在 GitHub 上更新

Transformers

影像處理器

ImageProcessingMixin

class transformers.ImageProcessingMixin

from_pretrained

save_pretrained

BatchFeature

class transformers.BatchFeature

convert_to_tensors

到

BaseImageProcessor

class transformers.BaseImageProcessor

center_crop

歸一化

rescale

BaseImageProcessorFast

class transformers.BaseImageProcessorFast

center_crop

compile_friendly_resize

轉換為 RGB

filter_out_unused_kwargs

歸一化

preprocess

rescale

rescale_and_normalize

resize