Diffusers 文件

VAE 影像處理器

Diffusers

加入 Hugging Face 社群

並獲得增強的文件體驗

在模型、資料集和 Spaces 上進行協作

透過加速推理獲得更快的示例

切換文件主題

開始使用

VAE 影像處理器

VaeImageProcessor 提供統一的 API，用於 StableDiffusionPipeline 準備 VAE 編碼的影像輸入並在解碼後處理輸出。這包括調整大小、歸一化以及 PIL 影像、PyTorch 和 NumPy 陣列之間的轉換等操作。

所有帶有 VaeImageProcessor 的管道都接受 PIL 影像、PyTorch 張量或 NumPy 陣列作為影像輸入，並根據使用者提供的 output_type 引數返回輸出。您可以將編碼後的影像潛在特徵直接傳遞給管道，並使用 output_type 引數（例如 output_type="latent"）將潛在特徵作為特定輸出從管道返回。這允許您將一個管道生成的潛在特徵傳遞給另一個管道作為輸入，而無需離開潛在空間。它還使得透過在不同管道之間直接傳遞 PyTorch 張量來更容易地同時使用多個管道。

VaeImageProcessor

class diffusers.image_processor.VaeImageProcessor

( do_resize: bool = True vae_scale_factor: int = 8 vae_latent_channels: int = 4 resample: str = 'lanczos' reducing_gap: int = None do_normalize: bool = True do_binarize: bool = False do_convert_rgb: bool = False do_convert_grayscale: bool = False )

引數

do_resize (bool, 可選, 預設為 True) — 是否將影像的 (高度, 寬度) 尺寸縮小為 vae_scale_factor 的倍數。可以接受來自 image_processor.VaeImageProcessor.preprocess() 方法的 height 和 width 引數。
vae_scale_factor (int, 可選, 預設為 8) — VAE 縮放因子。如果 do_resize 為 True，影像將自動調整為該因子的倍數。
resample (str, 可選, 預設為 lanczos) — 調整影像大小時使用的重取樣濾波器。
do_normalize (bool, 可選, 預設為 True) — 是否將影像歸一化到 [-1,1]。
do_binarize (bool, 可選, 預設為 False) — 是否將影像二值化為 0/1。
do_convert_rgb (bool, 可選, 預設為 False) — 是否將影像轉換為 RGB 格式。
do_convert_grayscale (bool, 可選, 預設為 False) — 是否將影像轉換為灰度格式。

VAE 影像處理器。

應用疊加

( mask: Image init_image: Image image: Image crop_coords: typing.Optional[typing.Tuple[int, int, int, int]] = None ) → PIL.Image.Image

引數

mask (PIL.Image.Image) — 突出顯示要疊加區域的遮罩影像。
init_image (PIL.Image.Image) — 應用疊加的原始影像。
image (PIL.Image.Image) — 要疊加到原始影像上的影像。
crop_coords (Tuple[int, int, int, int], 可選) — 裁剪影像的座標。如果提供，影像將相應裁剪。

返回

PIL.Image.Image

應用疊加後的最終影像。

在原始影像上疊加遮罩和修復後的影像。

二值化

( image: Image ) → PIL.Image.Image

引數

image (PIL.Image.Image) — 輸入影像，應為 PIL 影像。

返回

PIL.Image.Image

二值化影像。值小於 0.5 的設定為 0，值大於 0.5 的設定為 1。

建立遮罩。

模糊

( image: Image blur_factor: int = 4 ) → PIL.Image.Image

引數

image (PIL.Image.Image) — 要轉換為灰度的 PIL 影像。

返回

PIL.Image.Image

灰度轉換後的 PIL 影像。

對影像應用高斯模糊。

轉換為灰度

( image: Image ) → PIL.Image.Image

引數

image (PIL.Image.Image) — 要轉換的輸入影像。

返回

PIL.Image.Image

轉換為灰度影像。

將給定的 PIL 影像轉換為灰度。

轉換為 RGB

( image: Image ) → PIL.Image.Image

引數

image (PIL.Image.Image) — 要轉換為 RGB 的 PIL 影像。

返回

PIL.Image.Image

RGB 轉換後的 PIL 影像。

將 PIL 影像轉換為 RGB 格式。

去歸一化

( images: typing.Union[numpy.ndarray, torch.Tensor] ) → np.ndarray 或 torch.Tensor

引數

images (np.ndarray 或 torch.Tensor) — 要去歸一化的影像陣列。

返回

np.ndarray 或 torch.Tensor

去歸一化後的影像陣列。

將影像陣列去歸一化到 [0,1]。

獲取裁剪區域

( mask_image: Image width: int height: int pad = 0 ) → tuple

引數

mask_image (PIL.Image.Image) — 遮罩影像。
width (int) — 要處理的影像寬度。
height (int) — 要處理的影像高度。
pad (int, 可選) — 要新增到裁剪區域的填充。預設為 0。

返回

tuple

(x1, y1, x2, y2) 表示一個包含影像中所有遮罩區域並匹配原始寬高比的矩形區域。

查詢影像中包含所有遮罩區域的矩形區域，並擴充套件該區域以匹配原始影像的寬高比；例如，如果使用者在 128x32 區域繪製遮罩，並且處理尺寸為 512x512，則該區域將擴充套件到 128x128。

獲取預設高度和寬度

( image: typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor] height: typing.Optional[int] = None width: typing.Optional[int] = None ) → Tuple[int, int]

引數

image (Union[PIL.Image.Image, np.ndarray, torch.Tensor]) — 影像輸入，可以是 PIL 影像、NumPy 陣列或 PyTorch 張量。如果是 NumPy 陣列，其形狀應為 [batch, height, width] 或 [batch, height, width, channels]。如果是 PyTorch 張量，其形狀應為 [batch, channels, height, width]。
height (Optional[int], 可選, 預設為 None) — 預處理影像的高度。如果為 None，將使用 image 輸入的高度。
width (Optional[int], 可選, 預設為 None) — 預處理影像的寬度。如果為 None，將使用 image 輸入的寬度。

返回

Tuple[int, int]

一個包含高度和寬度的元組，兩者都已調整為 vae_scale_factor 的最近整數倍。

返回影像的高度和寬度，已縮減到 vae_scale_factor 的下一個整數倍。

歸一化

( images: typing.Union[numpy.ndarray, torch.Tensor] ) → np.ndarray 或 torch.Tensor

引數

images (np.ndarray or torch.Tensor) — 要標準化的影像陣列。

返回

np.ndarray 或 torch.Tensor

標準化的影像陣列。

將影像陣列標準化為[-1,1]。

numpy_to_pil

( images: ndarray ) → List[PIL.Image.Image]

引數

images (np.ndarray) — 要轉換為PIL格式的影像陣列。

返回

List[PIL.Image.Image]

PIL影像列表。

將NumPy影像或影像批次轉換為PIL影像。

numpy_to_pt

( images: ndarray ) → torch.Tensor

引數

images (np.ndarray) — 要轉換為PyTorch格式的NumPy影像陣列。

返回

torch.Tensor

影像的PyTorch張量表示。

將NumPy影像轉換為PyTorch張量。

pil_to_numpy

( images: typing.Union[typing.List[PIL.Image.Image], PIL.Image.Image] ) → np.ndarray

引數

images (PIL.Image.Image or List[PIL.Image.Image]) — 要轉換為NumPy格式的PIL影像或影像列表。

返回

np.ndarray

影像的NumPy陣列表示。

將PIL影像或PIL影像列表轉換為NumPy陣列。

postprocess

( image: Tensor output_type: str = 'pil' do_denormalize: typing.Optional[typing.List[bool]] = None ) → PIL.Image.Image, np.ndarray or torch.Tensor

引數

image (torch.Tensor) — 影像輸入，應為形狀為B x C x H x W的 PyTorch 張量。
output_type (str, 可選, 預設為 pil) — 影像的輸出型別，可以是 pil、np、pt、latent 之一。
do_denormalize (List[bool], 可選, 預設為 None) — 是否將影像反標準化為 [0,1]。如果為 None，將使用 VaeImageProcessor 配置中的 do_normalize 值。

返回

PIL.Image.Image、np.ndarray 或 torch.Tensor

後處理後的影像。

將影像輸出從張量後處理為 output_type。

preprocess

( image: typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor]] height: typing.Optional[int] = None width: typing.Optional[int] = None resize_mode: str = 'default' crops_coords: typing.Optional[typing.Tuple[int, int, int, int]] = None ) → torch.Tensor

引數

image (PipelineImageInput) — 影像輸入，接受的格式為 PIL 影像、NumPy 陣列、PyTorch 張量；也接受支援的格式列表。
height (int, 可選) — 預處理影像的高度。如果為 None，將使用 get_default_height_width() 獲取預設高度。
width (int, 可選) — 預處理後的寬度。如果為 None，將使用 get_default_height_width() 獲取預設寬度。
resize_mode (str, 可選, 預設為 default) — 調整大小模式，可以是 default 或 fill 之一。如果為 default，影像將調整大小以適應指定的寬度和高度，並且可能不保持原始縱橫比。如果為 fill，影像將調整大小以適應指定的寬度和高度，保持縱橫比，然後將影像居中放置在維度內，並用影像資料填充空餘部分。如果為 crop，影像將調整大小以適應指定的寬度和高度，保持縱橫比，然後將影像居中放置在維度內，並裁剪多餘部分。請注意，調整大小模式 fill 和 crop 僅支援 PIL 影像輸入。
crops_coords (List[Tuple[int, int, int, int]], 可選, 預設為 None) — 批次中每個影像的裁剪座標。如果為 None，則不裁剪影像。

返回

torch.Tensor

預處理後的影像。

預處理影像輸入。

pt_to_numpy

( images: Tensor ) → np.ndarray

引數

images (torch.Tensor) — 要轉換為 NumPy 格式的 PyTorch 張量。

返回

np.ndarray

影像的NumPy陣列表示。

將 PyTorch 張量轉換為 NumPy 影像。

resize

( image: typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor] height: int width: int resize_mode: str = 'default' ) → PIL.Image.Image, np.ndarray or torch.Tensor

引數

image (PIL.Image.Image, np.ndarray or torch.Tensor) — 影像輸入，可以是PIL影像、NumPy陣列或PyTorch張量。
height (int) — 要調整到的高度。
width (int) — 要調整到的寬度。
resize_mode (str, 可選, 預設為 default) — 要使用的調整大小模式，可以是 default 或 fill 之一。如果為 default，影像將調整大小以適應指定的寬度和高度，並且可能不保持原始縱橫比。如果為 fill，影像將調整大小以適應指定的寬度和高度，保持縱橫比，然後將影像居中放置在維度內，並用影像資料填充空餘部分。如果為 crop，影像將調整大小以適應指定的寬度和高度，保持縱橫比，然後將影像居中放置在維度內，並裁剪多餘部分。請注意，調整大小模式 fill 和 crop 僅支援 PIL 影像輸入。

返回

PIL.Image.Image、np.ndarray 或 torch.Tensor

調整大小後的影像。

調整影像大小。

VaeImageProcessorLDM3D

VaeImageProcessorLDM3D 接受 RGB 和深度輸入並返回 RGB 和深度輸出。

class diffusers.image_processor.VaeImageProcessorLDM3D

( do_resize: bool = True vae_scale_factor: int = 8 resample: str = 'lanczos' do_normalize: bool = True )

引數

do_resize (bool, 可選, 預設為 True) — 是否將影像的（高度，寬度）維度縮小為 vae_scale_factor 的倍數。
vae_scale_factor (int, 可選, 預設為 8) — VAE 縮放因子。如果 do_resize 為 True，影像將自動調整大小為該因子的倍數。
resample (str, 可選, 預設為 lanczos) — 調整影像大小時使用的重取樣濾鏡。
do_normalize (bool, 可選, 預設為 True) — 是否將影像標準化為 [-1,1]。

用於 VAE LDM3D 的影像處理器。

depth_pil_to_numpy

( images: typing.Union[typing.List[PIL.Image.Image], PIL.Image.Image] ) → np.ndarray

引數

images (Union[List[PIL.Image.Image], PIL.Image.Image]) — 要轉換的輸入影像或影像列表。

返回

np.ndarray

轉換後圖像的 NumPy 陣列。

將PIL影像或PIL影像列表轉換為NumPy陣列。

numpy_to_depth

( images: ndarray ) → List[PIL.Image.Image]

引數

images (np.ndarray) — 輸入深度影像的 NumPy 陣列，可以是單張影像或批次。

返回

List[PIL.Image.Image]

從輸入的 NumPy 深度影像轉換而來的 PIL 影像列表。

將 NumPy 深度影像或影像批次轉換為 PIL 影像列表。

numpy_to_pil

( images: ndarray ) → List[PIL.Image.Image]

引數

images (np.ndarray) — 輸入影像的 NumPy 陣列，可以是單張影像或批次。

返回

List[PIL.Image.Image]

從輸入的 NumPy 陣列轉換而來的 PIL 影像列表。

將 NumPy 影像或影像批次轉換為 PIL 影像列表。

preprocess

( rgb: typing.Union[torch.Tensor, PIL.Image.Image, numpy.ndarray] depth: typing.Union[torch.Tensor, PIL.Image.Image, numpy.ndarray] height: typing.Optional[int] = None width: typing.Optional[int] = None target_res: typing.Optional[int] = None ) → Tuple[torch.Tensor, torch.Tensor]

引數

rgb (Union[torch.Tensor, PIL.Image.Image, np.ndarray]) — RGB 輸入影像，可以是單張影像或批次。
depth (Union[torch.Tensor, PIL.Image.Image, np.ndarray]) — 深度輸入影像，可以是單張影像或批次。
height (Optional[int], 可選, 預設為 None) — 處理後圖像的所需高度。如果為 None，則預設為輸入影像的高度。
width (Optional[int], 可選, 預設為 None) — 處理後圖像的所需寬度。如果為 None，則預設為輸入影像的寬度。
target_res (Optional[int], 可選, 預設為 None) — 影像調整大小的目標解析度。如果指定，將覆蓋高度和寬度。

返回

Tuple[torch.Tensor, torch.Tensor]

包含處理後的 RGB 和深度影像（PyTorch 張量格式）的元組。

預處理影像輸入。接受的格式為 PIL 影像、NumPy 陣列或 PyTorch 張量。

rgblike_to_depthmap

( image: typing.Union[numpy.ndarray, torch.Tensor] ) → Union[np.ndarray, torch.Tensor]

引數

image (Union[np.ndarray, torch.Tensor]) — 要轉換的 RGB 類似深度影像。

返回

Union[np.ndarray, torch.Tensor]

對應的深度圖。

將 RGB 類似深度影像轉換為深度圖。

PixArtImageProcessor

class diffusers.image_processor.PixArtImageProcessor

( do_resize: bool = True vae_scale_factor: int = 8 resample: str = 'lanczos' do_normalize: bool = True do_binarize: bool = False do_convert_grayscale: bool = False )

引數

do_resize (bool, 可選, 預設為 True) — 是否將影像的（高度，寬度）維度縮小為 vae_scale_factor 的倍數。可以接受來自 image_processor.VaeImageProcessor.preprocess() 方法的 height 和 width 引數。
vae_scale_factor (int, 可選, 預設為 8) — VAE 縮放因子。如果 do_resize 為 True，影像將自動調整大小為該因子的倍數。
resample (str, 可選, 預設為 lanczos) — 調整影像大小時使用的重取樣濾鏡。
do_normalize (bool, 可選, 預設為 True) — 是否將影像標準化為 [-1,1]。
do_binarize (bool, 可選, 預設為 False) — 是否將影像二值化為 0/1。
do_convert_rgb (bool, optional, 預設為 False) — 是否將影像轉換為 RGB 格式。
do_convert_grayscale (bool, optional, 預設為 False) — 是否將影像轉換為灰度格式。

PixArt 影像處理器，用於影像大小調整和裁剪。

classify_height_width_bin

< source 原始碼 >

( height: int width: int ratios: dict ) → Tuple[int, int]

引數

height (int) — 影像的高度。
width (int) — 影像的寬度。
ratios (dict) — 字典，其中鍵為縱橫比，值為 (高度, 寬度) 的元組。

返回

Tuple[int, int]

最接近的已分箱高度和寬度。

根據縱橫比返回已分箱的高度和寬度。

resize_and_crop_tensor

< source 原始碼 >

( samples: Tensor new_width: int new_height: int ) → torch.Tensor

引數

samples (torch.Tensor) — 形狀為 (N, C, H, W) 的張量，其中 N 為批次大小，C 為通道數，H 為高度，W 為寬度。
new_width (int) — 輸出影像的期望寬度。
new_height (int) — 輸出影像的期望高度。

返回

torch.Tensor

包含已調整大小和裁剪影像的張量。

將影像張量調整大小並裁剪到指定尺寸。

IPAdapterMaskProcessor

class diffusers.image_processor.IPAdapterMaskProcessor

< source 原始碼 >

( do_resize: bool = True vae_scale_factor: int = 8 resample: str = 'lanczos' do_normalize: bool = False do_binarize: bool = True do_convert_grayscale: bool = True )

引數

do_resize (bool, optional, 預設為 True) — 是否將影像的 (高度, 寬度) 維度縮小為 vae_scale_factor 的倍數。
vae_scale_factor (int, optional, 預設為 8) — VAE 縮放因子。如果 do_resize 為 True，影像將自動調整大小為該因子的倍數。
resample (str, optional, 預設為 'lanczos') — 調整影像大小時使用的重取樣濾波器。
do_normalize (bool, optional, 預設為 False) — 是否將影像歸一化到 [-1,1]。
do_binarize (bool, optional, 預設為 True) — 是否將影像二值化為 0/1。
do_convert_grayscale (bool, optional, 預設為 True) — 是否將影像轉換為灰度格式。

IP Adapter 影像遮罩的影像處理器。

下采樣

< source 原始碼 >

( mask: Tensor batch_size: int num_queries: int value_embed_dim: int ) → torch.Tensor

引數

mask (torch.Tensor) — 使用 IPAdapterMaskProcessor.preprocess() 生成的輸入遮罩張量。
batch_size (int) — 批次大小。
num_queries (int) — 查詢的數量。
value_embed_dim (int) — 值嵌入的維度。

返回

torch.Tensor

下采樣後的遮罩張量。

將提供的遮罩張量下采樣以匹配縮放點積注意力機制的預期維度。如果遮罩的縱橫比與輸出影像的縱橫比不匹配，則會發出警告。

< > 在 GitHub 上更新

←實用工具影片處理器→

© . This site is unofficial and not affiliated with Hugging Face, Inc.