Diffusers 文件

萬壽菊計算機視覺

Diffusers

加入 Hugging Face 社群

並獲得增強的文件體驗

在模型、資料集和 Spaces 上進行協作

透過加速推理獲得更快的示例

切換文件主題

開始使用

萬壽菊計算機視覺

marigold

Marigold 在 CVPR 2024 口頭報告論文《重新利用基於擴散的影像生成器進行單目深度估計》中提出，作者為 Bingxin Ke、Anton Obukhov、Shengyu Huang、Nando Metzger、Rodrigo Caye Daudt 和 Konrad Schindler。其核心思想是**重新利用文字到影像潛在擴散模型 (LDM) 的生成先驗，用於傳統計算機視覺任務**。這種方法透過微調 Stable Diffusion 進行**單目深度估計**，如上圖預告片所示。

Marigold 後來在後續論文《Marigold：基於擴散的影像生成器在影像分析中的經濟適用性適應》中得到擴充套件，作者為 Bingxin Ke、Kevin Qu、Tianfu Wang、Nando Metzger、Shengyu Huang、Bo Li、Anton Obukhov 和 Konrad Schindler。這項工作將 Marigold 擴充套件到支援**表面法線**和**內稟影像分解** (IID) 等新模態，引入了**潛在一致性模型** (LCM) 的訓練協議，並展示了**高解析度** (HR) 處理能力。

早期的 Marigold 模型（v1-0 及更早版本）經過最佳化，至少需要 10 個推理步驟才能獲得最佳結果。後來開發了 LCM 模型，僅需 1 到 4 個步驟即可實現高質量推理。Marigold 模型 v1-1 及更高版本使用 DDIM 排程器，可在 1 到 4 個步驟內獲得最佳結果。

可用管道

每個管道都針對特定的計算機視覺任務量身定製，處理輸入的 RGB 影像並生成相應的預測。目前，已實現以下計算機視覺任務

流水線	推薦模型檢查點	空間（互動式應用程式）	預測模態
MarigoldDepthPipeline	prs-eth/marigold-depth-v1-1	深度估計	深度，視差
MarigoldNormalsPipeline	prs-eth/marigold-normals-v1-1	表面法線估計	表面法線
MarigoldIntrinsicsPipeline	prs-eth/marigold-iid-appearance-v1-1, prs-eth/marigold-iid-lighting-v1-1	內稟影像分解	反照率，材質，光照

可用檢查點

所有原始檢查點都可以在 Hugging Face 上的 PRS-ETH 組織下找到。它們旨在與 Diffusers 管道和原始程式碼庫一起使用，後者也可以用於訓練新的模型檢查點。以下是推薦檢查點的摘要，所有這些檢查點都可以在 1 到 4 個步驟內生成可靠的結果。

模型權重	模態	評論
prs-eth/marigold-depth-v1-1	深度	仿射不變深度預測為每個畫素分配一個介於 0（近平面）和 1（遠平面）之間的值，兩個平面均由模型在推理過程中確定。
prs-eth/marigold-normals-v0-1	法線	表面法線預測是螢幕空間相機中單位長度的 3D 向量，值範圍為 -1 到 1。
prs-eth/marigold-iid-appearance-v1-1	內在屬性	InteriorVerse 分解包括反照率和兩種 BRDF 材料屬性：粗糙度和金屬性。
prs-eth/marigold-iid-lighting-v1-1	內在屬性	影像 $\(I$\) 的 HyperSim 分解包括反照率 $\(A$\)、漫反射著色 $\(S$\) 和非漫反射殘差 $\(R$\)：$\(I = A*S+R$\)。

務必檢視“排程器”指南，瞭解如何探索排程器速度和質量之間的權衡，並檢視“跨管道重用元件”部分，瞭解如何有效地將相同元件載入到多個管道中。此外，要了解有關減少此管道記憶體使用的更多資訊，請參閱此處的“[減少記憶體使用]”部分。

Marigold 管道在模型檢查點中嵌入了排程器，並已使用該排程器進行設計和測試。最佳推理步數因排程器而異，沒有適用於所有情況的通用值。為了適應這一點，管道的 `__call__` 方法中的 `num_inference_steps` 引數預設為 `None`（參見 API 參考）。除非明確設定，否則它會繼承檢查點配置檔案（`model_index.json`）中 `default_denoising_steps` 欄位的值。這確保了在僅使用 `image` 引數呼叫管道時獲得高質量預測。

另請參閱萬壽菊使用示例。

Marigold 深度預測 API

class diffusers.MarigoldDepthPipeline

< source >

( unet: UNet2DConditionModel vae: AutoencoderKL scheduler: typing.Union[diffusers.schedulers.scheduling_ddim.DDIMScheduler, diffusers.schedulers.scheduling_lcm.LCMScheduler] text_encoder: CLIPTextModel tokenizer: CLIPTokenizer prediction_type: typing.Optional[str] = None scale_invariant: typing.Optional[bool] = True shift_invariant: typing.Optional[bool] = True default_denoising_steps: typing.Optional[int] = None default_processing_resolution: typing.Optional[int] = None )

引數

unet (UNet2DConditionModel) — 條件 U-Net，用於去噪深度潛在空間，並以影像潛在空間為條件。
vae (AutoencoderKL) — 變分自編碼器 (VAE) 模型，用於編碼和解碼影像及預測，在潛在表示之間轉換。
scheduler (DDIMScheduler 或 LCMScheduler) — 與 unet 結合使用的排程器，用於去噪編碼影像潛在空間。
text_encoder (CLIPTextModel) — 文字編碼器，用於空文字嵌入。
tokenizer (CLIPTokenizer) — CLIP 分詞器。
prediction_type (str, 可選) — 模型所做預測的型別。
scale_invariant (bool, 可選) — 一個模型屬性，指定預測的深度圖是否為尺度不變的。此值必須在模型配置中設定。當與 shift_invariant=True 標誌一起使用時，該模型也稱為“仿射不變”。注意：不支援覆蓋此值。
shift_invariant (bool, 可選) — 一個模型屬性，指定預測的深度圖是否為平移不變的。此值必須在模型配置中設定。當與 scale_invariant=True 標誌一起使用時，該模型也稱為“仿射不變”。注意：不支援覆蓋此值。
default_denoising_steps (int, 可選) — 生成高質量預測所需的最小去噪擴散步數。此值必須在模型配置中設定。當呼叫管道時未明確設定 num_inference_steps 時，將使用此預設值。這對於確保與管道相容的各種模型（例如依賴非常短的去噪時間表 (LCMScheduler) 和那些具有完整擴散時間表 (DDIMScheduler) 的模型）獲得合理的結果是必需的。
default_processing_resolution (int, 可選) — 管道的 processing_resolution 引數的推薦值。此值必須在模型配置中設定。當呼叫管道時未明確設定 processing_resolution 時，將使用此預設值。這對於確保使用各種具有不同最佳處理解析度值的模型獲得合理的結果是必需的。

使用 Marigold 方法進行單目深度估計的管道：https://marigoldmonodepth.github.io。

該模型繼承自DiffusionPipeline。有關庫為所有管道實現的通用方法（例如下載或儲存、在特定裝置上執行等），請檢視超類文件。

call

< source >

( image: typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor]] num_inference_steps: typing.Optional[int] = None ensemble_size: int = 1 processing_resolution: typing.Optional[int] = None match_input_resolution: bool = True resample_method_input: str = 'bilinear' resample_method_output: str = 'bilinear' batch_size: int = 1 ensembling_kwargs: typing.Optional[typing.Dict[str, typing.Any]] = None latents: typing.Union[torch.Tensor, typing.List[torch.Tensor], NoneType] = None generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None output_type: str = 'np' output_uncertainty: bool = False output_latent: bool = False return_dict: bool = True ) → MarigoldDepthOutput 或 tuple

引數

image (PIL.Image.Image, np.ndarray, torch.Tensor, List[PIL.Image.Image], List[np.ndarray]), — List[torch.Tensor]: 用於深度估計任務的輸入影像。對於陣列和張量，預期值範圍為 [0, 1]。透過提供四維陣列或張量，可以傳遞一批影像。此外，可以傳遞二維或三維陣列或張量列表。在後一種情況下，所有列表元素必須具有相同的寬度和高度。
num_inference_steps (int, 可選, 預設為 None) — 推理過程中的去噪擴散步數。預設值 None 會導致自動選擇。
ensemble_size (int, 預設為 1) — 整合預測的數量。值越大，可測量的改進和視覺退化越大。
processing_resolution (int, 可選, 預設為 None) — 有效處理解析度。當設定為 0 時，匹配較大的輸入影像尺寸。這會產生更清晰的預測，但也可能導致全域性上下文的整體丟失。預設值 None 解析為模型配置中的最佳值。
match_input_resolution (bool, 可選, 預設為 True) — 啟用時，輸出預測將調整大小以匹配輸入尺寸。停用時，輸出的長邊將等於 processing_resolution。
resample_method_input (str, 可選, 預設為 "bilinear") — 用於將輸入影像調整到 processing_resolution 的重取樣方法。接受的值為："nearest", "nearest-exact", "bilinear", "bicubic" 或 "area"。
resample_method_output (str, 可選, 預設為 "bilinear") — 用於將輸出預測調整為與輸入解析度匹配的重取樣方法。接受的值為 "nearest"、"nearest-exact"、"bilinear"、"bicubic" 或 "area"。
batch_size (int, 可選, 預設為 1) — 批處理大小；僅在設定 ensemble_size 或傳遞影像張量時才重要。
ensembling_kwargs (dict, 可選, 預設為 None) — 額外字典，用於精確控制整合。提供以下選項：
- reduction (str, 可選, 預設為 "median"): 定義在每個畫素位置應用的整合函式，可以是 "median" 或 "mean"。
- regularizer_strength (float, 可選, 預設為 0.02): 將對齊的預測拉入 0 到 1 單位範圍的正則化器的強度。
- max_iter (int, 可選, 預設為 2): 對齊求解器步驟的最大數量。請參考 scipy.optimize.minimize 函式的 options 引數。
- tol (float, 可選, 預設為 1e-3): 對齊求解器容差。當達到容差時，求解器停止。
- max_res (int, 可選, 預設為 None): 執行對齊的解析度；None 匹配 processing_resolution。
latents (torch.Tensor, 或 List[torch.Tensor], 可選, 預設為 None) — 用於替換隨機初始化的潛在噪聲張量。這些可以取自上一個函式呼叫的輸出。
generator (torch.Generator, 或 List[torch.Generator], 可選, 預設為 None) — 隨機數生成器物件，以確保可復現性。
output_type (str, 可選, 預設為 "np") — 輸出中 prediction 和可選的 uncertainty 欄位的首選格式。接受的值為："np"（numpy 陣列）或 "pt"（torch 張量）。
output_uncertainty (bool, 可選, 預設為 False) — 啟用後，如果 ensemble_size 引數設定為大於 2 的值，則輸出的 uncertainty 欄位將包含預測不確定性圖。
output_latent (bool, 可選, 預設為 False) — 啟用後，輸出的 latent 欄位將包含與集合中的預測相對應的潛在程式碼。這些程式碼可以儲存、修改，並用於後續呼叫 latents 引數。
return_dict (bool, 可選, 預設為 True) — 是否返回 MarigoldDepthOutput 而不是普通元組。

MarigoldDepthOutput 或 tuple

如果 return_dict 為 True，則返回 MarigoldDepthOutput；否則返回一個 tuple，其中第一個元素是預測，第二個元素是不確定性（或 None），第三個元素是潛在（或 None）。

呼叫管道時呼叫的函式。

示例

>>> import diffusers
>>> import torch

>>> pipe = diffusers.MarigoldDepthPipeline.from_pretrained(
...     "prs-eth/marigold-depth-v1-1", variant="fp16", torch_dtype=torch.float16
... ).to("cuda")

>>> image = diffusers.utils.load_image("https://marigoldmonodepth.github.io/images/einstein.jpg")
>>> depth = pipe(image)

>>> vis = pipe.image_processor.visualize_depth(depth.prediction)
>>> vis[0].save("einstein_depth.png")

>>> depth_16bit = pipe.image_processor.export_depth_to_16bit_png(depth.prediction)
>>> depth_16bit[0].save("einstein_depth_16bit.png")

類 diffusers.pipelines.marigold.MarigoldDepthOutput

< 源 >

( prediction: typing.Union[numpy.ndarray, torch.Tensor] uncertainty: typing.Union[NoneType, numpy.ndarray, torch.Tensor] latent: typing.Optional[torch.Tensor] )

引數

prediction (np.ndarray, torch.Tensor) — 預測的深度圖，值範圍為 [0, 1]。對於 torch.Tensor，形狀為 $numimages × 1 × height × width$，對於 np.ndarray，形狀為 $numimages × height × width × 1$。
uncertainty (None, np.ndarray, torch.Tensor) — 從集合計算出的不確定性圖，值範圍為 [0, 1]。對於 torch.Tensor，形狀為 $numimages × 1 × height × width$，對於 np.ndarray，形狀為 $numimages × height × width × 1$。
latent (None, torch.Tensor) — 與預測相對應的潛在特徵，與管道的 latents 引數相容。形狀為 $numimages * numensemble × 4 × latentheight × latentwidth$。

Marigold 單目深度預測管道的輸出類。

diffusers.pipelines.marigold.MarigoldImageProcessor.visualize_depth

< 源 >

( depth: typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor]] val_min: float = 0.0 val_max: float = 1.0 color_map: str = 'Spectral' )

引數

depth (Union[PIL.Image.Image, np.ndarray, torch.Tensor, List[PIL.Image.Image], List[np.ndarray], -- List[torch.Tensor]]): 深度圖。
val_min (float, 可選, 預設為 0.0) — 視覺化深度範圍的最小值。
val_max (float, 可選, 預設為 1.0) — 視覺化深度範圍的最大值。
color_map (str, 可選, 預設為 "Spectral") — 用於將單通道深度預測轉換為彩色表示的顏色圖。

視覺化深度圖，例如 MarigoldDepthPipeline 的預測。

返回：帶有深度圖視覺化的 List[PIL.Image.Image]。

Marigold 法線估計 API

類 diffusers.MarigoldNormalsPipeline

< 源 >

( unet: UNet2DConditionModel vae: AutoencoderKL scheduler: typing.Union[diffusers.schedulers.scheduling_ddim.DDIMScheduler, diffusers.schedulers.scheduling_lcm.LCMScheduler] text_encoder: CLIPTextModel tokenizer: CLIPTokenizer prediction_type: typing.Optional[str] = None use_full_z_range: typing.Optional[bool] = True default_denoising_steps: typing.Optional[int] = None default_processing_resolution: typing.Optional[int] = None )

引數

unet (UNet2DConditionModel) — 條件 U-Net，用於去噪法線潛在，以影像潛在為條件。
vae (AutoencoderKL) — 變分自編碼器（VAE）模型，用於編碼和解碼影像及預測與潛在表示之間的轉換。
scheduler (DDIMScheduler 或 LCMScheduler) — 與 unet 結合使用的排程器，用於對編碼影像潛在進行去噪。
text_encoder (CLIPTextModel) — 文字編碼器，用於空文字嵌入。
tokenizer (CLIPTokenizer) — CLIP 分詞器。
prediction_type (str, 可選) — 模型做出的預測型別。
use_full_z_range (bool, 可選) — 模型預測的法線是否使用 Z 維的完整範圍，或者僅使用其正半部分。
default_denoising_steps (int, 可選) — 生成合理質量預測所需的最小去噪擴散步數。此值必須在模型配置中設定。當呼叫管道時未明確設定 num_inference_steps 時，將使用預設值。這對於確保與管道相容的各種模型（例如依賴非常短去噪排程（LCMScheduler）和具有完整擴散排程（DDIMScheduler）的模型）獲得合理結果是必需的。
default_processing_resolution (int, 可選) — 管道 processing_resolution 引數的推薦值。此值必須在模型配置中設定。當呼叫管道時未明確設定 processing_resolution 時，將使用預設值。這對於確保與各種以不同最佳處理解析度值訓練的模型獲得合理結果是必需的。

使用 Marigold 方法進行單目法線估計的管道：https://marigoldmonodepth.github.io。

該模型繼承自DiffusionPipeline。有關庫為所有管道實現的通用方法（例如下載或儲存、在特定裝置上執行等），請檢視超類文件。

call

< 源 >

引數

image (PIL.Image.Image, np.ndarray, torch.Tensor, List[PIL.Image.Image], List[np.ndarray]), — List[torch.Tensor]: 用作法線估計任務輸入的影像。對於陣列和張量，預期值範圍在 [0, 1] 之間。可以透過提供四維陣列或張量來傳遞一批影像。此外，可以傳遞二維或三維陣列或張量影像列表。在後一種情況下，所有列表元素必須具有相同的寬度和高度。
num_inference_steps (int, 可選, 預設為 None) — 推理期間的去噪擴散步數。預設值 None 會導致自動選擇。
ensemble_size (int, 預設為 1) — 集合預測的數量。較高的值會帶來顯著改進和視覺降級。
processing_resolution (int, 可選, 預設為 None) — 有效處理解析度。設定為 0 時，與較大的輸入影像尺寸匹配。這會產生更清晰的預測，但也可能導致全域性上下文的整體丟失。預設值 None 會解析為模型配置中的最佳值。
match_input_resolution (bool, 可選, 預設為 True) — 啟用後，輸出預測將調整大小以匹配輸入尺寸。停用後，輸出的較長邊將等於 processing_resolution。
resample_method_input (str, 可選, 預設為 "bilinear") — 用於將輸入影像調整大小為 processing_resolution 的重取樣方法。接受的值為："nearest"、"nearest-exact"、"bilinear"、"bicubic" 或 "area"。
resample_method_output (str, 可選, 預設為 "bilinear") — 用於將輸出預測調整大小以匹配輸入解析度的重取樣方法。接受的值為 "nearest"、"nearest-exact"、"bilinear"、"bicubic" 或 "area"。
batch_size (int, 可選, 預設為 1) — 批次大小；僅當設定 ensemble_size 或傳遞影像張量時才重要。
ensembling_kwargs (dict, 可選, 預設為 None) — 包含精確集合控制引數的額外字典。以下選項可用：
- reduction (str, 可選, 預設為 "closest"): 定義應用於每個畫素位置的集合函式，可以是 "closest" 或 "mean"。
latents (torch.Tensor, 可選, 預設為 None) — 潛在噪聲張量，用於替換隨機初始化。這些可以從上一個函式呼叫的輸出中獲取。
generator (torch.Generator 或 List[torch.Generator], 可選, 預設為 None) — 隨機數生成器物件，用於確保可重現性。
output_type (str, 可選, 預設為 "np") — 輸出的 prediction 和可選的 uncertainty 欄位的首選格式。接受的值為："np"（numpy 陣列）或 "pt"（torch 張量）。
output_uncertainty (bool, 可選, 預設為 False) — 啟用後，如果 ensemble_size 引數設定為大於 2 的值，則輸出的 uncertainty 欄位將包含預測不確定性圖。
output_latent (bool, 可選, 預設為 False) — 啟用後，輸出的 latent 欄位將包含與集合中的預測相對應的潛在程式碼。這些程式碼可以儲存、修改，並用於後續呼叫 latents 引數。
return_dict (bool, 可選, 預設為 True) — 是否返回 MarigoldNormalsOutput 而不是普通元組。

MarigoldNormalsOutput 或 tuple

如果 return_dict 為 True，則返回 MarigoldNormalsOutput；否則返回一個 tuple，其中第一個元素是預測，第二個元素是不確定性（或 None），第三個元素是潛在（或 None）。

呼叫管道時呼叫的函式。

示例

>>> import diffusers
>>> import torch

>>> pipe = diffusers.MarigoldNormalsPipeline.from_pretrained(
...     "prs-eth/marigold-normals-v1-1", variant="fp16", torch_dtype=torch.float16
... ).to("cuda")

>>> image = diffusers.utils.load_image("https://marigoldmonodepth.github.io/images/einstein.jpg")
>>> normals = pipe(image)

>>> vis = pipe.image_processor.visualize_normals(normals.prediction)
>>> vis[0].save("einstein_normals.png")

類 diffusers.pipelines.marigold.MarigoldNormalsOutput

< 源 >

( prediction: typing.Union[numpy.ndarray, torch.Tensor] uncertainty: typing.Union[NoneType, numpy.ndarray, torch.Tensor] latent: typing.Optional[torch.Tensor] )

引數

prediction (np.ndarray, torch.Tensor) — 預測的法線，值範圍為 [-1, 1]。對於 torch.Tensor，形狀為 $numimages × 3 × height × width$，對於 np.ndarray，形狀為 $numimages × height × width × 3$。
uncertainty (None, np.ndarray, torch.Tensor) — 從集合計算出的不確定性圖，值範圍為 [0, 1]。對於 torch.Tensor，形狀為 $numimages × 1 × height × width$，對於 np.ndarray，形狀為 $numimages × height × width × 1$。
latent (None, torch.Tensor) — 與預測相對應的潛在特徵，與管道的 latents 引數相容。形狀為 $numimages * numensemble × 4 × latentheight × latentwidth$。

Marigold 單目法線預測管道的輸出類。

diffusers.pipelines.marigold.MarigoldImageProcessor.visualize_normals

< 源 >

( normals: typing.Union[numpy.ndarray, torch.Tensor, typing.List[numpy.ndarray], typing.List[torch.Tensor]] flip_x: bool = False flip_y: bool = False flip_z: bool = False )

引數

normals (Union[np.ndarray, torch.Tensor, List[np.ndarray], List[torch.Tensor]]) — 表面法線。
flip_x (bool, 可選, 預設為 False) — 翻轉法線參考系的 X 軸。預設方向為右。
flip_y (bool, 可選, 預設為 False) — 翻轉法線參考系的 Y 軸。預設方向為上。
flip_z (bool, 可選, 預設為 False) — 翻轉法線參考系的 Z 軸。預設方向為面向觀察者。

視覺化表面法線，例如 MarigoldNormalsPipeline 的預測。

返回：包含表面法線視覺化的 List[PIL.Image.Image]。

Marigold 本徵影像分解 API

class diffusers.MarigoldIntrinsicsPipeline

< source >

( unet: UNet2DConditionModel vae: AutoencoderKL scheduler: typing.Union[diffusers.schedulers.scheduling_ddim.DDIMScheduler, diffusers.schedulers.scheduling_lcm.LCMScheduler] text_encoder: CLIPTextModel tokenizer: CLIPTokenizer prediction_type: typing.Optional[str] = None target_properties: typing.Optional[typing.Dict[str, typing.Any]] = None default_denoising_steps: typing.Optional[int] = None default_processing_resolution: typing.Optional[int] = None )

引數

unet (UNet2DConditionModel) — 條件 U-Net 用於對目標潛在值進行去噪，以影像潛在值為條件。
vae (AutoencoderKL) — 變分自動編碼器 (VAE) 模型，用於將影像和預測編碼和解碼為潛在表示，以及從潛在表示進行編碼和解碼。
scheduler (DDIMScheduler 或 LCMScheduler) — 與 unet 結合使用的排程器，用於對編碼的影像潛在值進行去噪。
text_encoder (CLIPTextModel) — 文字編碼器，用於空文字嵌入。
tokenizer (CLIPTokenizer) — CLIP 分詞器。
prediction_type (str, 可選) — 模型所做預測的型別。
target_properties (Dict[str, Any], 可選) — 預測模態的屬性，例如 target_names，一個 List[str]，用於定義預測模態的數量、順序和名稱，以及解釋預測可能需要的任何其他元資料。
default_denoising_steps (int, 可選) — 產生合理質量預測所需的最少去噪擴散步數。此值必須在模型配置中設定。當呼叫管道時未明確設定 num_inference_steps 時，將使用預設值。這對於確保與管道相容的各種模型（例如依賴於非常短的去噪排程 (LCMScheduler) 和那些具有完整擴散排程 (DDIMScheduler) 的模型）獲得合理結果是必需的。
default_processing_resolution (int, 可選) — 管道的 processing_resolution 引數的推薦值。此值必須在模型配置中設定。當呼叫管道時未明確設定 processing_resolution 時，將使用預設值。這對於確保與經過不同最佳處理解析度值訓練的各種模型獲得合理結果是必需的。

使用 Marigold 方法進行本徵影像分解 (IID) 的管道：https://marigoldcomputervision.github.io。

該模型繼承自DiffusionPipeline。有關庫為所有管道實現的通用方法（例如下載或儲存、在特定裝置上執行等），請檢視超類文件。

call

< source >

引數

image (PIL.Image.Image, np.ndarray, torch.Tensor, List[PIL.Image.Image], List[np.ndarray]), — List[torch.Tensor]：用於本徵分解任務的輸入影像。對於陣列和張量，預期值範圍在 [0, 1] 之間。透過提供四維陣列或張量，可以傳遞一批影像。此外，可以傳遞二維或三維陣列或張量列表。在後一種情況下，所有列表元素必須具有相同的寬度和高度。
num_inference_steps (int, 可選, 預設為 None) — 推理期間的去噪擴散步數。預設值 None 會導致自動選擇。
ensemble_size (int, 預設為 1) — 整合預測的數量。較高的值會導致可衡量的改進和視覺退化。
processing_resolution (int, 可選, 預設為 None) — 有效處理解析度。當設定為 0 時，與較大的輸入影像尺寸匹配。這會產生更清晰的預測，但也可能導致整體全域性上下文丟失。預設值 None 解析為模型配置中的最佳值。
match_input_resolution (bool, 可選, 預設為 True) — 啟用時，輸出預測將調整大小以匹配輸入尺寸。停用時，輸出的較長邊將等於 processing_resolution。
resample_method_input (str, 可選, 預設為 "bilinear") — 用於將輸入影像調整到 processing_resolution 的重取樣方法。接受的值為："nearest"、"nearest-exact"、"bilinear"、"bicubic" 或 "area"。
resample_method_output (str, 可選, 預設為 "bilinear") — 用於將輸出預測調整大小以匹配輸入解析度的重取樣方法。接受的值為 "nearest"、"nearest-exact"、"bilinear"、"bicubic" 或 "area"。
batch_size (int, 可選, 預設為 1) — 批大小；僅在設定 ensemble_size 或傳遞影像張量時才重要。
ensembling_kwargs (dict, 可選, 預設為 None) — 帶有精確整合控制引數的額外字典。以下選項可用：
- reduction (str, 可選, 預設為 "median")：定義每個畫素位置應用的整合函式，可以是 "median" 或 "mean"。
latents (torch.Tensor, 可選, 預設為 None) — 潛在噪聲張量，用於替換隨機初始化。這些可以取自上一個函式呼叫的輸出。
generator (torch.Generator, 或 List[torch.Generator], 可選, 預設為 None) — 隨機數生成器物件，以確保可重現性。
output_type (str, 可選, 預設為 "np") — 輸出的 prediction 和可選 uncertainty 欄位的首選格式。接受的值為："np" (numpy 陣列) 或 "pt" (torch 張量)。
output_uncertainty (bool, 可選, 預設為 False) — 啟用時，輸出的 uncertainty 欄位包含預測不確定性圖，前提是 ensemble_size 引數設定為大於 2 的值。
output_latent (bool, 可選, 預設為 False) — 啟用時，輸出的 latent 欄位包含與整合中的預測相對應的潛在程式碼。這些程式碼可以儲存、修改，並用於後續呼叫 latents 引數。
return_dict (bool, 可選, 預設為 True) — 是否返回 MarigoldIntrinsicsOutput 而不是普通元組。

MarigoldIntrinsicsOutput 或 tuple

如果 return_dict 為 True，則返回 MarigoldIntrinsicsOutput，否則返回 tuple，其中第一個元素是預測，第二個元素是不確定性（或 None），第三個元素是潛在值（或 None）。

呼叫管道時呼叫的函式。

示例

>>> import diffusers
>>> import torch

>>> pipe = diffusers.MarigoldIntrinsicsPipeline.from_pretrained(
...     "prs-eth/marigold-iid-appearance-v1-1", variant="fp16", torch_dtype=torch.float16
... ).to("cuda")

>>> image = diffusers.utils.load_image("https://marigoldmonodepth.github.io/images/einstein.jpg")
>>> intrinsics = pipe(image)

>>> vis = pipe.image_processor.visualize_intrinsics(intrinsics.prediction, pipe.target_properties)
>>> vis[0]["albedo"].save("einstein_albedo.png")
>>> vis[0]["roughness"].save("einstein_roughness.png")
>>> vis[0]["metallicity"].save("einstein_metallicity.png")

>>> import diffusers
>>> import torch

>>> pipe = diffusers.MarigoldIntrinsicsPipeline.from_pretrained(
...     "prs-eth/marigold-iid-lighting-v1-1", variant="fp16", torch_dtype=torch.float16
... ).to("cuda")

>>> image = diffusers.utils.load_image("https://marigoldmonodepth.github.io/images/einstein.jpg")
>>> intrinsics = pipe(image)

>>> vis = pipe.image_processor.visualize_intrinsics(intrinsics.prediction, pipe.target_properties)
>>> vis[0]["albedo"].save("einstein_albedo.png")
>>> vis[0]["shading"].save("einstein_shading.png")
>>> vis[0]["residual"].save("einstein_residual.png")

class diffusers.pipelines.marigold.MarigoldIntrinsicsOutput

< source >

( prediction: typing.Union[numpy.ndarray, torch.Tensor] uncertainty: typing.Union[NoneType, numpy.ndarray, torch.Tensor] latent: typing.Optional[torch.Tensor] )

引數

prediction (np.ndarray, torch.Tensor) — 預測的影像本徵值，值範圍在 [0, 1] 之間。對於 torch.Tensor，形狀為 $(numimages numtargets) imes 3 imes height imes width$，對於 np.ndarray，形狀為 $(numimages numtargets) imes height imes width imes 3$，其中 numtargets 對應於本徵影像分解的預測目標模態的數量。
uncertainty (None, np.ndarray, torch.Tensor) — 從整合計算的不確定性圖，值範圍在 [0, 1] 之間。對於 torch.Tensor，形狀為 $(numimages numtargets) imes 3 imes height imes width$，對於 np.ndarray，形狀為 $(numimages numtargets) imes height imes width imes 3$。
latent (None, torch.Tensor) — 與預測相對應的潛在特徵，與管道的 latents 引數相容。形狀為 $(numimages numensemble) imes (numtargets 4) imes latentheight imes latentwidth$。

Marigold 本徵影像分解管道的輸出類。

diffusers.pipelines.marigold.MarigoldImageProcessor.visualize_intrinsics

< source >

( prediction: typing.Union[numpy.ndarray, torch.Tensor, typing.List[numpy.ndarray], typing.List[torch.Tensor]] target_properties: typing.Dict[str, typing.Any] color_map: typing.Union[str, typing.Dict[str, str]] = 'binary' )

引數

prediction (Union[np.ndarray, torch.Tensor, List[np.ndarray], List[torch.Tensor]]) — 本徵影像分解。
target_properties (Dict[str, Any]) — 分解屬性。預期條目：target_names: List[str] 和一個字典，其鍵為 prediction_space: str、sub_target_names: List[Union[str, Null]]（必須有 3 個條目，缺失模態為空）、up_to_scale: bool，每個目標和子目標各一個。
color_map (Union[str, Dict[str, str]], 可選, 預設為 "Spectral") — 用於將單通道預測轉換為彩色表示的顏色圖。當傳入字典時，每個模態可以使用自己的顏色圖進行著色。

視覺化本徵影像分解，例如 MarigoldIntrinsicsPipeline 的預測。

返回：包含本徵影像分解視覺化的 List[Dict[str, PIL.Image.Image]]。

< > 在 GitHub 上更新

←Lumina-T2X Mochi→