Diffusers

加入 Hugging Face 社群

並獲得增強的文件體驗

在模型、資料集和 Spaces 上進行協作

透過加速推理獲得更快的示例

切換文件主題

開始使用

UNet2DModel

UNet 模型最初由 Ronneberger 等人提出，用於生物醫學影像分割，但它在 🤗 Diffusers 中也常用，因為它輸出的影像與輸入影像大小相同。它是擴散系統最重要的元件之一，因為它促進了實際的擴散過程。在 🤗 Diffusers 中有幾種 UNet 模型變體，具體取決於其維數以及它是否是條件模型。這是一個 2D UNet 模型。

論文摘要如下：

人們普遍認為，深度網路的成功訓練需要數千個帶註釋的訓練樣本。在本文中，我們提出了一種網路和訓練策略，它強烈依賴於資料增強，以更有效地利用可用的帶註釋樣本。該架構由一個收縮路徑來捕獲上下文，以及一個對稱的擴充套件路徑，以實現精確的定位。我們證明了這種網路可以從非常少的影像端到端訓練，並在 ISBI 挑戰賽中超越了先前最好的方法（滑動視窗卷積網路），用於電子顯微鏡堆疊中神經元結構的分割。使用在透射光顯微鏡影像（相差和 DIC）上訓練的相同網路，我們在 2015 年 ISBI 細胞跟蹤挑戰賽中以巨大優勢贏得了這些類別。此外，該網路速度快。在最近的 GPU 上分割 512x512 影像所需時間不到一秒。完整的實現（基於 Caffe）和訓練好的網路可在 http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net 獲得。

UNet2DModel

class diffusers.UNet2DModel

< 來源 >

( sample_size: typing.Union[int, typing.Tuple[int, int], NoneType] = None in_channels: int = 3 out_channels: int = 3 center_input_sample: bool = False time_embedding_type: str = 'positional' time_embedding_dim: typing.Optional[int] = None freq_shift: int = 0 flip_sin_to_cos: bool = True down_block_types: typing.Tuple[str, ...] = ('DownBlock2D', 'AttnDownBlock2D', 'AttnDownBlock2D', 'AttnDownBlock2D') mid_block_type: typing.Optional[str] = 'UNetMidBlock2D' up_block_types: typing.Tuple[str, ...] = ('AttnUpBlock2D', 'AttnUpBlock2D', 'AttnUpBlock2D', 'UpBlock2D') block_out_channels: typing.Tuple[int, ...] = (224, 448, 672, 896) layers_per_block: int = 2 mid_block_scale_factor: float = 1 downsample_padding: int = 1 downsample_type: str = 'conv' upsample_type: str = 'conv' dropout: float = 0.0 act_fn: str = 'silu' attention_head_dim: typing.Optional[int] = 8 norm_num_groups: int = 32 attn_norm_num_groups: typing.Optional[int] = None norm_eps: float = 1e-05 resnet_time_scale_shift: str = 'default' add_attention: bool = True class_embed_type: typing.Optional[str] = None num_class_embeds: typing.Optional[int] = None num_train_timesteps: typing.Optional[int] = None )

引數

sample_size (int 或 Tuple[int, int], 可選, 預設為 None) — 輸入/輸出樣本的高度和寬度。尺寸必須是 2 ** (len(block_out_channels) - 1) 的倍數。
in_channels (int, 可選, 預設為 3) — 輸入樣本中的通道數。
out_channels (int, 可選, 預設為 3) — 輸出中的通道數。
center_input_sample (bool, 可選, 預設為 False) — 是否對輸入樣本進行中心化。
time_embedding_type (str, 可選, 預設為 "positional") — 要使用的時間嵌入型別。
freq_shift (int, 可選, 預設為 0) — 傅立葉時間嵌入的頻率偏移。
flip_sin_to_cos (bool, 可選, 預設為 True) — 是否將傅立葉時間嵌入的 sin 翻轉為 cos。
down_block_types (Tuple[str], 可選, 預設為 ("DownBlock2D", "AttnDownBlock2D", "AttnDownBlock2D", "AttnDownBlock2D")) — 下采樣塊型別的元組。
mid_block_type (str, 可選, 預設為 "UNetMidBlock2D") — UNet 中間塊的塊型別，可以是 UNetMidBlock2D 或 None。
up_block_types (Tuple[str], 可選, 預設為 ("AttnUpBlock2D", "AttnUpBlock2D", "AttnUpBlock2D", "UpBlock2D")) — 上取樣塊型別的元組。
block_out_channels (Tuple[int], 可選, 預設為 (224, 448, 672, 896)) — 塊輸出通道的元組。
layers_per_block (int, 可選, 預設為 2) — 每個塊的層數。
mid_block_scale_factor (float, 可選, 預設為 1) — 中間塊的比例因子。
downsample_padding (int, 可選, 預設為 1) — 下采樣卷積的填充。
downsample_type (str, 可選, 預設為 conv) — 下采樣層的下采樣型別。在“conv”和“resnet”之間選擇
upsample_type (str, 可選, 預設為 conv) — 上取樣層的上取樣型別。在“conv”和“resnet”之間選擇
dropout (float, 可選, 預設為 0.0) — 要使用的 dropout 機率。
act_fn (str, 可選, 預設為 "silu") — 要使用的啟用函式。
attention_head_dim (int, 可選, 預設為 8) — 注意力頭的維度。
norm_num_groups (int, 可選, 預設為 32) — 用於歸一化的組數。
attn_norm_num_groups (int, 可選, 預設為 None) — 如果設定為整數，將在中間塊的 Attention 層中建立一個組歸一化層，其組數為給定值。如果保留為 None，則僅當 resnet_time_scale_shift 設定為 default 時才會建立組歸一化層，如果建立，則其組數為 norm_num_groups。
norm_eps (float, 可選, 預設為 1e-5) — 歸一化的 epsilon 值。
resnet_time_scale_shift (str, 可選, 預設為 "default") — ResNet 塊的時間尺度偏移配置（參見 ResnetBlock2D）。選擇 default 或 scale_shift。
class_embed_type (str, 可選, 預設為 None) — 類嵌入的型別，最終與時間嵌入求和。選擇 None、"timestep" 或 "identity"。
num_class_embeds (int, 可選, 預設為 None) — 可學習嵌入矩陣的輸入維度，當使用 class_embed_type 等於 None 執行類條件時，該矩陣將投影到 time_embed_dim。

一個 2D UNet 模型，它接收一個有噪聲的樣本和一個時間步，並返回一個樣本形狀的輸出。

此模型繼承自 ModelMixin。有關所有模型實現的通用方法（如下載或儲存），請參閱超類文件。

forward

< 來源 >

( sample: Tensor timestep: typing.Union[torch.Tensor, float, int] class_labels: typing.Optional[torch.Tensor] = None return_dict: bool = True ) → UNet2DOutput 或 tuple

引數

sample (torch.Tensor) — 形狀為 (batch, channel, height, width) 的有噪聲輸入張量。
timestep (torch.Tensor 或 float 或 int) — 去噪輸入的時間步數。
class_labels (torch.Tensor, 可選, 預設為 None) — 用於條件作用的可選類標籤。它們的嵌入將與時間步嵌入求和。
return_dict (bool, 可選, 預設為 True) — 是否返回 UNet2DOutput 而不是普通元組。

UNet2DOutput 或 tuple

如果 return_dict 為 True，則返回 UNet2DOutput，否則返回一個 tuple，其中第一個元素是樣本張量。

UNet2DModel 的前向傳播方法。

UNet2DOutput

class diffusers.models.unets.unet_2d.UNet2DOutput

< 來源 >

( sample: Tensor )

引數

sample (torch.Tensor, 形狀為 (batch_size, num_channels, height, width)) — 模型最後一層輸出的隱藏狀態。

UNet2DModel 的輸出。

< > 在 GitHub 上更新

←UNet2DConditionModel UNet3DConditionModel→