Diffusers 文件
提示詞技巧
並獲得增強的文件體驗
開始使用
提示詞技巧
提示詞很重要,因為它們描述了您希望擴散模型生成的內容。最好的提示詞是詳細、具體且結構良好的,以幫助模型實現您的願景。但是,製作一個好的提示詞需要時間和精力,有時這可能還不夠,因為語言和詞語可能不精確。這時,您需要透過提示詞增強和提示詞加權等其他技術來提升您的提示詞,以獲得您想要的結果。
本指南將向您展示如何使用這些提示詞技巧,以更少的精力生成高質量影像,並調整提示詞中某些關鍵詞的權重。
提示詞工程
這不是一份詳盡的提示詞工程指南,但它將幫助您理解一個好的提示詞的必要部分。我們鼓勵您繼續嘗試不同的提示詞,並以新的方式組合它們,看看哪種效果最好。隨著您編寫更多的提示詞,您將培養出對什麼有效和什麼無效的直覺!
新的擴散模型在從基本提示詞生成高質量影像方面做得很好,但建立一個編寫良好的提示詞仍然很重要,以獲得最佳結果。以下是一些編寫良好提示詞的技巧:
- 影像的_媒介_是什麼?是照片、繪畫、3D 插圖還是其他?
- 影像的_主體_是什麼?是人、動物、物體還是場景?
- 您希望在影像中看到哪些_細節_?在這裡,您可以發揮創造力,盡情嘗試不同的詞語來讓您的影像栩栩如生。例如,光線如何?氛圍和美學是怎樣的?您正在尋找哪種藝術或插圖風格?您使用的詞語越具體和精確,模型就越能理解您想要生成的內容。


用 GPT2 增強提示詞
提示詞增強是一種快速提高提示詞質量而無需花費過多精力構建提示詞的技術。它使用像 GPT2 這樣在 Stable Diffusion 文字提示詞上預訓練的模型,自動用額外的關鍵關鍵詞豐富提示詞,以生成高質量影像。
該技術透過策劃一個特定關鍵詞列表並強制模型生成這些詞語來增強原始提示詞。這樣,您的提示詞可以是“一隻貓”,而 GPT2 可以將提示詞增強為“土耳其屋頂上曬太陽的貓的電影劇照,高度細節,高預算好萊塢電影,寬銀幕,情緒化,史詩,華麗,電影顆粒質量清晰聚焦美麗細節複雜驚豔史詩”。
您還應該使用偏移噪聲 LoRA 來改善明亮和黑暗影像的對比度,並整體建立更好的光照。這個LoRA 可從 stabilityai/stable-diffusion-xl-base-1.0 獲取。
首先定義某些樣式和詞語列表(您可以檢視 Fooocus 使用的更全面的詞語列表和樣式)來增強提示詞。
import torch
from transformers import GenerationConfig, GPT2LMHeadModel, GPT2Tokenizer, LogitsProcessor, LogitsProcessorList
from diffusers import StableDiffusionXLPipeline
styles = {
"cinematic": "cinematic film still of {prompt}, highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain",
"anime": "anime artwork of {prompt}, anime style, key visual, vibrant, studio anime, highly detailed",
"photographic": "cinematic photo of {prompt}, 35mm photograph, film, professional, 4k, highly detailed",
"comic": "comic of {prompt}, graphic illustration, comic art, graphic novel art, vibrant, highly detailed",
"lineart": "line art drawing {prompt}, professional, sleek, modern, minimalist, graphic, line art, vector graphics",
"pixelart": " pixel-art {prompt}, low-res, blocky, pixel art style, 8-bit graphics",
}
words = [
"aesthetic", "astonishing", "beautiful", "breathtaking", "composition", "contrasted", "epic", "moody", "enhanced",
"exceptional", "fascinating", "flawless", "glamorous", "glorious", "illumination", "impressive", "improved",
"inspirational", "magnificent", "majestic", "hyperrealistic", "smooth", "sharp", "focus", "stunning", "detailed",
"intricate", "dramatic", "high", "quality", "perfect", "light", "ultra", "highly", "radiant", "satisfying",
"soothing", "sophisticated", "stylish", "sublime", "terrific", "touching", "timeless", "wonderful", "unbelievable",
"elegant", "awesome", "amazing", "dynamic", "trendy",
]
您可能已經注意到,在`words`列表中,有些詞語可以組合在一起以建立更有意義的內容。例如,“high”和“quality”可以組合成“high quality”。讓我們將這些詞語組合起來,並刪除無法組合的詞語。
word_pairs = ["highly detailed", "high quality", "enhanced quality", "perfect composition", "dynamic light"]
def find_and_order_pairs(s, pairs):
words = s.split()
found_pairs = []
for pair in pairs:
pair_words = pair.split()
if pair_words[0] in words and pair_words[1] in words:
found_pairs.append(pair)
words.remove(pair_words[0])
words.remove(pair_words[1])
for word in words[:]:
for pair in pairs:
if word in pair.split():
words.remove(word)
break
ordered_pairs = ", ".join(found_pairs)
remaining_s = ", ".join(words)
return ordered_pairs, remaining_s
接下來,實現一個自定義的 LogitsProcessor 類,該類將 `words` 列表中的 token 賦值為 0,並將不在 `words` 列表中的 token 賦值為負值,以便它們在生成期間不會被選中。這樣,生成將偏向於 `words` 列表中的詞語。當列表中的一個詞語被使用後,它也會被賦值為負值,這樣它就不會再次被選中。
class CustomLogitsProcessor(LogitsProcessor):
def __init__(self, bias):
super().__init__()
self.bias = bias
def __call__(self, input_ids, scores):
if len(input_ids.shape) == 2:
last_token_id = input_ids[0, -1]
self.bias[last_token_id] = -1e10
return scores + self.bias
word_ids = [tokenizer.encode(word, add_prefix_space=True)[0] for word in words]
bias = torch.full((tokenizer.vocab_size,), -float("Inf")).to("cuda")
bias[word_ids] = 0
processor = CustomLogitsProcessor(bias)
processor_list = LogitsProcessorList([processor])
將提示詞與之前在 `styles` 字典中定義的 `cinematic` 風格提示詞結合起來。
prompt = "a cat basking in the sun on a roof in Turkey"
style = "cinematic"
prompt = styles[style].format(prompt=prompt)
prompt
"cinematic film still of a cat basking in the sun on a roof in Turkey, highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain"
從 Gustavosta/MagicPrompt-Stable-Diffusion 檢查點(此特定檢查點經過訓練用於生成提示詞)載入 GPT2 分詞器和模型以增強提示詞。
tokenizer = GPT2Tokenizer.from_pretrained("Gustavosta/MagicPrompt-Stable-Diffusion")
model = GPT2LMHeadModel.from_pretrained("Gustavosta/MagicPrompt-Stable-Diffusion", torch_dtype=torch.float16).to(
"cuda"
)
model.eval()
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
token_count = inputs["input_ids"].shape[1]
max_new_tokens = 50 - token_count
generation_config = GenerationConfig(
penalty_alpha=0.7,
top_k=50,
eos_token_id=model.config.eos_token_id,
pad_token_id=model.config.eos_token_id,
pad_token=model.config.pad_token_id,
do_sample=True,
)
with torch.no_grad():
generated_ids = model.generate(
input_ids=inputs["input_ids"],
attention_mask=inputs["attention_mask"],
max_new_tokens=max_new_tokens,
generation_config=generation_config,
logits_processor=proccesor_list,
)
然後您可以將輸入提示詞和生成的提示詞結合起來。請隨意檢視生成的提示詞(`generated_part`)、找到的詞對(`pairs`)和剩餘的詞語(`words`)。所有這些都打包在 `enhanced_prompt` 中。
output_tokens = [tokenizer.decode(generated_id, skip_special_tokens=True) for generated_id in generated_ids]
input_part, generated_part = output_tokens[0][: len(prompt)], output_tokens[0][len(prompt) :]
pairs, words = find_and_order_pairs(generated_part, word_pairs)
formatted_generated_part = pairs + ", " + words
enhanced_prompt = input_part + ", " + formatted_generated_part
enhanced_prompt
["cinematic film still of a cat basking in the sun on a roof in Turkey, highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain quality sharp focus beautiful detailed intricate stunning amazing epic"]
最後,載入一個管道和帶_低權重_的偏移噪聲 LoRA,用增強的提示詞生成影像。
pipeline = StableDiffusionXLPipeline.from_pretrained(
"RunDiffusion/Juggernaut-XL-v9", torch_dtype=torch.float16, variant="fp16"
).to("cuda")
pipeline.load_lora_weights(
"stabilityai/stable-diffusion-xl-base-1.0",
weight_name="sd_xl_offset_example-lora_1.0.safetensors",
adapter_name="offset",
)
pipeline.set_adapters(["offset"], adapter_weights=[0.2])
image = pipeline(
enhanced_prompt,
width=1152,
height=896,
guidance_scale=7.5,
num_inference_steps=25,
).images[0]
image


提示詞加權
提示詞加權提供了一種強調或減弱提示詞某些部分的方法,從而更好地控制生成的影像。一個提示詞可以包含多個概念,這些概念會轉化為上下文文字嵌入。模型使用這些嵌入來調節其交叉注意力層以生成影像(閱讀 Stable Diffusion 部落格文章以瞭解其工作原理)。
提示詞加權透過增加或減少對應於提示詞中概念的文字嵌入向量的比例來實現,因為您可能不一定希望模型平等地關注所有概念。準備提示詞嵌入最簡單的方法是使用 Stable Diffusion 長提示詞加權嵌入(sd_embed)。一旦您有了加權的提示詞嵌入,您可以將它們傳遞給任何具有 prompt_embeds(以及可選的 negative_prompt_embeds)引數的管道,例如 StableDiffusionPipeline、StableDiffusionControlNetPipeline 和 StableDiffusionXLPipeline。
如果您喜歡的管道沒有 `prompt_embeds` 引數,請開啟一個問題,以便我們可以新增它!
本指南將向您展示如何使用 sd_embed 加權您的提示詞。
開始之前,請確保您已安裝最新版本的 sd_embed
pip install git+https://github.com/xhinker/sd_embed.git@main
對於本示例,我們使用 StableDiffusionXLPipeline。
from diffusers import StableDiffusionXLPipeline, UniPCMultistepScheduler
import torch
pipe = StableDiffusionXLPipeline.from_pretrained("Lykon/dreamshaper-xl-1-0", torch_dtype=torch.float16)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
pipe.to("cuda")
要提升或降低某個概念的權重,請將文字用括號括起來。括號越多,文字的權重就越大。您還可以向文字附加一個數字乘數,以表示您希望增加或減少其權重的程度。
格式 | 乘數 |
---|---|
(hippo) | 增加1.1倍 |
((hippo)) | 增加1.21倍 |
(hippo:1.5) | 增加1.5倍 |
(hippo:0.5) | 減少4倍 |
建立提示詞,並結合使用括號和數字乘數來提升各種文字的權重。
from sd_embed.embedding_funcs import get_weighted_text_embeddings_sdxl
prompt = """A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus.
This imaginative creature features the distinctive, bulky body of a hippo,
but with a texture and appearance resembling a golden-brown, crispy waffle.
The creature might have elements like waffle squares across its skin and a syrup-like sheen.
It's set in a surreal environment that playfully combines a natural water habitat of a hippo with elements of a breakfast table setting,
possibly including oversized utensils or plates in the background.
The image should evoke a sense of playful absurdity and culinary fantasy.
"""
neg_prompt = """\
skin spots,acnes,skin blemishes,age spot,(ugly:1.2),(duplicate:1.2),(morbid:1.21),(mutilated:1.2),\
(tranny:1.2),mutated hands,(poorly drawn hands:1.5),blurry,(bad anatomy:1.2),(bad proportions:1.3),\
extra limbs,(disfigured:1.2),(missing arms:1.2),(extra legs:1.2),(fused fingers:1.5),\
(too many fingers:1.5),(unclear eyes:1.2),lowers,bad hands,missing fingers,extra digit,\
bad hands,missing fingers,(extra arms and legs),(worst quality:2),(low quality:2),\
(normal quality:2),lowres,((monochrome)),((grayscale))
"""
使用 `get_weighted_text_embeddings_sdxl` 函式生成提示詞嵌入和負提示詞嵌入。由於您使用的是 SDXL 模型,它還將生成 pooled 和 negative pooled 提示詞嵌入。
您可以安全地忽略下面的錯誤訊息,即 token 索引長度超出模型的最大序列長度。您的所有 token 都將用於嵌入過程。
Token indices sequence length is longer than the specified maximum sequence length for this model
(
prompt_embeds,
prompt_neg_embeds,
pooled_prompt_embeds,
negative_pooled_prompt_embeds
) = get_weighted_text_embeddings_sdxl(
pipe,
prompt=prompt,
neg_prompt=neg_prompt
)
image = pipe(
prompt_embeds=prompt_embeds,
negative_prompt_embeds=prompt_neg_embeds,
pooled_prompt_embeds=pooled_prompt_embeds,
negative_pooled_prompt_embeds=negative_pooled_prompt_embeds,
num_inference_steps=30,
height=1024,
width=1024 + 512,
guidance_scale=4.0,
generator=torch.Generator("cuda").manual_seed(2)
).images[0]
image

有關 FLUX.1、Stable Cascade 和 Stable Diffusion 1.5 的長提示詞加權的其他詳細資訊,請參閱 sd_embed 倉庫。
文字反演
文字反演是一種從一些影像中學習特定概念的技術,您可以使用它來生成以該概念為條件的新影像。
建立一個管道並使用 load_textual_inversion() 函式載入文字反演嵌入(您可以隨意瀏覽 Stable Diffusion Conceptualizer 以獲取 100 多個訓練好的概念)
import torch
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
torch_dtype=torch.float16,
).to("cuda")
pipe.load_textual_inversion("sd-concepts-library/midjourney-style")
在提示詞中新增 `<midjourney-style>` 文字以觸發文字反演。
from sd_embed.embedding_funcs import get_weighted_text_embeddings_sd15
prompt = """<midjourney-style> A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus.
This imaginative creature features the distinctive, bulky body of a hippo,
but with a texture and appearance resembling a golden-brown, crispy waffle.
The creature might have elements like waffle squares across its skin and a syrup-like sheen.
It's set in a surreal environment that playfully combines a natural water habitat of a hippo with elements of a breakfast table setting,
possibly including oversized utensils or plates in the background.
The image should evoke a sense of playful absurdity and culinary fantasy.
"""
neg_prompt = """\
skin spots,acnes,skin blemishes,age spot,(ugly:1.2),(duplicate:1.2),(morbid:1.21),(mutilated:1.2),\
(tranny:1.2),mutated hands,(poorly drawn hands:1.5),blurry,(bad anatomy:1.2),(bad proportions:1.3),\
extra limbs,(disfigured:1.2),(missing arms:1.2),(extra legs:1.2),(fused fingers:1.5),\
(too many fingers:1.5),(unclear eyes:1.2),lowers,bad hands,missing fingers,extra digit,\
bad hands,missing fingers,(extra arms and legs),(worst quality:2),(low quality:2),\
(normal quality:2),lowres,((monochrome)),((grayscale))
"""
使用 `get_weighted_text_embeddings_sd15` 函式生成提示詞嵌入和負提示詞嵌入。
(
prompt_embeds,
prompt_neg_embeds,
) = get_weighted_text_embeddings_sd15(
pipe,
prompt=prompt,
neg_prompt=neg_prompt
)
image = pipe(
prompt_embeds=prompt_embeds,
negative_prompt_embeds=prompt_neg_embeds,
height=768,
width=896,
guidance_scale=4.0,
generator=torch.Generator("cuda").manual_seed(2)
).images[0]
image

DreamBooth
DreamBooth 是一種根據少量影像對主題進行訓練,然後生成該主題的上下文影像的技術。它與文字反演類似,但 DreamBooth 訓練的是完整模型,而文字反演僅對文字嵌入進行微調。這意味著您應該使用 from_pretrained() 來載入 DreamBooth 模型(您可以隨意瀏覽 Stable Diffusion DreamBooth 概念庫以獲取 100 多個訓練好的模型)
import torch
from diffusers import DiffusionPipeline, UniPCMultistepScheduler
pipe = DiffusionPipeline.from_pretrained("sd-dreambooth-library/dndcoverart-v1", torch_dtype=torch.float16).to("cuda")
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
根據您使用的模型,您需要在提示詞中包含模型的唯一識別符號。例如,`dndcoverart-v1` 模型使用識別符號 `dndcoverart`
from sd_embed.embedding_funcs import get_weighted_text_embeddings_sd15
prompt = """dndcoverart of A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus.
This imaginative creature features the distinctive, bulky body of a hippo,
but with a texture and appearance resembling a golden-brown, crispy waffle.
The creature might have elements like waffle squares across its skin and a syrup-like sheen.
It's set in a surreal environment that playfully combines a natural water habitat of a hippo with elements of a breakfast table setting,
possibly including oversized utensils or plates in the background.
The image should evoke a sense of playful absurdity and culinary fantasy.
"""
neg_prompt = """\
skin spots,acnes,skin blemishes,age spot,(ugly:1.2),(duplicate:1.2),(morbid:1.21),(mutilated:1.2),\
(tranny:1.2),mutated hands,(poorly drawn hands:1.5),blurry,(bad anatomy:1.2),(bad proportions:1.3),\
extra limbs,(disfigured:1.2),(missing arms:1.2),(extra legs:1.2),(fused fingers:1.5),\
(too many fingers:1.5),(unclear eyes:1.2),lowers,bad hands,missing fingers,extra digit,\
bad hands,missing fingers,(extra arms and legs),(worst quality:2),(low quality:2),\
(normal quality:2),lowres,((monochrome)),((grayscale))
"""
(
prompt_embeds
, prompt_neg_embeds
) = get_weighted_text_embeddings_sd15(
pipe
, prompt = prompt
, neg_prompt = neg_prompt
)
