ComfyUIで「思い通りの画像を生成したい!けど思うようにいかない…」という方、TextToImage(t2i)を使いこなせていますか? Stable Diffusionの内部の仕組みを理解し、ComfyUIでのText to Imageテクニックを身につけて、思い通りの画像を生成できるようになりましょう!
face focus, dog ears, 1boy, best quality, 16K, eating hair, dog nose, looking at looking at viewer, short hair, skinny, big eyes, dark skin, tall, naked overalls, 20-year-old,
beautiful, bold black outline, pink, pastel colors, Light Pink, scowl, brown hair, golden eyes, simple line drawing, animal ears, solo, brown hair, overalls, cat ears, dark skin, short hair, dark-skinned female, simple background, choker, sweater, yellow shirt, long sleeves, pink choker, white background, closed mouth, extra ears, animal ear fluff
「face focus, dog ears, 1boy, best quality, 16K, eating hair, dog nose, looking at looking at viewer, short hair, skinny, big eyes, dark skin, tall, naked overalls, 20-year-old, beautiful, bold black outline, pink, pastel colors, Light Pink, scowl, brown hair, golden eyes, simple line drawing, animal ears, solo, brown hair, overalls, cat ears, dark skin, short hair, dark-skinned female, simple background, choker, sweater, yellow shirt, long sleeves, pink choker, white background, closed mouth, extra ears, animal ear fluff」、ネガティブプロンプトは「text, watermark」(2トークン消費)で生成してみます。
前半「face focus, dog ears, 1boy, best quality, 16K, eating hair, dog nose, looking at looking at viewer, short hair, skinny, big eyes, dark skin, tall, naked overalls, 20-year-old」 Negative 「text, watermark」, SDXL, 1344×768, seed=13
CLIP1-Conditioning_to「face focus, dog ears, 1boy, best quality, 16K, eating hair, dog nose, looking at looking at viewer, short hair, skinny, big eyes, dark skin, tall, naked overalls, 20-year-old」 CLIP2-Conditiong_from「beautiful, bold black outline, pink, pastel colors, Light Pink, scowl, brown hair, golden eyes, simple line drawing, animal ears, solo, brown hair, overalls, cat ears, dark skin, short hair, dark-skinned female, simple background, choker, sweater, yellow shirt, long sleeves, pink choker, white background, closed mouth, extra ears, animal ear fluff」 Negative 「text, watermark」, SDXL, 1344×768, seed=13
CLIP1-Conditioning_to「beautiful, bold black outline, pink, pastel colors, Light Pink, scowl, brown hair, golden eyes, simple line drawing, animal ears, solo, brown hair, overalls, cat ears, dark skin, short hair, dark-skinned female, simple background, choker, sweater, yellow shirt, long sleeves, pink choker, white background, closed mouth, extra ears, animal ear fluff」 CLIP2-Conditiong_from「face focus, dog ears, 1boy, best quality, 16K, eating hair, dog nose, looking at looking at viewer, short hair, skinny, big eyes, dark skin, tall, naked overalls, 20-year-old」 Negative 「text, watermark」, SDXL, 1344×768, seed=13
続いてキャラクター2のプロンプトを設定します。もともと頂いたプロンプトからキャラクターとして一貫性がありそうな「1boy, dog ears, dog nose, short hair, skinny, big eyes, looking at viewer, dark skin, tall, naked overalls, 20-year-old, brown hair」(1boy、犬耳、犬鼻、短髪、痩せ型、大きな目、視聴者を見ている、黒い肌、背が高い、裸のオーバーオール、20歳、茶髪)として、先程のキャラクター1の代わりに入れてみます。
Conditioning_to「1boy, dog ears, dog nose, short hair, skinny, big eyes, looking at viewer, dark skin, tall, naked overalls, 20-year-old, brown hair」 Conditioning_from「beautiful, bold black outline, simple line drawing, simple background, white background, best quality, 16K」
入れ替えてみます。
Conditioning_to「beautiful, bold black outline, simple line drawing, simple background, white background, best quality, 16K」 Conditioning_from「1boy, dog ears, dog nose, short hair, skinny, big eyes, looking at viewer, dark skin, tall, naked overalls, 20-year-old, brown hair」
CLIPを2つでConcatだけで構成する場合にはこんな感じです。 [CLIP1] best quality, beautiful, 1girl and 1boy, [CLIP2] best quality, beautiful, 1girl and 1boy, ここまでいれる、というテクニックが非常に重要です。 こんな絵も作れるようになります。
[CLIP1] best quality, beautiful, 1girl and 1boy, 1girl, female, 20-year-old, cat ears, pink choker, sweater, yellow shirt, long sleeves, white background, closed mouth, extra ears, animal ear fluff, solo, brown hair, overalls, pink, pastel colors, Light Pink, face focus,
[CLIP2] best quality, beautiful, 1girl and 1boy, blue choker, eating hair, dog ears,16K, dog nose, looking at looking at viewer, short hair, skinny, big eyes, dark skin, tall, naked overalls, bold black outline, scowl, brown hair, golden eyes, simple line drawing, animal ears, simple background, dark skin, short hair, dark-skinned
CLIP0(conditioning_to) : 1boy and 1girl looking each other, beautiful, bold black outline, simple line drawing, simple background, white background, best quality, 16K
CLIP1(Conditiong_from) : 1boy and 1girl looking each other,1boy face focus, dog ears, 1boy, best quality, 16K,, dog nose, looking at looking at viewer, short hair, skinny, big eyes, dark skin, tall, naked overalls, 20-year-old
CLIP2(Conditioning_from) : 1boy and 1girl looking each other, 1girl , pink, pastel colors, Light Pink, scowl, golden eyes, animal ears, solo, overalls, cat ears, dark skin, short hair, dark-skinned female, choker, sweater, yellow shirt, long sleeves, pink choker, closed mouth, extra ears, animal ear fluff
これまで4回にわたってComfyUIでSD1.5およびSDXLを使ってプロンプトの基本的な方法から、応用的なテクニックまでを紹介してきました。今回はイラストレーションで「複数のキャラクターを同時制御する」という実験を行いましたが、プロンプトだけでなく、CLIPの内部やConditioningを使ってComfyUIならではの絵作りを探求していくベースになったようであれば幸いです。実はconditioningにはたくさんあり、今回紹介したCombine、Concatの他にはAverage、 Set Areaなど複数存在します。例えばAverageは結合ではなく線形補間します。
ComfyUIで「思い通りの画像を生成したい!けど思うようにいかない…」という方、TextToImage(t2i)を使いこなせていますか? Stable Diffusionの内部の仕組みを理解し、ComfyUIでのText to Imageテクニックを身につけて、思い通りの画像を生成できるようになりましょう!
extract this stable diffusion prompt longer to 75 tokens. “A beautiful garden with red roses and white lilies blooming under a clear blue sky” make more detail for each words.
A stunning, vibrant garden filled with deep red roses, their petals velvety and rich, alongside pristine white lilies, their petals delicate and gracefully open, all flourishing under a vast, clear blue sky with soft clouds gently drifting in the distance.
“A breathtaking, lush garden teeming with radiant, deep crimson roses, each petal exquisitely velvety and glowing under the sunlight. The roses stand tall and proud, surrounded by elegant, snow-white lilies with intricate, soft petals that curve gracefully outward. The lilies’ golden stamens catch the light, contrasting with their pure white petals. The garden is bathed in warm sunlight, the vibrant greenery of the foliage complementing the vivid flowers. Above, the expansive sky is a brilliant, azure blue, with delicate wisps of clouds painted across the horizon, creating a serene and peaceful atmosphere, where every bloom flourishes in perfect harmony with nature.”
SD1Tokenizer Token Details Token RangeMinimum token value: 0 (pad token when `pad_with_end` is False) Maximum token value: Size of the vocabulary – 1 Special TokensStart token: Typically 49406 End token: Typically 49407 Maximum Length`max_length`: 77 (default, can be set in the constructor) `max_tokens_per_section`: 75 (max_length – 2) Note: The actual vocabulary size and special token values may vary depending on the specific CLIP tokenizer used.
def token_weights(string, current_weight):
a = parse_parentheses(string)
out = []
for x in a:
weight = current_weight
if len(x) >= 2 and x[-1] == ')' and x[0] == '(':
x = x[1:-1]
xx = x.rfind(":")
weight *= 1.1
if xx > 0:
try:
weight = float(x[xx+1:])
x = x[:xx]
except:
pass
out += token_weights(x, weight)
else:
out += [(x, current_weight)]
return out
cute anime girl with massive fluffy fennec ears and a big fluffy tail blonde messy long hair blue eyes wearing a maid outfit with a long black gold leaf pattern dress and a white apron mouth open holding a fancy black forest cake with candles on top in the kitchen of an old dark Victorian mansion lit by candlelight with a bright window to the foggy forest and very expensive stuff everywhere 黒い金箔模様のロングドレスと白いエプロンのメイド服を着て、口を開けたまま、ろうそくの明かりに照らされた古い暗いビクトリア様式の邸宅の厨房で、ろうそくの上にろうそくを立てた黒い森のケーキを持っているかわいいアニメの女の子
A cinematic image capturing a Japanese woman with long black hair, performing a dramatic dive from a helicopter into the vast open sky. The background features a breathtaking view of the sky filled with soaring birds, accentuating a sense of freedom and exhilaration. The woman's expression is focused and fearless, her hair flowing dramatically behind her as she dives. The helicopter is visible in the upper part of the frame, adding a touch of adventure and scale to the scene. The lighting is dynamic, highlighting the action and the expansive atmosphere.
ダッシュする髭マッチョ男性
A cinematic image depicting a rugged Japanese man with a beard, sprinting through the bustling streets of Shibuya, Tokyo. He is portrayed as muscular and intense, with his strong physique evident even through his clothing. The scene captures him mid-dash, with the iconic Shibuya crossing in the background blurred by the motion. Neon lights and the vibrant city life add to the dynamic and energetic atmosphere of the image. The lighting is urban and dramatic, emphasizing the man's determined expression and the fast-paced action of the scene.
ドラゴンと勇者
A cinematic fantasy image inspired by RPG themes, featuring a heroic scene with a dragon, a warrior, a wizard, a martial artist, and a cleric. Set in a mystical landscape, the dragon looms large in the background, spewing fire into the sky. The warrior, clad in armor, stands boldly in the foreground with a sword raised. Beside him, a wizard prepares a spell, glowing with magical energy. The martial artist, in dynamic pose, is ready to strike, and the cleric, with a staff in hand, invokes a protective spell. The scene is bathed in the ethereal light of magic and fire, creating a dramatic and epic atmosphere.
ゾンビと逃げるカップル
A cinematic image depicting a male and female couple frantically running from a massive horde of zombies. The scene is set in a chaotic urban environment with the army in the background, engaged in a fierce battle to contain the zombie outbreak. The couple appears desperate and terrified, dodging between abandoned cars and debris. Soldiers can be seen in the periphery, firing at the advancing zombies, providing a grim backdrop. The atmosphere is tense and suspenseful, with dark, ominous lighting amplifying the sense of impending danger.
足の生成テスト
A cinematic image of a Japanese woman casually displaying the soles of her feet, seated on a park bench. The scene captures her in a relaxed pose, perhaps during a leisurely afternoon in a tranquil urban park. The focus is on her bare feet, crossed elegantly as she enjoys a book or the peaceful surroundings. The background is softly blurred, emphasizing her and the detail of her feet. The lighting is warm and natural, highlighting the simplicity and quiet mood of the moment.
軍隊の上陸作戦
A cinematic image depicting a military landing at a beachfront during a defensive operation. The scene captures the intensity of the moment with troops disembarking from landing craft under the cover of smoke and gunfire. The ocean is rough, reflecting the turmoil of battle, with waves crashing against the shore. Soldiers in full gear advance onto the beach, facing resistance from defensive positions in the distance. The sky is overcast, adding a dramatic and somber tone to the scene, emphasizing the gravity of the military engagement.
複数の女性がプールで遊ぶ
A cinematic image featuring multiple Japanese women in swimsuits, enjoying a playful moment in a pool, surrounded by splashing water that creates a fantastical atmosphere. The scene captures them laughing and splashing water at each other, with the sun casting a shimmering glow on the droplets, creating a sparkling effect. The background shows a beautifully designed pool area that enhances the dreamlike quality of the image. The overall mood is joyful and ethereal, with soft, diffused lighting that adds a magical touch to the setting.
イラスト: 魔法使いが爆発魔法を唱える
A cinematic image inspired by anime, depicting a dramatic scene of magical alchemy leading to an explosion. The setting is a dark, mystic chamber filled with ancient symbols and glowing artifacts. In the center, a character performs a complex magical ritual, hands raised as they channel energy into a vibrant, swirling mass of light that culminates in a sudden, intense explosion. The explosion sends colorful magical energies radiating outward, casting vivid shadows and illuminating the room with a spectrum of light. The atmosphere is tense and charged with the power of unleashed magic.
イラスト: 異世界転生したプログラマ
A cinematic image blending realistic and anime styles, featuring a programmer who has been reincarnated into a fantastical other world. The scene shows the programmer sitting at a magical, glowing workstation filled with ancient scrolls and futuristic screens, coding to manipulate the laws of this new world. Around him, elements of a traditional fantasy setting—enchanted forests, distant castles, and mythical creatures—merge with digital effects to symbolize his unique role in this realm. The lighting is dynamic, highlighting the contrast between the old world's mystique and the new digital influence he brings.
商用利用可能ではありませんので、ホビーストが Text to Imageでの一発出しを楽しむレベルつまり、Midjourneyとしては十分な品質を持っているという印象があります。しかし商用ライセンスとしてはまだウェイティングリスト状態ですし、出力される画像はMidJourneyにNijiJourneyが混ざったような総花的な感覚を受けます。
上記の通り、BFLははComfyUIコミュニティとの協働を行っているようです。 ネガティブプロンプトやImage to Image、ControlNetはまだありませんが、KolorsでのComfyUIが公式からリリースされたように(※2024/8/6追記)、今後、オープンソースコミュニティと協働によって開発が進むのかもしれません。今後新しいワークフローがどこからリリースされるのか注目です。
Made from the most advanced models, including Stable Diffusion 3, Ultra offers the best of the Stable Diffusion ecosystem. Stable Diffusion 3 を含む最も高度なモデルから作成された Ultraは、Stable Diffusionエコシステムの最高峰です。
本日「FLUX.1」 Text to Imageモデル・スイートをリリースしました。強力なクリエイティブ機能を備えたこれらのモデルは、競争力のあるジェネレーティブ Text-to-Videoシステムの強力な基盤となります。私たちのビデオモデルは、高精細でかつてないスピードでの正確な作成と編集を可能にします。私たちは、ジェネレーティブ・メディアの未来を開拓し続けることを約束します。