Verifying 'Flux.1' with ComfyUI: Installation and Results

6 min read
Modified
Progress 14 / 15
Table of Contents

It seems that an open-source image generation model called Flux.1 has been released recently, so I will try it with ComfyUI.

About Flux.1

Overview of Flux.1

It is a new image generation model developed by BlackForestLab. The source code is published below, but no paper seems to have been released yet (as of 2024/08/17).

GitHub - black-forest-labs/flux: Official inference repo for FLUX.1 models

Official inference repo for FLUX.1 models. Contribute to black-forest-labs/flux development by creating an account on GitHub.

github.com

There are three types of pre-trained models:

  • pro: The highest performance model. Weights are not public.
  • dev: Distilled from the Pro model using Guidance Distillation. Weights are public.
  • schnell: Distilled using Guidance Distillation and Step Distillation. A Dev model capable of generating with fewer steps. Weights are public.

Note: Guidance Distillation is training to mimic the output of a high-performance model. In this case, the output of Pro acts as the ground truth, and the model is trained to approximate it.

Note: Step Distillation involves learning while gradually increasing the complexity of the training data. Since there is no paper explaining the specific steps taken to learn, the concrete parts are unknown.

Flux.1 Architecture

Describing it based on Stable Diffusion, it changes the TextEncoder of StableDiffusion to T5+CLIP (same as SD3) and changes the reverse diffusion part to a unique DiT (Diffusion Transformer).

Someone on reddit created a diagram from the source code, so I’ll link it. Since there is no paper, the specific philosophy is unknown, but unlike structure like SD3 where simple blocks of one type are stacked, it seems to have a structure where multiple types of blocks are combined and stacked.

Reddit - Please wait for verification
www.reddit.com

Steps to Try in ComfyUI

I will try the ComfyUI official example below.

Flux Examples

Examples of ComfyUI workflows

comfyanonymous.github.io

1. Install ComfyUI

Install ComfyUI. If you have already installed it, please update ComfyUI to the latest version.

【Stable Diffusion】ComfyUIを使って画像生成AIで遊んでみよう【導入編】

>-

blog.otama-playground.com

2. Import Workflow

Drag and drop the image on the page below into the ComfyUI web application. The workflow will be loaded.

Flux Examples

Examples of ComfyUI workflows

comfyanonymous.github.io

There are 4 images, which are workflows for:

  • Flux Dev(fp16)
  • Flux Flux Schnell(fp16)
  • Flux Dev(fp8)
  • Flux Schnell(fp8)

fp8 is processed in 8bit space and fp16 in 16bit space, so fp8 has lower computational cost and smaller model size. Exceptional households might handle fp16 (model size is an astonishing 24GB), but for those without such high specs, let’s go with fp8 (this one is also 16GB).

Flux Dev(fp16) Workflow
Flux Dev(fp16) Workflow
Flux Dev(fp8) Workflow
Flux Dev(fp8) Workflow

3. Download Models

Download the necessary models and place each in the ComfyUI directory.

4. Generate

Select the model and generate.

Generation Results

Generation Result 1 – Flux Schnell(fp8)

Prompt:

Two anime-style schoolgirls standing together in a school setting. The first girl has her hair tied in a high ponytail with a red ribbon and is wearing a sailor uniform. She is holding a piece of paper with the text ‘Flux.1’ written on it. The second girl has her hair in a side tail and is dressed in a sweater. She is holding a signboard with the text ‘Future of Diffusion Model’. The background features a typical Japanese school with classrooms and hallways, filled with natural light. The atmosphere is lively and full of school spirit.

Schnell seems to have sufficient generation capability.

Flux Schnell(fp8)
Flux Schnell(fp8)

Generation Result 2 – Flux Dev(fp8)

Prompt (same as above):

Two anime-style schoolgirls standing together in a school setting. The first girl has her hair tied in a high ponytail with a red ribbon and is wearing a sailor uniform. She is holding a piece of paper with the text ‘Flux.1’ written on it. The second girl has her hair in a side tail and is dressed in a sweater. She is holding a signboard with the text ‘Future of Diffusion Model’. The background features a typical Japanese school with classrooms and hallways, filled with natural light. The atmosphere is lively and full of school spirit.

Dev takes longer due to more steps, but it generates text perfectly.

Flux Dev(fp8)
Flux Dev(fp8)

Generation Result 3 – Flux Dev(fp8)

Prompt:

Two photorealistic schoolgirls standing together in a Japanese school setting. The first girl has her hair tied in a high ponytail with a red ribbon, wearing a sailor uniform. She is holding a piece of paper with the text ‘Photorealistic’ written on it. The second girl has her hair styled in a side tail and is dressed in a cozy sweater. She is holding a signboard with the text ‘Future of Diffusion Model’. The background showcases a modern Japanese school with detailed classrooms, polished floors, and bright natural light streaming through large windows. The scene captures a realistic and vibrant school atmosphere.

This is a failure case. Wait, what is written there…

Flux Dev(fp8)
Flux Dev(fp8)

Generation Result 4 – Flux Dev(fp8)

Prompt:

A grand, ethereal landscape bathed in the golden light of a setting sun. Towering, ancient mountains draped in snow loom in the background, their peaks touching the clouds. In the foreground, a crystal-clear river winds through a lush, verdant valley, reflecting the vibrant colors of the sky. Majestic trees with twisted, ancient trunks and branches full of colorful autumn leaves line the riverbank. A solitary figure stands on a large rock in the river, wearing flowing, ornate robes that ripple in the gentle breeze. The figure holds a staff crowned with a glowing gemstone, casting a soft light. The sky above is filled with swirling clouds and distant stars beginning to twinkle as dusk approaches. The overall atmosphere is serene, yet awe-inspiring, with intricate details in every element, from the texture of the rocks and trees to the shimmering reflections in the water.

Flux Dev(fp8)
Flux Dev(fp8)

Generation Result 5 – Flux Dev(fp8)

Prompt:

A highly detailed and realistic portrait of a stunningly beautiful woman with long, flowing hair, soft and flawless skin, and captivating eyes. She is looking slightly to the side, with a gentle smile on her lips. The background is soft and blurred, focusing all attention on her. A speech bubble is placed near her mouth, with the text “spare me a 4090 plz” in a playful and stylish font.

Flux Dev(fp8)
Flux Dev(fp8)

Generation Result 6 – Flux Dev(fp8)

Prompt:

A sleek and modern logo with the text “Flux.1” prominently displayed in bold, futuristic font. The text “Flux.1” should be the main focus, large and centered. In the corner or along the edge, include the text “4090 exclusive” in a smaller, subtle font. The overall style should be minimalistic, with a high-tech, cutting-edge design.

Can also make logos that look like it.

Flux Dev(fp8)
Flux Dev(fp8)

Conclusion

The fp16 model size of 24GB really feels like it’s saying “use a 4090”…

I was able to try the fp8 (16GB) on my environment (VRAM 8GB + RAM 16GB), so if you have similar specs, don’t be discouraged by the required specs for fp16 and try the fp8 model.