Stable Diffusionの高速化：Turbo、Lightning、LCM、HyperのLoRA使用ガイド

Stable Diffusionの推論を高速化する手法として、Turbo、Lightning, LCM, Hyperの4種類があると思います。

これらのモデルをLoRA化したものを使うことが多いと思うのですが、どれを使うか迷ったので使い方や生成結果を整理してみます。

Turbo

Turboモデルの概要

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

We introduce Adversarial Diffusion Distillation (ADD), a novel training approach that efficiently samples large-scale foundational image diffusion models in just 1-4 steps while maintaining high image quality. We use score distillation to leverage large-scale off-the-shelf image diffusion models as a teacher signal in combination with an adversarial loss to ensure high image fidelity even in the low-step regime of one or two sampling steps. Our analyses show that our model clearly outperforms existing few-step methods (GANs, Latent Consistency Models) in a single step and reaches the performance of state-of-the-art diffusion models (SDXL) in only four steps. ADD is the first method to unlock single-step, real-time image synthesis with foundation models. Code and weights available under https://github.com/Stability-AI/generative-models and https://huggingface.co/stabilityai/ .

arxiv.org

2023年11月にstability.aiから発表された論文
拡散モデルの生成精度とGANの高速生成を組み合わせたいという思想
Adversarial Diffusion Distillationという蒸留方法を用いることで、denoiseの効率を上げている
具体的には、学習済拡散モデルを教師モデルとし、GANの仕組みを利用して生徒拡散モデルにスコア蒸留している
このとき生徒モデルの重みも学習済みモデルの重みで初期化する。

Turbo-LoRAの使い方

CFG: 1 ~ 2.5
Steps: 4 ~
Sampler: lcm

例えば以下からダウンロードできます。

SDXL Turbo-LoRA-Stable Diffusion XL faster than light - v1-64dim | Stable Diffusion XL LoRA | Civitai

v1: 128 dim v1-64dim: half dimension, similar quality v1-16dim: 94MB, s imilar quality(but different results in some prompts... I suggest the 128 d...

civitai.com

Turbo-LoRAの生成結果(SDXL)

インストラクション通り

CFG: 1.5
Steps: 4
sampler: lcm
scheduler: sgm_uniform
model: Animagine XL
lora-strength: 1
prompt: 1 girl, running, smiling, mouth open, semi side view

明らかに異常が多い

パラメータ調整後

CFG: 1.5
Steps: 6
sampler: euler
scheduler: sgm_uniform
model: Animagine XL
lora-strength: 0.6
prompt: 1 girl, running, smiling, mouth open, semi side view

調整しても異常が出やすいのは変わらず

Lightning

Lightningモデルの概要

ByteDance/SDXL-Lightning · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

SDXL-Lightning: Progressive Adversarial Diffusion Distillation

We propose a diffusion distillation method that achieves new state-of-the-art in one-step/few-step 1024px text-to-image generation based on SDXL. Our method combines progressive and adversarial distillation to achieve a balance between quality and mode coverage. In this paper, we discuss the theoretical analysis, discriminator design, model formulation, and training techniques. We open-source our distilled SDXL-Lightning models both as LoRA and full UNet weights.

arxiv.org

2024年2月にByteDanceから発表された論文
Adversarial DistillationとProgressive Distillationを組み合わせたもの
教師モデルが複数ステップで推論した結果を、生徒モデルが1ステップで直接推論できるように訓練する
最初はMSE損失を使用して基礎的なトレーニングを行い、その後に敵対的損失を追加することで画質を向上させる

Lightning-LoRAの使い方

CFG: 1~3?
Steps: 1, 2, 4, 8
Sampler: euler

例えば以下からダウンロードできます。

ByteDance/SDXL-Lightning at main

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Lightning-LoRAの生成結果(SDXL, 4step)

CFG: 2
Steps: 4
sampler: euler
scheduler: sgm_uniform
model: Animagine XL
lora-strength: 1.0
prompt: 1 girl, running, smiling, mouth open, semi side view

割と良い感じ

LCM

LCMモデルの概要

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

Latent Diffusion models (LDMs) have achieved remarkable results in synthesizing high-resolution images. However, the iterative sampling process is computationally intensive and leads to slow generation. Inspired by Consistency Models (song et al.), we propose Latent Consistency Models (LCMs), enabling swift inference with minimal steps on any pre-trained LDMs, including Stable Diffusion (rombach et al). Viewing the guided reverse diffusion process as solving an augmented probability flow ODE (PF-ODE), LCMs are designed to directly predict the solution of such ODE in latent space, mitigating the need for numerous iterations and allowing rapid, high-fidelity sampling. Efficiently distilled from pre-trained classifier-free guided diffusion models, a high-quality 768 x 768 2~4-step LCM takes only 32 A100 GPU hours for training. Furthermore, we introduce Latent Consistency Fine-tuning (LCF), a novel method that is tailored for fine-tuning LCMs on customized image datasets. Evaluation on the LAION-5B-Aesthetics dataset demonstrates that LCMs achieve state-of-the-art text-to-image generation performance with few-step inference. Project Page: https://latent-consistency-models.github.io/

arxiv.org

LCM-LoRA: A Universal Stable-Diffusion Acceleration Module

Latent Consistency Models (LCMs) have achieved impressive performance in accelerating text-to-image generative tasks, producing high-quality images with minimal inference steps. LCMs are distilled from pre-trained latent diffusion models (LDMs), requiring only ~32 A100 GPU training hours. This report further extends LCMs' potential in two aspects: First, by applying LoRA distillation to Stable-Diffusion models including SD-V1.5, SSD-1B, and SDXL, we have expanded LCM's scope to larger models with significantly less memory consumption, achieving superior image generation quality. Second, we identify the LoRA parameters obtained through LCM distillation as a universal Stable-Diffusion acceleration module, named LCM-LoRA. LCM-LoRA can be directly plugged into various Stable-Diffusion fine-tuned models or LoRAs without training, thus representing a universally applicable accelerator for diverse image generation tasks. Compared with previous numerical PF-ODE solvers such as DDIM, DPM-Solver, LCM-LoRA can be viewed as a plug-in neural PF-ODE solver that possesses strong generalization abilities. Project page: https://github.com/luosiallen/latent-consistency-model.

arxiv.org

2023年10月に発表された論文
内部の拡散プロセスを確率フロー常微分方程式（PF-ODE）の解を予測する形式に変換し、反復回数を大幅に削減している
- 逆拡散プロセスを数値的に効率的に行えるので、少ないステップで画像生成が可能
LCM-LoRAについての論文も2023年11月に発表されている
LCM-LoRAの場合はLoRAの低ランク行列がPF-ODEのソルバーとしての役割を持つ

LCM-LoRAの使い方

CFG: ~ 1.5
Steps: 3 ~

例えば以下から入手できます

Latent Consistency Models LoRAs - a latent-consistency Collection

Latent Consistency Models for Stable Diffusion - LoRAs and full fine-tuned weights

huggingface.co

LCM-LoRAの生成結果(SDXL)

CFG: 1.5
Steps: 6
sampler: lcm
scheduler: sgm_uniform
model: DreamShaper XL
lora-strength: 1.0
prompt: 1 girl, running, smiling, mouth open, semi side view, anime style

Animagineだとうまく生成されないため、DreamShaper XLを使用しています。

LCM-LoRAの生成結果(SD1.5)

CFG: 1.5
Steps: 6
sampler: lcm
scheduler: sgm_uniform
model: Mistoon_Anime
lora-strength: 1.0
prompt: 1 girl, running, smiling, mouth open, semi side view

Hyper

Hyperの概要

ByteDance/Hyper-SD · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis

Recently, a series of diffusion-aware distillation algorithms have emerged to alleviate the computational overhead associated with the multi-step inference process of Diffusion Models (DMs). Current distillation techniques often dichotomize into two distinct aspects: i) ODE Trajectory Preservation; and ii) ODE Trajectory Reformulation. However, these approaches suffer from severe performance degradation or domain shifts. To address these limitations, we propose Hyper-SD, a novel framework that synergistically amalgamates the advantages of ODE Trajectory Preservation and Reformulation, while maintaining near-lossless performance during step compression. Firstly, we introduce Trajectory Segmented Consistency Distillation to progressively perform consistent distillation within pre-defined time-step segments, which facilitates the preservation of the original ODE trajectory from a higher-order perspective. Secondly, we incorporate human feedback learning to boost the performance of the model in a low-step regime and mitigate the performance loss incurred by the distillation process. Thirdly, we integrate score distillation to further improve the low-step generation capability of the model and offer the first attempt to leverage a unified LoRA to support the inference process at all steps. Extensive experiments and user studies demonstrate that Hyper-SD achieves SOTA performance from 1 to 8 inference steps for both SDXL and SD1.5. For example, Hyper-SDXL surpasses SDXL-Lightning by +0.68 in CLIP Score and +0.51 in Aes Score in the 1-step inference.

arxiv.org

2024年4月にByteDanceから発表された論文
Trajectory Segmented Consistency Distillation(3-1章)
- 学習プロセスをステップごとに細かく分割し、それぞれの分割毎で蒸留を行う、とのこと
- 分割ごとに正確に画像を生成できるように訓練し、徐々に分割を減らしていく
- 最終的には、少ないステップでも高品質な画像を生成できるようになる
human feedback learning(3-2章)
- モデルの性能を人間の好みに基づいて向上させるための手法
- 人間のフィードバックデータを使用して報酬モデルを訓練
- 学習時はこの報酬モデルを損失関数として利用
↑の説明はあんまり自信ないので、本気で理解したかったら元の論文を読んだ方が良いです。

Hyper-LoRAの使い方

CFG: 小さめ(cfg-loraの場合は8とかでも行けるらしい)
Steps: 1, 2, 4, 8, 12（ダウンロードしたLoRAのファイル名に従う）

以下から入手できます。

ByteDance/Hyper-SD at main

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Hyper-LoRAの生成結果(SDXL, 4step)

CFG: 1.5
Steps: 4
sampler: euler
scheduler: normal
model: DreamShaper XL
lora-strength: 1.0
prompt: 1 girl, running, smiling, mouth open, semi side view, anime style

4stepのLoRAを使用しましたが、4stepだと少し厳しかったので6stepで生成しています。

Hyper-LoRAの生成結果(SDXL, 8step, cfg preserved)

CFG: 8
Steps: 8
sampler: euler
scheduler: normal
model: DreamShaper XL
lora-strength: 1.0
prompt: 1 girl, running, smiling, mouth open, semi side view, anime style

Hyper-LoRA生成結果(SDXL, 8step, cfg preserved)

Hyper-LoRAの生成結果(SD1.5, 4step)

CFG: 1.5
Steps: 6
sampler: euler
scheduler: normal
model: Mistoon_Anime
lora-strength: 1.0
prompt: 1 girl, running, smiling, mouth open, semi side view, anime style

Hyper-LoRAの生成結果(SD1.5, 8step, cfg preserved)

CFG: 8
Steps: 8
sampler: euler
scheduler: normal
model: flat2DAnimerge
lora-strength: 1.0
prompt: 1 girl, running, smiling, mouth open, semi side view, anime style

最後に

比較というほどではないですが、それぞれ試してみたところ自分的には↓の感想

Turboは選択肢から外す
SD1.5はLCMが安定
SDXLはHyper/Lightning/LCMのどれでも良いが、雰囲気Hyperが良さげ
通常のLoRAと同様に、ベースモデルとの相性の良し悪しがあるのでうまく生成できない場合は他を試す
全体的に精度はあまりよくないので、主にパラメータ調整用途とかで一時的に入れる感じになりそう

Stable Diffusionの高速化：Turbo、Lightning、LCM、HyperのLoRA使用ガイド

Turbo

Turboモデルの概要

Turbo-LoRAの使い方

Turbo-LoRAの生成結果(SDXL)

インストラクション通り

パラメータ調整後

Lightning

Lightningモデルの概要

Lightning-LoRAの使い方

Lightning-LoRAの生成結果(SDXL, 4step)

LCM

LCMモデルの概要

LCM-LoRAの使い方

LCM-LoRAの生成結果(SDXL)

LCM-LoRAの生成結果(SD1.5)

Hyper

Hyperの概要

Hyper-LoRAの使い方

Hyper-LoRAの生成結果(SDXL, 4step)

Hyper-LoRAの生成結果(SDXL, 8step, cfg preserved)

Hyper-LoRAの生成結果(SD1.5, 4step)

Hyper-LoRAの生成結果(SD1.5, 8step, cfg preserved)

最後に

関連記事

Stable Diffusionガイド：画像生成に役立つリンク集

ComfyUIとImpact Packで画像から人物を簡単抽出！マスク生成からInpaintまで

StableDiffusion image2image手法比較：Latent vs ControlNet vs IPAdapter

新モデル「Flux.1」の実力をComfyUIで検証：導入手順と実際の生成結果を紹介

ComfyUIでFlux.1モデルにLoRAを適用する方法｜手順と生成結果の紹介

【Stable Diffusion】ComfyUIを使って動画生成AIで遊んでみよう【RIFE編】

【Stable Diffusion】ComfyUIを使って画像生成AIで遊んでみよう【Face Detailer編】

【Stable Diffusion】ComfyUIを使って画像生成AIで遊んでみよう【InstantID編】

Kolors：文章理解力が強化されたStableDiffusion派生モデルをComfyUIで試す方法

ComfyUIを使って落書きからリアルタイムで画像生成してみる