Stable Diffusionの推論を高速化する手法として、Turbo、Lightning, LCM, Hyperの4種類があると思います。
これらのモデルをLoRA化したものを使うことが多いと思うのですが、どれを使うか迷ったので使い方や生成結果を整理してみます。
Turbo
Turboモデルの概要
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
We introduce Adversarial Diffusion Distillation (ADD), a novel training approach that efficiently samples large-scale foundational image diffusion models in just 1-4 steps while maintaining high image quality. We use score distillation to leverage large-scale off-the-shelf image diffusion models as a teacher signal in combination with an adversarial loss to ensure high image fidelity even in the low-step regime of one or two sampling steps. Our analyses show that our model clearly outperforms existing few-step methods (GANs, Latent Consistency Models) in a single step and reaches the performance of state-of-the-art diffusion models (SDXL) in only four steps. ADD is the first method to unlock single-step, real-time image synthesis with foundation models. Code and weights available under https://github.com/Stability-AI/generative-models and https://huggingface.co/stabilityai/ .
- 2023年11月にstability.aiから発表された論文
- 拡散モデルの生成精度とGANの高速生成を組み合わせたいという思想
- Adversarial Diffusion Distillationという蒸留方法を用いることで、denoiseの効率を上げている
- 具体的には、学習済拡散モデルを教師モデルとし、GANの仕組みを利用して生徒拡散モデルにスコア蒸留している
- このとき生徒モデルの重みも学習済みモデルの重みで初期化する。
Turbo-LoRAの使い方
- CFG: 1 ~ 2.5
- Steps: 4 ~
- Sampler: lcm
例えば以下からダウンロードできます。
v1: 128 dim v1-64dim: half dimension, similar quality v1-16dim: 94MB, s imilar quality(but different results in some prompts... I suggest the 128 d...
Turbo-LoRAの生成結果(SDXL)
インストラクション通り
- CFG: 1.5
- Steps: 4
- sampler: lcm
- scheduler: sgm_uniform
- model: Animagine XL
- lora-strength: 1
- prompt: 1 girl, running, smiling, mouth open, semi side view
明らかに異常が多い

パラメータ調整後
- CFG: 1.5
- Steps: 6
- sampler: euler
- scheduler: sgm_uniform
- model: Animagine XL
- lora-strength: 0.6
- prompt: 1 girl, running, smiling, mouth open, semi side view
調整しても異常が出やすいのは変わらず

Lightning
Lightningモデルの概要
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
We propose a diffusion distillation method that achieves new state-of-the-art in one-step/few-step 1024px text-to-image generation based on SDXL. Our method combines progressive and adversarial distillation to achieve a balance between quality and mode coverage. In this paper, we discuss the theoretical analysis, discriminator design, model formulation, and training techniques. We open-source our distilled SDXL-Lightning models both as LoRA and full UNet weights.
- 2024年2月にByteDanceから発表された論文
- Adversarial DistillationとProgressive Distillationを組み合わせたもの
- 教師モデルが複数ステップで推論した結果を、生徒モデルが1ステップで直接推論できるように訓練する
- 最初はMSE損失を使用して基礎的なトレーニングを行い、その後に敵対的損失を追加することで画質を向上させる
Lightning-LoRAの使い方
- CFG: 1~3?
- Steps: 1, 2, 4, 8
- Sampler: euler
例えば以下からダウンロードできます。
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Lightning-LoRAの生成結果(SDXL, 4step)
- CFG: 2
- Steps: 4
- sampler: euler
- scheduler: sgm_uniform
- model: Animagine XL
- lora-strength: 1.0
- prompt: 1 girl, running, smiling, mouth open, semi side view
割と良い感じ

LCM
LCMモデルの概要
Latent Diffusion models (LDMs) have achieved remarkable results in synthesizing high-resolution images. However, the iterative sampling process is computationally intensive and leads to slow generation. Inspired by Consistency Models (song et al.), we propose Latent Consistency Models (LCMs), enabling swift inference with minimal steps on any pre-trained LDMs, including Stable Diffusion (rombach et al). Viewing the guided reverse diffusion process as solving an augmented probability flow ODE (PF-ODE), LCMs are designed to directly predict the solution of such ODE in latent space, mitigating the need for numerous iterations and allowing rapid, high-fidelity sampling. Efficiently distilled from pre-trained classifier-free guided diffusion models, a high-quality 768 x 768 2~4-step LCM takes only 32 A100 GPU hours for training. Furthermore, we introduce Latent Consistency Fine-tuning (LCF), a novel method that is tailored for fine-tuning LCMs on customized image datasets. Evaluation on the LAION-5B-Aesthetics dataset demonstrates that LCMs achieve state-of-the-art text-to-image generation performance with few-step inference. Project Page: https://latent-consistency-models.github.io/
Latent Consistency Models (LCMs) have achieved impressive performance in accelerating text-to-image generative tasks, producing high-quality images with minimal inference steps. LCMs are distilled from pre-trained latent diffusion models (LDMs), requiring only ~32 A100 GPU training hours. This report further extends LCMs' potential in two aspects: First, by applying LoRA distillation to Stable-Diffusion models including SD-V1.5, SSD-1B, and SDXL, we have expanded LCM's scope to larger models with significantly less memory consumption, achieving superior image generation quality. Second, we identify the LoRA parameters obtained through LCM distillation as a universal Stable-Diffusion acceleration module, named LCM-LoRA. LCM-LoRA can be directly plugged into various Stable-Diffusion fine-tuned models or LoRAs without training, thus representing a universally applicable accelerator for diverse image generation tasks. Compared with previous numerical PF-ODE solvers such as DDIM, DPM-Solver, LCM-LoRA can be viewed as a plug-in neural PF-ODE solver that possesses strong generalization abilities. Project page: https://github.com/luosiallen/latent-consistency-model.
- 2023年10月に発表された論文
- 内部の拡散プロセスを確率フロー常微分方程式(PF-ODE)の解を予測する形式に変換し、反復回数を大幅に削減している
- 逆拡散プロセスを数値的に効率的に行えるので、少ないステップで画像生成が可能
- LCM-LoRAについての論文も2023年11月に発表されている
- LCM-LoRAの場合はLoRAの低ランク行列がPF-ODEのソルバーとしての役割を持つ
LCM-LoRAの使い方
- CFG: ~ 1.5
- Steps: 3 ~
例えば以下から入手できます
Latent Consistency Models for Stable Diffusion - LoRAs and full fine-tuned weights
LCM-LoRAの生成結果(SDXL)
- CFG: 1.5
- Steps: 6
- sampler: lcm
- scheduler: sgm_uniform
- model: DreamShaper XL
- lora-strength: 1.0
- prompt: 1 girl, running, smiling, mouth open, semi side view, anime style
Animagineだとうまく生成されないため、DreamShaper XLを使用しています。

LCM-LoRAの生成結果(SD1.5)
- CFG: 1.5
- Steps: 6
- sampler: lcm
- scheduler: sgm_uniform
- model: Mistoon_Anime
- lora-strength: 1.0
- prompt: 1 girl, running, smiling, mouth open, semi side view

Hyper
Hyperの概要
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Recently, a series of diffusion-aware distillation algorithms have emerged to alleviate the computational overhead associated with the multi-step inference process of Diffusion Models (DMs). Current distillation techniques often dichotomize into two distinct aspects: i) ODE Trajectory Preservation; and ii) ODE Trajectory Reformulation. However, these approaches suffer from severe performance degradation or domain shifts. To address these limitations, we propose Hyper-SD, a novel framework that synergistically amalgamates the advantages of ODE Trajectory Preservation and Reformulation, while maintaining near-lossless performance during step compression. Firstly, we introduce Trajectory Segmented Consistency Distillation to progressively perform consistent distillation within pre-defined time-step segments, which facilitates the preservation of the original ODE trajectory from a higher-order perspective. Secondly, we incorporate human feedback learning to boost the performance of the model in a low-step regime and mitigate the performance loss incurred by the distillation process. Thirdly, we integrate score distillation to further improve the low-step generation capability of the model and offer the first attempt to leverage a unified LoRA to support the inference process at all steps. Extensive experiments and user studies demonstrate that Hyper-SD achieves SOTA performance from 1 to 8 inference steps for both SDXL and SD1.5. For example, Hyper-SDXL surpasses SDXL-Lightning by +0.68 in CLIP Score and +0.51 in Aes Score in the 1-step inference.
- 2024年4月にByteDanceから発表された論文
- Trajectory Segmented Consistency Distillation(3-1章)
- 学習プロセスをステップごとに細かく分割し、それぞれの分割毎で蒸留を行う、とのこと
- 分割ごとに正確に画像を生成できるように訓練し、徐々に分割を減らしていく
- 最終的には、少ないステップでも高品質な画像を生成できるようになる
- human feedback learning(3-2章)
- モデルの性能を人間の好みに基づいて向上させるための手法
- 人間のフィードバックデータを使用して報酬モデルを訓練
- 学習時はこの報酬モデルを損失関数として利用
- ↑の説明はあんまり自信ないので、本気で理解したかったら元の論文を読んだ方が良いです。
Hyper-LoRAの使い方
- CFG: 小さめ(cfg-loraの場合は8とかでも行けるらしい)
- Steps: 1, 2, 4, 8, 12(ダウンロードしたLoRAのファイル名に従う)
以下から入手できます。
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Hyper-LoRAの生成結果(SDXL, 4step)
- CFG: 1.5
- Steps: 4
- sampler: euler
- scheduler: normal
- model: DreamShaper XL
- lora-strength: 1.0
- prompt: 1 girl, running, smiling, mouth open, semi side view, anime style
4stepのLoRAを使用しましたが、4stepだと少し厳しかったので6stepで生成しています。

Hyper-LoRAの生成結果(SDXL, 8step, cfg preserved)
- CFG: 8
- Steps: 8
- sampler: euler
- scheduler: normal
- model: DreamShaper XL
- lora-strength: 1.0
- prompt: 1 girl, running, smiling, mouth open, semi side view, anime style

Hyper-LoRAの生成結果(SD1.5, 4step)
- CFG: 1.5
- Steps: 6
- sampler: euler
- scheduler: normal
- model: Mistoon_Anime
- lora-strength: 1.0
- prompt: 1 girl, running, smiling, mouth open, semi side view, anime style

Hyper-LoRAの生成結果(SD1.5, 8step, cfg preserved)
- CFG: 8
- Steps: 8
- sampler: euler
- scheduler: normal
- model: flat2DAnimerge
- lora-strength: 1.0
- prompt: 1 girl, running, smiling, mouth open, semi side view, anime style

最後に
比較というほどではないですが、それぞれ試してみたところ自分的には↓の感想
- Turboは選択肢から外す
- SD1.5はLCMが安定
- SDXLはHyper/Lightning/LCMのどれでも良いが、雰囲気Hyperが良さげ
- 通常のLoRAと同様に、ベースモデルとの相性の良し悪しがあるのでうまく生成できない場合は他を試す
- 全体的に精度はあまりよくないので、主にパラメータ調整用途とかで一時的に入れる感じになりそう








