Stable Diffusion Speedup: LoRA for Turbo, Lightning, LCM, Hyper

I think there are 4 types, Turbo, Lightning, LCM, Hyper as methods to speed up inference of Stable Diffusion.

I think using these models converted to LoRA is common, but since I wondered which one to use, I organize usage and generation results.

Turbo

Overview of Turbo Model

stabilityai/sdxl-turbo · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Adversarial Diffusion Distillation

We introduce Adversarial Diffusion Distillation (ADD), a novel training approach that efficiently samples large-scale foundational image diffusion models in just 1-4 steps while maintaining high image quality. We use score distillation to leverage large-scale off-the-shelf image diffusion models as a teacher signal in combination with an adversarial loss to ensure high image fidelity even in the low-step regime of one or two sampling steps. Our analyses show that our model clearly outperforms existing few-step methods (GANs, Latent Consistency Models) in a single step and reaches the performance of state-of-the-art diffusion models (SDXL) in only four steps. ADD is the first method to unlock single-step, real-time image synthesis with foundation models. Code and weights available under https://github.com/Stability-AI/generative-models and https://huggingface.co/stabilityai/ .

arxiv.org

Paper announced from stability.ai in Nov 2023
Idea to combine generation precision of diffusion model and high speed generation of GAN
Improving efficiency of denoise by using distillation method called Adversarial Diffusion Distillation
Specifically, score distillation to student diffusion model using mechanism of GAN making learned diffusion model as teacher model
At this time initialize weight of student model with weight of learned model.

Usage of Turbo-LoRA

CFG: 1 ~ 2.5
Steps: 4 ~
Sampler: lcm

For example can be downloaded from below.

SDXL Turbo-LoRA-Stable Diffusion XL faster than light - v1-64dim | Stable Diffusion XL LoRA | Civitai

v1: 128 dim v1-64dim: half dimension, similar quality v1-16dim: 94MB, s imilar quality(but different results in some prompts... I suggest the 128 d...

civitai.com

Generation Result of Turbo-LoRA (SDXL)

As instruction

CFG: 1.5
Steps: 4
sampler: lcm
scheduler: sgm_uniform
model: Animagine XL
lora-strength: 1
prompt: 1 girl, running, smiling, mouth open, semi side view

Obviously many abnormalities

After Parameter Adjustment

CFG: 1.5
Steps: 6
sampler: euler
scheduler: sgm_uniform
model: Animagine XL
lora-strength: 0.6
prompt: 1 girl, running, smiling, mouth open, semi side view

Abnormality likely to appear not changed even if adjusted

Turbo-LoRA Generation Result (After Parameter Adjustment)

Lightning

Overview of Lightning Model

ByteDance/SDXL-Lightning · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

SDXL-Lightning: Progressive Adversarial Diffusion Distillation

We propose a diffusion distillation method that achieves new state-of-the-art in one-step/few-step 1024px text-to-image generation based on SDXL. Our method combines progressive and adversarial distillation to achieve a balance between quality and mode coverage. In this paper, we discuss the theoretical analysis, discriminator design, model formulation, and training techniques. We open-source our distilled SDXL-Lightning models both as LoRA and full UNet weights.

arxiv.org

Paper announced from ByteDance in Feb 2024
Combination of Adversarial Distillation and Progressive Distillation
Train so that student model can directly infer in 1 step result inferred by teacher model in multiple steps
Perform basic training using MSE loss first, then improve image quality by adding adversarial loss

Usage of Lightning-LoRA

CFG: 1~3?
Steps: 1, 2, 4, 8
Sampler: euler

For example can be downloaded from below.

ByteDance/SDXL-Lightning at main

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Generation Result of Lightning-LoRA (SDXL, 4step)

CFG: 2
Steps: 4
sampler: euler
scheduler: sgm_uniform
model: Animagine XL
lora-strength: 1.0
prompt: 1 girl, running, smiling, mouth open, semi side view

Relatively good feeling

LCM

Overview of LCM Model

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

Latent Diffusion models (LDMs) have achieved remarkable results in synthesizing high-resolution images. However, the iterative sampling process is computationally intensive and leads to slow generation. Inspired by Consistency Models (song et al.), we propose Latent Consistency Models (LCMs), enabling swift inference with minimal steps on any pre-trained LDMs, including Stable Diffusion (rombach et al). Viewing the guided reverse diffusion process as solving an augmented probability flow ODE (PF-ODE), LCMs are designed to directly predict the solution of such ODE in latent space, mitigating the need for numerous iterations and allowing rapid, high-fidelity sampling. Efficiently distilled from pre-trained classifier-free guided diffusion models, a high-quality 768 x 768 2~4-step LCM takes only 32 A100 GPU hours for training. Furthermore, we introduce Latent Consistency Fine-tuning (LCF), a novel method that is tailored for fine-tuning LCMs on customized image datasets. Evaluation on the LAION-5B-Aesthetics dataset demonstrates that LCMs achieve state-of-the-art text-to-image generation performance with few-step inference. Project Page: https://latent-consistency-models.github.io/

arxiv.org

LCM-LoRA: A Universal Stable-Diffusion Acceleration Module

Latent Consistency Models (LCMs) have achieved impressive performance in accelerating text-to-image generative tasks, producing high-quality images with minimal inference steps. LCMs are distilled from pre-trained latent diffusion models (LDMs), requiring only ~32 A100 GPU training hours. This report further extends LCMs' potential in two aspects: First, by applying LoRA distillation to Stable-Diffusion models including SD-V1.5, SSD-1B, and SDXL, we have expanded LCM's scope to larger models with significantly less memory consumption, achieving superior image generation quality. Second, we identify the LoRA parameters obtained through LCM distillation as a universal Stable-Diffusion acceleration module, named LCM-LoRA. LCM-LoRA can be directly plugged into various Stable-Diffusion fine-tuned models or LoRAs without training, thus representing a universally applicable accelerator for diverse image generation tasks. Compared with previous numerical PF-ODE solvers such as DDIM, DPM-Solver, LCM-LoRA can be viewed as a plug-in neural PF-ODE solver that possesses strong generalization abilities. Project page: https://github.com/luosiallen/latent-consistency-model.

arxiv.org

Paper announced in Oct 2023
Converting internal diffusion process to format predicting solution of Probability Flow ODE (PF-ODE), reducing iteration count largely
- Since reverse diffusion process can be performed numerically efficiently, image generation is possible with few steps
Paper about LCM-LoRA also announced in Nov 2023
In case of LCM-LoRA, low rank matrix of LoRA has role as solver of PF-ODE

Usage of LCM-LoRA

CFG: ~ 1.5
Steps: 3 ~

For example can be obtained from below

Latent Consistency Models LoRAs - a latent-consistency Collection

Latent Consistency Models for Stable Diffusion - LoRAs and full fine-tuned weights

huggingface.co

Generation Result of LCM-LoRA (SDXL)

CFG: 1.5
Steps: 6
sampler: lcm
scheduler: sgm_uniform
model: DreamShaper XL
lora-strength: 1.0
prompt: 1 girl, running, smiling, mouth open, semi side view, anime style

Using DreamShaper XL since not generated well with Animagine.

Generation Result of LCM-LoRA (SD1.5)

CFG: 1.5
Steps: 6
sampler: lcm
scheduler: sgm_uniform
model: Mistoon_Anime
lora-strength: 1.0
prompt: 1 girl, running, smiling, mouth open, semi side view

Hyper

Overview of Hyper

ByteDance/Hyper-SD · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis

Recently, a series of diffusion-aware distillation algorithms have emerged to alleviate the computational overhead associated with the multi-step inference process of Diffusion Models (DMs). Current distillation techniques often dichotomize into two distinct aspects: i) ODE Trajectory Preservation; and ii) ODE Trajectory Reformulation. However, these approaches suffer from severe performance degradation or domain shifts. To address these limitations, we propose Hyper-SD, a novel framework that synergistically amalgamates the advantages of ODE Trajectory Preservation and Reformulation, while maintaining near-lossless performance during step compression. Firstly, we introduce Trajectory Segmented Consistency Distillation to progressively perform consistent distillation within pre-defined time-step segments, which facilitates the preservation of the original ODE trajectory from a higher-order perspective. Secondly, we incorporate human feedback learning to boost the performance of the model in a low-step regime and mitigate the performance loss incurred by the distillation process. Thirdly, we integrate score distillation to further improve the low-step generation capability of the model and offer the first attempt to leverage a unified LoRA to support the inference process at all steps. Extensive experiments and user studies demonstrate that Hyper-SD achieves SOTA performance from 1 to 8 inference steps for both SDXL and SD1.5. For example, Hyper-SDXL surpasses SDXL-Lightning by +0.68 in CLIP Score and +0.51 in Aes Score in the 1-step inference.

arxiv.org

Paper announced from ByteDance in April 2024
Trajectory Segmented Consistency Distillation (Chapter 3-1)
- Divide learning process finely by step, and perform distillation at each division, it says
- Train to generate image accurately at each division, and reduce division gradually
- Finally, become able to generate high quality image even with few steps
human feedback learning (Chapter 3-2)
- Method to improve performance of model based on human preference
- Train reward model using human feedback data
- Utilize this reward model as loss function at learning
I am not confident about explanation of ↑ so better read original paper if you want to understand seriously.

Usage of Hyper-LoRA

CFG: Small (Seems 8 etc. is okay in case of cfg-lora)
Steps: 1, 2, 4, 8, 12 (Follow filename of downloaded LoRA)

Obtainable from below.

ByteDance/Hyper-SD at main

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Generation Result of Hyper-LoRA (SDXL, 4step)

CFG: 1.5
Steps: 4
sampler: euler
scheduler: normal
model: DreamShaper XL
lora-strength: 1.0
prompt: 1 girl, running, smiling, mouth open, semi side view, anime style

Used 4step LoRA, but generating with 6step because 4step was slightly tough.

Hyper-LoRA Generation Result (SDXL, 4step)

Generation Result of Hyper-LoRA (SDXL, 8step, cfg preserved)

CFG: 8
Steps: 8
sampler: euler
scheduler: normal
model: DreamShaper XL
lora-strength: 1.0
prompt: 1 girl, running, smiling, mouth open, semi side view, anime style

Hyper-LoRA Generation Result (SDXL, 8step, cfg preserved)

Generation Result of Hyper-LoRA (SD1.5, 4step)

CFG: 1.5
Steps: 6
sampler: euler
scheduler: normal
model: Mistoon_Anime
lora-strength: 1.0
prompt: 1 girl, running, smiling, mouth open, semi side view, anime style

Hyper-LoRA Generation Result (SD1.5, 4step)

Generation Result of Hyper-LoRA (SD1.5, 8step, cfg preserved)

CFG: 8
Steps: 8
sampler: euler
scheduler: normal
model: flat2DAnimerge
lora-strength: 1.0
prompt: 1 girl, running, smiling, mouth open, semi side view, anime style

Hyper-LoRA Generation Result (SD1.5, 8step, cfg preserved)

Conclusion

Not much of comparison but my impression after trying each is ↓

Exclude Turbo from choice
LCM is stable for SD1.5
Any of Hyper/Lightning/LCM is fine for SDXL, but Hyper seems good in atmosphere
Same as normal LoRA, there is good or bad compatibility with base model so try others if not generated well
Since precision is not so good overall, it seems to be used temporarily for parameter adjustment usage mainly

Stable Diffusion Speedup: LoRA for Turbo, Lightning, LCM, Hyper

Turbo

Overview of Turbo Model

Usage of Turbo-LoRA

Generation Result of Turbo-LoRA (SDXL)

As instruction

After Parameter Adjustment

Lightning

Overview of Lightning Model

Usage of Lightning-LoRA

Generation Result of Lightning-LoRA (SDXL, 4step)

LCM

Overview of LCM Model

Usage of LCM-LoRA

Generation Result of LCM-LoRA (SDXL)

Generation Result of LCM-LoRA (SD1.5)

Hyper

Overview of Hyper

Usage of Hyper-LoRA

Generation Result of Hyper-LoRA (SDXL, 4step)

Generation Result of Hyper-LoRA (SDXL, 8step, cfg preserved)

Generation Result of Hyper-LoRA (SD1.5, 4step)

Generation Result of Hyper-LoRA (SD1.5, 8step, cfg preserved)

Conclusion

Related Posts

Stable Diffusion Guide: Image Generation Links

Person Extraction with ComfyUI and Impact Pack: Mask to Inpaint

StableDiffusion i2i Comparison: Latent vs ControlNet vs IPAdapter

Verifying 'Flux.1' with ComfyUI: Installation and Results

Apply LoRA to Flux.1 in ComfyUI: Steps and Results

Video Generation AI with ComfyUI: RIFE Edition

Image Generation AI with ComfyUI: Face Detailer Edition

Image Generation AI with ComfyUI: InstantID Edition

Kolors: StableDiffusion Derivative for Better Prompts in ComfyUI

Real-Time Image Generation from Scribbles with ComfyUI