Stable Diffusion Guide: Paper Reading Links

This is a collection of explanatory articles on papers around Stable Diffusion.

Image Generation

Stable Diffusion

Everyone’s favorite Stable Diffusion model. It is a model that combines Text Encoder, UNet, and VAE.

Stable Diffusion 3

An update of the Stable Diffusion model to the latest technology. It changes to a Transformers-based model called MMDiT instead of UNet, and includes some tweaks such as optimizing the noise schedule and increasing the TextEncoder to three.

Stable Diffusion 3論文読み：ついにUNetを卒業したようです

blog.otama-playground.com

LoRA

Everyone’s favorite LoRA. It can be said to be an expansion part of the model, and additional learning of the model can be performed just by increasing the parameters a little. (Originally proposed for language models)

LoRA（Low-Rank Adaptation）とは？大規模モデルを低コストでファインチューニングする手法とメリット

blog.otama-playground.com

ControlNet

ControlNet is a model extension for controlling the pose of the person in the image. UNet is doubled and one is made to learn posture data.

Stable Diffusionモデルで姿勢を学習・制御するControlNetの基礎

blog.otama-playground.com

Textual Inversion

A method of tweaking the Text Encoder instead of UNet. Since the outside of the main body, UNet, is tweaked, the effect is relatively weak, but there are advantages such as no need to increase parameters and learning is easier than tweaking UNet.

Textual Inversionのわかりやすい解説：Stable Diffusionの制御手法

blog.otama-playground.com

IPAdapter

A method of preparing an Image Encoder in parallel with the Text Encoder so that it can receive image prompts.

IPAdapterの簡単解説：画像をプロンプトとして使用できる！？【Stable Diffusion】

blog.otama-playground.com

Video Generation

AnimateDiff

A model that adds layers to Stable Diffusion so that it can learn the time axis.

AnimateDiff: Stable Diffusionを拡張した軽量動画生成モデルの仕組み

blog.otama-playground.com

RIFE

Technology to generate intermediate frames and improve video frame rates.

動画のフレームレートを上げる技術：RIFEとそのアーキテクチャ

blog.otama-playground.com

Stream Diffusion

A model that enables ultra-high-speed image generation by introducing pipeline processing and other speed-up techniques. Maximum 91.07fps with RTX 4090.

Stream Diffusion: リアルタイムな動画生成を可能にする新技術

blog.otama-playground.com

Stable Diffusion Guide: Paper Reading Links

Image Generation

Stable Diffusion

Stable Diffusion 3

LoRA

ControlNet

Textual Inversion

IPAdapter

Video Generation

AnimateDiff

RIFE

Stream Diffusion

Related Posts

Video Frame Rate Enhancement: RIFE and its Architecture

Stable Diffusion 3 Paper: Moving Beyond UNet to DiT Architecture

Stream Diffusion: Real-Time Video and Image Generation

IPAdapter Explained: Use Images as Prompts for Stable Diffusion

AnimateDiff: Lightweight Video Extension for Stable Diffusion

ControlNet Basics: Posture Control with Stable Diffusion

Textual Inversion Guide: Controlling Stable Diffusion Prompts

What is LoRA? Low-Cost Optimization for Large Models

Stable Diffusion: Understanding the Image Generation Mechanism

Endless Automated Refactoring with Codex and Temporal