Stable Diffusion Guide: Paper Reading Links

2 min read
Progress 1 / 12
Table of Contents

This is a collection of explanatory articles on papers around Stable Diffusion.

Image Generation

Stable Diffusion

Everyone’s favorite Stable Diffusion model. It is a model that combines Text Encoder, UNet, and VAE.

【Stable Diffusion】画像生成モデルの仕組みを理解する

>-

blog.otama-playground.com

Stable Diffusion 3

An update of the Stable Diffusion model to the latest technology. It changes to a Transformers-based model called MMDiT instead of UNet, and includes some tweaks such as optimizing the noise schedule and increasing the TextEncoder to three.

Stable Diffusion 3論文読み:ついにUNetを卒業したようです

>-

blog.otama-playground.com

LoRA

Everyone’s favorite LoRA. It can be said to be an expansion part of the model, and additional learning of the model can be performed just by increasing the parameters a little. (Originally proposed for language models)

LoRA(Low-Rank Adaptation)とは?大規模モデルを低コストでファインチューニングする手法とメリット

>-

blog.otama-playground.com

ControlNet

ControlNet is a model extension for controlling the pose of the person in the image. UNet is doubled and one is made to learn posture data.

Stable Diffusionモデルで姿勢を学習・制御するControlNetの基礎

>-

blog.otama-playground.com

Textual Inversion

A method of tweaking the Text Encoder instead of UNet. Since the outside of the main body, UNet, is tweaked, the effect is relatively weak, but there are advantages such as no need to increase parameters and learning is easier than tweaking UNet.

Textual Inversionのわかりやすい解説:Stable Diffusionの制御手法

>-

blog.otama-playground.com

IPAdapter

A method of preparing an Image Encoder in parallel with the Text Encoder so that it can receive image prompts.

IPAdapterの簡単解説:画像をプロンプトとして使用できる!?【Stable Diffusion】

>-

blog.otama-playground.com

Video Generation

AnimateDiff

A model that adds layers to Stable Diffusion so that it can learn the time axis.

AnimateDiff: Stable Diffusionを拡張した軽量動画生成モデルの仕組み

>-

blog.otama-playground.com

RIFE

Technology to generate intermediate frames and improve video frame rates.

動画のフレームレートを上げる技術:RIFEとそのアーキテクチャ

>-

blog.otama-playground.com

Stream Diffusion

A model that enables ultra-high-speed image generation by introducing pipeline processing and other speed-up techniques. Maximum 91.07fps with RTX 4090.

Stream Diffusion: リアルタイムな動画生成を可能にする新技術

>-

blog.otama-playground.com