StableDiffusion i2i Comparison: Latent vs ControlNet vs IPAdapter

When I wanted to generate images from images (image-to-image) with Stable Diffusion, I wasn’t sure which method to use, so I decided to actually generate and compare them.

The three main image-to-image methods are as follows. I will focus on these in this article.

Method 1: Input the image as a Latent Image
Method 2: Use ControlNet
Method 3: Input the image as a prompt using IPAdapter

Method 1: Input the Image as a Latent Image

Overview

The image is encoded with VAE to obtain a latent variable (Latent Image), which is then used as input to Stable Diffusion for image generation. At this point, set the Denoise strength slightly lower than 1.

The color tone and rough position of colors are preserved, but depending on the prompt and parameters, the result can end up quite different from the original image — that is the main drawback.

In ComfyUI, you can generate images by connecting the nodes as shown below:

Generation Result

prompt: (blank)
model: flat2DAnimerge
denoise: 0.65

The original image is in the top-left.

Method 2: Use ControlNet

Overview

ControlNet is a method for specifying the composition of the generated image. ControlNet uses data such as line drawings and OpenPose to define the image composition. The flow works as follows: the input image is converted into line drawings or OpenPose data, which is then passed to ControlNet for image-to-image generation.

Using this method, the composition closely follows the original image, but the color tone is not preserved — you specify the colors via the text prompt instead.

In ComfyUI, the workflow looks like this:

For instructions on how to set up ControlNet, please refer to the following article:

ComfyUIでControlNet：姿勢指定した画像生成AIの基本

blog.otama-playground.com

Generation Results

prompt: (blank)
model: flat2DAnimerge

The original image is in the top-left.

Method 3: Use IPAdapter to Input the Image as a Prompt

Overview

IPAdapter is a method that uses an image as a prompt. The image is interpreted by an AI model (ViT), and that interpretation is passed to the Stable Diffusion model, allowing generation of images similar to the original. Since it operates via a mechanism separate from the Text Encoder, it can be used in combination with text prompts as well.

For instructions on how to use IPAdapter in ComfyUI, see the following article:

【Stable Diffusion】ComfyUIを使って画像生成AIで遊んでみよう【IPAdapter編】

blog.otama-playground.com

Generation Result

prompt: (blank)
model: flat2DAnimerge

The original image is in the top-left.

Comparison

Here is a summary of the advantages and disadvantages of each method:

Method	Advantages	Disadvantages
Method 1: Latent	No additional extensions needed, simple to use	Parameter adjustment needed, easy to fail
Method 2: ControlNet	Can reproduce composition almost faithfully	Inflexible (especially line art types), generation becomes slightly heavy, additional extensions needed
Method 3: IPAdapter	Roughly reproduces composition while allowing adjustment, adjustment can also be done with text prompts	Reproducibility of original image is mixed, additional extensions needed

Conclusion

This time I generated separately to capture the characteristics of each method, but it seems possible to use them in combination. I think it would be good to choose based on the merits needed at the time.

StableDiffusion i2i Comparison: Latent vs ControlNet vs IPAdapter

Method 1: Input the Image as a Latent Image

Overview

Generation Result

Method 2: Use ControlNet

Overview

Generation Results

Method 3: Use IPAdapter to Input the Image as a Prompt

Overview

Generation Result

Comparison

Conclusion

Related Posts

Image Generation AI with ComfyUI: Face Detailer Edition

Image Generation AI with ComfyUI: Inpaint Edition

Image Generation AI with ComfyUI: Outpaint Edition

Image Generation AI with ComfyUI: Textual Inversion Edition

Real-Time Image Generation from Scribbles with ComfyUI

Generate Images from Scribbles with Krita & StableDiffusion

Vid2Vid with ComfyUI: AnimateDiff, ControlNet, and FaceID

Image Generation AI with ComfyUI: ESRGAN Upscaling Edition

SD-Turbo & SDXL-Turbo: Overview and Local Demo Guide

Stable Video Diffusion: High-Precision AI Video Generation