Image Generation AI with ComfyUI: ESRGAN Upscaling Edition

3 min read
Modified
Progress 9 / 15
Table of Contents

When you start exploring the world of AI image generation, the first wall you’ll likely hit is “resolution.” Generating images at standard sizes often results in blurry details, and the results can look quite disappointing when enlarged.

This is where ESRGAN (Enhanced Super-Resolution Generative Adversarial Networks) comes in. ESRGAN is a deep learning-based Super-Resolution model that can take a low-resolution image and output a high-resolution version while maintaining sharp details.

In this article, I’ll walk you through the specific steps to integrate ESRGAN into your ComfyUI workflow to upscale your generated images quickly and effectively.

ESRGAN vs. Hires.fix: Which One to Use?

While “Hires.fix” is a popular upscaling method, it works fundamentally differently from the Image-space Upscale (ESRGAN) we’re discussing here.

  • Hires.fix (Latent-space Upscale): This method increases the resolution in the Latent Space and performs a second pass of denoising. It doesn’t just enlarge the image; it adds new details. However, it consumes significantly more VRAM and carries the risk of slightly changing your composition.
  • ESRGAN (Image-space Upscale): This process works on the decoded “final image” through a neural network. It’s ideal when you want to increase resolution without changing the original composition or when you prioritize faster generation speeds.

If your composition is already perfect and you just want it to be “crisper,” ESRGAN is often the less stressful choice.

Implementation: Integrating into your Workflow

With ComfyUI, adding upscaling to your workflow is as simple as adding a single node.

1. Placing the Node

Place an Upscale Image (using Model) node so that it receives the image output from your VAE Decode node.

A full view of the workflow. Simply connect the Upscale node after VAE Decode.
A full view of the workflow. Simply connect the Upscale node after VAE Decode.
A close-up of the node. You can see it's a very straightforward connection.
A close-up of the node. You can see it's a very straightforward connection.

2. Setting Up the Models

To use ESRGAN, you’ll need the trained model files. Here are some highly recommended ones:

  • R-ESRGAN 4x+: A solid, versatile standard.
  • 4x-UltraSharp: Renowned for producing extremely clean results across both realistic and illustrative styles.
  • 4x-AnimeSharp: Specifically tuned for anime-style images.

Download these models (usually in .pth format) and place them in the following directory: ComfyUI/models/upscale_models

If you’re looking for these models, HuggingFace is a great place to start.

3. Execution

Select your chosen model in the Load Upscale Model node, connect it to the upscale_model input of the Upscale Image (using Model) node, and you’re ready to go.

For instance, using 4x-Ultrasharp, a 512x512 image can be transformed into a 2048x2048 masterpiece in just moments.

Results Comparison

The difference is clear when you see the results side-by-side. While it doesn’t “re-draw” the image like Hires.fix, the edges become significantly sharper, and fine artifacts are cleaned up.

Before ESRGAN (512×512)
Before ESRGAN (512×512)
After ESRGAN (2048×2048)
After ESRGAN (2048×2048)

Conclusion: Finding the Right Balance

While Hires.fix might be the mainstream choice these days, the sheer simplicity and speed of ESRGAN are hard to beat.

My personal preference is to “use ESRGAN for quick batches to find the best composition, and then run Hires.fix or Ultimate SD Upscale on that specific seed value for the final polish.” It’s all about finding the balance that works for your project.

If you find your generated images are looking a bit “soft,” definitely give ESRGAN a try in your next workflow.

For more techniques related to image generation, check out the link collection below:

Stable Diffusionガイド:画像生成に役立つリンク集

>-

blog.otama-playground.com