Stable Video Diffusion: High-Precision AI Video Generation

ChatGPT’s video generation is completely uninteresting because it can mostly just add narration to a still image!! So, I will investigate methods to generate proper “footage” other than ChatGPT. There are several methods, but this time I will try running “Stable Video Diffusion” locally.

Overview of Stable Video Diffusion

First, start by knowing your opponent before working. Below is the official explanation.

stabilityai/stable-video-diffusion-img2vid-xt · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Stable Video Diffusion is a Latent Diffusion Model developed by Stability AI that generates short videos from a single image (image-to-video). It can generate 25-frame videos at a resolution of 576x1024.
According to user surveys, Stable Video Diffusion’s Image-to-Video is rated as having superior video quality compared to alternative models like GEN-2 and PikaLabs.
This model can only generate short videos (less than 4 seconds), and movement may be limited. Also, person generation may not be accurate.

Try the Official Demo

Prerequisites

HuggingFace Account
- Hugging Face – The AI community building the future.
  
  We’re on a journey to advance and democratize artificial intelligence through open source and open science.
  
  huggingface.co
HuggingFace Login Token
- Generate at the link below
- Hugging Face – The AI community building the future.
  
  We’re on a journey to advance and democratize artificial intelligence through open source and open science.
  
  huggingface.co
Access rights to the repository below
- Agree to terms and get access rights at the link below
- stabilityai/stable-video-diffusion-img2vid-xt-1-1 · Hugging Face
  
  We’re on a journey to advance and democratize artificial intelligence through open source and open science.
  
  huggingface.co
NVIDIA GPU (Probably required for execution)
CUDA (Might be required)

Work Content

I don’t want to pollute my local environment, so I will do it using docker.

Create docker image
- Make the demo run automatically when creating the container
Launch container (Run demo)

1. Create docker image

First, create a Dockerfile.

 FROM python:3.10.14-bookworm

 # huggingface login token (will error if not specified at build)
 ARG token=""

 # clone repo
 WORKDIR /home
 RUN git clone https://github.com/Stability-AI/generative-models.git .

 # install requirements
 RUN pip install --no-cache-dir -r ./requirements/pt2.txt

 # Reinstall OpenCV using apt due to an error with libGL.so.
 RUN apt -y update && apt -y upgrade && apt -y install libopencv-dev

 # Register hugging face credentials (token is mandatory at build)
 RUN huggingface-cli login --token $token

 # Clear cache
 RUN apt-get autoremove -y &&\
   apt-get clean &&\
   rm -rf /usr/local/src/*

 # Specify command at startup
 CMD ["python", "-u", "-m",  "scripts.demo.gradio_app"]

Next, build the image. It takes quite a while, so have some tea while waiting.

# Build image with name svd-image
# Specify the path where the Dockerfile created earlier is located
docker image build -t svd-image --build-arg token={HuggingFace_Login_Token} .

2. Launch container

By specifying gpus, access to GPU resources from the container is enabled.
It takes a long time to execute, and for some reason logs are not output much, so it becomes worrying, but please wait patiently.
After command execution is complete, access the displayed public URL via browser to see the demo application.

# Execute specifying the image created earlier
docker run --rm -i --gpus all svd-image

ENJOY!

In my case, all efforts went down the drain due to an out of memory error… It seems VRAM 8GB + RAM 8GB assigned to docker is insufficient.

[Update: 2024/05/20] I was able to run Stable Video Diffusion in my environment with the method below, so I’m attaching the link to the article.

【Stable Video Diffusion】ComfyUIを使って動画生成AIで遊んでみよう

blog.otama-playground.com

Stable Video Diffusion: High-Precision AI Video Generation

Overview of Stable Video Diffusion

Try the Official Demo

Prerequisites

Work Content

1. Create docker image

2. Launch container

ENJOY!

Related Posts

SD-Turbo & SDXL-Turbo: Overview and Local Demo Guide

Run on 6GB VRAM! Trying the 'FramePack' Video Gen AI Demo

StableDiffusion i2i Comparison: Latent vs ControlNet vs IPAdapter

Image Generation AI with ComfyUI: Face Detailer Edition

Generate Images from Scribbles with Krita & StableDiffusion

Vid2Vid with ComfyUI: AnimateDiff, ControlNet, and FaceID

Image Generation AI with ComfyUI: Inpaint Edition

Image Generation AI with ComfyUI: Outpaint Edition

Image Generation AI with ComfyUI: Textual Inversion Edition

Video Generation AI with ComfyUI: RIFE Edition