ChatGPT’s video generation is completely uninteresting because it can mostly just add narration to a still image!! So, I will investigate methods to generate proper “footage” other than ChatGPT. There are several methods, but this time I will try running “Stable Video Diffusion” locally.
Overview of Stable Video Diffusion
First, start by knowing your opponent before working. Below is the official explanation.
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
- Stable Video Diffusion is a Latent Diffusion Model developed by Stability AI that generates short videos from a single image (image-to-video). It can generate 25-frame videos at a resolution of 576x1024.
- According to user surveys, Stable Video Diffusion’s Image-to-Video is rated as having superior video quality compared to alternative models like GEN-2 and PikaLabs.
- This model can only generate short videos (less than 4 seconds), and movement may be limited. Also, person generation may not be accurate.
Try the Official Demo
Prerequisites
- HuggingFace Account
- HuggingFace Login Token
- Access rights to the repository below
- Agree to terms and get access rights at the link below
-
stabilityai/stable-video-diffusion-img2vid-xt-1-1 · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
- NVIDIA GPU (Probably required for execution)
- CUDA (Might be required)
Work Content
I don’t want to pollute my local environment, so I will do it using docker.
- Create docker image
- Make the demo run automatically when creating the container
- Launch container (Run demo)
1. Create docker image
First, create a Dockerfile.
FROM python:3.10.14-bookworm
# huggingface login token (will error if not specified at build) ARG token=""
# clone repo WORKDIR /home RUN git clone https://github.com/Stability-AI/generative-models.git .
# install requirements RUN pip install --no-cache-dir -r ./requirements/pt2.txt
# Reinstall OpenCV using apt due to an error with libGL.so. RUN apt -y update && apt -y upgrade && apt -y install libopencv-dev
# Register hugging face credentials (token is mandatory at build) RUN huggingface-cli login --token $token
# Clear cache RUN apt-get autoremove -y &&\ apt-get clean &&\ rm -rf /usr/local/src/*
# Specify command at startup CMD ["python", "-u", "-m", "scripts.demo.gradio_app"]Next, build the image. It takes quite a while, so have some tea while waiting.
# Build image with name svd-image# Specify the path where the Dockerfile created earlier is locateddocker image build -t svd-image --build-arg token={HuggingFace_Login_Token} .2. Launch container
- By specifying gpus, access to GPU resources from the container is enabled.
- It takes a long time to execute, and for some reason logs are not output much, so it becomes worrying, but please wait patiently.
- After command execution is complete, access the displayed
public URLvia browser to see the demo application.
# Execute specifying the image created earlierdocker run --rm -i --gpus all svd-imageENJOY!
In my case, all efforts went down the drain due to an out of memory error… It seems VRAM 8GB + RAM 8GB assigned to docker is insufficient.
[Update: 2024/05/20] I was able to run Stable Video Diffusion in my environment with the method below, so I’m attaching the link to the article.
>-









