Stable Audio Open: Free Model Released for Local Execution

Stable Audio Open is a variant of the Stable Audio model trained exclusively on royalty-free audio sources, specifically to address copyright concerns around music generation.

With its free public release, I decided to set up and run the official demo locally. Here is how to do it.

Requirements

PyTorch 2.0 or later
A GPU with CUDA support is recommended (it is quite slow on CPU)

Execution Method

Step 1: HuggingFace Setup

Request access to Stable-Audio-Open on HuggingFace

You need to be logged in to accept the terms and request access.

stabilityai/stable-audio-open-1.0 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Generate an Access Token

Once logged in, generate a token at the link below. You’ll need this to authenticate with the HuggingFace CLI.

Hugging Face – The AI community building the future.

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Step 2: Prepare Environment and Run the Demo

I used Python 3.10.14
If you don’t want to clutter your local environment, run this inside a venv.

# Install dependencies (alternatively: pip install . inside the repo)
pip install stable-audio-tools

# Additional packages I needed in my environment
sudo apt install libsndfile1
sudo apt install nvidia-cuda-toolkit
pip install flash-attn

# Authenticate with HuggingFace
huggingface-cli login

# Clone and run the demo
git clone https://github.com/Stability-AI/stable-audio-tools.git
python ./stable-audio-tools/run_gradio.py --pretrained-name stabilityai/stable-audio-open-1.0

Step 3: Access the Demo

Once the demo starts successfully, a URL will be displayed. Open it in your browser.

Bonus: Disabling the Public URL

By default, a public URL accessible from anywhere on the internet is created by Gradio.

If you don’t want to expose the URL publicly, change the share parameter in interface.launch (line 18 of run_gradio.py) to False.

Alternatively, you can pass username and password as options to run_gradio.py to enable Basic authentication.

Generation Result

positive prompt:

Trance, Progressive, Rock, EDM

negative prompt:

harsh, loud, chaotic, aggressive, dissonant, jarring, abrupt, noisy, overpowering, unsettling, atonal, disruptive

Conclusion

I’m not very knowledgeable about music, but as an amateur I think the output roughly matched what I had in mind.

The stable-audio-tools repository also includes training scripts, so if you want to generate melodies similar to a specific song, that might be worth trying. It’s likely fine-tuning, though, so expect to need substantial memory and training data.

Personally I was happy to get it running successfully, so I’ll leave it here for now.

Stable Audio Open: Free Model Released for Local Execution

Requirements

Execution Method

Step 1: HuggingFace Setup

Step 2: Prepare Environment and Run the Demo

Step 3: Access the Demo

Bonus: Disabling the Public URL

Generation Result

Conclusion

Related Posts

Generate Images from Scribbles with Krita & StableDiffusion

Run on 6GB VRAM! Trying the 'FramePack' Video Gen AI Demo

StableDiffusion i2i Comparison: Latent vs ControlNet vs IPAdapter

Video Generation AI with ComfyUI: RIFE Edition

Image Generation AI with ComfyUI: Face Detailer Edition

Real-Time Image Generation from Scribbles with ComfyUI

Vid2Vid with ComfyUI: AnimateDiff, ControlNet, and FaceID

Stable Diffusion 3 Paper: Moving Beyond UNet to DiT Architecture

Image Generation AI with ComfyUI: ESRGAN Upscaling Edition

Image Generation AI with ComfyUI: Inpaint Edition