Stable Audio Open: Free Model Released for Local Execution

Progress 3 / 12
Table of Contents

Stable Audio Open is a variant of the Stable Audio model trained on royalty-free sound sources to address copyright concerns.

With the recent free release of Stable Audio Open, I decided to set up and run the official demo locally. Here’s how you can do it too.

Requirements

  • PyTorch 2.0 or later
  • GPU capable of using cuda if possible (It is quite slow with CPU)

Execution Method

Step 1: Things around HuggingFace

Obtain permission to use Stable-Audio-Open at HuggingFace

Login required

stabilityai/stable-audio-open-1.0 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Generation of Access Token

If logged in, generation possible at link below

Hugging Face – The AI community building the future.

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Step 2: Preparation of Environment and Execution of Demo

  • python: Used 3.10.14
  • If you don’t want to pollute local, please execute using venv etc.
Terminal window
# Install dependencies (pip install . inside repository is also possible)
pip install stable-audio-tools
# Things needed additional installation in my environment
sudo apt install libsndfile1
sudo apt install nvidia-cuda-toolkit
pip install flash-attn
# Register credential of hugging face
huggingface-cli login
# Execution of demo
git clone https://github.com/Stability-AI/stable-audio-tools.git
python ./stable-audio-tools/run_gradio.py --pretrained-name stabilityai/stable-audio-open-1.0

Step 3: Access Demo

When execution of demo completes normally, URL is displayed, so access it.

Demo Screen
Demo Screen

Bonus: I hate public URL

By default, something called public URL accessible from all over the world is generated.

People who don’t want to publish URL on network, please modify share parameter specified in interface.launch (line 18) of run_gradio.py to False.

Or if you specify username, password in option of run_gradio.py, Basic authentication will be inserted.

Generation Result

positive

Trance, Progressive, Rock, EDM

negative

harsh, loud, chaotic, aggressive, dissonant, jarring, abrupt, noisy, overpowering, unsettling, atonal, disruptive

Conclusion

How was it. Although I am not very familiar with music, I think melody feeling like what I imagined as amateur came out.

Since script for learning is also published in stable-audio-tools, those who want to generate melody similar to specific song might want to try that too. (Probably fine-tuning, so memory/learning data etc. are needed in large quantity)

Since I am satisfied generating successfully personally, I stop here this time.