Stability AI Shares Open-Source Generative AI Audio Model for Creative Sound Design

Synthetic media startup Stability AI has unveiled an open-source generative AI model aimed at musicians and sound engineers. The new Stable Audio Open model can produce short audio samples based on text prompts, including sound effects and production elements.

Stable Audio Open can generate up to 47 seconds of high-quality audio from a given text prompt. Stability pitches this capability as particularly useful for creating various audio elements such as drum beats, instrument riffs, ambient sounds, and foley recordings, which are essential in music production and sound design. Stable Audio Open has some flexibility in its final product, allowing users to fine-tune the model with their own audio data. The company suggested a drummer could train the model on samples of their drum recordings to generate new and unique beats. This adaptability opens up a myriad of possibilities for users to customize the audio output to better suit their specific needs and preferences.

The potential applications of Stable Audio Open are vast. In film and television, for instance, sound designers can quickly generate bespoke sound effects tailored to specific scenes. Musicians and producers can use the tool to experiment with new sounds and incorporate unique audio elements into their compositions. Even in gaming, developers can create dynamic audio effects that enhance the immersive experience for players.

“This release marks a key milestone as we further open portions of our generative audio capabilities to empower sound designers, musicians and creative communities,” Stability AI wrote in a blog post. “We encourage sound designers, musicians, developers and audio enthusiasts to download the model, explore its capabilities and provide feedback. While an exciting step forward, this is still just the beginning for open and responsible audio generation capabilities. We look forward to continuing research and prioritizing development hand-in-hand with creative communities.”

Stability AI trained Stable Audio Open on audio data from FreeSound and the Free Music Archive, which allowed the creation of an open audio model that respects the rights of original creators. The Stable Audio Open model weights are available on Hugging Face. By comparison, Stability AI’s commercial product, Stable Audio, can produce high-quality, coherent musical tracks up to three minutes long and includes other features like audio-to-audio generation. Stable Audio Open is tailored more for shorter audio clips and sound effects. While it is capable of generating brief musical segments, it is not optimized for full songs, melodies, or vocals. This distinction ensures that Stable Audio Open remains focused on providing tools for sound design rather than complete musical compositions.

You can hear a couple of samples below. The first clip was made from the prompt: Blackbird song, summer, dusk in the forest.

The model generated the second audio clip from the prompt: Rock beat played in a treated studio, session drumming on an acoustic kit.

