News | Stability AI Unveils Innovative Latent Diffusion AI Platform for Audio Generation

Stability AI Unveils Innovative Latent Diffusion AI Platform for Audio Generation

Published by: Insights Desk Released: Sep 14, 2023 Source: DemandTalk

Highlights:

The Stability AI’s popularity lies in its image generation models, namely Stable Diffusion and Stable Doodle, designed to interpret text instructions and user-provided doodles as inputs.
Stability AI trained U-Net on over 800,000 audio files from AudioSparx, a stock music provider.

Stability AI Ltd. unveiled Stable Audio, a software platform employing a latent diffusion model to produce audio based on user text prompts.

The platform can produce audio clips lasting up to 95 seconds, spanning diverse music genres. Stability AI also notes that Stable Audio can be utilized for crafting various audio forms, including sound effects.

Stability AI, headquartered in London, has secured over USD 100 million in venture funding. The company’s popularity lies in its image generation models, namely Stable Diffusion and Stable Doodle, designed to interpret text instructions and user-provided doodles as inputs. Stability AI has also introduced an open-source language model capable of generating code and text.

Typically, artificial intelligence audio generators, such as Stability AI’s recently launched Stable Studio platform, employ what are known as diffusion models. These neural networks are built using a training dataset deliberately added with errors. These Gaussian noise errors instruct a diffusion model to analyze the audio files within its training dataset and autonomously produce similar files.

Stability AI contends that AI systems of this nature possess two primary limitations.

According to the company, diffusion models typically have a limitation in generating audio snippets of a fixed length. For instance, an AI trained on 30-second sound snippets cannot generate files of 40 or 20 seconds in duration. Moreover, clips produced by such models frequently commence in the middle or toward the end of a musical phrase, negatively impacting their overall quality.

Stable Audio employs a specialized variant of diffusion models called a latent diffusion model to surmount these constraints. What distinguishes these models from the conventional ones is that they are invariably employed in tandem with a second neural network known as an autoencoder.

An autoencoder is an AI system capable of taking a data input and eliminating extraneous information. For instance, such a model could take in an audio file that includes background noise and effectively filter out that. The autoencoder stores the retained information within a mathematical construct called latent space.

A standard diffusion model is developed using raw training datasets. In contrast, a latent diffusion model is constructed using an enhanced version of the same training datasets, from which an autoencoder has eliminated superfluous information. Due to the improved quality of the refined datasets, the latent diffusion model trained on them can produce higher-quality output.

Stability AI’s innovative Stable Audio platform comprises not just one but three neural networks. At its core is U-Net, a latent diffusion model boasting an impressive 907 million parameters. It represents an improved iteration of a pre-existing neural network, Moûsai, introduced earlier this year.

Stability AI trained U-Net on over 800,000 audio files from AudioSparx, a stock music provider. The company reports that these files contain approximately 19,500 hours of audio. Stability AI incorporated text-based metadata or contextual information to optimize the AI training process.

Stable Audio incorporates U-Net and two additional neural networks. One is an autoencoder, while the other translates user prompts describing the audio to be generated into a format U-Net can comprehend.

When running on an A100 graphics processing unit, the platform can produce 95 seconds of audio with a sample rate of 44.1 kHz in less than one second. The A100 was Nvidia Corporation’s flagship data center GPU until the H100 replaced it last year.

Stability AI intends to improve both its audio generation models and the dataset utilized for training them in the future. Additionally, the organization will release open-source models based on Stable Audio.

ai governance for the enterprise...

empower ai and real-time insights at the edge...

power ai and analytics workloads with performance,...

how to choose the right ai foundation model...

pros enterprise ai for the industrial industries (...

unlocking ai’s potential: challenges and opportu...

transforming procurement with ai: opportunities, c...

adobe acrobat ai assistant: reinventing productivi...

adobe acrobat ai assistant: reinventing productivi...

ai, automation, and the strategic cao...

an introduction to ai in customer service...

5 ways ai can transform your customer experience...

ciso guide to generative ai attacks...

10 reasons to hire a customer-led voice assistant...

10 reasons to hire a customer-led voice assistant...

the definitive buying guide for contact center her...

cfo's guide to ai...

discover the future of business innovation with ge...

preparing for the future of cx by harnessing the p...

tableau gpt: innovate for the future with generati...

profitable ai-powered data management solutions to...

business-centric cognitive architecture revolution...

ai use cases – innovations for business success...

the role of ai in software development...

ai in cybersecurity – your digital guardian...

how chatbot marketing supports today’s business ...

advanced adaptive ai bolsters business intelligenc...

the dynamic impact of ai in procurement...

ai in customer service – revealing common applic...

how to use dall-e for marketing success...

rpa vs ai: a comparative analysis for business aut...

maximizing business efficiency through ai integrat...

7 trendiest ai marketing campaigns igniting commer...

liquid neural network unveiling the fluid intellig...

the art of prompt engineering in general & marketi...

what is amazon bedrock?...

decode data like never before: chatgpt for data an...

workforce planning models –the power of ai skil...

black friday and the impact of ai in e-commerce...

how digital brain is a game changer for business s...

microsoft introduces bing generative search in lim...

cytoreason raises usd 80 m in the funding round in...

google unveils a suite of new features for ai apps...

kindo reels in usd 20.6 m and acquires whiterabbit...

microsoft’s spreadsheetllm enhances ai’s compr...

herculesai raises usd 26 m to develop and expand i...

intel capital leads usd 15 m investment in ai cons...

aws unveils app studio to accelerate app developme...

captions llc raises usd 60 m for generative video ...

enso technologies secures usd 6 m for smb-focused ...

hebbia raises usd 130 m to develop data search pla...

meta releases four open-source language models...

harvey is reportedly raising usd 100 m at usd 1.5 ...

cloudflare introduces a new no-code feature to pre...

redactive raises usd 7.5 m to expand headcount and...

rapid7 acquires noetic cyber to help businesses fi...

runway ai aims for usd 450 m amid ai startup inter...

gen ai coding assistant startup magic ai aims to r...

anthropic introduces new program to fund enhanced ...

meta to open-source meta llm compiler for code opt...

role of machine learning in networking...

Stability AI Unveils Innovative Latent Diffusion AI Platform for Audio Generation

Highlights:

Insights Desk

Related posts

Microsoft Introduces Bing Generative Search in Lim...

CytoReason Raises USD 80 M in the Funding Round In...

Google Unveils a Suite of New Features for AI Apps...

Kindo Reels in USD 20.6 M and Acquires WhiteRabbit...

Microsoft’s SpreadsheetLLM Enhances AI’s Compr...

HerculesAI Raises USD 26 M to Develop and Expand i...

Intel Capital Leads USD 15 M Investment in AI Cons...

AWS Unveils App Studio to Accelerate App Developme...

Captions LLC Raises USD 60 M for Generative Video ...

Enso Technologies Secures USD 6 M for SMB-focused ...

Our Brands