News | Meta AI Releases Self-supervised Speech Recognition Model Training Tools

Meta AI Releases Self-supervised Speech Recognition Model Training Tools

Published by: Insights Desk Released: May 23, 2023 Source: DemandTalk

Highlights:

Wav2vec 2.0 is a self-supervised learning algorithm that lets computers pick up new skills without depending on labeled training data.
When models trained on MMS data were directly compared to OpenAI LP’s Whisper speech recognition model, the researchers at Meta discovered that the word error rate was roughly half as low.

The artificial intelligence research group at Meta Platforms Inc. announced that a new project called Massively Multilingual Speech, which seeks to address the difficulties in developing precise and trustworthy speech recognition models, has been open-sourced.

AI models able to recognize human speech and respond to that have great potential, specifically for people using voice access to get information. However, developing high-quality models typically necessitates a vast amount of data, including transcriptions of spoken words and thousands of hours of audio. That information is simply absent in many languages, particularly the less well-known ones.

Meta’s MMS project eliminates the requirement by combining a self-supervised learning algorithm named wav2vec 2.0 with a new dataset that offers labeled data for over 1,100 languages and unlabeled data for almost 4,000 languages.

Meta’s researchers turned to the Bible, which, unlike most other books, has already been translated into thousands of languages, to overcome the lack of data for some languages. Its translations are frequently examined for text-based language translation research, and for many of them, audio recordings of people reading these texts are also freely accessible.

Meta’s researchers added, “As part of this project, we created a dataset of readings of the New Testament in over 1,100 languages, which provided on average 32 hours of data per language.”

Thirty-two hours of data are obviously insufficient to train a traditional supervised speech recognition model, which is why wav2vec 2.0 was employed. A self-supervised learning algorithm called Wav2vec 2.0 enables computers to pick up new skills without relying on labeled training data.

It enables training speech recognition models on a much smaller amount of data. The MMS project used approximately 500,000 hours of speech data in over 1,400 languages to train multiple self-supervised models, which were then fine-tuned for a particular speech task, such as multilingual speech recognition or language identification.

According to Meta, the final models outperformed both other speech recognition models and standard benchmarks like FLEURS.

Meta’s researchers explained, “We trained multilingual speech recognition models on over 1,100 languages using a 1B parameter wav2vec 2.0 model. Meta’s researchers explained. As the number of languages increases, performance does decrease, but only very slightly: Moving from 61 to 1,107 languages increases the character error rate by only about 0.4% but increases the language coverage by over 17 times.”

According to research by Meta’s researchers, the word error rate for models trained on MMS data was roughly half that of OpenAI LP’s Whisper speech recognition model. The researchers said, “This demonstrates that our model can perform very well compared with the best current speech models.”

For other members of the AI research community to build on this work, Meta said it is now sharing its MMS dataset and the tools used to develop and train its models. Meta’s objectives for MMS include broadening its support for more languages and enhancing its management of dialects, which is a significant challenge for current speech technologies.

The researchers added, “Our goal is to make it easier for people to access information and to use devices in their preferred language. We also envision a future where a single model can solve several speech tasks for all languages. While we trained separate models for speech recognition, speech synthesis, and language identification, we believe that in the future, a single model will be able to accomplish all these tasks and more, leading to better overall performance.”

ai governance for the enterprise...

empower ai and real-time insights at the edge...

power ai and analytics workloads with performance,...

how to choose the right ai foundation model...

pros enterprise ai for the industrial industries (...

unlocking ai’s potential: challenges and opportu...

transforming procurement with ai: opportunities, c...

adobe acrobat ai assistant: reinventing productivi...

adobe acrobat ai assistant: reinventing productivi...

ai, automation, and the strategic cao...

an introduction to ai in customer service...

5 ways ai can transform your customer experience...

ciso guide to generative ai attacks...

10 reasons to hire a customer-led voice assistant...

10 reasons to hire a customer-led voice assistant...

the definitive buying guide for contact center her...

cfo's guide to ai...

discover the future of business innovation with ge...

preparing for the future of cx by harnessing the p...

tableau gpt: innovate for the future with generati...

profitable ai-powered data management solutions to...

business-centric cognitive architecture revolution...

ai use cases – innovations for business success...

the role of ai in software development...

ai in cybersecurity – your digital guardian...

how chatbot marketing supports today’s business ...

advanced adaptive ai bolsters business intelligenc...

the dynamic impact of ai in procurement...

ai in customer service – revealing common applic...

how to use dall-e for marketing success...

rpa vs ai: a comparative analysis for business aut...

maximizing business efficiency through ai integrat...

7 trendiest ai marketing campaigns igniting commer...

liquid neural network unveiling the fluid intellig...

the art of prompt engineering in general & marketi...

what is amazon bedrock?...

decode data like never before: chatgpt for data an...

workforce planning models –the power of ai skil...

black friday and the impact of ai in e-commerce...

how digital brain is a game changer for business s...

microsoft introduces bing generative search in lim...

cytoreason raises usd 80 m in the funding round in...

google unveils a suite of new features for ai apps...

kindo reels in usd 20.6 m and acquires whiterabbit...

microsoft’s spreadsheetllm enhances ai’s compr...

herculesai raises usd 26 m to develop and expand i...

intel capital leads usd 15 m investment in ai cons...

aws unveils app studio to accelerate app developme...

captions llc raises usd 60 m for generative video ...

enso technologies secures usd 6 m for smb-focused ...

hebbia raises usd 130 m to develop data search pla...

meta releases four open-source language models...

harvey is reportedly raising usd 100 m at usd 1.5 ...

cloudflare introduces a new no-code feature to pre...

redactive raises usd 7.5 m to expand headcount and...

rapid7 acquires noetic cyber to help businesses fi...

runway ai aims for usd 450 m amid ai startup inter...

gen ai coding assistant startup magic ai aims to r...

anthropic introduces new program to fund enhanced ...

meta to open-source meta llm compiler for code opt...

role of machine learning in networking...

Meta AI Releases Self-supervised Speech Recognition Model Training Tools

Insights Desk

Related posts

Microsoft Introduces Bing Generative Search in Lim...

CytoReason Raises USD 80 M in the Funding Round In...

Google Unveils a Suite of New Features for AI Apps...

Kindo Reels in USD 20.6 M and Acquires WhiteRabbit...

Microsoft’s SpreadsheetLLM Enhances AI’s Compr...

HerculesAI Raises USD 26 M to Develop and Expand i...

Intel Capital Leads USD 15 M Investment in AI Cons...

AWS Unveils App Studio to Accelerate App Developme...

Captions LLC Raises USD 60 M for Generative Video ...

Enso Technologies Secures USD 6 M for SMB-focused ...

Our Brands