News | Cohere for AI Introduces Aya, a Multilingual Open-source AI Featuring 101 Languages

Cohere for AI Introduces Aya, a Multilingual Open-source AI Featuring 101 Languages

Published by: Insights Desk Released: Feb 14, 2024 Source: DemandTalk

Highlights:

Along with Aya, Cohere is launching the most extensive multilingual instruction dataset to date, comprising 513 million data points across 114 diverse languages.
The Aya model is derived from the Aya Project, a massive project that was started in January 2023 with the participation of over 3,000 academics from 119 different countries.

Aya, an open-source artificial intelligence large language model that is “massively multilingual” and capable of operating in 101 different languages, was unveiled by Cohere for AI, a nonprofit research lab managed by the artificial intelligence startup Cohere Inc.

The company claims that with over 100 languages under its belt, Cohere’s Aya’s capability more than doubles the number of languages covered by current open-source models.

The AI team announced, “Aya helps researchers unlock the powerful potential of LLMs for dozens of languages and cultures largely ignored by most advanced models on the market today.”

Along with Cohere’s Aya, the business is launching the most extensive multilingual instruction dataset to date, comprising 513 million data points across 114 diverse languages. This dataset will be available for researchers to integrate into their models. To give AI technology a head start in serving larger audiences, the dataset includes rare annotations from speakers of rare languages worldwide as well as underserved languages.

The Aya model is derived from the Aya Project, a massive project that was started in January 2023 with the participation of over 3,000 academics from 119 different countries. The goal of the project is to create a multilingual generative AI model that would draw from the contributions of individuals worldwide. Even though many models concentrate on English, just 5% of people speak it at home. This implies that the field of AI technologies undervalues many other languages.

“As LLMs, and AI generally, have changed the global technological landscape, many communities across the world have been left unsupported due to the language limitations of existing models. This gap hinders the applicability and usefulness of generative AI for a global audience, and it has the potential to further widen existing disparities that already exist from previous waves of technological development,” said the Cohere for AI team.

To be of assistance, 204,000 infrequent human-curated annotations in 67 languages, spanning a wide range of linguistic applications, are included in the dataset that is being made public. AI models employ annotations to improve learning outcomes by providing context to language-processing data, enabling more accurate categorization and comprehension. As a result, scientists will have access to an incredibly high-quality dataset that they may utilize to construct reliable AI language models, which may incorporate linguistic analysis and language preservation.

The language research center Ethnologue claims that there are currently more than 7,000 languages spoken worldwide. About 40% of all languages are endangered, with many having fewer than 1,000 speakers. Of those, only 23—including English—represent more than half of the world’s population.

Research and development can benefit from initiatives like Aya, adding new languages to a vastly multilingual dataset. This will facilitate the inclusion and accessibility of more groups and allow academics to employ AI technology.

Additionally, the dataset broadens coverage to include more than 50 previously underrepresented languages, such as Somali and Uzbek, that are rarely included in private models. While prominent languages like English, French, and Russian are well covered by commercial and open-source models, the developers of Aya made a concerted effort to include a large number of underrepresented languages in their dataset.

According to the researchers, the model outperforms existing open-source models like mT0 and BigScience’s Bloomz on benchmarks and benchmarks well against other massively multilingual models. The researchers claimed that Aya routinely outperformed other “leading open-source models” in human evaluations, receiving scores of 75–90%, and in simulated win rates, 80–90%.

ai governance for the enterprise...

empower ai and real-time insights at the edge...

power ai and analytics workloads with performance,...

how to choose the right ai foundation model...

pros enterprise ai for the industrial industries (...

unlocking ai’s potential: challenges and opportu...

transforming procurement with ai: opportunities, c...

adobe acrobat ai assistant: reinventing productivi...

adobe acrobat ai assistant: reinventing productivi...

ai, automation, and the strategic cao...

an introduction to ai in customer service...

5 ways ai can transform your customer experience...

ciso guide to generative ai attacks...

10 reasons to hire a customer-led voice assistant...

10 reasons to hire a customer-led voice assistant...

the definitive buying guide for contact center her...

cfo's guide to ai...

discover the future of business innovation with ge...

preparing for the future of cx by harnessing the p...

tableau gpt: innovate for the future with generati...

profitable ai-powered data management solutions to...

business-centric cognitive architecture revolution...

ai use cases – innovations for business success...

the role of ai in software development...

ai in cybersecurity – your digital guardian...

how chatbot marketing supports today’s business ...

advanced adaptive ai bolsters business intelligenc...

the dynamic impact of ai in procurement...

ai in customer service – revealing common applic...

how to use dall-e for marketing success...

rpa vs ai: a comparative analysis for business aut...

maximizing business efficiency through ai integrat...

7 trendiest ai marketing campaigns igniting commer...

liquid neural network unveiling the fluid intellig...

the art of prompt engineering in general & marketi...

what is amazon bedrock?...

decode data like never before: chatgpt for data an...

workforce planning models –the power of ai skil...

black friday and the impact of ai in e-commerce...

how digital brain is a game changer for business s...

microsoft introduces bing generative search in lim...

cytoreason raises usd 80 m in the funding round in...

google unveils a suite of new features for ai apps...

kindo reels in usd 20.6 m and acquires whiterabbit...

microsoft’s spreadsheetllm enhances ai’s compr...

herculesai raises usd 26 m to develop and expand i...

intel capital leads usd 15 m investment in ai cons...

aws unveils app studio to accelerate app developme...

captions llc raises usd 60 m for generative video ...

enso technologies secures usd 6 m for smb-focused ...

hebbia raises usd 130 m to develop data search pla...

meta releases four open-source language models...

harvey is reportedly raising usd 100 m at usd 1.5 ...

cloudflare introduces a new no-code feature to pre...

redactive raises usd 7.5 m to expand headcount and...

rapid7 acquires noetic cyber to help businesses fi...

runway ai aims for usd 450 m amid ai startup inter...

gen ai coding assistant startup magic ai aims to r...

anthropic introduces new program to fund enhanced ...

meta to open-source meta llm compiler for code opt...

role of machine learning in networking...

Cohere for AI Introduces Aya, a Multilingual Open-source AI Featuring 101 Languages

Highlights:

Insights Desk

Related posts

Microsoft Introduces Bing Generative Search in Lim...

CytoReason Raises USD 80 M in the Funding Round In...

Google Unveils a Suite of New Features for AI Apps...

Kindo Reels in USD 20.6 M and Acquires WhiteRabbit...

Microsoft’s SpreadsheetLLM Enhances AI’s Compr...

HerculesAI Raises USD 26 M to Develop and Expand i...

Intel Capital Leads USD 15 M Investment in AI Cons...

AWS Unveils App Studio to Accelerate App Developme...

Captions LLC Raises USD 60 M for Generative Video ...

Enso Technologies Secures USD 6 M for SMB-focused ...

Our Brands