Highlights:

  • Cohere offers a larger version of Aya with 35 billion parameters for developers with more advanced requirements.
  • Aya 23 was trained by Cohere on a multilingual training dataset, also known as Aya, which was open-sourced earlier this year.

Cohere Inc. recently introduced Aya 23, a new family of open-source large language models capable of understanding 23 languages.

Toronto-based Cohere, an OpenAI competitor, is backed by more than USD 400 million in funding from Nvidia Corp., Oracle Corp., and other investors. It provides a set of large language models (LLMs) optimized for the enterprise market. Cohere also offers Embed, a neural network designed to convert data into mathematical structures that language models can more easily understand.

The recent Cohere Aya 23 launch of the LLM series has two algorithms at launch. The first algorithm features 8 billion parameters and is designed for use cases that require a balance between response quality and performance. Cohere offers a larger version of Aya with 35 billion parameters for developers with more advanced requirements.

The latter edition, Aya-23-35B, is based on an LLM called Command R that Cohere introduced last March. Until this past April, it served as Cohere’s flagship AI model when the company debuted a more advanced algorithm. Command R supports prompts with up to 128,000 tokens, features a built-in retrieval-augmented generation (RAG) capability, and can automatically carry out tasks in external applications.

Underneath, Aya-23-35B relies on a widely recognized LLM design called the decoder-only Transformer architecture. Models employing this architecture ascertain the meaning of individual words within a user prompt by analyzing their context, specifically, the text that precedes them. This architecture’s algorithms can produce more accurate output than many earlier neural networks.

According to Cohere, Aya-23-35B enhances several aspects of the standard decoder-only Transformer architecture. The company’s enhancements have made the model more proficient in understanding user prompts.

The mechanism enabling LLMs to discern the meaning of a word from its context isn’t typically constructed as a singular software module. Instead, it comprises multiple software modules, each employing a distinct approach to interpreting text. Cohere Aya 23 launch implements these components using a technique known as grouped query attention, which reduces their RAM usage, thus accelerating inference.

Aya-23-35B also incorporates a technology known as rotational positional embeddings.
An LLM considers the meaning of words and their position within a sentence when interpreting text. Rotational positional embeddings allow LLMs to process word location information more effectively, enhancing the quality of their output.

Aya 23 was trained by Cohere on a multilingual training dataset, also known as Aya, which was open-sourced earlier this year. The dataset comprises 513 million prompts and corresponding answers for large language models across 114 languages. This extensive resource was developed through an open-source initiative with contributions from approximately 3,000 collaborators.

Additionally, as part of the initiative, Cohere released Aya-101, a large language model capable of understanding 101 languages. The company claims that its new Aya-23-35B model demonstrated superior performance in a series of internal assessments compared to the previous algorithm. It also demonstrated greater proficiency than other open-source LLMs in multilingual text-processing tasks.