News | Cerebras Systems Announces Seven New GPT Models Trained on CS-2 Wafer-scale Systems Available

Cerebras Systems Announces Seven New GPT Models Trained on CS-2 Wafer-scale Systems Available

Published by: Insights Desk Released: Apr 03, 2023 Source: DemandTalk

Highlights:

This is the first instance of a business training LLMs with up to 13 billion parameters using non-GPU-based AI systems. The models, weights, and training recipes are being shared under the accepted Apache 2.0 license.
As a first for AI hardware firms, Cerebras researchers trained a set of seven GPT models with 111 million, 256 million, 590 million, 1.3 billion, 2.7 billion, 6.7 billion, and 13 billion parameters on the Andromeda supercomputer.

Cerebras Systems Inc., a manufacturer of artificial intelligence chips, revealed recently that it has trained and made available to the larger research community seven GPT-based Large language models for generative AI.

The new LLMs are noteworthy because they are the first to have been trained on the Cerebras Andromeda AI supercluster’s CS-2 systems, which are driven by the Cerebras WSE-2 chip made exclusively to run AI software. In other words, they are some of the first LLMs to be trained independently of systems that use graphics processing units. A typical Apache 2.0 license will be used to share not only the models but also the weights and training procedures that were employed, according to Cerebras.

More than USD 720 million has been invested in the firm in Sunnyvale, California. The WSE-2 processor, which the firm sells, is made exclusively to run AI software. The Cerebras Andromeda supercomputer, which has more than 13.5 million CPU cores and is designed to execute AI applications, is powered by the WSE-2.

According to Cerebras, the OpenAI LP’s ChatGPT-led emergence of generative AI has prompted a race among AI hardware manufacturers to develop more powerful and specialized CPUs for the task. However, despite the fact that several businesses have offered alternatives to Nvidia Corporation’s GPUs, none have been able to demonstrate the capacity to train large-scale models and open source those efforts under permissive licenses.

Contrarily, according to Cerebras, market pressures have made it less likely that LLMs will be made publicly accessible. Therefore they are still mainly unavailable.

With recent release, Cerebras hopes to fix that. With 111 million, 256 million, 590 million, 1.3 billion, 2.7 billion, 6.7 billion, and 13 billion parameters, it is open-sourcing seven GPT models and making them accessible on GitHub and Hugging Face. Cerebras said that the speed of the Cerebras CS-2 systems in Andromeda, along with a unique weight streaming architecture, helped down the training period for these models to just a few weeks, as opposed to the several months it would typically take.

According to Sean Lie, co-founder of Cerebras and its chief software architect, only some firms can train genuinely large-scale models by themselves. He said, “Releasing seven fully trained GPT models into the open-source community shows just how efficient clusters of Cerebras CS-2 systems can be and how they can rapidly solve the largest scale AI problems – problems that typically require hundreds or thousands of GPUs.”

According to the business, this release represents the first time a complete set of GPT models that have been trained utilizing cutting-edge efficiency approaches have been made available to the general public. It was revealed that compared to other LLMs already on the market, they require less energy, less time, and less money to train.

The Cerebras LLMs can be used for both research and commercial purposes because they are open source, according to the business. Their training weights produce an extremely accurate pre-trained model that can be fine-tuned for different tasks with modest amounts of custom data, enabling anyone to create a powerful, generative AI application with minimal effort.

The release also shows the value of a “simple, data-parallel only method to training,” as described by Cerebras. A complicated mixture of pipeline, model, and data parallelism approaches is needed for traditional LLM training on GPUs. On the other hand, the weight-streaming architecture of Cerebras demonstrates how it is possible to scale to very large models using a more straightforward, data-parallel-only approach without the need for programming changes.

According to Cambrian AI analyst Karl Freund, Cerebras’ CS-2 systems’ capabilities as a top platform for AI training are demonstrated by recent release, propelling the business to the top echelon of AI practitioners.

Karl Freund said, “There are a handful of companies in the world capable of deploying end-to-end AI training infrastructure and training the largest LLMs to state-of-the-art accuracy. Cerebras must now be counted among them. Moreover, by releasing these models into the open-source community with the permissive Apache 2.0 license, Cerebras shows commitment to ensuring that AI remains an open technology that broadly benefits humanity.”

ai governance for the enterprise...

empower ai and real-time insights at the edge...

power ai and analytics workloads with performance,...

how to choose the right ai foundation model...

pros enterprise ai for the industrial industries (...

unlocking ai’s potential: challenges and opportu...

transforming procurement with ai: opportunities, c...

adobe acrobat ai assistant: reinventing productivi...

adobe acrobat ai assistant: reinventing productivi...

ai, automation, and the strategic cao...

an introduction to ai in customer service...

5 ways ai can transform your customer experience...

ciso guide to generative ai attacks...

10 reasons to hire a customer-led voice assistant...

10 reasons to hire a customer-led voice assistant...

the definitive buying guide for contact center her...

cfo's guide to ai...

discover the future of business innovation with ge...

preparing for the future of cx by harnessing the p...

tableau gpt: innovate for the future with generati...

profitable ai-powered data management solutions to...

business-centric cognitive architecture revolution...

ai use cases – innovations for business success...

the role of ai in software development...

ai in cybersecurity – your digital guardian...

how chatbot marketing supports today’s business ...

advanced adaptive ai bolsters business intelligenc...

the dynamic impact of ai in procurement...

ai in customer service – revealing common applic...

how to use dall-e for marketing success...

rpa vs ai: a comparative analysis for business aut...

maximizing business efficiency through ai integrat...

7 trendiest ai marketing campaigns igniting commer...

liquid neural network unveiling the fluid intellig...

the art of prompt engineering in general & marketi...

what is amazon bedrock?...

decode data like never before: chatgpt for data an...

workforce planning models –the power of ai skil...

black friday and the impact of ai in e-commerce...

how digital brain is a game changer for business s...

microsoft introduces bing generative search in lim...

cytoreason raises usd 80 m in the funding round in...

google unveils a suite of new features for ai apps...

kindo reels in usd 20.6 m and acquires whiterabbit...

microsoft’s spreadsheetllm enhances ai’s compr...

herculesai raises usd 26 m to develop and expand i...

intel capital leads usd 15 m investment in ai cons...

aws unveils app studio to accelerate app developme...

captions llc raises usd 60 m for generative video ...

enso technologies secures usd 6 m for smb-focused ...

hebbia raises usd 130 m to develop data search pla...

meta releases four open-source language models...

harvey is reportedly raising usd 100 m at usd 1.5 ...

cloudflare introduces a new no-code feature to pre...

redactive raises usd 7.5 m to expand headcount and...

rapid7 acquires noetic cyber to help businesses fi...

runway ai aims for usd 450 m amid ai startup inter...

gen ai coding assistant startup magic ai aims to r...

anthropic introduces new program to fund enhanced ...

meta to open-source meta llm compiler for code opt...

role of machine learning in networking...

Cerebras Systems Announces Seven New GPT Models Trained on CS-2 Wafer-scale Systems Available

Insights Desk

Related posts

Microsoft Introduces Bing Generative Search in Lim...

CytoReason Raises USD 80 M in the Funding Round In...

Google Unveils a Suite of New Features for AI Apps...

Kindo Reels in USD 20.6 M and Acquires WhiteRabbit...

Microsoft’s SpreadsheetLLM Enhances AI’s Compr...

HerculesAI Raises USD 26 M to Develop and Expand i...

Intel Capital Leads USD 15 M Investment in AI Cons...

AWS Unveils App Studio to Accelerate App Developme...

Captions LLC Raises USD 60 M for Generative Video ...

Enso Technologies Secures USD 6 M for SMB-focused ...

Our Brands