Highlights:

  • This is the first instance of a business training LLMs with up to 13 billion parameters using non-GPU-based AI systems. The models, weights, and training recipes are being shared under the accepted Apache 2.0 license.
  • As a first for AI hardware firms, Cerebras researchers trained a set of seven GPT models with 111 million, 256 million, 590 million, 1.3 billion, 2.7 billion, 6.7 billion, and 13 billion parameters on the Andromeda supercomputer.

Cerebras Systems Inc., a manufacturer of artificial intelligence chips, revealed recently that it has trained and made available to the larger research community seven GPT-based Large language models for generative AI.

The new LLMs are noteworthy because they are the first to have been trained on the Cerebras Andromeda AI supercluster’s CS-2 systems, which are driven by the Cerebras WSE-2 chip made exclusively to run AI software. In other words, they are some of the first LLMs to be trained independently of systems that use graphics processing units. A typical Apache 2.0 license will be used to share not only the models but also the weights and training procedures that were employed, according to Cerebras.

More than USD 720 million has been invested in the firm in Sunnyvale, California. The WSE-2 processor, which the firm sells, is made exclusively to run AI software. The Cerebras Andromeda supercomputer, which has more than 13.5 million CPU cores and is designed to execute AI applications, is powered by the WSE-2.

According to Cerebras, the OpenAI LP’s ChatGPT-led emergence of generative AI has prompted a race among AI hardware manufacturers to develop more powerful and specialized CPUs for the task. However, despite the fact that several businesses have offered alternatives to Nvidia Corporation’s GPUs, none have been able to demonstrate the capacity to train large-scale models and open source those efforts under permissive licenses.

Contrarily, according to Cerebras, market pressures have made it less likely that LLMs will be made publicly accessible. Therefore they are still mainly unavailable.

With recent release, Cerebras hopes to fix that. With 111 million, 256 million, 590 million, 1.3 billion, 2.7 billion, 6.7 billion, and 13 billion parameters, it is open-sourcing seven GPT models and making them accessible on GitHub and Hugging Face. Cerebras said that the speed of the Cerebras CS-2 systems in Andromeda, along with a unique weight streaming architecture, helped down the training period for these models to just a few weeks, as opposed to the several months it would typically take.

According to Sean Lie, co-founder of Cerebras and its chief software architect, only some firms can train genuinely large-scale models by themselves. He said, “Releasing seven fully trained GPT models into the open-source community shows just how efficient clusters of Cerebras CS-2 systems can be and how they can rapidly solve the largest scale AI problems – problems that typically require hundreds or thousands of GPUs.”

According to the business, this release represents the first time a complete set of GPT models that have been trained utilizing cutting-edge efficiency approaches have been made available to the general public. It was revealed that compared to other LLMs already on the market, they require less energy, less time, and less money to train.

The Cerebras LLMs can be used for both research and commercial purposes because they are open source, according to the business. Their training weights produce an extremely accurate pre-trained model that can be fine-tuned for different tasks with modest amounts of custom data, enabling anyone to create a powerful, generative AI application with minimal effort.

The release also shows the value of a “simple, data-parallel only method to training,” as described by Cerebras. A complicated mixture of pipeline, model, and data parallelism approaches is needed for traditional LLM training on GPUs. On the other hand, the weight-streaming architecture of Cerebras demonstrates how it is possible to scale to very large models using a more straightforward, data-parallel-only approach without the need for programming changes.

According to Cambrian AI analyst Karl Freund, Cerebras’ CS-2 systems’ capabilities as a top platform for AI training are demonstrated by recent release, propelling the business to the top echelon of AI practitioners.

Karl Freund said, “There are a handful of companies in the world capable of deploying end-to-end AI training infrastructure and training the largest LLMs to state-of-the-art accuracy. Cerebras must now be counted among them. Moreover, by releasing these models into the open-source community with the permissive Apache 2.0 license, Cerebras shows commitment to ensuring that AI remains an open technology that broadly benefits humanity.”