Highlights:

  • The Gaudi 3 represents the third version of a processor series acquired by Intel through a USD 2 billion startup purchase in 2019.
  • Intel claims that the Gaudi 3 surpasses both its previous-generation silicon and Nvidia Corp.’s H100 graphics card in performance.

Recently, Intel Corp. launched the Gaudi 3 AI chip, which the company claims will deliver up to four times the performance of its previous generation of silicon.

During its annual Intel Vision 2024 conference, the chipmaker provided an update on its AI strategy in addition to product details. Intel intends to collaborate with partners to increase the accessibility of AI hardware systems that use parts from several vendors. In the context of the occasion, competitor Advanced Micro Devices Inc. improved the lineup of AI processors it offered by adding new systems-on-chip designed for the connected device market.

Gaudi 3

The Gaudi 3 represents the third installment in a processor series acquired by Intel through a USD 2 billion startup acquisition in 2019. Contrasted with its precursor, this new chip pledges a fourfold increase in performance when processing data in the BF16 format, commonly employed in AI applications. Additionally, Gaudi 3 chips placed in the same AI cluster can communicate with one another more quickly thanks to its increased network bandwidth.

The chip performs calculations utilizing two sets of onboard cores. The initial core type, TPC, is finely tuned to accelerate various computations frequently executed by deep learning models during data processing. These computations encompass batch normalization, a procedure crucial for enhancing the efficiency of deep learning models by refining the raw input data into a more structured format.

The Gaudi 3 also integrates what are known as MME cores. These cores are similarly engineered to expedite the computations essential for AI models in data processing, albeit with a distinct emphasis compared to the TPC cores. Among the tasks accelerated by the MME circuits is the execution of convolutional layers, fundamental software components frequently utilized in image recognition models.

The Gaudi 3 is equipped with 64 TPC cores and eight MME cores distributed across two dies or semiconductor modules. These modules are interconnected to operate seamlessly as a unified chip. Backed by 128 gigabytes of onboard HBM2e memory, a high-speed RAM variant, the chip enables rapid access to the data essential for AI models to perform calculations efficiently.

Intel utilized Taiwan Semiconductor Manufacturing Co. Ltd.’s seven-nanometer process for the earlier iteration of the Gaudi chip. However, with the introduction of the Gaudi 3, the company has transitioned to a more advanced five-nanometer node. This updated technology enables the production of quicker and more power-efficient transistors.

Intel states that a single server can accommodate eight Gaudi 3 chips. Each chip is equipped with 21 Ethernet networking links, facilitating data exchange with adjacent Gaudi 3 units. Additionally, each processor includes three extra networking links, totaling 24, enabling interaction with chips beyond its host server.

Intel claims that the Gaudi 3 surpasses not only its predecessor silicon but also Nvidia Corp.’s H100 graphics card. Through an internal assessment, the chipmaker found that the Gaudi 3 can train certain iterations of the widely used Llama 2 large language model up to 50% faster. Additionally, it pledges up to 30% swifter inference compared to Nvidia’s H200, an upgraded version of the H100 chip specifically optimized for large language models (LLMs).

“Enterprises weigh considerations such as availability, scalability, performance, cost and energy efficiency. Intel Gaudi 3 stands out as the GenAI alternative presenting a compelling combination of price performance, system scalability and time-to-value advantage,” stated Justin Hotard, who serves as the Executive Vice President and General Manager of Intel’s Data Center and AI Group.

At the Intel Vision event unveiling Gaudi 3 recently, the chipmaker shared insights into its AI strategy. Intel announced collaborations with over a dozen partners, such as Red Hat and SAP SE, to develop an ‘open platform for enterprise AI.’ This initiative aims to provide businesses with access to AI-optimized systems integrating hardware and software from various suppliers.

Intel states that these systems will be fine-tuned to execute AI models with RAG capabilities. RAG, which stands for retrieval-augmented generation, is a machine learning approach enabling an LLM to assimilate new information and integrate it into its responses without the need for extensive retraining.

As a component of the initiative, Intel will unveil reference implementations showcasing the utilization of servers featuring Gaudi and Xeon chips for AI workloads. Additionally, it will expand the infrastructure capacity within its Tiber Developer Cloud. This cloud platform, provided by Intel, enables customers to train and deploy AI models using the chipmaker’s processors.

AMD’s Latest AI Chips

In the context of Intel Vision, competitor AMD unveiled two new chip lineups aimed primarily at fueling edge computing devices like smart car subsystems. These additions complement AMD’s existing Versal product portfolio, acquired through its USD 50 billion acquisition of Xilinx in 2022.

Every processor within the Versal portfolio incorporates two types of circuits. One set is tailored for specific tasks like executing AI models or handling sensory data. Additionally, each Versal chip integrates adaptable compute modules, which customers can customize to meet their specific needs. These modules are founded on FPGA (field-programmable gate array) technology, initially pioneered by Xilinx.

The initial two Versal chip lineups introduced by AMD recently were named the AI Edge Series Gen 2. Each processor in this family comprises three sets of compute modules: central processing unit cores designed by Arm Holdings plc, customizable FPGA modules, and AI-optimized circuits. The FPGA circuits have the capability to transform data from sensors within a connected device into a format that facilitates easier processing for the device’s onboard AI models.

Subaru Corp., an early adopter of the Versal AI Edge Series Gen 2, intends to incorporate chips from this lineup into multiple vehicles. These chips will serve as the backbone for an advanced driver-assistance system named EyeSight. The system offers safety functionalities, including adaptive cruise control and automated braking.

AMD unveiled the AI Edge Series Gen 2 recently, along with the Prime Series Gen 2, another brand-new chip line. Its design is akin to that of the previous product family, but compute modules optimized for AI are absent. Every chip in the Prime Series Gen 2 lineup has configurable FPGA circuits, modules designed to process video streams, and Arm-based CPU cores.