Cerebras Challenges Nvidia with Launch of World’s Fastest AI Inference Service

Why Trust Techopedia
Key Takeaways

  • Cerebras Systems introduces "Cerebras Inference," claiming it's 20 times faster than Nvidia's Hopper chips at AI inference.
  • The new service is based on the CS-3 chip, the size of a dinner plate.
  • The company aims to disrupt Nvidia's stronghold in the AI chip market.

Cerebras Systems has launched what it claims is the world’s fastest AI inference service, directly challenging industry leader Nvidia.

Cerebras Systems is a Silicon Valley-based AI computing startup, and its new “Cerebras Inference” platform is based on its CS-3 chip. It is now available in the cloud and as part of computing systems that data center operators can purchase and run independently.

Cerebras Inference Performance and Architecture

In an August 27 press release, Cerebras said its new platform delivers performance that is 20 times faster than Nvidia’s current generation of Hopper chips for AI inference tasks.

The company references evaluations conducted by Artificial Analysis, a benchmarking firm, to validate its performance claims.

The CS-3 chip, the size of a dinner plate, employs a unique architecture that integrates memory directly into the chip wafer, unlike the separate high-bandwidth memory chip used by Nvidia. 

AI inference involves a trained machine learning model evaluating new data and drawing conclusions without relying on specific examples. It’s different from data training in that AI data training builds on it by equipping models to generate precise inferences.

According to Verified Market Research, the AI inference chip market is booming, valued at $15.8 billion in 2023 and expected to reach $90.6 billion by 2030. This rapid growth reflects the increasing adoption of AI inference across industries and applications.

Cerebras’ innovative CS-3 chip is at the forefront of this trend, delivering exceptional performance for AI inference workloads. Specifically, it can process 1,800 tokens per second for the open-source Llama 3.1 8B model and 450 tokens per second for the larger Llama 3.1 70B model. 

Micah Hill-Smith, co-founder and CEO of Artificial Analysis Inc., confirmed that these models running on Cerebras Inference achieve “quality evaluation results” that align with Meta’s official versions.

In addition to its performance claims, Cerebras is positioning its service as a cost-effective alternative to existing solutions. The company states that its service starts at just 10 cents per million tokens, which it claims equals 100 times higher price performance for AI inference workloads.

Cerebras CEO Andrew Feldman emphasized the company’s strategy, stating, “The way you beat the 800-pound gorilla is by bringing a vastly better product to market.” He went on to claim Cerebras is already taking customers from Nvidia.

Industry Context and Nvidia AI Competition

Cerebras is part of a group of smaller companies, including Groq, aiming to capture a portion of the multibillion-dollar AI chip market currently dominated by Nvidia. These companies are capitalizing on the growing demand for AI inference capabilities, which are essential for powering applications like ChatGPT and Google’s Gemini.

While Nvidia’s Hopper GPUs have become a highly demanded commodity for training top AI models, Cerebras and its competitors focus on more specialized chips designed to run these models efficiently.

Notably, the AI chip startup landscape continues to see significant activity, with Groq, another AI inference competitor, raising $640 million this month at a $2.8 billion valuation

However, the sector is not without its challenges, as evidenced by chipmaker Graphcore’s recent acquisition by SoftBank for less than the total venture capital it had raised since its founding.

Similarly, the launch of Cerebras Inference represents a significant development in the AI computing landscape. 

As demand for AI inference capabilities continues to grow, particularly for real-time and high-volume applications, Cerebras’ solution could potentially disrupt the market.