Cerebras Systems has launched what it claims is the world’s fastest AI inference service, directly challenging industry leader Nvidia.
Cerebras Systems is a Silicon Valley-based AI computing startup, and its new “Cerebras Inference” platform is based on its CS-3 chip. It is now available in the cloud and as part of computing systems that data center operators can purchase and run independently.
Cerebras Inference Performance and Architecture
In an August 27 press release, Cerebras said its new platform delivers performance that is 20 times faster than Nvidia’s current generation of Hopper chips for AI inference tasks.
Introducing Cerebras Inference
‣ Llama3.1-70B at 450 tokens/s – 20x faster than GPUs
‣ 60c per M tokens – a fifth the price of hyperscalers
‣ Full 16-bit precision for full model accuracy
‣ Generous rate limits for devs
Try now: https://t.co/50vsHCl8LM pic.twitter.com/hD2TBmzAkw— Cerebras (@CerebrasSystems) August 27, 2024
The company references evaluations conducted by Artificial Analysis, a benchmarking firm, to validate its performance claims.
The CS-3 chip, the size of a dinner plate, employs a unique architecture that integrates memory directly into the chip wafer, unlike the separate high-bandwidth memory chip used by Nvidia.
AI inference involves a trained machine learning model evaluating new data and drawing conclusions without relying on specific examples. It’s different from data training in that AI data training builds on it by equipping models to generate precise inferences.
According to Verified Market Research, the AI inference chip market is booming, valued at $15.8 billion in 2023 and expected to reach $90.6 billion by 2030. This rapid growth reflects the increasing adoption of AI inference across industries and applications.
Cerebras’ innovative CS-3 chip is at the forefront of this trend, delivering exceptional performance for AI inference workloads. Specifically, it can process 1,800 tokens per second for the open-source Llama 3.1 8B model and 450 tokens per second for the larger Llama 3.1 70B model.
Micah Hill-Smith, co-founder and CEO of Artificial Analysis Inc., confirmed that these models running on Cerebras Inference achieve “quality evaluation results” that align with Meta’s official versions.
In addition to its performance claims, Cerebras is positioning its service as a cost-effective alternative to existing solutions. The company states that its service starts at just 10 cents per million tokens, which it claims equals 100 times higher price performance for AI inference workloads.
AI gets cheaper and faster every day.
Meet Cerebras Systems a strong Groq competitor.
Also using custom AI chips for interference, instead of expensive (multi purpose) GPUs.
• 𝗳𝗮𝘀𝘁: Llama3.1-70B at 450 tokens/s – 20x faster than GPUs
• 𝗰𝗵𝗲𝗮𝗽: 60c per M tokens – a… pic.twitter.com/5Se5g7HT1K
— Christoph C. Cemper 🧡 AIPRM (@cemper) August 28, 2024
Cerebras CEO Andrew Feldman emphasized the company’s strategy, stating, “The way you beat the 800-pound gorilla is by bringing a vastly better product to market.” He went on to claim Cerebras is already taking customers from Nvidia.
Industry Context and Nvidia AI Competition
Cerebras is part of a group of smaller companies, including Groq, aiming to capture a portion of the multibillion-dollar AI chip market currently dominated by Nvidia. These companies are capitalizing on the growing demand for AI inference capabilities, which are essential for powering applications like ChatGPT and Google’s Gemini.
While Nvidia’s Hopper GPUs have become a highly demanded commodity for training top AI models, Cerebras and its competitors focus on more specialized chips designed to run these models efficiently.
NVIDIA just announced new GPUs @NVIDIAGTC delivering up to 20 petaFLOPS per chip.
Many don't realize what a staggering, absurd amount of compute that is, so I'd like to provide some perspective:
* With a few gigaFLOPS, you can run basic image processing operations on
1/x pic.twitter.com/WmwcihfujC
— Rafael Spring (@Rafael_L_Spring) March 20, 2024
Notably, the AI chip startup landscape continues to see significant activity, with Groq, another AI inference competitor, raising $640 million this month at a $2.8 billion valuation.
However, the sector is not without its challenges, as evidenced by chipmaker Graphcore’s recent acquisition by SoftBank for less than the total venture capital it had raised since its founding.
Similarly, the launch of Cerebras Inference represents a significant development in the AI computing landscape.
As demand for AI inference capabilities continues to grow, particularly for real-time and high-volume applications, Cerebras’ solution could potentially disrupt the market.