NVIDIA’s powerful new Blackwell AI chips are facing overheating issues in data center racks, which may cause delays for major customers.
NVIDIA has apparently run into some technical snags with its Blackwell AI chips. The cutting-edge processors, already delayed from their initial launch window, are now grappling with overheating issues when implemented in data center server racks, according to a report from The Information.
Sources told the publication that the Blackwell GPUs overheat when packed together in server racks designed to hold up to 72 of the chips at once. Apparently, NVIDIA has had to go back to its suppliers multiple times requesting redesigns of these racks in an attempt to get the heat situation under control.
A company spokesperson suggested the issues weren’t uncommon, telling Reuters that “NVIDIA is working with leading cloud service providers as an integral part of our engineering team and process. The engineering iterations are normal and expected.”
Whatever the circumstances, the overheating setbacks could stymie rollout plans for major Blackwell customers like Meta, Google, and Microsoft. NVIDIA had previously stated the chips would ship a few months following the initial announcement in March before hitting these delays.
The overheating issue isn’t totally surprising given the monstrous performance NVIDIA is promising with Blackwell. These new GPUs essentially glue together two chips the size of NVIDIA’s previous offerings into one mega processor that can be a whopping 30 times faster for AI workloads like large language models.
More performance means more power draw and more heat generation. And getting that heat dissipation equation right will be critical for NVIDIA’s data center partners to maximize AI capabilities.
While NVIDIA navigates these pains, competitors like AMD will be looking to capitalize on any opening in the white-hot AI hardware market.
This is not the first time NVIDIA has run into delays. In August, the company reportedly encountered “design flaws” that were expected to push the release of the chips by at least three months. CEO Jensen Huang himself acknowledged the issue in October, announcing the company was back on track.