Nvidia’s new Blackwell chips are encountering fresh problems with servers that they operate with, causing overheating and subsequent problems.
As initially reported by The Information, the advanced GPUs are overheating when connected to server racks designed to accommodate up to 72 chips, with the outlet citing sources close to the issue.
The market leader has pressed its suppliers tasked with providing the racks to come up with different solutions, following several attempts to get to the bottom of the problem. The critical situation is causing consternation among customers due to fears they will not be able to set up data centers in line with existing timescales.
A previous design flaw had held up Nvidia’s production of the B200 commodities but speaking to the press during a visit to Denmark last month, company CEO Jensen Huang said it had been fixed.
He admitted a “design flaw caused the yield to be low”, calling it “100% Nvidia’s fault,” but with assistance from manufacturing partner TSMC, a solution was found.
Nvidia Appears to Downplay the Urgency of the Situation
In a fresh statement issued to Reuters, in response to the overheating issue, Nvidia appeared to play down the extent of the setback.
The company confirmed it continued to work with leading cloud service providers as an important aspect of its processes, adding “the engineering iterations are normal and expected.”
Nvidia’s advanced Blackwell chip uses two squares of silicon the size of their previous offering and brings them together into a single component that performs up to 30 times faster in tasks such as providing chatbot responses.
They are a product in high demand at the cutting edge of AI development.