NVIDIA Blackwell Platform — First Look as World Prepares for Trillion Parameter AI Models

Why Trust Techopedia
KEY TAKEAWAYS

  • NVIDIA’s Blackwell platform allows organizations to build and run real-time generative AI on (yet to launch) trillion-parameter large language models.
  • The platform effectively acts as a single GPU and can reduce data center facility power by 28%.
  • NVIDIA's focus on energy efficiency, including the use of liquid cooling technologies, is crucial for addressing the growing environmental impact of AI.
  • The platform's scalable architecture allows for the deployment of massive AI models on large-scale infrastructure, enabling increasingly complex AI workloads.

NVIDIA has unveiled its first look at the Blackwell platform to a select group of tech journalists.

Techopedia attended NVIDIA’s pre-briefing as the AI giant prepares to reveal the Blackwell platform at Hot Chips 2024 event from August 25 to 27.

We take a first look as NVIDIA unveils how AI will operate at a hardware level over the next few years. The Blackwell platform will work as a ‘single GPU’, bringing together multiple chips, systems, and software to power the next generation of artificial intelligence.

And with AI expected to add more pressure to the world’s energy needs, NVIDIA will unveil a liquid-cooled rack designed to scale as a solution to tackle AI’s energy demand problem.

NVIDIA’s New Blackwell Platform: Next-Gen AI Hardware

At the heart of the technologies needed to push AI to the next era lie the three big problems the industry has been constrained by — computing power, energy efficiency, and latency.

Dave Salvator, Director of Accelerated Computing Products at NVIDIA, explained how different chips, switches, GPUs, and systems converge in the new Blackwell platform.

Salvator said that Blackwell is both a system and a platform, engineered at the data center level, specifically designed for the challenges of future AI models.

An ‘exploded’ view of the Blackwell — effectively functioning as one giant GPU.
An ‘exploded’ view of the Blackwell — effectively functioning as one giant GPU. Source: NVIDIA

An ‘exploded’ view of the Blackwell — effectively functioning as one giant GPU. (NVIDIA)

To upgrade Blackwell the company is turning to its NVSwitch technology. The company said that using NVSwitch they have managed to optimize and improve throughput and low latency inference as computing demands increase due to generative AI advancements.

Integrated into Blackwell, the switches allow GPUs to communicate more efficiently even when several GPUs are ‘talking’ with each other simultaneously.

The NVIDIA NVSwitch: Now entering its fifth generation
The NVIDIA NVSwitch: Now entering its fifth generation. Source: NVIDIA
The NVSwitch tray, designed for low latency.
The NVSwitch tray, designed for low latency. Source: NVIDIA

Salvator explained that at the system level problem, models grow in size over time, and multi-generative AI applications are expected to run in real-time, but the requirement for inference has risen dramatically over the last several years.

“One of the things that real-time, large language model inferencing needs is multiple GPUs, and in the net and the not too distant future, multiple server nodes.”

 

“The challenge is, this huge balancing act between getting great performance out of the GPUs, great utilization on the GPUs, and delivering great user experiences to the end users using those AI-powered services.”

NVIDIA said that by using NVSwitch, GPUs can communicate at a much faster rate than any other internet communication. By integrating NVSwitch technology into Blackwell, NVIDIA can now provide a potential increase of up to 50% in performance. The company announced they will also be doubling the bandwidth of NVswitch, going from 900 gigabytes a second to 1.8 terabytes per second.

“Every GPU gets that rate of connectivity for his communication, even when multiple GPUs are talking to each other at the same time,” Salvator said.

First look: The NVIDIA GB200: Claims to deliver 30x faster real-time large language model inference.
First look: The NVIDIA GB200: Claims to deliver 30x faster real-time large language model inference.
Source: NVIDIA

Combining FP Precision with Algorithms To Preserve Accuracy

Another feature coming to Blackwell is FP [Floating Point] core precision. FP core precision is all about preserving accuracy with reduced precision — a concept easier said than done.

“When you reduce precision, you get into the big, fancy word quantization, which basically means rounding.

 

“In other words, if you use fewer bits to describe the number, you are ultimately rounding off and losing some precision, which can cause accuracy loss.”

NVIDIA’s work around this paradox involves bringing in what the company calls a ‘Quasar quantization system’. This system combines existing NVIDIA technologies with continually developing algorithmic research.

With these algorithms, NVIDIA said it can now run with reduced precision while not sacrificing accuracy.

Salvator explained that while many other companies offer FP 4 precision support for chips the solutions do not efficiently preserve accuracy.

“Without (our algorithms), you could try to use FP 4, but you will not really be able to preserve accuracy.”

Liquid Cooling: Warm Water Tech Can Increase Energy Efficiency By Up To 28%

It’s no secret that AI is driving energy consumption to an all-time high across the tech industry. The International Energy Agency’s (IEA) recent report claims electricity consumption from data centers, artificial intelligence (AI) and the cryptocurrency sector could double by 2026.

Data centers and technology companies are investing in liquid cooling technology to mitigate energy demands. Advancements in liquid cooling could bring significant benefits not just for the sustainable side of business but also for the bottom line.

On August 21, SeekingAlpha reported that direct liquid cooling (DLC) systems working to cool NVDA’s Blackwell AI chips could positively impact the company’s quarterly earnings reports.

While there is not a single liquid cooling technology on the market, several concepts are in operation, some more advanced but still in research. At the press briefing, NVIDIA highlighted the potential of ‘warm water cooling’.

Warm water cooling, unlike traditional water cooling, requires no chillers or compressors to lower the temperature, therefore cutting down dramatically on the amount of energy required to power them.

Additionally, by removing compressors and water cooling components from data center GPU liquid cooling systems, maintenance, operational costs, and lifespan improve significantly.

NVIDIA also announced research looking into how to reuse the heat pulled away from the servers and recycle it back to the data center. According to NVIDIA estimates, warm cooling can generate up to a 28% reduction in data center facility power.

The Bottom Line: Why the New Blackwell is a Big Deal?

The Blackwell architecture is perhaps the most complex architecture ever built for AI. It goes well beyond the demands of today’s models and prepares the infrastructure, engineering, and platform that organizations will need when 1 trillion parameter models inevitably hit the market next year.

To meet the demands of these new models, NVIDIA is not only working on computing processing. It has also set a strong focus on the three biggest roadblocks limiting AI today: energy consumption, latency, and precision accuracy.

While Blackwell’s architecture and hardware are an impressive leap, they will operate at the data center level. This leaves us today with many questions that will need to be resolved in the future.

One example is how these massive AI cloud operations will be optimized for lightweight edge environments and end-user devices, such as smartphones, the Internet of Things (IoT), laptops, and others.

That said, the new Blackwell will be a significant breakthrough in AI computing. By combining innovative architecture with technologies like NVSwitch and FP core precision, it will deliver unprecedented performance with better energy efficiency.

NVIDIA is poised once again not only to pave the way for most cutting-edge AI projects, like drug discovery or material science, but also to ensure accessibility, rapid deployment, and simplify the work that developers do in small, medium, or large companies that are all into the AI era.

FAQs

What is NVIDIA Blackwell?

When is NVIDIA Blackwell coming out?

Related Terms

Related Article