Breaking Records: Nvidia's MLPerf Inference V5.0 Results Are Out!

Key Points

Nvidia releases MLPerf Inference V5.0 benchmark results for its Blackwell GPU, showcasing its GB200 NVL72 system’s performance records.
The system delivers up to 30 times higher throughput on the Llama 3.1 405B workload compared to the company’s H200 NVL8, based on the Hopper architecture.
The company’s Blackwell platform achieves a nearly 5x performance gain on DeepSeek R1 inference in just one month.

A recent report has shed light on Nvidia’s latest developments in its Blackwell GPU, a successor to Hopper. According to the report, Nvidia’s GB200 NVL72 system, a rack-scale offering designed for AI reasoning, set a series of performance records in the MLPerf Inference V5.0 benchmark results.

The benchmarks, which included the latest updates to MLPerf Inference, featured the addition of Llama 3.1 405B, a model considered "one of the largest and most challenging-to-run open-weight models". Additionally, the new Llama 2 70B Interactive benchmark was included, which features much stricter latency requirements, closely modeling how chatbots work.

Salvator, director of accelerated computing products at Nvidia, noted that the system, which connected 72 Blackwell GPUs to act as a single GPU, delivered up to 30 times higher throughput on the Llama 3.1 405B workload compared to the company’s H200 NVL8, based on the Hopper architecture. This demonstrates the significant advancements made in the Blackwell platform.

The report also highlighted the company’s Hopper architecture’s continued performance gains. Despite being in the market for three years, Hopper achieved a 60% performance boost over last year on the Llama 2 70B workload. This shows that even older architectures still have room for improvement.

The article also touched on the company’s Dynamo open-source inferencing software, introduced at GTC 2025. According to Salvator, Dynamo will further increase performance. McGregor, principal analyst at Tirias Research, described Dynamo as "significant, thinking of an OS for an entire data center. This gets to [Nvidia CEO Jensen Huang’s] view that the new unit of compute is the data center, or to put it another way, you have to think of the entire data center as a single server."

The report also discussed the issue of power efficiency in data centers. McGregor described it as "an ongoing problem. We have to make everything operate more efficiently to maximize the power consumption and reduce costs, and we have to find better power solutions like small modular reactors." A Nvidia spokesperson explained that the company approaches power efficiency in two ways: "The Blackwell architecture is a more efficient architecture than the previous generation, meaning we get more performance within a given power budget with the Blackwell-based GPUs. In addition, reduced precisions like FP8 and now FP4 with Blackwell bring performance increases that allow more work to be done using less infrastructure, also increasing overall efficiency."

Read the rest: Source Link

You might also like: How to get Windows Server 2022, Try Windows 11 Pro for Workstations & browse Windows Azure content.

Remember to like our facebook and our twitter @WindowsMode for a chance to win a free Surface every month.

Post Views: 356