Accelerate AI and HPC with Math Libraries on NVIDIA Blackwell GPUs | NVIDIA GTC 2025

Babak Hejazi, Azi Riahi

NVIDIA GTC 2025 · Session

This talk, presented by Ozie and Bobak from NVIDIA's math libraries team, provides a comprehensive overview of how NVIDIA's **CUDA X math libraries** are optimized to leverage the advanced capabilities of the newly introduced **Blackwell GPU architecture**. The core message emphasizes the critical role these libraries play in accelerating both AI and High-Performance Computing (HPC) workloads by abstracting hardware complexities, ensuring peak performance, and offering seamless hardware portability. With the introduction of Blackwell, the libraries are strategically updated to support new features like **FP4 precision**, enhanced memory bandwidth, and the **Grace CPU** as a host, enabling users to migrate their code effortlessly and achieve significant performance gains from day one. This presentation highlights specific library enhancements and benchmarks across various applications, demonstrating Blackwell's power in tackling compute-intensive, memory-bandwidth-bound, and large-scale distributed problems.

AI review

A technically competent product overview of NVIDIA's math library updates for Blackwell — honest about what the hardware does, reasonably specific about implementation techniques like FP64 emulation via INT8 tensor cores, and backed by benchmark numbers across a meaningful range of workloads. Not a talk about how to build something, but a credible account of what the libraries do and why. Engineers who need to understand the Blackwell performance envelope before making infrastructure decisions will get value here. Engineers who want to know how to use any of this will leave needing to open a…

Watch on YouTube