Multi-GPU Communication Libraries for Scaling HPC and AI Workloads | NVIDIA GTC 2025

Jiri Kraus

NVIDIA GTC 2025 · Session

In this comprehensive talk from NVIDIA GTC, Jiri Kraus, Principal Developer Technology at NVIDIA, delves into the critical role of multi-GPU communication libraries in addressing the escalating demands of High-Performance Computing (HPC) and Artificial Intelligence (AI) workloads. The presentation highlights how **Message Passing Interface (MPI)**, the **NVIDIA Collective Communications Library (NCCL)**, and **NVIDIA SHMEM (NVSHMEM)** serve as foundational technologies for efficiently orchestrating data movement across multiple GPUs, both within single nodes and across distributed systems. Kraus positions these libraries within a broader multi-GPU programming landscape, emphasizing their balance between generality—the ability to express diverse algorithms—and productivity for developers.

AI review

A competent, technically honest survey of NVIDIA's multi-GPU communication stack — MPI, NCCL, and NVSHMEM — with real benchmark results from production HPC codes. The speaker clearly knows this material from the inside, and the three-way comparison across ICON, VASP, and QUDA is useful. But this is primarily a vendor talk that documents existing NVIDIA technologies rather than advancing how engineers think about the problem. The engineering is real; the novelty is low.

Watch on YouTube