Extreme PyTorch: Inside the Most Demanding ML Workloads—and the Open Challenges in Building AI Agents to Democratize Them

Soumith Chintala

Conference on Machine Learning and Systems 2025 · Day 1 · Invited Talk

Soumith Chintala, a distinguished Scientist-Engineer at Meta and NYU, presented a comprehensive talk at MLSys 2025, delving into the intricate world of **PyTorch** and its application to the most demanding machine learning workloads. The presentation, meticulously assembled with contributions from many Meta colleagues, aimed to demystify why PyTorch, despite its seemingly simple "Hello World" interface, has evolved into a multi-million-line, multi-gigabyte binary. Chintala articulated the core challenges and philosophical underpinnings driving PyTorch's development, particularly in the context of extreme-scale AI training, such as the **Llama** series of large language models.

AI review

Soumith Chintala is one of the few people on earth who could give this talk with genuine authority, and the content is legitimately interesting — the internals of Llama training at 100k GPUs, the SPMD breakdown, the checkpointing wins, KernelBot. But the write-up reads like a well-structured press release about PyTorch rather than an engineering talk you can act on. The ratio of named things to explained things is too high, and the stuff that would actually change how I think — the Pathways-style controller model, the MAAS scheduler design, the NCCL flight recorder mechanics — gets…