An AI Stack: From Scaling AI Workloads to Evaluating LLMs
Ion Stoica
Conference on Machine Learning and Systems 2025 · Day 2 · Invited Talk
In this comprehensive talk at MLSys 2025, Professor Ion Stoica from UC Berkeley presented an insightful journey through the evolution of the AI/ML stack, focusing on three pivotal open-source projects he has been deeply involved with: **Ray**, **vLLM**, and **Chatbot Arena**. These projects collectively address critical challenges in scaling AI workloads, optimizing large language model (LLM) inference, and establishing reliable evaluation methodologies for the rapidly evolving field of generative AI. Stoica emphasized that these projects form a cohesive "AI stack," where Ray provides the foundational distributed compute framework, vLLM (and its sibling SGLang) delivers high-performance LLM serving, and Chatbot Arena offers a dynamic, human-preference-driven approach to model evaluation.
AI review
Ion Stoica covers three genuinely important projects — Ray, vLLM/SGLang, and Chatbot Arena — and the underlying engineering on all three is real and battle-tested. PagedAttention in particular is one of the most consequential systems ideas in LLM infrastructure of the last few years, and Stoica was there. But this talk reads more like a retrospective overview than a deep technical session: the implementation details are present but shallow, and anyone already familiar with these systems won't find much new. It's a strong 'greatest hits' talk, not a 'here's what we learned that changes how…