Photon: Federated LLM Pre-Training

Lorenzo Sani, Alex Iacob, Zeyu Cao, Royson Lee, Nicholas D. Lane

Conference on Machine Learning and Systems 2025 · Day 4 · Session 11: Federated Learning

This article delves into **Photon**, a pioneering system designed for **Federated LLM pre-training**, as presented by Lorenzo Sani and his collaborators from Flower Labs, the Machine Learning Systems Group at the University of Cambridge, and visiting researchers from Bup and Zhejiang University. The talk outlines a radical departure from conventional large language model (LLM) training paradigms, proposing a distributed, internet-scale approach to address the escalating challenges of compute infrastructure, training robustness, and data accessibility. Photon aims to leverage globally distributed computing resources and decentralized data sources, moving away from monolithic, tightly coupled data centers.

AI review

Photon presents a genuinely interesting systems contribution — federated pre-training of LLMs with real throughput numbers on real hardware — but the article reads more like a polished abstract than an engineering report. The headline results (2X throughput on 13B params, 90% GPU utilization, fault-tolerant stateless client design) are credible and worth attention, but the implementation is mostly described at the level of design principles rather than reproducible engineering. Good enough to follow up on, not yet enough to build from.