rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Xinyu Guan, Li Lyna Zhang, Yifei Liu, Ning Shang, Youran Sun, Yi Zhu, Fan Yang, Mao Yang

International Conference on Machine Learning 2025 · Oral

The talk introduces **rStar-Math**, a novel framework designed to empower small Language Models (LLMs) with advanced mathematical reasoning capabilities through a process of self-evolved deep thinking. Presented at ICML 2025, this work addresses a critical challenge in the field of AI: the inherent difficulty for auto-regressive LLMs to perform complex, multi-step reasoning, particularly in mathematics. Unlike human cognition, which often employs a deliberate, "System 2" thinking mode for intricate problems, LLMs typically operate in a fast, "System 1" manner prone to errors and hallucinations, a significant drawback when precision is paramount, as in mathematical problem-solving.

AI review

rStar-Math is a competent engineering system that combines MCTS, code-execution filtering, and preference-based reward modeling to bootstrap math reasoning in small LLMs. The empirical results are noteworthy — a 7B model approaching o1-class performance is a real number worth paying attention to — but the talk, as described, offers no theoretical grounding for why this works, presents no controlled ablations that isolate the contribution of each component, and makes benchmark comparisons against a moving and poorly characterized target. The 'self-evolved deep thinking' framing is rhetoric…