CodeIO: Condensing Reasoning Patterns via Code Input-Output Prediction
Junlong Li, Daya Guo, Dejian Yang, Runxin Xu, Yu Wu, Junxian He
International Conference on Machine Learning 2025 · Oral
The talk introduces **CodeIO**, a novel framework designed to enhance the general reasoning capabilities of large language models (LLMs) by leveraging "code in the wild." Presented by a proxy on behalf of the authors, including first author Junlong Li, CodeIO addresses a critical bias in current LLMs: their disproportionate strength in mathematical and code generation reasoning due to the abundance of high-quality, structured datasets in these domains (e.g., Art of Problem Solving, LeetCode). In contrast, reasoning in other crucial areas like logical, symbolic, scientific, and natural language understanding remains underdeveloped due to data scarcity.
AI review
CodeIO presents a competent and clearly motivated data construction pipeline for improving general reasoning in LLMs by training models to predict code inputs and outputs in natural language chain-of-thought. The empirical results across 14 benchmarks are reasonably broad and the ablations are well-structured. However, the paper's core theoretical claims — that code I/O prediction 'condenses reasoning patterns' and transfers them across domains — remain at the level of motivated intuition rather than formal characterization. The contribution is primarily an engineering framework with strong…