Machine Learning meets Algebraic Combinatorics: A Suite of Datasets Capturing Research-level Conjecturing Ability in Pure Mathematics

Herman Chau, Helen Jenne, Davis Brown, Jesse He, Mark Raugas, Sara Billey, Henry Kvinge

International Conference on Machine Learning 2025 · Oral

This talk, presented by Herman Chau, introduces a novel collection of datasets designed to advance the application of AI in pure mathematics, specifically focusing on the often-overlooked aspects of mathematical discovery: intuition, exploration, and conjecture generation. Developed by a large interdisciplinary team of AI researchers and mathematicians, the initiative addresses critical gaps in existing AI for math datasets, which predominantly focus on known mathematics and proof generation. The presented suite of nine datasets, rooted in **algebraic combinatorics**, aims to provide machine learning models with raw mathematical data corresponding to both open problems and foundational results, thereby enabling AI to assist mathematicians in forming new hypotheses and uncovering hidden patterns.

AI review

A well-motivated dataset contribution targeting a genuine gap — the conjecture-formation phase of mathematical research — grounded in algebraic combinatorics. The work is honest about what it is: infrastructure, not theory. The headline result (99.8% accuracy on Schubert structure constants) is intriguing but sits closer to a proof-of-concept than a mathematical finding. The real contribution is the dataset suite itself, and its value will depend almost entirely on whether the community can extract interpretable mathematical content from the models that succeed on it. Solid infrastructure…