Position: Principles of Animal Cognition to Improve LLM Evaluations

Sunayana Rane, Cyrus Kirkman, Graham Todd, Amanda Royka, Ryan Law, Erica Cartmill, Jacob Foster

International Conference on Machine Learning 2025 · Oral

In an era where large language models (LLMs) exhibit increasingly sophisticated and emergent behaviors, evaluating their true cognitive capabilities remains a profound challenge. This insightful talk, presented by Sunayana Rane at ICML 2025, introduces a novel and rigorous framework for LLM evaluation, drawing critical parallels from decades of research in **animal cognition**. Titled "Principles of Animal Cognition to Improve LLM Evaluations," the presentation highlights how the field of animal cognition has long grappled with similar issues of anthropomorphism, confounding variables, and the distinction between superficial behavior and underlying cognitive mechanisms.

AI review

A competent interdisciplinary position paper that imports five evaluation principles from animal cognition into LLM assessment, supported by a small empirical case study on transitive inference. The framing is intellectually honest and the Clever Hans analogy is apt. The core contribution — that LLM benchmark performance on linguistic tasks may not reflect the underlying cognitive mechanism the task was designed to probe — is genuine and worth saying clearly. However, the theoretical contribution is limited: these principles are largely a repackaging of well-understood ideas in behavioral…