Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards

Yangsibo Huang, Milad Nasr, Anastasios Angelopoulos, Nicholas Carlini, Wei-Lin Chiang, Christopher A. Choquette Choo, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Ken Ziyu Liu, Ion Stoica, Florian Tramer, Chiyuan Zhang

International Conference on Machine Learning 2025 · Oral

This talk, presented by Christopher Choquette Choo on behalf of lead author Yangsibo Huang and a large team of collaborators, delves into the vulnerabilities inherent in **voting-based benchmarks**, a rapidly growing paradigm for evaluating large language models (LLMs). While traditional static benchmarks like GSM8K and MMLU suffer from issues such as limited diversity, small example sets, and data contamination risks due to their fixed and published nature, voting-based platforms offer an interactive, online alternative where users directly compare and rank model responses. The most prominent example, **Chatbot Arena** (or **LMSYS Arena**), boasts over 3.5 million votes to date and relies heavily on model anonymity to ensure unbiased evaluation. This research uncovers critical methods for breaking this anonymity and subsequently manipulating model rankings, highlighting significant challenges for the integrity of these popular evaluation systems.

AI review

A competent and practically important security audit of voting-based LLM evaluation platforms, with clear empirical findings and honest enumeration of tradeoffs. The work is well-motivated and addresses a real problem that the community should care about, but it operates primarily in the security/measurement space rather than the theoretical ML space, and its techniques — logistic regression on bag-of-words, Elo simulation, cost modeling — are standard. The headline results are credible and useful, but the work closes a specific concern rather than opening a new theoretical direction.