Strategy Coopetition Explains the Emergence and Transience of In-Context Learning

Aaditya Singh, Ted Moskovitz, Sara Dragutinović, Feilx Hill, Stephanie Chan, Andrew Saxe

International Conference on Machine Learning 2025 · Oral

The talk "Strategy Coopetition Explains the Emergence and Transience of In-Context Learning" by Aaditya Singh and collaborators delves into one of the most intriguing and foundational phenomena in large language models (LLMs): **in-context learning (ICL)**. ICL, defined as a transformer's ability to adapt its behavior by learning from inputs at test time, emerged prominently with models like **GPT-3** and is a cornerstone of their remarkable versatility. This capability is often contrasted with **in-weights learning (IWL)**, where knowledge is encoded directly into the model's parameters during pre-training. A central puzzle in the field is not just why ICL emerges when models are merely trained for next-token prediction, but also why this powerful ability has been observed to be transient, fading or weakening with extended training.

AI review

Singh et al. offer a carefully designed mechanistic study of in-context learning transience in small transformers, introducing the useful construct of Context-Constrained In-Weights Learning (CIWL) and the 'strategy coopetition' framework. The layer-patching experiments are clean and the identification of CIWL as the asymptotic strategy is a genuine empirical contribution. The toy model is a nice touch. But the theoretical backbone is thin: 'coopetition' is a narrative label more than a formal framework, the toy model's gradient dynamics are not analyzed in any rigorous sense, and the…