LASSP & AEP Seminar: Gautam Reddy (Princeton)
In-Context Learning: Insights from the Analysis of Tiny Transformers
Transformer models pretrained on large amounts of language data display a powerful feature known as in-context learning: the ability to parse new information presented in the context with no additional weight updates. In-context learning contrasts with traditional weight-based learning paradigms in neuroscience, which usually involve learning rules designed to solve specific problems and motivates new models of rapid learning. In this talk, I will present a detailed, quantitative analysis of small transformer models trained on simplified tasks. I will discuss how these models implement in-context learning, how this ability emerges during learning and why it appears even in scenarios when memorizing the dataset is optimal.
Bio:
Gautam Reddy is assistant professor of physics at Princeton University. He studied at the Indian Institute of Technology, Bombay, India, earning a degree in engineering physics. He received his Ph.D. in physics from the University of California, San Diego, and then served as an NSF-Simons Fellow at Harvard and as a research scientist at NTT Research’s Physics and Informatics Labs. Drawing upon a diverse set of problems in neuroscience, evolution and machine learning, his research is focused on understanding how living and artificial systems process high-dimensional information to solve goal-oriented tasks. He runs the Reddy lab which develops novel physics-inspired theory and tools to build phenomenological models of learning and decision-making in collaboration with experimental biologists and uses machine learning models as ‘experimental systems’ to motivate new theory.