
Cog Lunch: Cheng Tang
Description
Zoom Link: https://mit.zoom.us/j/99672193351
Speaker: Cheng Tang
Affiliation: Jazayeri lab, 4th year PhD candidate (system neuroscience)
Title: An explainable transformer circuit for compositional generalization
Abstract: Compositional generalization—the systematic combination of known elements into novel ensembles— is a hallmark of human cognition, enabling flexible problem-solving beyond rote memorization. While transformer models exhibit surprising proficiency in such tasks (Lake et al., 2023), the underlying mechanisms remain poorly understood. In this case study, we reverse-engineer how a transformer achieves compositional generalization at the circuit level, focusing on a function-primitive composition task. In this task, the model infers functions from teaching examples (e.g., interpreting “apple kiki → apple apple” to deduce that “kiki” means double) and generalizes them to new primitives (e.g., applying “kiki” to “tree” to produce “tree tree”). Our trained transformer achieves high test accuracy (~98%), demonstrating robust generalization.In the first half of the presentation, I will introduce the basics of transformer and provide an intuitive account on how attention operations perform information-routing between tokens with a slot-like data structure. Then I will present the human-interpretable algorithm implemented by the model, walk through the circuit discovery procedure, and highlight the correspondence between attention heads and the algorithm’s steps. Lastly, I will show causal perturbation experiments that validates the reverse-engineered circuit. This presentation aims to demystify the black-box impression of transformers to audience in neuroscience and invite discussion between model understanding and model control.