CogLunch: Chengxu Zhuang "Towards a More Human-like Language Learning Strategy through Lexicon-level Contrastive Visual Grounding"
Description
Speaker: Chengxu Zhuang
Affiliation: ICoN Postdoctoral Fellow, MIT
Title: Towards a More Human-like Language Learning Strategy through Lexicon-level Contrastive Visual Grounding
Abstract:
Today’s most accurate language models are trained on orders of magnitude more language data than human language learners receive—but with no supervision from other sensory modalities that play a crucial role in human learning. Can we make language models' representations and predictions more accurate and human-like with more ecologically plausible supervision?
In this talk, I will first demonstrate that existing vision-grounded language models yield only limited improvements in language learning. This is illustrated by training a diverse set of language learning algorithms, both with and without visual supervision, and evaluating them on multiple word-level semantical and syntactical benchmarks.
I will then present a new language learning algorithm we have developed to better leverage visual grounding. This algorithm, LexiContrastive Grounding (LCG), integrates a cross-modality contrastive learning objective on lexicon-level representations and a next-word prediction objective. Beyond the typical grounded-only learning scenario, LCG also outperforms existing algorithms on multiple benchmarks in a more child-like learning scenario, which mixes grounded and ungrounded learning.
Finally, I will discuss some preliminary results showing that LCG yields improvements in modeling the neuronal responses of a multi-modality cortical area. I will also discuss the next steps in modeling children’s language learning behaviors and brain responses.
Location: 46-3310