Cog Lunch: Selina Guter & Yudi Xie
Description
Zoom link: https://mit.zoom.us/j/2711902511
***
Speaker: Selina Guter
Title: Do human see faces as beautiful?
Abstract: My paper argues that humans see facial beauty, meaning that human face perception is rich. I discuss aftereffect studies as empirical evidence that facial beauty impressions are perceptual, and then argue that those aftereffects are not well explained in terms of seeing low-level correlates of facial beauty. To this aim, I clarify the distinction between “low-level” and “high-level” properties. Two interpretations are possible: properties can be “low-level” either due to being represented at early stages of visual processing or due to having superior causal powers in triggering visual responses. I show that aftereffects on facial beauty impressions cannot be explained as aftereffects on a low-level correlate of facial beauty under either interpretation of the term.
***
Speaker: Yudi Xie
Title: Vision models trained to estimate spatial latents learned similar ventral-stream-aligned representations
Abstract: Studies of the functional role of the primate ventral visual stream have traditionally focused on object categorization, often ignoring -- despite much prior evidence -- its role in estimating "spatial" latents such as object position and pose. Most leading ventral stream models are derived by optimizing networks for object categorization, which seems to imply that the ventral stream is also derived under such an objective. Here, we explore an alternative hypothesis: Might the ventral stream be optimized for estimating spatial latents? And a closely related question: How different -- if at all -- are representations learned from spatial latent estimation compared to categorization? To ask these questions, we leveraged synthetic image datasets generated by a 3D graphic engine and trained convolutional neural networks (CNNs) to estimate different combinations of spatial and category latents. We found that models trained to estimate just a few spatial latents achieve neural alignment scores comparable to those trained on hundreds of categories, and the spatial latent performance of models strongly correlates with their neural alignment. Spatial latent and category-trained models have very similar -- but not identical -- internal representations, especially in their early and middle layers. We provide evidence that this convergence is partly driven by non-target latent variability in the training data, which facilitates the implicit learning of representations of those non-target latents. Taken together, these results suggest that many training objectives, such as spatial latents, can lead to similar models aligned neurally with the ventral stream. Thus, one should not assume that the ventral stream is optimized for object categorization only. As a field, we need to continue to sharpen our measures of comparing models to brains to better understand the functional roles of the ventral stream.