Hierarchy and invariance in auditory cortical computation
Description
With ease, we recognize a friend’s voice in a crowd, or pick out the first violin in a concerto. But the effortlessness of everyday perception masks its computational challenge. Perception does not occur in the eyes and ears – indeed, nearly half of primate cortex is dedicated to it.
While much is known about peripheral auditory processing, auditory cortex remains poorly understood. This thesis addresses basic questions about the functional and computational organization of human auditory cortex through three studies.
In the first study we show that a hierarchical neural network model optimized to recognize speech and music does so at human levels, exhibits a similar pattern of behavioral errors, and predicts cortical responses, as measured with fMRI. The multi-task optimization procedure we introduce produces separate music and speech pathways after a shared front end, potentially recapitulating aspects of auditory cortical functional organization. Within the model, different layers best predict primary and non-primary voxels, revealing a hierarchical organization in human auditory cortex.
We then seek to characterize the representational transformations that occur across stages of the putative cortical hierarchy, probing for one candidate: invariance to real-world background noise. To measure invariance, we correlate voxel responses to natural sounds with and without real-world background noise. Non-primary responses are substantially more noise-invariant than primary responses. These results illustrate a representational consequence of the potential hierarchical organization of the auditory system.
Lastly, we explore of the generality of deep neural networks as models of human hearing by simulating many psychophysical and fMRI experiments on the above-described neural network model. The results provide an extensive comparison of the performance characteristics and internal representations of a deep neural network with those of humans. We observe many similarities that suggest that the model replicates a broad variety of aspects of auditory perception. However, we also find discrepancies that suggest targets for future modeling efforts.
Thesis can be found here: https://www.dropbox.com/s/kqyq7q2ejv777o1/kell.thesis.for-committee.pdf?dl=0