Exploring the efficacy of normalization in the acquisition and processing of Japanese vowels
Infants must learn the sound categories of their language and adults need to map particular acoustic productions they hear to one of those learned categories. These tasks can be difficult because there is often a lot of overlap between the acoustic realizations of different categories that can mask which sounds should be grouped together. Previous work has proposed that this overlap is caused, at least in part, by systematic and predictable sources of variability, and that listeners could learn about the structure of this variability and normalize it out to help learn from and process the incoming sounds. In this work, we further explore this idea of normalization, by applying it to the problem of Japanese vowel length contrast – a contrast that current computational models fail to learn due to high overlap between short and long vowels. We find that, at least in the way it is implemented here, normalizing out systematic variability does not substantially improve categorization performance over leaving acoustics unnormalized. We then present an alternative path forward by showing that a strategy that uses both acoustic cues and non-acoustic top-down information in categorization is better able to separate the short and long vowels.