Leveraging automatic speech recognition technology to model cross-linguistic speech perception in humans
Existing theories of cross-linguistic phonetic category perception agree that listeners perceive foreign sounds by mapping them onto their native phonetic categories. Yet, none of the available theories specify a way to compute this mapping. As a result, they cannot provide systematic quantitative predictions and remain mainly descriptive. In this talk, I will present a new approach that leverages Automatic Speech Recognition (ASR) technology to obtain fully specified mapping between foreign and native sounds. Using the machine ABXevaluation method, we derive quantitative predictions from ASR systems and compare them to empirical observations in human cross-linguistic phonetic category perception. I will present results both where the proposed model successfully predicts empirical effects (for example on the American English /r/-/l/ distinction) and where it fails (for example on the Japanese vowel length contrasts) and discuss possible interpretations.