Using Structure to Predict the Next Word: What RNN Language Models Learn about Syntax
Description
Recurrent Neural Networks (RNNs) have been able to achieve state of the art scores on numerous linguistic tasks, such as language modeling and translation. However, the nature of the representations they learn is still unclear, which poses a problem for both their controllability and interpretability. In this work, I employ methodology from psycholinguistics to demonstrate that RNNs trained on a language modeling objective (predict the next word given a context) demonstrate behavior that is consistent with multiple aspects of human syntactic representation, including constituency and hierarchy. Treating the network as one would a subject in a psycholinguistics test, I provide evidence that they maintain syntactic state through subordinate clauses and are sensitive to garden path effects. Turning to the filler—gap dependency, I demonstrate that the models are sensitive to the hierarchical relationship between the two words implicated in the dependency and are sensitive to a number of “islands”, syntactic structures in which the dependency is blocked. This work demonstrate how some human-like syntactic organization can arise from a linear learning model without any obvious hierarchical biases trained on a relatively simple objective function.