I explore the hypothesis that the universal properties of human languages can be explained in terms of efficient communication given fixed human information processing constraints, such as incrementality and noisy memory representations. I argue that due to these constraints, languages should exhibit information locality: words that depend on each other should be close to each other in linear order. In support of this idea, I show corpus evidence from over 40 languages that word order in grammar and usage is shaped by working memory constraints in the form of dependency locality: a pressure for syntactically linked words to be close. Next, I develop a new formal model of language processing cost, based on rational inference over noisy memory representations, that unifies surprisal and memory effects and derives dependency locality effects as a subset of information locality effects. Finally, I show that the new processing model resolves a long-standing paradox in the psycholinguistic literature, structural forgetting, where the effects of memory appear to be language-dependent.