From Theory to Methodology and Data Analysis: A Study of Model (Mis)specification
Description
Statistical models can help discover structure in data, such as connectivity patterns in neuronal networks. The data scientist, however, faces the possibility of model misspecification, as well as the danger of drawing invalid conclusions from evaluating too many models on the same data. We will discuss three vignettes that attempt to address some of these concerns.
First, we argue that cross-validation---a method often used for model selection---is not the best statistical procedure. Our theoretical analysis suggests a different (and optimal) method, and we demonstrate its effectiveness in practice. Second, we discuss procedures for estimating patterns in networks or genomics data for several natural statistical models. Finally, we describe a sequential prediction paradigm where data arrive in an online fashion. We propose a method for choosing dynamic treatment regimes that is robust to model misspecification.
Speaker Bio
Alexander (Sasha) Rakhlin is an Associate Professor at the Department of Statistics at the University of Pennsylvania. He received his bachelors degrees in Mathematics and Computer Science from Cornell
University, and a doctoral degree in Neuroscience from the Brain and Cognitive Sciences Department at MIT. He was a postdoc at UC Berkeley before joining the University of Pennsylvania. Sasha is a recipient of the NSF CAREER award, IBM Research Best Paper award, Machine Learning Journal award, and COLT Best Paper Award. He is an Associate Editor of the Annals of Statistics and the Journal of Machine Learning Research.