About
Alexander Rakhlin is the Distinguished Professor in Data, Systems, and Society, IDSS and Brain and Cognitive Sciences. His research is in machine learning, with an emphasis on statistics and computation. He is interested in formalizing the process of learning, in analyzing learning models, and in deriving and implementing emerging learning methods. A significant thrust of his research is in developing theoretical and algorithmic tools for online prediction, a learning framework where data arrives in a sequential fashion.
Prof. Rakhlin received his bachelor’s degrees in mathematics and computer science from Cornell University, and doctoral degree from MIT. He was a postdoc at UC Berkeley EECS before joining the University of Pennsylvania, where he was an associate professor in the Department of Statistics and co-director of the Penn Research in Machine Learning (PRiML) center. He was also a Visiting Professor in IDSS’s Statistics and Data Science Center (2016). He is a recipient of the NSF CAREER award, IBM Research Best Paper award, Machine Learning Journal award, and COLT Best Paper Award.
Research
My research is at the interface of Machine Learning and Statistics. I am interested in formalizing the process of learning, in analyzing the learning models, and in deriving and implementing the emerging learning methods. A significant thrust of my research is in developing theoretical and algorithmic tools for online prediction, a learning framework where data arrives in a sequential fashion. My recent interests include understanding neural networks and, more generally, interpolation methods.
A high level description of a few research areas:
- Statistical Learning: We study the problem of building a good predictor based on an i.i.d. sample. While much is understood in this classical setting, our current focus is on the Deep Learning models. In particular, we study various measures of complexity of neural networks that govern their out-of-sample performance. We aim to understand the "geometry" (in an appropriate sense) of neural networks and its relation to the prediction ability of trained models. Our recent focus is on statistical and computational aspects of interpolation methods, as well as understanding the bias-variance tradeoff in over-parametrized models.
- Non-Convex Landscapes: Here we are interested in understanding properties of high-dimensional empirical landscapes that arise when one attempts to fit a model with many parameters (such as a multi-layer neural network or a latent variable model) to data. Some of the questions that arise are: (a) What is the behavior of optimization methods on such landscapes? (b) What salient features of the landscape arise from its random nature? (c) How can one exploit randomness in the optimization method to analyze its convergence?
- High-Dimensional Statistics: This setting is centered around the problem of recovery of high-dimensional and structured signals hidden in noise. Since standard statistical methods are often computationally intractable, the question of interplay between computation and statistical optimality arises. Examples: estimation of communities in networks, recovery of few relevant genes in a large set of gene expression data, etc. We are also interested in understanding optimality of maximum likelihood methods in such rich models.
- Online Learning: We aim to develop robust prediction methods that do not rely on the i.i.d. or stationary nature of data. In contrast to the well-studied setting of Statistical Learning, methods that predict in an online fashion are arguably more complex and nontrivial. Major questions that arise in this setting are: (a) How to model the problem at hand? (b) How many examples are required to achieve certain level of performance, and what are the computationally-efficient methods? (c) How to deal with incomplete feedback and the exploration-exploitation dilemma? Examples: sequentially predicting users' preferences, classifying nodes in a social network, sequentially selecting medical treatment strategies while observing limited feedback about the past decisions, etc.
Teaching
Fall semester: 9.520: Statistical Learning Theory and Applications
Spring semester: 9.521/IDS 160: Mathematical Statistics: A Non-Asymptotic Approach
Publications
Recent publications:
* Consistency of Interpolation with Laplace Kernels is a High-Dimensional Phenomenon (with X. Zhai). COLT 2019.
* Just Interpolate: Kernel "Ridgeless" Regression Can Generalize (with T. Liang). The Annals of Statistics, to appear.
* Online Learning: Sufficient Statistics and the Burkholder Method (with D. Foster and K. Sridharan). COLT, 2018
* Optimality of Maximum Likelihood for Log-Concave Density Estimation and Bounded Convex Regression (with G. Kur and Y. Dagan). In submission.