
Akhilan Boopathy Thesis Defense: Towards High-Dimensional Generalization in Neural Networks
Description
Date & Time: July 3rd, 1 PM
Location: 46-3002
Zoom: https://mit.zoom.us/my/akhilan
Title: Towards High-Dimensional Generalization in Neural Networks
Abstract:
Neural networks excel in a wide range of applications due to their ability to generalize beyond training data. However, their performance degrades on high-dimensional tasks without large-scale data, a challenge known as the curse of dimensionality. This thesis addresses this limitation by pursuing three key objectives aimed at understanding and improving neural network generalization.
1. We aim to investigate the scaling laws underlying generalization in neural networks including double descent, a phenomenon in which as a model's capacity or training data is increased, the test error temporarily increases at a certain point before continuing to decrease. In particular, we will have two goals: 1) a better understanding of when double descent can and cannot be empirically observed and 2) a better understanding of scaling laws with respect to training time.
2. Inductive bias refers to the set of assumptions a learning algorithm makes to predict outputs on inputs it has not encountered. We propose quantifying the amount of inductive bias required for a model to generalize well with a fixed amount of training data. By developing methods to measure inductive bias, we can assess how much information model designers need to incorporate into neural networks to improve their generalizability. This quantification can guide the design of harder tasks that better test a model's generalization.
3. Finally, we aim to develop new methods to enhance neural network generalization, particularly focusing on reducing the exponential number of training samples required for high-dimensional tasks. This involves creating algorithms and architectures that can learn effectively from limited data by incorporating stronger inductive biases. In particular, we will focus on two inductive biases in particular: 1) learning features of the training loss landscape correlated with generalization and 2) using modular neural network architectures. We expect that these techniques can improve generalization, particularly in high-dimensional tasks.
Together, these contributions aim to deepen our theoretical understanding and develop practical tools for enabling neural networks to generalize effectively from limited data.
Thesis Committee: Ila Fiete (supervisor), Leslie Kaelbling, Paul Liang