Skip to main content

Main navigation

  • About BCS
    • Mission
    • History
    • Building 46
      • Building 46 Room Reservations
    • Leadership
    • Employment
    • Contact
      • BCS Spot Awards
      • Building 46 Email and Slack
    • Directory
  • Faculty + Research
    • Faculty
    • Areas of Research
    • Postdoctoral Research
      • Postdoctoral Association and Committees
    • Core Facilities
    • InBrain
      • InBRAIN Collaboration Data Sharing Policy
  • Academics
    • Course 9: Brain and Cognitive Sciences
    • Course 6-9: Computation and Cognition
      • Course 6-9 MEng
    • Brain and Cognitive Sciences PhD
      • How to Apply
      • Program Details
      • Classes
      • Research
      • Student Life
      • For Current Students
    • Molecular and Cellular Neuroscience Program
      • How to Apply to MCN
      • MCN Faculty and Research Areas
      • MCN Curriculum
      • Model Systems
      • MCN Events
      • MCN FAQ
      • MCN Contacts
    • Research Scholars Program
    • Course Offerings
  • News + Events
    • News
    • Events
    • Recordings
    • Newsletter
  • Community + Culture
    • Community + Culture
    • Community Stories
    • Outreach
      • MIT Summer Research Program (MSRP)
      • Conferences, Outreach and Networking Opportunities
    • Post-Baccalaureate Research Scholars Program
    • Get Involved (MIT login required)
    • Resources (MIT login Required)
  • Give to BCS
    • Join the Champions of the Brain Fellows Society
    • Meet Our Donors

Utility Menu

  • Directory
  • Apply to BCS
  • Contact Us

Footer

  • Contact Us
  • Employment
  • Be a Test Subject
  • Login

Footer 2

  • McGovern
  • Picower

Utility Menu

  • Directory
  • Apply to BCS
  • Contact Us
Brain and Cognitive Sciences
Menu
MIT

Main navigation

  • About BCS
    • Mission
    • History
    • Building 46
    • Leadership
    • Employment
    • Contact
    • Directory
  • Faculty + Research
    • Faculty
    • Areas of Research
    • Postdoctoral Research
    • Core Facilities
    • InBrain
  • Academics
    • Course 9: Brain and Cognitive Sciences
    • Course 6-9: Computation and Cognition
    • Brain and Cognitive Sciences PhD
    • Molecular and Cellular Neuroscience Program
    • Research Scholars Program
    • Course Offerings
  • News + Events
    • News
    • Events
    • Recordings
    • Newsletter
  • Community + Culture
    • Community + Culture
    • Community Stories
    • Outreach
    • Post-Baccalaureate Research Scholars Program
    • Get Involved (MIT login required)
    • Resources (MIT login Required)
  • Give to BCS
    • Join the Champions of the Brain Fellows Society
    • Meet Our Donors

News

News Menu

  • News
  • Events
  • Newsletters

Breadcrumb

  1. Home
  2. News
  3. Demystifying the world of deep networks
February 28, 2020

Demystifying the world of deep networks

by
Kris Brewer | Center for Brains, Minds and Machines
Image
tommy-andrzej-qianli-00_0.png
MIT researchers (left to right) Qianli Liao, Tomaso Poggio, and Andrzej Banburski stand with their equations.

Introductory statistics courses teach us that, when fitting a model to some data, we should have more data than free parameters to avoid the danger of overfitting — fitting noisy data too closely, and thereby failing to fit new data. It is surprising, then, that in modern deep learning the practice is to have orders of magnitude more parameters than data. Despite this, deep networks show good predictive performance, and in fact do better the more parameters they have. Why would that be?

It has been known for some time that good performance in machine learning comes from controlling the complexity of networks, which is not just a simple function of the number of free parameters. The complexity of a classifier, such as a neural network, depends on measuring the “size” of the space of functions that this network represents, with multiple technical measures previously suggested: Vapnik–Chervonenkis dimension, covering numbers, or Rademacher complexity, to name a few. Complexity, as measured by these notions, can be controlled during the learning process by imposing a constraint on the norm of the parameters — in short, on how “big” they can get. The surprising fact is that no such explicit constraint seems to be needed in training deep networks. Does deep learning lie outside of the classical learning theory? Do we need to rethink the foundations?

In a new Nature Communications paper, “Complexity Control by Gradient Descent in Deep Networks,” a team from the Center for Brains, Minds, and Machines led by Director Tomaso Poggio, the Eugene McDermott Professor in the MIT Department of Brain and Cognitive Sciences, has shed some light on this puzzle by addressing the most practical and successful applications of modern deep learning: classification problems.

“For classification problems, we observe that in fact the parameters of the model do not seem to converge, but rather grow in size indefinitely during gradient descent. However, in classification problems only the normalized parameters matter — i.e., the direction they define, not their size,” says co-author and MIT PhD candidate Qianli Liao. “The not-so-obvious thing we showed is that the commonly used gradient descent on the unnormalized parameters induces the desired complexity control on the normalized ones.”

“We have known for some time in the case of regression for shallow linear networks, such as kernel machines, that iterations of gradient descent provide an implicit, vanishing regularization effect,” Poggio says. “In fact, in this simple case we probably know that we get the best-behaving maximum-margin, minimum-norm solution. The question we asked ourselves, then, was: Can something similar happen for deep networks?”

The researchers found that it does. As co-author and MIT postdoc Andrzej Banburski explains, “Understanding convergence in deep networks shows that there are clear directions for improving our algorithms. In fact, we have already seen hints that controlling the rate at which these unnormalized parameters diverge allows us to find better performing solutions and find them faster.”

What does this mean for machine learning? There is no magic behind deep networks. The same theory behind all linear models is at play here as well. This work suggests ways to improve deep networks, making them more accurate and faster to train.

Read the Original Article
Don't miss our next newsletter!
Sign Up

Footer menu

  • Contact Us
  • Employment
  • Be a Test Subject
  • Login

Footer 2

  • McGovern
  • Picower
Brain and Cognitive Sciences

MIT Department of Brain and Cognitive Sciences

Massachusetts Institute of Technology

77 Massachusetts Avenue, Room 46-2005

Cambridge, MA 02139-4307 | (617) 253-5748

For Emergencies | Accessibility

Massachusetts Institute of Technology