
Learning to see the physical world
Description
Human intelligence is beyond pattern recognition. From a single image, we're able to explain what we see, reconstruct the scene in 3D, predict what's going to happen, and plan our actions accordingly. In this talk, I will present our recent work on physical scene understanding---building versatile, data-efficient, and generalizable machines that learn to see, reason about, and interact with the physical world. Analogical to the nature vs. nurture debate in psychology, when building machine intelligence systems, we would also like to understand what needs to be built in, and how to learn to rest. The core idea behind my research is to exploit the generic, causal structure behind the world, including knowledge from computer graphics, physics, and language, and to integrate it with deep learning. Here, learning plays two major roles: first, it learns to invert causal models for efficient inference; second, it learns to augment them for powerful forward simulation. I'll focus on a few topics to demonstrate this idea: building scene representation for both object geometry and physics; learning expressive dynamics models for planning and control; perception and reasoning beyond vision.
Speaker Bio
Jiajun Wu is a Ph.D. student in Electrical Engineering and Computer Science at Massachusetts Institute of Technology. He received his B.Eng. from Tsinghua University in 2014. His research interests lie in the intersection of computer vision, machine learning, robotics, and computational cognitive science. His research has been recognized through the IROS Best Paper Award on Cognitive Robotics and fellowships from Facebook, Nvidia, Samsung, Baidu, and Adobe, and his work has been covered by major media outlets including CNN, BBC, WIRED, and MIT Tech Review.