How much do deep neural networks know about natural scenes?
Description
A striking feature of biological vision is the ability to recognize new objects under a large variety of conditions after seeing only a single example. Recently, artificial neural networks have become highly successful in mimicking this skill, demonstrating remarkable generalization performance. First I will present our most recent results on exploiting this generalization power for bottom-up saliency modeling, i.e. for predicting where people look in images. Second, I will show two examples of how deep neural networks can be used for generative image modeling. In the first example our goal was to maximize the likelihood of the model on natural image data using a recurrent neural network. In the second example we are using a high-performing object recognition network for the generation of images. Together the examples indicate an important discrepancy between likelihood — which measures how much a model knows about natural scenes in an information theoretic sense — and the perceptual quality of images that are synthesized from a model. I conclude with a comparison of the two different tasks and question what benchmarking problems would be most useful for making progress on generative image modeling.