2016 Poster Sessions : Visual Relationship Detection with Language Priors

Student Name : Ranjay Krishna
Advisor : Fei-Fei Li
Research Areas: Artificial Intelligence
Visual relationships capture a wide variety of interactions between pairs of objects in images (e.g. “man riding bicycle” and “man pushing bicycle”). Consequently, the set of possible relationships is extremely large and it is difficult to obtain sufficient training examples for all possible relationships. Because of this limitation, previous work on visual relationship detection has concentrated on predicting only a small set of relationships. Even though most relationships are infrequent, their objects (e.g. “man” and “bicycle”) and predicates (e.g. “riding”and “pushing”) independently occur more frequently. We propose a model that uses this insight to train visual models for objects and predicates individually and later combines them together to predict multiple relationships per image. We improve on prior work by leveraging language priors from semantic word embeddings to finetune the likelihood of a predicted relationship. Our model can scaleto predict thousands of types of relationships from a few examples. Additionally, we localize the objects in the predicted relationships as bounding boxes in the image. We further demonstrate that understanding relationships can improve content based image retrieval.

Ranjay Krishna is a PhD student at Stanford co-advised by Professor Fei-Fei Li and Professor Michael Bernstein. His research interests lie at the intersection of computer vision, machine learning and human computer interaction. He is exploring ways of building models that can effectively utilize human intelligence to understand interactions between objects in images and perform other cognitive computer vision tasks. He is currently leading the Visual Genome project, which aims at creating a knowledge representation of the visual world.