2011 Poster Sessions : Identifying Human-Object Interaction in Range and Video Data

Student Name : Benjamin Packer
Advisor : Daphne Koller
Research Areas: Artificial Intelligence
The goal of this work is to understand actions that involve human-object interaction over time in realistic settings. We take advantage of recent progress in sensor technology and corresponding pose tracking systems to allow us to reasonably estimate both human pose and object location so that we can model human-object interactions in action sequences. We propose a generative model for understanding the interactions between humans and objects while performing an action over time. The model uses
``interaction primitives'' such as ``in the hand'' and ``object moving farther from the foot'' to represent a variety of actions in
which an object is manipulated. This generative representation
allows not only for recognition of actions in sequences, but unlike
a discriminative method, it is also able to provide improved
localization of the object of interest. We introduce a new Human-Object Interaction Dataset of sensor and video sequences of human-object actions using a Kinect depth sensor, and show that the model that we learn is able to achieve both action recognition and object localization performance that improves over methods that do not use an understanding of human-object interaction.

Ben Packer is a Ph.D. student in the Artificial Intelligence lab of the
Computer Science Department at Stanford University. His research focuses on using probabilistic methods for high-level scene understanding in computer vision applications. He received his Bachelor's and Master's Degrees from the University of Pennsylvania in 2004.