2012 Poster Sessions : Understanding Actions and Interactions in Range and Video Data

Student Name : Benjamin Packer
Advisor : Daphne Koller
Research Areas: Artificial Intelligence
Understanding natural human activity involves not only identifying the action being performed, but also locating the semantic elements of the scene and describing the person's interaction with them. We present a system that is able to recognize complex, fine-grained human actions involving the manipulation of objects in realistic action sequences. Our method takes advantage of recent advances in sensors and pose trackers in learning an action model that draws on successful discriminative techniques while explicitly modeling both pose trajectories and object manipulations. By combining these elements in a single model, we are able to simultaneously recognize actions and track the location and manipulation of objects. To showcase this ability, we introduce a novel Cooking Action Dataset that contains video, depth readings, and pose tracks from a Kinect sensor. We show that our model outperforms existing state of the art techniques on this dataset as well as a recently produced action dataset with only video sequences.

Ben Packer is a Ph.D. student in the Artificial Intelligence lab of the Computer Science Department at Stanford University. His research focuses on using probabilistic methods for high-level scene understanding in computer vision applications. He received his Bachelor's and Master's Degrees from the University of Pennsylvania in 2004.