Jennifer Widom : 2010 InfoLab Workshop


Thursday, April 29, 2010
Location: Fisher Conference Center, Arrillaga Alumni Center

"Panda: A System for Provenance and Data"


Panda (for Provenance and Data) is a new project whose goal is to develop a general-purpose system for modeling, capturing, storing, exploiting, and querying data provenance in a wide range of applications. Abstractly, provenance (also referred to as lineage) describes where data came from and how it has been processed over time. Our specific plans for Panda are based on fully integrating data-based and process-based provenance, while spanning the spectrum from fine-grained to coarse-grained provenance. We envision a set of built-in operators for exploiting provenance after it has been captured, and an ad-hoc declarative query language over provenance together with data. We expect many optimization problems to present themselves, including approximation schemes, and lazy versus eager provenance capture, storage, and evaluation. In addition to outlining our vision and plans for the project, this talk will touch on some concrete results to date, including provenance-based refresh in data-oriented workflows, and version 0.1 of the Panda prototype.

Jennifer Widom is the Fletcher Jones Professor and Chair of the Computer Science Department at Stanford University. She received her Bachelors degree from the Indiana University School of Music in 1982 and her Computer Science Ph.D. from Cornell University in 1987. She was a Research Staff Member at the IBM Almaden Research Center before joining the Stanford faculty in 1993. Her research interests span many aspects of nontraditional data management. She is an ACM Fellow and a member of the National Academy of Engineering and the American Academy of Arts & Sciences; she received the ACM SIGMOD Edgar F. Codd Innovations Award in 2007 and was a Guggenheim Fellow in 2000; she has served on a variety of program committees, advisory boards, and editorial boards.