2010 Poster Sessions : Provenance-Based Refresh in Data-Oriented Workflows

Student Name : Robert Ikeda
Advisor : Jennifer Widom
Research Areas: Information Systems
We consider a general workflow setting in which input data sets are processed by a graph of transformations to produce output results. Our goal is to perform efficient selective refresh of elements in the output data, i.e., compute the latest values of specific output elements when the input data may have changed. Our approach is based on capturing one-level data provenance at each transformation when the workflow is run initially. Then at refresh time provenance is used to determine (transitively) which input elements are responsible for given output elements, and the workflow is rerun only on that portion of the data needed for refresh. The primary challenges are to formalize the problem setting and the problem itself, to specify properties of transformations and provenance that are required for efficient refresh, and to provide algorithms that apply to a wide class of transformations and workflows.

Robert Ikeda is a PhD student in Computer Science at Stanford University working on the Panda project led by Professor Jennifer Widom. His research interests include data provenance and information extraction. He received a B.S. in Electrical Engineering from UC San Diego.