Steven Whang : 2010 InfoLab Workshop


Thursday, April 29, 2010
Location: Fisher Conference Center, Arrillaga Alumni Center

"Entity-Resolution: Beyond the Basics"


The goal of the SERF project is to develop a generic infrastructure for Entity Resolution (ER). ER (also known as deduplication, or record linkage) is an important information integration problem: The same "real-world entities" (e.g., customers, or products) are referred to in different ways in multiple data records. For instance, two records on the same person may provide different name spellings, and addresses may differ. The goal of ER is to "resolve" entities, by identifying the records that represent the same entity and reconciling them to obtain one record per entity. In our approach, the functions that "match" records (i.e. decide whether they represent the same entity) and "merge" them are viewed as black-boxes, which permits generic, extensible ER solutions. In this talk, I will present results on efficiently processing records and give an overview of the SERF project.


Steven Whang is a Computer Science Ph.D. candidate at Stanford University, advised by Prof. Hector Garcia-Molina. Steven received a B.S. in Computer Science from the Korea Advanced Institute of Science and Technology (KAIST) in 2003, an M.S. in Computer Science from Stanford University in 2007. Steven's current research interests include Entity Resolution and Data Privacy.