2017 Poster Sessions : Understanding Black-box Predictions via Influence Functions

Student Name : Pang Wei Koh
Advisor : Percy Liang
Research Areas: Artificial Intelligence
How can we explain the predictions of a black-box model? In this paper, we use influence functions -- a classic technique from robust statistics -- to trace a model's prediction through the learning algorithm and back to its training data, identifying the points most responsible for a given prediction. Applying ideas from second-order optimization, we scale up influence functions to modern machine learning settings and show that they can be applied to high-dimensional black-box models, even in non-convex and non-differentiable settings. We give a simple, efficient implementation that requires only oracle access to gradients and Hessian-vector products. On linear models and convolutional neural networks, we demonstrate that influence functions are useful for many different purposes: to understand model behavior, debug models and detect dataset errors, and even identify and exploit vulnerabilities to adversarial training-set attacks.

Pang Wei Koh is a first-year CS PhD student, advised by Percy Liang. His research interests are in machine learning and its application to biomedical problems. Before grad school, Pang Wei was an early employee at Coursera and received his BS and MS degrees in CS from Stanford, where he worked with Andrew Ng, Anshul Kundaje, and Daphne Koller on deep learning and computational biology.