2017 Poster Sessions : Linear Regression with Shuffled Labels

Student Name : Abubakar Abid
Advisor : Ada Poon
Research Areas: Information Systems
Is it possible to apply machine learning to datasets whose inputs are not in the same order as their outputs? We answer this question by studying the problem of inferring the weights of a noisy linear model, in which the output labels are additionally shuffled by an unknown permutation. We prove that the analog of the ordinary least squares estimator is inconsistent in this setting, and introduce a consistent estimator based on the self-moments of the input features and labels. We study the regimes in which each estimator excels, and generalize the estimators to the setting where partial ordering information is available in the form of multiple independent experiments. The result is a framework that enables robust recovery of the weights, as we demonstrate by extensive experiments on both synthetic and standard datasets. This demonstrates that inference in the absence of complete ordering information is possible and can be of practical interest, particularly in experiments that characterize populations of particles, such as cytometry.

Abubakar is a 1st-year PhD student at Stanford in the Department of Electrical Engineering. His research interests include machine learning theory and algorithms, with particular application to datasets in medicine and biology. Abubakar is a recipient of the Stanford Graduate Fellowship and is a 2016 Paul and Daisy Soros Fellow.