2017 Poster Sessions : Conservative Contextual Linear Bandits

Student Name : Abbas Kazerouni
Advisor : Benjamin Van Roy
Research Areas: Information Systems
Safety is a desirable property that can immensely increase the applicability of learning algorithms in real-world decision-making problems. It is much easier for a company to deploy an algorithm that is safe, i.e., guaranteed to perform at least as well as a baseline. In this paper, we study the issue of safety in contextual linear bandits that have application in many different fields including personalized ad recommendation in online marketing. We formulate a notion of safety for this class of algorithms. We develop a safe contextual linear bandit algorithm, called conservative linear UCB (CLUCB), that simultaneously minimizes its regret and satisfies the safety constraint, i.e., maintains its performance above a fixed percentage of the performance of a baseline strategy, uniformly over time. We prove an upper-bound on the regret of CLUCB and show that it can be decomposed into two terms: 1) an upper-bound for the regret of the standard linear UCB algorithm that grows with the time horizon and 2) a constant (does not grow with the time horizon) term that accounts for the loss of being conservative in order to satisfy the safety constraint. We empirically show that our algorithm is safe and validate our theoretical analysis.

I am a 4th year PhD student at Stanford University pursuing my degree under joint supervision of Lawrence M. Wein and Benjamin Van Roy. My research lies at the intersection of Reinforcement Learning and Operation Management with applications in Healthcare and Economics. Specifically, I use Statistical Machine Learning techniques to design algorithms that can learn to make decisions under uncertainty.

I have earned a MSc degree in Electrical Engineering from Stanford University in 2015. Before joining Stanford, I received a BSc in Electrical Engineering and a BSc in Mathematics from Sharif University of Technology in 2013.