2017 Poster Sessions : Why Information-Directed Approaches Should be Better than Thompson Sampling and Optimism?

Student Name : Jonathan Lacotte
Advisor : Benjamin Van Roy
Research Areas: Information Systems
Reinforcement learning addresses the critical problem of balancing well between the exploration of an unknown environment and its exploitation. Beyond naive approaches such as epsilon-greedy, Thompson sampling and optimism are two of the most popular design principles. However, even in simple settings such as linear bandits, they fail to consider the specific information structure of a problem and are suboptimal. Therefore, we present some recent algorithms that offer a promising direction to overcome such limitations. They aim at acquiring relevant information over the problem at each time step, while minimizing the expected regret. We discuss their design and some statistical guarantees in bandit learning.

Jonathan Lacotte is currently a first-year PhD student in Electrical Engineering. He is interested in reinforcement learning, design principles for exploration and, in particular, how to exploit the specific information structure of a problem to offer efficient algorithms. He is fortunate to have worked with Pr. Benjamin Van Roy, and to be currently rotating with Pr. Emma Brunskill.

Prior to Stanford, Jonathan has completed his undergraduate studies at the Ecole Polytechnique in France. Thereafter, he has earned the Part III in Mathematics from Cambridge University. He also worked with Pr. Laurent El Ghaoui at UC Berkeley, on optimization and topic detection in large text corpora.