2010 Poster Sessions : Sampling Propagation Cascades

Student Name : Maria Montserrat Medina Martinez, Eldar Sadikov
Advisor : Jure Leskovec
Research Areas: Artificial Intelligence, Computer Systems
When information, influence, or recommendations diffuse in a social network, one is often interested in the propagation cascades. For example, a cascade is formed between a group of people who influenced each other to buy a product or join a trend. However, information about who and when performed a particular action may not be fully available due to the privacy concerns or scaling issues. Social networks often provide only a sample of the full data. As a result, cascades may be perturbed. In this work, we study how sampling affects properties of cascades, such as their size, depth, number and size of the connected components. We first analyze cascades by themselves (without the network) showing how their properties change as a function of the sampling rate. Then, we proceed to studying cascades in the context of synthetic networks: Erdos-Renyi, Power-Law, and Forest-Fire networks. Finally, we simulate cascades on real Twitter network with more than 60 million nodes. We find interesting interplay between sampling and the network structure. We conclude by looking at real cascades formed by retweets on Twitter and how their sampling effects the observed properties of these cascades. We show with our results how the true (unbiased by sampling) properties of real cascades can be recovered from the sampled data.

Eldar Sadikov is a 3rd year Ph.D. student in the Computer Science Department of Stanford University. He is advised by Hector Garcia-Molina and co-advised by Jure Leskovec. Eldar has previously worked on search engine query log mining and information provenance. His current interests lie primarily in social networks and information diffusion over them.

Montse Medina is a 2nd year Ph.D. student at the Institute of Computational and Mathematical Engineering of Stanford University. She is advised by Pat Hanrahan. She is working on Liszt, domain specific language for scientific computing. Montse's broad interests include scientific and parallel computing and data mining. Montse holds an MS in Aeronautics and Astronautics from Stanford University.