2012 Poster Sessions : Partitioned Multi-Indexing: Bringing Order to Social Search

Student Name : Bahman Bahmani
Advisor : Ashish Goel
Research Areas: Information Systems
To answer search queries on a social network rich with user-generated content, it is desirable to give a higher ranking to content that is closer to the individual issuing the query. Queries occur at nodes in the network, documents are also created by nodes in the same network, and the goal is to find the document that matches the query and is closest in network distance to the node issuing the query. In this paper, we present the ``Partitioned Multi-Indexing'' scheme, which provides an approximate solution to this problem. With $m$ links in the network, after an offline $tilde{O}(m)$ pre-processing time, our scheme allows for social index operations (i.e., social search queries, as well as insertion and deletion of words into and from a document at any node), all in time $tilde{O}(1)$. Further, our scheme can be implemented on open source distributed streaming systems such as Yahoo! S4 or Twitter's Storm so that every social index operation takes $tilde{O}(1)$ processing time and network queries in the worst case, and just two network queries in the common case where the reverse index corresponding to the query keyword is much smaller than the memory available at any distributed compute node.

Building on Das Sarma et al.'s approximate distance oracle, the worst-case approximation ratio of our scheme is $tilde{O}(1)$ for undirected networks. Our simulations on the social network Twitter as well as synthetic networks show that in practice, the approximation ratio is actually close to 1 for both directed and undirected networks. We believe that this work is the first demonstration of the feasibility of social search with real-time text updates at large scales.

Bahman is a PhD student working with Prof. Ashish Goel. His research interests are in the algorithmic aspects of massive scale data applications, including search, personalization, recommender systems, and large scale data mining.