2013 Poster Sessions : Copysets: Reducing the Frequency of Data Loss in Cloud Storage

Student Name : Asaf Cidon
Advisor : Mendel Rosenblum
Research Areas: Computer Systems
Random replication is widely used in data center storage systems to prevent data loss. However, it is specifically vulnerable to the common scenario of correlated node failures due to cluster-wide power outages. Due to the high fixed cost of each incident of data loss, data center operators prefer to minimize the frequency of such events at the expense of losing more data in each event. We present Copyset Replication, a novel general-purpose replication technique that significantly reduces the frequency of data loss events. Storage systems require that each node’s data be scattered across several nodes for parallel data recovery and access. Copyset Replication presents an optimal tradeoff between the number of nodes on which the data is scattered and the probability of data loss. For example, in a 5000-node RAMCloud cluster under a power outage, Copyset Replication reduces the probability of any data loss from 99.99% to 0.15%. For Facebook’s HDFS cluster, it reduces data loss from 22.8% to 0.78%. We implemented and evaluated Copyset Replication on two open source data center storage systems, HDFS and RAMCloud, and show it incurs a low overhead on all operations.

Asaf Cidon is an Electrical Engineering PhD candidate at Stanford University, where he conducts research on data center and mobile systems under Professors Mendel Rosenblum and Sachin Katti. He worked at Google Israel in the web search team and previously served as a Product Manager for two start-ups. Prior to his studies, Asaf served as a team leader in an elite unit in the Israeli Intelligence Forces. He received his MS in Electrical Engineering from Stanford and BSc in Computer and Software Engineering Cum Laude from the Technion. He is a recipient of the Stanford Graduate Fellowship and Sohnis Promising Scientist Award.