2013 Poster Sessions : Managing Interference for High Utilization and QoS in Modern Datacenters

Student Name : Christina Delimitrou
Advisor : Christos Kozyrakis
Research Areas: Computer Systems
An increasing amount of compute in now performed in the cloud, primarily due to the cost benefits, both for the end users and the datacenter (DC) operators. Ideally, cloud computing should offer high performance at low costs. Past efforts have attempted to achieve this through switching to commodity servers, reducing the power delivery and cooling infrastructure costs and building more and larger DCs. However, these techniques are reaching the end of the road. To further improve DC compute capabilities we must ensure that we are using efficiently the tens of thousands of servers in each large-scale DC. Examining typical server utilizations in modern DCs shows that they are notoriously low. While many reasons contribute to this (dynamic load, unknown applications, platform heterogeneity), the underlying cause for low utilization is the performance loss due to interference between co-scheduled applications. To improve utilization without hurting QoS, we must understand, reduce and manage interference in these large-scale platforms.

A large part of the potential for improving utilization lies in the system’s cluster manager, which orchestrates how applications are scheduled across servers. We present Paragon, a QoS-aware cluster manager for large-scale DCs. Paragon schedules applications to servers, taking into account interference between co-scheduled workloads and platform heterogeneity. To keep scheduling overheads low, Paragon does not rely on detailed application profiling. Instead it uses a minimal signal about a new application and leverages the large amount of information the system already has about previously-scheduled applications. It is designed as an online recommendation system, similar to e-commerce and Netflix, that classifies incoming applications and finds similarities between new and known workloads to determine where an application should run. Quasar, the resource manager in Paragon, determines how many resources the application should be allocated. Quasar also relies on robust collaborative filtering techniques to decide the minimum amount of resources an application needs to satisfy its QoS guarantees. We evaluate Paragon on 1,000 EC2 servers and show that it preserves QoS for most workloads while significantly improving system utilization.

Christina Delimitrou is a fourth year Ph.D. student in the Electrical Engineering Department at Stanford University. She works with Professor Christos Kozyrakis in the area of Computer Architecture and Computer Systems. Specifically she works on QoS-aware techniques for scheduling and resource management in large-scale datacenters. She is also interested in scalable techniques for application and system modeling. She is affiliated with EPIC (Efficiency and Proportionality In the Cloud) and SEDCL (Stanford Experimental Data Center Laboratory). Prior to coming to Stanford, she graduated with a diploma in Electrical and Computer Engineering from the National Technical University of Athens. More information in: http://www.stanford.edu/~cdel/