2011 Poster Sessions : Efficient Super Computing

Student Name : Vishal Parikh
Advisor : William Dally
Research Areas: Computer Systems
Computing systems today are power limited and undergoing an unprecedented shift toward parallelism. Currently, commodity processors have six to eight cores and vendor roadmaps show the number of cores doubling every 18 months for the next decade. This shift toward multi-core processors is driven by two factors: instruction-level parallelism (ILP) reaching its practical limit, and the end of voltage scaling. Without voltage scaling, each technology becomes more power dense than the previous, making energy efficiency (operations per Joule) across the chip a major issue. One source of energy overhead is data and instruction supply, exacerbated by complex cache hierarchies and coherence protocols.

The Efficient Supercomputing (ESC) project aims to significantly increase the energy efficiency and scalability of many core chips running high performance applications. Currently, the energy spent doing a floating point add, for example, is an order of magnitude less than the energy necessary to execute a floating add instruction. This overhead and cache hierarchies and protocol complexity will increase as more cores are added to a system. Our goal is to reduce this overhead in the memory subsystem by exposing existing architectural features and adding new features while maintaining roughly the same programming model that is used in scientific computing today. The ESC architecture maintains a global, shared address space with novel hardware coherence techniques, but exposes more of the memory hierarchy than is currently typical. We support block, stride, and gather operations within the memory hierarchy. Also, we add active messaging support for the relocation of instructions instead of data. This allows for efficient atomic operations and thread creation. Exposing the concept of processor locality enables threads that share the same data to be colocated, minimizing communication costs. Through these features, we hope to significantly reduce the energy per flop, while maintaining programmability.

Vishal Parikh is a 4th year PhD Student in the Concurrent VLSI Architecture group under Bill Dally. Vishal is interested in energy efficient processing and scalable memory systems. Vishal completed his MSEE from Stanford in 2008 and is BSEE and BA in Math from the University of Texas at Austin in 2006.