2016 Poster Sessions : Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

Student Name : Song Han
Advisor : William Dally
Research Areas: Computer Systems
Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources. To address this limitation, we first introduce "deep compression" to reduce the storage requirement of neural networks without affecting their accuracy. On the ImageNet dataset, our method reduced the storage required by AlexNet by 35x from 240MB to 6.9MB, VGG-16 by 49x from 552MB to 11.3MB, both with no loss of accuracy. Our compression method also facilitates the use of complex neural networks in mobile applications where application size and download bandwidth are constrained. This also allows fitting the model into on-chip SRAM cache rather than off-chip DRAM memory.

Song Han is a fourth year PhD student with Prof. Bill Dally at Stanford University. His research interest is computer architecture for deep learning. Currently his research is improving the energy efficiency of neural networks targeting mobile and embedded systems. He worked on model compression and hardware accelerator on the compressed model that fit state-of-the-art DNN models fully on-chip, which has been covered by TheNextPlatform and O'Reilly. Before joining Stanford, Song Han graduated from Institute of Microelectronics, Tsinghua University.