Anshul Kundaje: 2017 Plenary Session


Tuesday, April 11, 2017
Location: McCaw Hall, Arrillaga Alumni Center

"Interpretable, Integrative Deep Learning for Decoding the Human Genome and Disease-Associated Genetic Variation"



The human genome contains the fundamental code that defines the identity and function of all the cell types and tissues in the human body. Genes are functional sequence units that encode for proteins. But they account for just about 2% of the 3 billion long human genome sequence. What does the rest of the genome encode? How is gene activity controlled in each cell type? Where do the regulatory control elements lie and what is their sequence composition? How do variants and mutations in the genome sequence affect cellular function and disease? These are fundamental questions that remain largely unanswered. The regulatory code that controls gene activity is made up complex genome sequence grammars representing hierarchically organized units of regulatory elements. These functional words and grammars are sparsely distributed across billions of nucleotides of genomic sequence and remain largely elusive. Deep learning has revolutionized our understanding of natural language, speech and vision. We strongly believe it has the potential to revolutionize our understanding of the regulatory language of the genome. We have developed deep learning frameworks to learn how genomic sequence encodes dynamic activation profiles of millions of experimentally measured regulatory genomic events across 100s of cell types and tissues. We have developed novel methods to interpret our models and extract local and global predictive patterns revealing many insights into the regulatory code. We demonstrate how our deep learning models can reveal the regulatory code that controls differentiation and identity of diverse blood cell types. Our models also allow us to predict the effects of natural and disease-associated genetic variation i.e. how differences in DNA sequence across healthy and diseased individuals are likely to affect molecular mechanisms associated with complex traits and diseases.


Anshul Kundaje is an Assistant Professor of Genetics and Computer Science at Stanford University. His primary research area is large-scale computational regulatory genomics. He specializes in developing statistical and machine learning methods for large-scale integrative analysis of heterogeneous, high-throughput functional genomic and genetic data to decipher regulatory elements and long-range regulatory interactions, learn predictive regulatory network models across individuals, cell-types and species and improve detection and interpretation of natural and disease-associated genetic variation. Previously as a postdoc at Stanford and Research Scientist at MIT, Anshul was the lead computational analyst of the ENCODE Project and the Roadmap Epigenomics Project. Anshul is also a recipient of the 2014 Alfred Sloan Fellowship.