2015 Data Science Workshop


Wed, April 29, 2015
Location: Fisher Conference Center, Arrillaga Alumni Center

"Data Science for Personalized Health"


Recent technological advances have enabled collection of diverse health data at an unprecedented level. Omics information of genomes, transcriptomes, proteomes and metabolomes, DNA methylomes, and microbiome as well as electronic medical records and data from sensors and wearable devices provide detailed view of disease state, physiological, and behavioral parameters at the individual level. Availability of such massive-scale digital footprint of an individual’s health opens the door to numerous opportunities for monitoring and accurately predicting the individual’s health outcomes in addition to customizing treatments at individual level, hence realizing the goal of personalized medicine. A major challenge is how to efficiently collect, store, secure and most importantly, analyze such massive-scale and highly private data so that accuracy of outcome predictions and treatment analysis is not impacted. The “Data Science for Personalized Health” flagship project will design a system that will address this challenge and validate it on several personalized medicine tasks. Specifically we will 1) devise new algorithms for sampling, for imputation of missing data and for joint processing of multiple measurements; 2) build novel frameworks to house and manage complex data in a useful and secure fashion; 3) devise new tools for the analysis of and the prediction from high dimensional, complex, longitudinal data. Using a unique dataset on 70 pre-diabetic participants, we devise a personalized and highly accurate early detection method for diabetes and analyze the consequences of weight change, physical activity, stress, and respiratory viral infection on individuals’ digital health footprint and ultimately predict the effect of such perturbations on individuals’ health outcomes. The research is led by an interdisciplinary team of faculty with expertise in medicine, genetics, machine learning, security and information theory, and the tools developed will be of broad interest to other data science problems as well.