Drug targets for which there is human data (e.g., genetics) that links them to the disease are more likely to successfully complete clinical development and be approved as new drugs. However, the surmountable challenge of assembling large scale human cohorts has limited the collection of such data to national health organizations, and even these cohorts provide limited phenotyping and omics data due to the high cost of the tests.
To address this challenge, we initiated Project 10K, a large-scale, longitudinal, deeply phenotyped, multi-omics human cohort, that our lab is collecting. We aim to find novel diagnostic, prognostic, and therapeutic biomarkers for diseases, based on applying state of the art machine learning methods to deep phenotypic and multi-omics measurements of 10,000 human volunteers over a 10-year period.
The goals of Project 10k include:
- Create the most deeply phenotyped human cohort globally
- Develop personalized algorithms that accurately predict the likelihood of a person to developing a particular condition or disease within 5-10 years
- Obtain molecular characterization of diseases on several multi-omics levels
- Identify novel disease therapeutic targets
- Develop machine learning algorithm and tools to model disease continuum and progression
We apply machine learning methods to both the baseline variation in disease risk and the longitudinal data in order to identify novel therapeutic targets as well as means to modulate them by dietary, lifestyle, microbiome, and small molecules. We also devise algorithms for predicting future onset of various diseases based on baseline measurements.
Big Data in Healthcare
Health data are increasingly being generated at a massive scale, at various levels of phenotyping and from different types of resources. We are using nationwide electronic health record data on millions of individuals from several countries, with the aim of developing machine learning algorithms for predicting future onset of disease, identifying causal drivers of disease, and unraveling personalized responses to drugs. We aim to understand health trajectories of different people, how they unfold along different pathways, how the past affects the present and future health, and the complex interactions between different determinants of health over time.
Analyses of such large-scale medical data have the potential to identify new and unknown associations, patterns and trends in the data that may pave the way to scientific discoveries in pathogenesis, classification, diagnosis, treatment and progression of disease. Such work includes using the data for constructing computational models to accurately predict clinical outcomes and disease progression, which have the potential to identify people at high risk and prioritize them for early intervention strategies, and to evaluate the influence of public health policies on ‘real-world’ data. Our prediction of gestational diabetes is one such example.
We use state of the art data science methods to analyze these large datasets, including:
- Descriptive analysis. Such approaches are useful for unbiased exploratory study of the data and for finding interesting patterns in the data, which may lead to testable hypotheses.
- Prediction analysis. Prediction analysis aims to learn a mapping from a set of inputs to some outcome of interest, such that the mapping can later be used to predict the outcome from the inputs in a different unseen set. Prediction analysis holds the potential for improving disease diagnostic and prognostic.
- Counterfactual prediction. One major limitation of any observational study is its inability to answer causal questions, as observational data may be heavily confounded and contain other limiting flaws. Counterfactual prediction thus aims to construct models that address limiting flaws inherent to observational data for inferring causality.
Microbiome in Health and Disease
Another rich source of information with the potential to contain pertinent disease risk factor data is the human microbiome – the collective genome of trillions of microbes, including bacteria, viruses, fungi, and parasites that reside in the human gastrointestinal tract. The microbiome contains 100-fold more genes than the human genome, and is considered a bona-fide ‘second genome’ with fundamental roles in multiple aspects of human physiology and health, including obesity, non-alcoholic fatty liver disease, inflammatory diseases, cancer, metabolic diseases, cardiovascular disease, aging, and neurodegenerative disorders. As such, it should capture different aspects of disease than existing risk factors, and their combination can lead to earlier and more robust disease detection. However, very few microbiome-based markers predictive of disease onset and progression were found to date and none are currently used by healthcare systems. Thus, discovery of microbiome-based risk factors is a promising yet mostly unexplored research area.
Growing evidence supports a causal role for the microbiome in obesity, diabetes, metabolic disorders, cardiovascular disease, and immune-mediated disease. For example, transplanting microbiota from human subjects discordant for obesity into germ-free mice induced the corresponding phenotype in the recipient mice, and cohousing mice harboring the obese human microbiota with mice harboring the lean human microbiota prevented obesity. Atherosclerosis susceptibility was also shown to be transmitted by gut microbiota transfer. We previously showed that weight gain and glucose intolerance are induced in recipient mice following transplantation of microbiota from mice that either consumed artificial sweeteners, had a history of obesity, or had altered feeding patterns or host mutations in circadian genes. We also showed that microbiota transplantations in human, improved clinical outcomes in subjects with Atopic Dermatitis, a severe skin disease.
Our goal is to find novel disease risk factors based on the human microbiome that are more accurate than existing ones in their ability to predict the likelihood of a person to develop a particular condition or disease within 5-10 years. We work numerous conditions including diabetes, cardiovascular disease, obesity, inflammatory bowel disease, fatty liver disease, multiple sclerosis, Alzheimer’s disease, Parkinson’s disease, and cancer. In each setting we collaborate with clinicians to assemble cohorts for which we obtain clinical profiles and microbiome data. We develop algorithms using microbiome features at recruitment time for unravelling the role of the microbiome in each of these conditions.
Our research identifies microbiome-derived features that are predictive of disease and that may be causal for disease, paving the way towards diagnostic and prognostic microbiome applications and towards microbiome-based therapeutics.