Predict expression online
The paper
Download data and code

The following papers describe the model learning methodology and the web-based prediction tool:


Robust Prediction of Expression Differences Among Human Individuals Using Only Genotype Information (PLoS Genetics, 2013)

Many genetic variants that are significantly correlated to gene expression changes across human individuals have been identified, but the ability of these variants to predict expression of unseen individuals has rarely been evaluated. Here, we devise an algorithm that given training expression and genotype data for a set of individuals, predicts the expression of genes of unseen test individuals given only their genotype in the local genomic vicinity of the predicted gene. See more

Notably, the resulting predictions are remarkably robust in that they agree well between the training and test set, even when the training and test set consist of individuals fr\ om distinct populations. Thus, although the overall number of genes that can be predicted is relatively small, as expected from our choice to ignore effects such as environmenta\ l factors and trans sequence variation, the robust nature of the predictions means that the identity and quantitative degree to which genes can be predicted is known in advance.\ We also present an extension that incorporates heterogeneous types of genomic annotations to differentially weigh the importance of the various genetic variants, and show that \ assigning higher weights to variants with particular annotations such as proximity to genes and high regional G/C content can further improve the predictions. Finally, genes tha\ t are successfully predicted have, on average, higher expression and more variability across individuals, providing insight into the characteristics of the types of genes that c\ an be predicted from their cis genetic variation.

See less

A web tool for predicting gene expression levels from single nucleotide polymorphisms (submitted)

Understanding the effects of single nucleotide polymorphisms (SNPs) on human health is an important goal. In genome-wide association studies (GWAS), SNPs are tested for association with a given disease. This results in many such associated SNPs, yet in most cases the gene or process that they affect is unknown. Expression quantitative trait loci (eQTL) studies examine the correlation of SNPs with the expression profile of a specific gene. However, combinations of SNPs are typically not tested in these studies, and thus, their ability to predict the expression of a gene in a test individual is not clear. We recently published a study in which we devised a multi-SNP predictive model for gene expression in lymphoblastoid cell lines (LCL), and showed that it can robustly predict the expression of a small number of genes in test individuals. Here, we validate the generality of our models by predicting expression profiles for genes in LCL in an independent study, and also extend the pool of predictable genes to 232 genes across 14 different cell types. As the number of people who obtained their SNP profiles through companies such as 23andMe is rising rapidly, we developed GenoExp, a web-based tool in which users can upload their individual SNP data and obtain predicted expression levels for our set of predictable genes across the 14 different cell types. Our tool thus allows users with biological knowledge to study the possible effects that their set of genes predicted to be over- and under-expressed may have.