The complex functions of a living cell are carried out through the coordinated activity of many genes. Cells achieve such coordination by tightly regulating the precise time and space in which essential biological processes occur. This regulation occurs at every level, including at the transcription level, in which the primary DNA sequence of a gene is copied (transcribed) into an mRNA sequence; at the translational level, in which the mRNA sequence of a gene is translated into its protein product; and at the post-translational level, in which the level of activity of each gene is controlled. This coordinated control is key to nearly all cellular activities. For example, the development of eukaryotic organisms such as ourselves depends on the establishment of complex patterns of gene activity (expression) at precise times and spatial locations, and many diseases such as cancer are caused by defects in such control mechanisms.
Our lab develops computational models aimed at understanding how the various molecular components interact to carry out increasingly complex functions, such as development. To achieve this goal, we integrate heterogeneous sources of genomic data into unified statistical models, based on the language of probabilistic graphical models, where the models are designed to capture the physical principles of the various interactions. Broadly put, our goal is to combine heterogeneous sources of genomic data, such as DNA sequence data and available expression data, and "reverse-engineer" the underlying regulation model of the underlying complex systems. We expect that our integrated computational framework will be useful and generally applicable and that the resulting models will provide profound insights into basic mechanisms and design principles of gene regulation.
Our current research is focused on three main areas:
The development of higher eukaryotes depends on the establishment of complex patterns of gene expression at precise times and spatial locations. The instructions for establishing such expression patterns are encoded in the DNA sequence by a regulatory network that specifies, for each gene, a small number of regulatory proteins responsible for controlling its expression. Despite many efforts, we are far from having mechanistic models that can explain how cells integrate the above components in order to execute precise expression programs. Even in the most well understood developmental system of the fruit fly Drosophila Melanogaster, where many of the key regulators have been identified, we are still missing several regulators and sequence elements involved, and no attempt was made at constructing a unified model that integrates all regulatory components (sequence, regulators, and expression). Similarly, although the cell cycle is among the most heavily studied cellular processes, the precise programs that control the activity of many of the genes are unknown, even in single cellular organisms such as yeast.
MicroRNAs are small non-coding RNA genes, about 22 nucleotides in length, that target the mRNA products of protein-coding genes to inhibit their translation. These recently discovered molecules were shown to play important roles in many biological functions, including development, cell proliferation, and cell death, and it is estimated that about 20% of all the protein-coding genes in the human genome are targeted by microRNAs. MicroRNAs are complementary to regions in the mRNA products of their target genes, and thus the interaction between MicroRNAs and mRNAs occurs through RNA:RNA hybridization. Although many attempts were made at understanding the nature of these interactions, we are still far from a level of understanding that would allow us to predict and model such interactions with high accuracy, and overall, very few validated interactions are known between microRNAs and target genes.
In addition to gene-specific regulatory signals, cells must also control the detailed molecular structure of the DNA within the nucleus, since eukaryotic DNA does not exist as a naked molecule but rather is highly compacted, into repeated protein-DNA nucleosome complexes, collectively known as chromatin. Since nucleosomes occlude the DNA they compact from access to most other DNA binding proteins, and since nucleosomes compact ~75-90% of the entire genomic DNA, they act as general repressors and hence have important implications for all aspects of gene regulation. To date, the driving forces that control the genomic positions of nucleosomes are unknown, and no attempt has been made at jointly modeling the interaction between chromatin and transcriptional regulation.