Supplementary Data
Learn Motif
  From Unaligned Sequences
Download executable

Suppplemental info for:

A Feature-Based Approach to Modeling Protein-DNA Interactions

Eilon Sharon1*, Shai Lubliner1*, Eran Segal12†

Transcription factor (TF) binding to its DNA target site is a fundamental regulatory interaction. The most common model used to represent TF binding specificities is a position specific scoring matrix (PSSM), which assumes independence between binding positions. In many cases, this simplifying assumption does not hold. Here, we present feature motif models (FMMs), a novel probabilistic method for modeling TF-DNA interactions, based on Markov networks. Our approach uses sequence features to represent TF binding specificities, where each feature may span multiple positions. We develop the mathematical formulation of our model, and devise an algorithm for learning their structural features from binding sites data. We also developed a discriminative motif finder, which discovers de novo FMM that are enriched in target sequences set compared to a background set. We evaluate our approach on synthetic data, and on the widely used TF chromatin immunoprecipitation (ChIP) dataset of Harbison et. al. [1]. We then apply our algorithms to high-throughput TF chromatin immunoprecipitation data from mouse and human, and reveal sequence features that are present in the binding specificities of mouse and human TFs, and show that FMMs explain TF binding significantly better than PSSMs.

1 Dept. of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel.
2 Department of. Molecular Cell Biology, Weizmann Institute of Science, Rehovot, 76100, Israel.
* These authors contributed equally to this work.
Correspondence should be addressed to E.S.