Supplementary Data
Learn Motif
  From Unaligned Sequences
Download executable

Learn Motifs from Unaligned Sequences

This tool allows you to find motifs that are discriminatively enriched in a positive sequences set relative to a negative sequences set. The main novelty is that it can learn a "Feature Motif Model" (FMM) representation of the motif, capturing dependencies between different positions through di-nucleotide features (such as: "G at position 3 and T at position 9"). Mono-nucleotide features are also in use, thus the FMM formalism contains the PSSM one. FMMs are represented by a clear and intuitive logo, easily pointing out important di-nucleotide features. The height of a feature in the logo is linear to its expected occurrence. For a comprehensive description of the FMM and of this tool (the FMM Motif Finder) see Sharon & Lubliner et al., A Feature-Based Approach to Modeling Protein-DNA Interactions, PLoS Comput Biol, Aug. 2008. Note that standard PSSM models can be learned by the tool, without using the FMM formalism.

Online usage of this tool is limited to data size of up to 500 kb. To run on larger sets you may download our standalone executable and run it on your machine. The standalone version allows the customization of various run parameters.

The software uses weblogo to generate the PSSM logos, under this license.

Paste positive sequences (fasta format) Or upload: Paste input example

Use randomization of input sequences as negative set
Upload negative sequences

Consider reverse compliment of input sequences
Maximum number of Motifs to Learn:
Models to learn: FMM PSSM

For problems and questions, please e-mail Eran Segal