Main
Predict Motif
Download Executable
Implementation Notes
FAQs

Predict a motif for your own set of RNAs


This executable allows you to identify and score RNA motifs in sets of RNAs. It is provided as an executable wrapped in Perl scripts, as described below.



NOTE:THIS SOFTWARE IS FREELY AVAILABLE FOR ACADEMIC USE ONLY, and is distributed under the The GNU Lesser General Public License.


Requirements

  1. The executable is compiled for a 64 bit Linux machine. It is wrapped by Perl scripts.
  2. The executable uses 2 RNA folding packages:
    • The Vienna RNA package, which is developed by Ivo Hofacker at the Institute for Theoretical Chemistry of the University of Vienna.
      Latest version, Copyright information
    • The CONTRAfold software, see Do, C.B., Woods,D.A., and Batzoglou, S. (2006) CONTRAfold: RNA Secondary Structure Prediction without Energy-Based Models. Bioinformatics, 22(14): e90-e98. Under the BSD license.

Versions History

Last version is available here for download.
  • Executable version 1 (01-Jul-08): Initial release.
  • Executable version 2 (20-Apr-09): Added new script: rnamotifs08_motif_match.pl and fixed some bugs
  • Executable version 3 (04-Jan-10): Fixed a bug (introduced in a minor release) that produced wrong png-s.

Download Instructions

  1. Download the archive file into a folder of your choice, 64-bit version: .
  2. Unzip and extract all its files (tar xvfz 64bit_exe_rnamotifs08_motif_finder.tar.gz).
  3. Type make install

Usage


The main script is rnamotifs08_motif_finder.pl, type: rnamotifs08_motif_finder.pl --help to see its usage.

The only mandatory parameter the script expects is:
  • Positive sequences - a fasta format file containing the sequences to predict motifs on.

The download zip file contains example files:
  • input_pos_seq.fa - example sequences file
  • input_pos_struct.tab - example positive structures file
  • Output - directory containing output files
These output files were produced by this command:
./rnamotifs08_motif_finder.pl -positive_seq input_pos_seq.fa -output_dir Output


After learning a motif, you can search a database of sequences to find positions that match the motif you learned. To do that you need to first match a secondary structure to each of the input sequences in your database, either using existing structure prediction algorithms, or using some other information. The database is then specified in the following format: <id> <sequence> <structure>
The motif itself is saved in a .cm file (that is produced as output from the learning algorithm), and given as a parameter to the search (the -cm option). The output of this is in the format: <id> <score>
That is, for each of the database sequences, the likelihood score of finding the motif in this sequence is calculated. It is also possible to calculate the actual motif position for each of the database sequences. Here the algorithm will search for the **best** match of the motif in each database sequence, and will output it's position, its likelihood score, its structure and its sequence.

For example:

  1. likelihood:

    rnamotifs08_motif_match.pl database.tab -cm model.cm
    Will produce a likelihood score for each sequence in the database:

    
    seq_2:0    19.7698
    seq_7:0    19.3706
    seq_3:0    19.1064
    seq_1:0    18.073
    seq_5:0    16.5508
    seq_9:0    14.5906
    seq_4:0    10.3685
    seq_10:0   9.15077
    seq_6:0    6.81294
    seq_8:0    0.233537

  2. positions:

    rnamotifs08_motif_match.pl database.tab -cm model.cm -b
    Will produce a likelihood score for the best motif position in each sequence in the database, and the position itself:

    
    seq_2:0    19.7698    33      48      UUCAACAGUGUUUGGA        (((((......)))))        <<<<<,,,,,,>>>>>
    seq_7:0    19.3706    104     119     GGGAGCAGUGUCUUCC        (((((......)))))        <<<<<,,,,,,>>>>>
    seq_3:0    19.1064    16      31      GUCCUCAGUGCAGGGC        (((((......)))))        <<<<<,,,,,,>>>>>
    seq_1:0    18.073     30      52      GACGUUCUUCGCCGAGAGUCGUC (((((((((....))))).)))) <<<<<<<<<,--->>>>>,>>>>
    seq_5:0    16.5508    104     119     AGCUACAGUGUUAGCU        (((((......)))))        <<<<<,,,,,,>>>>>
    seq_9:0    14.5906    32      47      GAGCCAGUGUGUUUCU        ((((......))))..        <<<-,,,,,,->>>,,
    seq_4:0    10.3685    7       21      UUGUCAGUGCACAAA         ((((......)))).         <<<<,,,,,,>>>>,
    seq_10:0   9.15077    133     152     CAACCUCCACCUUCUGGGUU    .(((((.........)))))    ,<----,,,,,,,,,---->
    seq_6:0    6.81294    1       16      UAUGGAGAUUUCCAUA        (((((......)))))        <<<<<,,,,,,>>>>>
    seq_8:0    0.233537   95      115     ACACCCCAGCCCUGCAGUGUA   ((((..((....))..)))).   <<<<,,--,,,,---->>>>,
    


For troubleshooting and FAQ please visit our executables FAQ page.