|
|
Predict a motif for your own set of RNAs
This executable allows you to identify and score RNA motifs in sets of RNAs. It is provided as an executable wrapped in Perl scripts, as described below.
Requirements
- The executable is compiled for a 64 bit Linux machine. It is wrapped by Perl scripts.
- The executable uses 2 RNA folding packages:
- The Vienna RNA package, which is developed by Ivo Hofacker at the Institute for Theoretical Chemistry of the University of Vienna.
Latest version, Copyright information
- The CONTRAfold software, see Do, C.B., Woods,D.A., and Batzoglou, S. (2006) CONTRAfold: RNA Secondary Structure Prediction without Energy-Based Models. Bioinformatics, 22(14): e90-e98. Under the BSD license.
Versions History
Last version is available here for download.
- Executable version 1 (01-Jul-08): Initial release.
- Executable version 2 (20-Apr-09): Added new script: rnamotifs08_motif_match.pl and fixed some bugs
- Executable version 3 (04-Jan-10): Fixed a bug (introduced in a minor release) that produced wrong png-s.
Download Instructions
- Download the archive file into a folder of your choice, 64-bit version:
.
- Unzip and extract all its files (tar xvfz 64bit_exe_rnamotifs08_motif_finder.tar.gz).
- Type make install
Usage
The main script is rnamotifs08_motif_finder.pl, type: rnamotifs08_motif_finder.pl --help to see its usage.
The only mandatory parameter the script expects is:
- Positive sequences - a fasta format file containing the sequences to predict motifs on.
The download zip file contains example files:
- input_pos_seq.fa - example sequences file
- input_pos_struct.tab - example positive structures file
- Output - directory containing output files
These output files were produced by this command: ./rnamotifs08_motif_finder.pl -positive_seq input_pos_seq.fa -output_dir Output
After learning a motif, you can search a database of sequences to find positions that match the motif you learned. To do that you need to first match a secondary structure to each of the input sequences in your database, either using existing structure prediction algorithms, or using some other information. The database is then specified in the following format: <id> <sequence> <structure> The motif itself is saved in a .cm file (that is produced as output from the learning algorithm), and given as a parameter to the search (the -cm option). The output of this is in the format: <id> <score>
That is, for each of the database sequences, the likelihood score of finding the motif in this sequence is calculated. It is also possible to calculate the actual motif position for each of the database sequences. Here the algorithm will search for the **best** match of the motif in each database sequence, and will output it's position, its likelihood score, its structure and its sequence.
For example:
- likelihood:
rnamotifs08_motif_match.pl database.tab -cm model.cm
Will produce a likelihood score for each sequence in the database:
seq_2:0 19.7698
seq_7:0 19.3706
seq_3:0 19.1064
seq_1:0 18.073
seq_5:0 16.5508
seq_9:0 14.5906
seq_4:0 10.3685
seq_10:0 9.15077
seq_6:0 6.81294
seq_8:0 0.233537
- positions:
rnamotifs08_motif_match.pl database.tab -cm model.cm -b
Will produce a likelihood score for the best motif position in each sequence in the database, and the position itself:
seq_2:0 19.7698 33 48 UUCAACAGUGUUUGGA (((((......))))) <<<<<,,,,,,>>>>>
seq_7:0 19.3706 104 119 GGGAGCAGUGUCUUCC (((((......))))) <<<<<,,,,,,>>>>>
seq_3:0 19.1064 16 31 GUCCUCAGUGCAGGGC (((((......))))) <<<<<,,,,,,>>>>>
seq_1:0 18.073 30 52 GACGUUCUUCGCCGAGAGUCGUC (((((((((....))))).)))) <<<<<<<<<,--->>>>>,>>>>
seq_5:0 16.5508 104 119 AGCUACAGUGUUAGCU (((((......))))) <<<<<,,,,,,>>>>>
seq_9:0 14.5906 32 47 GAGCCAGUGUGUUUCU ((((......)))).. <<<-,,,,,,->>>,,
seq_4:0 10.3685 7 21 UUGUCAGUGCACAAA ((((......)))). <<<<,,,,,,>>>>,
seq_10:0 9.15077 133 152 CAACCUCCACCUUCUGGGUU .(((((.........))))) ,<----,,,,,,,,,---->
seq_6:0 6.81294 1 16 UAUGGAGAUUUCCAUA (((((......))))) <<<<<,,,,,,>>>>>
seq_8:0 0.233537 95 115 ACACCCCAGCCCUGCAGUGUA ((((..((....))..)))). <<<<,,--,,,,---->>>>,
For troubleshooting and FAQ please visit our executables FAQ page.
|
|