- What is the difference between PITA sites and PITA targets? Why isn't the target score equal to the site ΔΔG value?
A PITA site denotes a single microRNA binding site prediction on a given UTR (and has a ΔΔG value). There can be (and in many cases are) many sites for the same microRNA along a given UTR. The target score represents the overall effect of all those sites combined together on the given UTR. If a microRNA has a single site on the UTR then the target score is equal to that site's ΔΔG. However, if more than one site exists for a given microRNA on a given UTR then the target score appropriately sums up the ΔΔG energies of all those sites (see methods in the article for the summation formula).
- Running PITA predictions on my UTR of interests yields many results. How should those be interpreted? How can the number of results be narrowed down?
PITA assigns a score to each and every identified potential site (based on the selected seed criteria) -- even the very weak ones. The thresholding and filtering applied to the initial list of sites greatly depends on your specific application. For example, if you are looking for "sure candidates" then inspecting seeds of size 7-8 with no mismatches and requiring some minimal conservation makes sense (grab the "TOP" catalogs in this case); if you have a UTR of interest and want to get the full picture (including sites whose functionality is not certain) then going down to 6-mer seeds with no conservation filters makes more sense (grab the "ALL" catalog).
- How should the ΔΔG score be interpreted? Which score is generally considered to be a "good" microRNA site?
Since ΔΔG is an energetic score, the lower (more negative) its value, the stronger the binding of the microRNA to the given site is expected to be. As a rough rule of thumb, sites having ΔΔG values below -10 are likely to be functional in endogenous microRNA expression levels. However, given that the actual level of repression depends on microRNA concentration, even sites having ΔΔG values above -10 may be functional at high microRNA expression levels.
- What does the notation "X:Y:Z" (for example "8:0:1") in the seed column mean? Which types of seeds should I consider?
PITA's "X:Y:Z" notation for describing the seed represents the size of the seed (X), the number of mismatches (Y) and the number of G:U wobble pairs (Z). For example, 8:0:1 means that the seed is an 8-mer, has no mismatches but contains a single G:U wobble.
We suggest following standard seed parameter settings and considering seeds of length 6-8 bases, with no mismatches and up to one G:U wobble in 7- or 8-mers.
- What modifications did you introduce to the Vienna Package in order to compute the ΔΔG score?
To compute the ΔGduplex value, we modified RNAduplex such that it takes into account the explicit seed pairings upon which the target was chosen and considers only structures that comply with these constraints. Those changes are reflected in the file RNAduplex.c (found under Bin/ViennaRNA/ViennaRNA-1.6/Progs when downloading the PITA executable) and the accompanying library, duplex.c (found under Bin/ViennaRNA/ViennaRNA-1.6/lib) where the duplexfold() function was changed to take the given seed matching constraints into account.
To compute the ΔGopen value, a new program was introduced to the Vienna Package which uses the original Vienna fold() function to compute both the constrained and non-constrained structures of the target area. This new program is implemented in RNAddG4.c (found under Bin/ViennaRNA/ViennaRNA-1.6/Progs)
- Which set of 3' UTRs and microRNAs were used to compute the PITA catalogs ?
The RefSeq annotation for worm (ce6), fly (dm3), mouse (mm9) and human (hg18) were downloaded from UCSC. For each organism, the set of 3' UTRs was constructed by extracting the sequences from the appropriate genome and removing any introns. Fasta files containing the input sets of 3' UTRs can be downloaded here: worm, fly, mouse and human.
MicroRNA sequences were downloaded from miRBase. Version 11.0 (available here) was used.
- How is the summation of single sites into a complete microRNA-gene target score done? ?
To assign an overall microRNA-gene target score, we compute the statistical weight of all configurations in which exactly one of the sites is bound by the microRNA. For negative values of ΔΔG, the summation formula is:
TARGET_SCORE = -log (Σ(e-ΔΔGi))
Please note the negations, in contrast to the published formula in the original paper.
MicroRNA sites in proximity to 3' UTR boundary. Computation of ΔGopen for sites located close to the 3' UTR start or end may require the folding of sequence that is not within the UTR. This is done by pasting the end of the coding region of the matching gene and a streach of Poly(A) before and after the UTR, respectively.