Download Source
This file contains the matlab source code for the following functions:
  • CalcDpFromSeq - the main function. Computes the average occupancy of each DNA binding molecule along a region of a query promoter
  • LogOfSumOfExps - auxiliary function for calculations in log-scale
  • addlog - auxiliary function for calculations in log scale

In our framework, binding configurations along the sequence are modeled as an HMM. The hidden state of each sequence location indicates whether this location is bound by some molecule, such as a nucleosome or a transcription factor, or unbound (we use the binding of a "background factor" to model unbound sequence locations).

The function CalcDpFromSeq supplied here implements the Forward-Backward algorithm for our HMM (a dynamic-programming algorithm). This function enables the user to compute efficiently for a given sequence and several DNA binding molecules the average occupancy of each of these molecules along any region of the query sequence.

The binding probability of each molecule to a specific location along the sequence depends on the molecule's concentration and binding affinity, and thus this information is required in order to perform the calculation in CalcDpFromSeq.
The user should therefore supply the following information regarding each of the considered DNA binding molecules:
  • The name of the molecule (for example: Nucleosome, GCN4).
  • The concentration of the molecule.
  • The length of the molecule's binding site.
  • A vector holding the likelihood for binding of the molecule to each sequence location (this type of vectors can be easily produced by scanning the query sequence with the known binding affinity preferences of the molecules under consideration, represented for example by the molecule's PSSMs).
  • The first position within the considered sequence for which binding of the molecule is allowed.
    The last position within the considered sequence for which binding of the molecule is allowed.
  • A minimal likelihood threshold for molecule binding (when no threshold is employed, this field should holds '?realmax').
The user should also supply:
  • The length of the entire sequence considered.
  • Start and end indices indicating the region within the full sequence for which the average occupancy of each molecule should be computed.

For a detailed description of the modeling framework see main text and supplementary information.

For problems and questions, please e-mail Tali Raveh-Sadka or Michal Levo .