Wednesday, June 22, 2011

MOODS: fast search for position weight matrix matches in DNA sequences

http://bioinformatics.oxfordjournals.org/content/25/23/3181.full
http://www.cs.helsinki.fi/group/pssmfind/
http://cs.helsinki.fi/u/prastas/pdf/BIRD-fullC.pdf

J. Korhonen, P. Martinmäki, C. Pizzi, P. Rastas and E. Ukkonen. MOODS: fast search for position weight matrix matches in DNA sequences. Bioinformatics 25(23), pages 3181-3182. (2009)


Summary: MOODS (MOtif Occurrence Detection Suite) is a software package for matching position weight matrices against DNA sequences. MOODS implements state-of-the-art online matching algorithms, achieving considerably faster scanning speed than with a simple brute-force search. MOODS is written in C++, with bindings for the popular BioPerl and Biopython toolkits. It can easily be adapted for different purposes and integrated into existing workflows. It can also be used as a C++ library.

Availability: The package with documentation and examples of usage is available at http://www.cs.helsinki.fi/group/pssmfind. The source code is also available under the terms of a GNU General Public License (GPL).                

  • based on Pizzi et al. 2007, 2009, standard scoring based on PWM scores (log-odds against background (can be user-defined))
  • online algorithm, simple sequential search over target sequence using string matching with lookahead filtration 
    • "we know without seeing the remaining symbols that the m-segment cannot be a match as the total score will be less than k for any choice of the sequence beyond the location h."
  • written in C++, has bindings for BioPerl and BioPython
  • return log-odds score or absolute score thresholds
  • seems to be good for significant p-values for short-avg-length matrices (4-20 bases), DNA
Comments
  • doesn't look like it incorporates  any conservation information
  • even though it's significantly faster than TFBS and can handle more data, I wonder how similar the results are in terms of predicted positions
TFBS BioPerl extension (Lenhard and Wasserman, 2002) - uses naive algorithm


lookahead scoring search
algorithm of [30] already utilizes the filtering idea.

No comments: