Thursday, March 25, 2021

Boyer-Moor Exact Pattern Matching Algorithm

https://www.youtube.com/watch?v=4Xyhb72LCX4

Boyer–Moore string-search algorithm - Wikipedia

O(n + m) where n and m are lengths of p (search pattern) and t (text to search)


advantages

  1. skip alignments
  2. doesn't compare all the characters

Try alignments from left-to-right and compare characters in the alignment from right-to-left 


Bad character rule - if a mismatched character is found going from right-to-left then shift the pattern to the right until a matching character in the pattern is found

Good suffix rule - shift until the good matching suffix matches

Try both rules and shift by the largest amount of shift


number of characters to shift is stored in a lookup table which is created using only the search pattern


 The algorithm preprocesses the string being searched for (the pattern), but not the string being searched in (the text). It is thus well-suited for applications in which the pattern is much shorter than the text or where it persists across multiple searches. The Boyer–Moore algorithm uses information gathered during the preprocess step to skip sections of the text, resulting in a lower constant factor than many other string search algorithms. In general, the algorithm runs faster as the pattern length increases. The key features of the algorithm are to match on the tail of the pattern rather than the head, and to skip along the text in jumps of multiple characters rather than searching every single character in the text.

No comments: