Multidimensonal dynamic programming

The dynamic programming algorithms used for pairwise sequences alignment can theoretically be extended to any number of sequences. However, the time and memory requirements of this algorithm increase exponentially with the number of sequences.

The only assumption necessary to make multidimensional dynamic programming to work is that column scores are independent.

A common approach to multiple sequence alignment is to progressively align pairs of sequences. The general strategy is:

  1. A starting pair of sequences is selected and aligned
  2. Each subsequent sequence is aligned to the previous alignment

This is a greedy heuristic algorithm. A greedy algorithm decomposes a problem into pieces, and then chooses the best solution to each piece without paying attention to the problem as a whole. Since it is a heuristic algorithm, progressive alignment is not guaranteed to find the best solution. In practice, however, progressive alignment methods are efficient and produce biologically meaningful results.

MSA uses a clever heuristic multidimensional dynamic programming algorithm. It assumes an SP scoring system for both residues and gaps. We assume that the score of a multiple alignment is the sum of the scores of all pairwise alignments defined by the multiple alignment. The score of the complete alignment is given by:

Let âkl be the optimal pairwise alignment of k,l, which we can calculate in O(L2) time by standard dynamic programing. Obviously, S(akl ≤ S(âkl).

Combining this simple observation and the definition of the SP scoring system, we obtain a lower bound on the score of any pairwise alignment that can occur in the optimal multiple alignment. Now we only need to consider pairwise alignment better than the lower bound. This significantly reduces the complexity.

Note: This post is a summary of chapter 6.3 of Durbin