Profile HMMs could be used in place of standard profiles in progressive or iterative alignment methods. The use of profile HMM formalisms may have certain advantages such as replacing SP scoring scheme by profile HMM assumption that sequences are generated independently from a single 'root' probability distribution.
Profile HMMs can also be trained from initially unaligned sequences using Baum-Welch expectation maximization algorithm.
Before tackling the problem of estimating a model and a multiple alignment simultaneously from initially unaligned training sequences, we consider the simpler problem of obtaining a multiple alignment from a known model. To align a sequence to a profile HMM, we find the most probable path through the model which is found by the Viterbi algorithm. Constructing a multiple alignment just requires calculating a Viterbi alignment for each individual sequence. Residues aligned to the same profile HMM match state are aligned in columns. Use fig. 456.
Suppose we align 5 sequences. Then we derive Viterbi optimal path and realign the sequences. A profile HMM inserts insert states [a-z] for unmatched residues and [A-Z] for matched residues. A profile HMM does not modify the alignment. Insert state residues represent parts of the sequences which are atypical, unconserved, and not meaningfully alignable.
Now we try to estimate a model and multiple alignment from initially unaligned sequences.
Initialization: Choose the length of the profile HMM and initialize parameters.
Training: Estimate the model using Baum-Welch or Viterbi algorithm. It is necessary to use a heuristic method for avoiding local optima.
Multiple Alignment: Align all sequences to the final model using the Viterbi algorithm and build a multiple alignment.
A profile HMM is a repeating linear structure of three states (match, delete, and insert). The only decision that must be made in choosing an initial architecture for Baum-Welch estimation is the length of the model M. M is the number of match states in the profile HMM rather than the total number of states, which is usually set to the average length or training sets or based on prior knowledge.
Since Baum-Welch finds local optima, it is important to choose initial models carefully. The model should be encouraged to use 'sensible' transitions; or instance, transitions into match states should be large compared to other transition probabilities. At the same time, we want to start Baum-Welch from multiple different points to see if all converge to approximately the same optimum, so we want some randomness in the choice of initial model parameters.
Note: This post is a summary of chapter 6.5 of Durbin.