Leaves of a phylogenetic tree represent objects being compared. Objects can be genes, protein, species, etc. Internal nodes are hypothetical ancestral objects. In a rooted tree, the path from root to a node corresponds to a path in evolutionary time. An unrooted tree specifies relationships among objects, but not evolutionary time.
Parsimony is probably the most widely used of all tree building algorithms. It works by finding the tree which can explain the observed sequences with a minimal number of substitutions. Instead of building a tree, it assigns a cost to a given tree, and it is necessary to search through all topologies, or to pursue a more efficient strategy that achieves this effect in order to identify the best tree.
Parsimony takes aligned sequences as input (character data) and outputs a labeled tree which explains the data with a minimal number of changes across edges.
Important Points
- Parsimony does not build trees, it rearranges them
- Parsimony assigns a cost to a given tree
- Parsimony requires a search through all topologies
- Parsimony treats each site of the tree independently
The parsimony algorithm has two important components:
- compute the cost of a given tree
- search through all trees to find the tree with the minimal cost
In plain words, we would need to build the tree from the bottom up. At each step, we compute the cost of connect a leaf or node to some site. Site is the node we wish to connect our leaf or node to. We the tree is complete, we calculate the cost of the tree by summing the cost of all connections. We repeat these steps for all possible trees. When we have computed the cost for all trees, we look for the tree which has the minimum cost. This cost is the best solution.
Weighted Parsimony
Weighted parsimony is an extension of traditional parsimony. It uses substitution costs S(a,b) for each substitution of a by b. It reduces to traditional parsimony when S(a,a) = 0 and S(a,b) = 1 ∀ a ≠ b.

Suppose we have the following aligned sequences:
AAG
AAA
GGA
AGA
7. A?A
5. A?A 6. A?A
1. AAG 2. AGA 3. AAA 4. GGA
We need to do a post-order traversal to compute the tree. In simple terms, we start from the leaves and work our way up to the node.
When k = {1,2,3,4} which are leaf nodes:
k = 1: S1(A) = 0, S1(C) = S1(G) = S1(T) = ∞
k = 2: S1(G) = 0, S1(C) = S1(A) = S1(T) = ∞
k = 3: S1(A) = 0, S1(C) = S1(G) = S1(T) = ∞
k = 4: S1(G) = 0, S1(C) = S1(A) = S1(T) = ∞
∞ forces the algorithm to start at the actually observed residue.
k = 5:
S5(A) = minb(S1(b) + S(a,b)) + minb(S1(b) + S(a,b))
S5(A) = S1(A) + S(a;A) + S2(G) + S(a;G)
S5(A) = S(a;A) + S(a;G)
This translates to:
S5(A) = S(A;A) + S(A;G)
S5(T) = S(T;A) + S(T;G)
S5(C) = S(C;A) + S(C;G)
S5(G) = S(G;A) + S(G;G)
If this were traditional parsimony, we would get:
S5(A) = 0 + 1 = 1
S5(T) = 1 + 1 = 2
S5(C) = 1 + 1 = 2
S5(G) = 1 + 0 = 1
k = 6: Same as k = 5
k = 7:
S7(a) = minb(S5(b) + S(a,b)) + minb(S6(b) + S(a,b))
S7(a) = 2 min((S5(A) + S(a;A)); (S5(T) + S(a; T)); (S5(C) + S(a;C)); (S5(G) + S(a;G)); )
If this were traditional parsimony, we would get:
S7(a) = 2 min((1 + S(a;A)); (2 + S(a; T));(2 + S(a;C)); (1 + S(a;G)))
S7(A) = 2 min((1 + 0); (2 + 1); (2 + 1); (1 + 1)) = 2
S7(T) = 2 min((1 + 1); (2 + 0); (2 + 1); (1 + 1)) = 4
S7(C) = 2 min((1 + 1); (2 + 1); (2 + 0); (1 + 1)) = 4
S7(G) = 2 min((1 + 1); (2 + 1); (2 + 1); (1 + 0)) = 2
The the minimal cost of tree = minaS7(a) = 2
Therefore, the two solutions are:
7. AGA
/ \
5. AGA 6. AGA
/ \ / \
1. AAG 2. AGA 3. AAA 4. GGA
7. AAA
/ \
5. AAA 6. AAA
/ \ / \
1. AAG 2. AGA 3. AAA 4. GGA
It may sometimes be of interest to keep track of minimizing residues (residue that returned minimal costs) in recursion. By assigning the following pointers to the left and right daughters at the end of each recursion block, we can track the minimizing residues:
lk(a) = argminb(Si(b) + S(a,b))
rk(a) = argminb(Sj(b) + S(a,b))
To obtain an assignment of ancestral residues, we pick residue a at the root that gives minimal cost, follow pointers, and choose arbitrarily if there are more than one possible targets.
Traditional Parsimony
In traditional parsimony only the number of substitutions is counted. List Rk of minimal cost residues at is kept at each node along with current cost c.

There is a traceback procedure for finding ancestral assignments in traditional parsimony.
- Choose a residue from the root (R2n-1) and proceed down the tree.
- Having chosen a residue in Rk, pick the same residue in Ri and Rj if possible, otherwise at random.
An assignment not obtained by traceback with Rk can be found by keeping a set Qk of residues at node k where cost is one more than that of the residues in Rk at Qk is set to one higher than Rk.
So far, we only looked at parsimony for rooted trees. The minimum cost for traditional parsimony is independent of root location. Minimum cost for weighted parsimony is independent of root location if S(a,b) is metric:
S(a,a) = 0
S(a,b) = S(b,a)
S(a,c) ≤ S(a,b) + S(b,c)
When cost is independent of root location, we economize in terms of number of possible topologies.

Branch and Bound
In parsimony, the number of possible topologies increases rapidly as the number of leaves increase. Therefore, a more efficient strategy is required. There are several search strategies which are quite efficient but they do not guarantee to find the best tree. Branch and bound is an algorithm which guarantees to find the best tree. The idea is to systematically enumerate trees while abandoning a search venue when the incomplete tree is more expensive than the complete tree.
The tree can be enumerated like an odometer.

[i3] lists the possible edges, where a new edge for x4 can be added.
[i5] lists the possible edges, where a new edge for x5 can be added.
[i2n-5] lists the possible edges, where a new edge for xn can be added.
Accessing tree with bootstrap
Once a tree is build using an algorithm of choice, how much to trust it? How can we access its accuracy? Bootstrap is a method which allows us to access the significance of a phylogenetic feature.
- Dataset is an alignment of sequences
- Generate artificial dataset by selecting columns with replacement
- Build tree for artificial dataset
- Repeat the previous 2 steps about a 1000 times
- Frequency of appearance of a feature is the measure of confidence in this feature
Source
[1] Durbin
[2] Haari Jaalinoja's slides