Tuesday, February 15, 2011

Phylogeny, evolution

HKY (Hasegawa, Kishino and Yano 1985)
- Rate distance matrix
- substitution matrix
- distinguishes between the rate of transitions and transversions { rate matrix, Q, }
- allows unequal base frequencies { π, an equilibrium vector }
The diagonals of the Q matrix are chosen so that the rows sum to zero:
http://en.wikipedia.org/wiki/Models_of_DNA_evolution
http://en.wikipedia.org/wiki/Substitution_model
www.bioportal.uio.no/onlinemat/phylcourse/MaximumLikelihood.pdf

BLOSUM62
- Amino acid log-odds substitution model

Neighbourhood Joining
- hierarchical pairwise clustering tree-building
- uses rate matrix as measure of distance
- produce unrooted tree
- based on the Minimum Evolution criterion for phylogenetic trees, i.e. the topology that gives the least total branch length is preferred at each step of the algorithm.
- greedy so fast
- does not assume that all lineages evolve at the same rate ( molecular clock hypothesis) (assumed by UPGMA)
http://www.economicexpert.com/a/Neighbor:joining.htm

Likelihood (like statistics, based on observations) vs Probability (based on perceived parameter, eg. we know coin is fair, p=0.5)
- P(HH|ph=0.5) = 0.25 - what's the probability of seeing two heads if the probability of getting a tails is 0.5
- Likelihood L(ph=0.5 | HH) - what's the likelihood that ph = 0.5 (parameter), given that we see two heads (observed data) 0.25 of the times
- HH = Observation
http://stats.stackexchange.com/questions/2641/what-is-the-difference-between-likelihood-and-probability
http://en.wikipedia.org/wiki/Likelihood_function

Maximum Likelihood Estimate
- assume observations are iid (independent and identically distributed) (x1, x2, ..., xn)
- L(theta | x1, x2, ..., xn)

Felsenstein
- Compute the likelihood for a given tree
- finding maximum likelihood estimates for evolutionary trees from nucleic acid sequence data
- The key to the pruning algorithm is that once the four numbers are computed, they don't need to to be recomputed again (using dynamic programming). The algorithm is a recursion that computes at each node of the tree from the same quantities in the immediate descendent node. The algorithm is applied starting at the node which has all of its immediate descendents being tips. Then it is applied successively further down the tree, not applying it to any node until all of its descendents have been processed.
http://www.stat.berkeley.edu/users/terry/Classes/s260.1998/Week13b/week13b/node8.html

No comments: