Monday, January 24, 2011

Dirichlet Mixture for estimating expected amino acid probability at each position

http://compbio.soe.ucsc.edu/dirichlets/dirichlet-papers.html

Sjolander, K., Karplus, K., Brown, M., Hughey, R., Krogh, A., Mian, I.S., and Haussler, D. Dirichlet Mixtures: A Method for Improving Detection of Weak but Significant Protein Sequence Homology. CABIOS, 12(4): 327-345, Aug 1996.

We present a method for condensing the information in a protein database into a mixture of Dirichlet densities. These mixtures are designed to be combined with observed amino acid frequencies, to form estimates of expected amino acid probabilities at each position in a profile, hidden Markov model, or other statistical model.

1.3 What is a Dirichlet density?
A Dirichlet density Berger, 1985; Santner and Du y, 1989 is a probability density over the set of all probability vectors ~ i.e., pi 0 and i pi = 1 . Proteins have a 20-letter alphabet, with pi = Prob amino acid i .

Each vector ~ represents a possible probability distribution over the 20 amino acids.

1.4 What is a Dirichlet Mixture?
A mixture of Dirichlet densities is a collection of individual Dirichlet densities that function jointly to
assign probabilities to distributions. For any distribution of amino acids, the mixture as a whole assigns a
probability to the distribution by using a weighted combination of the probabilities given the distribution
by each of the components in the mixture. These weights are called mixture coefficients.

No comments: