Wednesday, February 2, 2011

Baum-Welch Algorithm, expectation maximization (EM), HMM

Baum-Welch Algorithm
Intuition
To solve Problem 3 we need a method of adjusting the lambda parameters to maximize the likelihood of the training set.

Suppose that the outputs (observations) are in a 1-1 correspondence with the states so that N = M, varphi(q_i) = v_i and b_i(j) = 1 for j = i and 0 for j != i. Now the Markov process is not hidden at all and the HMM is just a Markov chain. To estimate the lambda parameters for this Markov chain it is enough just to calculate the appropriate frequencies from the observed sequence of outputs. These frequencies constitute sufficient statistics for the underlying distributions.

In the more general case, we can't observe the states directly so we can't calculate the required frequencies. In the hidden case, we use expectation maximization (EM) as described in [Dempster et al., 1977].

Instead of calculating the required frequencies directly from the observed outputs, we iteratively estimated the parameters. We start by choosing arbitrary values for the parameters (just make sure that the values satisfy the requirements for probability distributions).

We then compute the expected frequencies given the model and the observations. The expected frequencies are obtained by weighting the observed transitions by the probabilities specified in the current model. The expected frequencies so obtained are then substituted for the old parameters and we iterate until there is no improvement. On each iteration we improve the probability of O being observed from the model until some limiting probability is reached. This iterative procedure is guaranteed to converge on a local maximum of the cross entropy (Kullback-Leibler) performance measure.

http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/html_dev/gen_patterns/s1_pg1.html
http://vimeo.com/7303679

Forward and Backward
- Goal is to calculate theta at a specific t
- Split the Markov Chain, reduces total chain rule of joint probability to a bunch of recursion rules
- X = emission
- theta = state
- P(theta t, X) = P(theta t, {X1..t}) * P(theta t | X{t+1 .. n})
- So use forward algorithm to calculate the left
- And backward algorithm to calculate the right (why do this, more computationally efficient)

No comments: