Wednesday, February 9, 2011

Training parameters for HMM

Manually adjusting the parameters of an HMM in order to get a high prediction accuracy can be a very time consuming task which is also not guaranteed to improve the performance accuracy. A variety of training algorithms have therefore been devised in order to address this challenge. These training algorithms require as input and starting point a so-called training set of (typically partly annotated) data. Starting with a set of (typically user-chosen) initial parameter values, the training algorithm employs an iterative procedure which subsequently derives new, more refined parameter values. The iterations are stopped when a termination criterion is met, e.g. when a maximum number of iterations have been completed or when the change of the log-likelihood from one iteration to the next become sufficiently small. The model with the final set of parameters is then used to test if the performance accuracy has been improved. This is typically done by analyzing a test set of annotated data which has no overlap with the training set by comparing the predicted to the known annotation.

Another well-known training algorithm for HMMs is Baum-Welch training [21] which is an expectation maximization (EM) algorithm [22]. In each iteration, a new set of parameter values is derived from the estimated number of counts of emissions and transitions by considering all possible state paths (rather than only a single Viterbi path) for every training sequence. The iterations are typically stopped after a fixed number of iterations or as soon as the change in the log-likelihood is sufficiently small. For Baum-Welch training, the likelihood P(equation M3|ϕ) [13] can be shown to converge (under some conditions) to a stationary point which is either a local optimum or a saddle point. Baum-Welch training using the traditional combination of forward and backward algorithm [13] is, for example, implemented into the prokaryotic gene prediction method EASYGENE [23] and the HMM-compiler HMMoC [15]. As for Viterbi training, the outcome of Baum-Welch training may strongly depend on the chosen set of initial parameter values. As Jensen [24] and Khreich et al. [25] describe, computationally more efficient algorithms for Baum-Welch training which render the memory requirement independent of the sequence length have been proposed, first in the communication field by [26-28] and later, independently, in bioinformatics by Miklós and Meyer [29], see also [30]. The advantage of this linear-memory memory algorithm is that it is comparatively easy to implement as it requires only a one- rather than a two-step procedure and as it scans the sequence in a uni- rather than bi-directional way. This algorithm was employed by Hobolth and Jensen [31] for comparative gene prediction and has also been implemented, albeit in a modified version, by Churbanov and Winters-Hilt [30] who also compare it to other implementations of Viterbi and Baum-Welch training including checkpointing implementations.

Stochastic expectation maximization (EM) training or Monte Carlo EM training [32] is another iterative procedure for training the parameters of HMMs. Instead of considering only a single Viterbi state path for a given training sequence as in Viterbi training or all state paths as in Baum-Welch training, stochastic EM training considers a fixed-number of K state paths Πs which are sampled from the posterior distribution P(Π|X) for every training sequence X in every iteration. Sampled state paths have already been used in several bioinformatics applications for sequence decoding, see e.g. [2,33] where sampled state paths are used in the context of gene prediction to detect alternative splice variants.

http://www.ncbi.nlm.nih.gov.proxy.lib.sfu.ca/pmc/articles/PMC3019189/?tool=pubmed

No comments: