Saturday, February 4, 2012

Conditional Random Fields applied to protein fold recognition

http://online.liebertpub.com/doi/pdf/10.1089/cmb.2006.13.394


CRFs are “undirected” graphical models (also known as random fields, as opposed to directed graphical models such as HMMs) to compute the conditional likelihood P (y|x) directly.


Protein folds are frequent arrangement pattern of several secondary structure elements: some elements
are quite conserved in sequences or prefer a specific length, while others might form hydrogen bonds with
each other, such as two β-strands in a parallel β-sheet. To model the protein fold better, it would be natural
to think of each secondary structure element as one observation, corresponding to one node in the graph,
and the edges between elements as indicating their interactions in 3-D. Then, given a protein sequence,
we can search for the best segmentation defined by the graph and determine whether the protein adopts
the fold or not.

No comments: