Friday, October 5, 2012

Random forest

http://www.nature.com/nrg/journal/v10/n6/box/nrg2579_BX2.html

Rather than using a single classification tree, substantial improvements in classification accuracy can result from growing an ensemble of trees and letting them 'vote' for the most popular outcome class, given a set of input variable values. Such ensemble approaches can be used to provide measures of variable importance, a feature that is of great interest in genetic studies and that is often lacking in machine-learning approaches. The most widely used ensemble tree approach is probably the random forests method75. A random forest is constructed by drawing with replacement several bootstrap samples of the same size (for example, the same number of cases and controls) from the original sample. An unpruned classification tree is grown for each bootstrap sample, but with the restriction that at each node, rather than considering all possible predictor variables, only a random subset of the possible predictor variables is considered. 

No comments: