Thursday, November 18, 2010

SVM, bagging, boosting, normalization

Sequential minimum optimization (SMO), a fast algorithm for
training SVM [26,27], was used to build MC-SVM kernel
function models, as implemented in WEKA.

Bagging vs. Boosting (Freund and Schapire 1996). Bagging (resampling) vs Boosting (iterative reweighting). -- these are used to eliminate bias in your samples

The development of PIPA: an integrated and automated pipeline for genome-wide protein function annotation

-- And with microarrays, it seems that the results are largely dependent on the data itself, not so much on the algorithms / classifiers used (so pick and choose which ones, and you might squeeze in a little performance above state of the art).

Nat Genet. 2002 Dec;32 Suppl:496-501.
Microarray data normalization and transformation.
Quackenbush J.

http://www.nature.com/ng/journal/v32/n4s/full/ng1032.html

The goal of most microarray experiments is to survey patterns of gene expression by assaying the expression levels of thousands to tens of thousands of genes in a single assay.

The hypothesis underlying microarray analysis is that the measured intensities for each arrayed gene represent its relative expression level. Biologically relevant patterns of expression are typically identified by comparing measured expression levels between different states on a gene-by-gene basis. But before the levels can be compared appropriately, a number of transformations must be carried out on the data to eliminate questionable or low-quality measurements, to adjust the measured intensities to facilitate comparisons, and to select genes that are significantly differentially expressed between classes of samples.

Using this approach, a normalization factor is calculated by summing the measured intensities in both channels
Locally weighted linear regression (lowess)6 analysis has been proposed4, 5 as a normalization method that can remove such intensity-dependent effects in the log2(ratio) values.

No comments: