Thursday, January 5, 2012

Caveats and pitfalls of ROC analysis in clinical microarray research (and how to avoid them)

http://bib.oxfordjournals.org/content/13/1/83.full

Abstract

The receiver operating characteristic (ROC) has emerged as the gold standard for assessing and comparing the performance of classifiers in a wide range of disciplines including the life sciences. ROC curves are frequently summarized in a single scalar, the area under the curve (AUC). This article discusses the caveats and pitfalls of ROC analysis in clinical microarray research, particularly in relation to (i) the interpretation of AUC (especially a value close to 0.5); (ii) model comparisons based on AUC; (iii) the differences between ranking and classification; (iv) effects due to multiple hypotheses testing; (v) the importance of confidence intervals for AUC; and (vi) the choice of the appropriate performance metric. With a discussion of illustrative examples and concrete real-world studies, this article highlights critical misconceptions that can profoundly impact the conclusions about the observed performance.

ROC analysis measures a model's ability to rank positive and negative cases relative to each other.


http://rss.acs.unt.edu/Rdoc/library/verification/html/roc.plot.html

a<- c(0,0,0,1,1,1,0,1,1,0,0,0,0,1,1) b<- c(.8, .8, 0, 1,1,.6, .4, .8, 0, 0, .2, 0, 0, 1,1) c<- c(.928,.576, .008, .944, .832, .816, .136, .584, .032, .016, .28, .024, 0, .984, .952) A<- data.frame(a,b,c) names(A)<- c("event", "p1", "p2") http://www.inmet.gov.br/documentos/cursoI_INMET_IRI/Climate_Information_Course/References/Mason%2BGraham_2002.pdf

No comments: