Monday, January 9, 2012

Chi-square test

http://www.itl.nist.gov/div898/handbook/eda/section3/eda358.htm

Engineering statistics Handbook

A chi-square test ( Snedecor and Cochran, 1983) can be used to test if the standard deviation of a population is equal to a specified value.

The chi-square distribution results when nu independent variables with standard normal distributions are squared and summed.

When items are classified according to two or more criteria, it is often of interest to decide whether these criteria act independently of one another.

Statistic
X1 = sigma (Oi-Ei)^2/Ei

Oi are the observed values and Ei are the expected values according to some hypothesis, then X1 ~ chi-square. The statistic X1 is known as a goodness of fit statistic, and has n-1 degrees of freedom (df) if no parameters are estimated from the observed data.

http://67.159.209.94/Courses/Genetics_and_statistics.doc
http://67.159.209.94/Courses/Statistics_and_math_notation.doc

Pearson's chi-squared is used to assess two types of comparison: tests of goodness of fit and tests of independence.

  • A test of goodness of fit establishes whether or not an observed frequency distribution differs from a theoretical distribution.
  • A test of independence assesses whether paired observations on two variables, expressed in a contingency table, are independent of each other—for example, whether people from different regions differ in the frequency with which they report that they support a political candidate.



Use Fisher's exact test for smaller contingency tables
http://en.wikipedia.org/wiki/Fisher%27s_exact_test

So in Fisher's original example, one criterion of classification could be whether milk or tea was put in the cup first; the other could be whether Dr Bristol thinks that the milk or tea was put in first. We want to know whether these two classifications are associated – that is, whether Dr Bristol really can tell whether milk or tea was poured in first.
To determine if the distribution of differentially expressed genes across the functional groups within each taxonomy differs significantly from the distribution of detected genes, the sum of the χ2 distances between the two distributions was calculated and compared with the sums calculated for 10,000 sets of genes randomly selected from all genes with detectable expression. According to this criterion, all three taxonomies were significantly changed (p < 0.0001).

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC509255/?tool=pubmed


Applied Statistics for Bioinformatics using R
Wim P. Krijnen
November 10, 2009



Example 2. In the year 1866 Mendel observed in large number of exper-
iments frequencies of characteristics of different kinds of seed and their off-
spring. In particular, this yielded the frequencies 5474, 1850 the seed shape



of ornamental sweet peas. A crossing of B and b yields off spring BB, Bb and
bb with probability 0.25, 0.50, 0.25. Since Mendel could not distinguish Bb
from BB, his observations theoretically occur with probability 0.75 (BB and
Bb) and 0.25 (bb). To test the null hypothesis H0 : (π1 , π2 ) = (0.75, 0.25)
against H1 : (π1 , π2 ) = (0.75, 0.25), we use the chi-squared test6 , as follows.
> pi <- c(0.75,0.25)
> x <-c(5474, 1850)
> chisq.test(x, p=pi)
Chi-squared test for given probabilities
data: x
X-squared = 0.2629, df = 1, p-value = 0.6081
From the p-value 0.6081, we do not reject the null hypothesis.



No comments: