Monday, January 10, 2011

MGED - Microarray Gene Expression Data Society (now FGED)

http://www.mged.org/

Our goal is to assure that investment in functional genomics data generates the maximum public benefit. Our work on defining minimum information specifications for reporting data in functional genomics papers have already enabled large data sets to be used and reused to their greater potential in biological and medical research.

MIAME describes the Minimum Information About a Microarray Experiment that is needed to enable the interpretation of the results of the experiment unambiguously and potentially to reproduce the experiment. [Brazma et al, Nature Genetics]

TM4 - Microarray Software Suite
http://www.tm4.org/

MAGE-ML (XML), MAGE-OM (object model)

BioConductor
http://www.bioconductor.org/
Bioconductor provides tools for the analysis and comprehension of high-throughput genomic data. Bioconductor uses the R statistical programming language, and is open source and open development.

functional genomics = gene expressions = microarrays

ArrayExpress
http://www.ebi.ac.uk/arrayexpress/

NCBI's Gene Expression Omnibus
http://www.ncbi.nlm.nih.gov/geo/

1. select microarray design
2. MADAM - image processing software, estimates of expression, background noise
3. measure expression, usually log2 because log(1/2) = -log(2) = -1
4. normalizing expression measurements, adjusted to the reference gene
5. assume arrayed elements contain random assortment of genes to avoid bias
6. assume finite RNA sample so when expression on one gene goes up, it must go down for the other

Ntotal = (sum ri / sum gi) over total number of elements in microarray
T = (1/Ntotal)R/G

or use lowess (locally weighted linear regression) to estimate systematic bias in the data

semimetric distance - doesn't follow the triangle rule ( dik <= dij + djk), eg. pearson correlation coefficient, r= -1 (opposite), +1 (identical, perfect correlation), 0 = orthogonal, uncorrelated

squared pearson = 0 <= rsq <= 1 and distance d = 1-rsq (since high correlation/anticorrelation is r=1 and distance should be very close d=0)


Algorithms
K -means (Tavazoie et al.,1999) and self-organizing maps (SOMs; Tamayo et al., 1999; T ̈ r ̈ nen et al., 1999),

1. Michael B. Eisen et al., “Cluster analysis and display of genome-wide expression patterns,” Proceedings of the National Academy of Sciences of the United States of America 95, no. 25 (December 8, 1998): 14863 -14868.

1. Sandrine Dudoit, Robert C Gentleman, and John Quackenbush, “Open source software for the analysis of microarray data,” BioTechniques Suppl (March 2003): 45-51.

1. M K Kerr and G A Churchill, “Statistical design and the analysis of gene expression microarray data,” Genetical Research 77, no. 2 (April 2001): 123-128.

1. Junbai Wang et al., “Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study,” BMC Bioinformatics 3 (November 24, 2002): 36.

1. Ivana V Yang et al., “Within the fold: assessing differential expression measures and reproducibility in microarray assays,” Genome Biology 3, no. 11 (October 24, 2002): research0062.

1. P Pavlidis and W S Noble, “Analysis of strain and regional variation in gene expression in mouse brain,” Genome Biology 2, no. 10 (2001): RESEARCH0042.

No comments: