Friday, August 17, 2012

A global map of human gene expression.


The experiment contains systematically annotated and consistently normalized human gene expression data matrix of 5372 samples integrated from 206 public experiments of a HG-U133A array platform. The dataset is a subset of a larger pool of 9004 samples gathered from ArrayExpress and GEO websites and checked for quality assessment and suitability for data co-analysis as described by Bolstad et al 2005. The sample annotations have been subject to semiautomatic curation and manual generalization of 369 biological groups which have been additionally organized into blood/non-blood, 14 and 15 meta-classes.
Lukk et al. (2010), Nature Biotech.

The experiment has been slightly modified by adding biological variables (cell line, disease state, organism part, developmental stage) that are not present in the original publication.(PubMed 20379172)
  • http://www.ebi.ac.uk/gxa/experiment/E-MTAB-62
    • Supplementary methods:
      • They pulled 1925 datasets from ArrayExpress + 7079 from GEO
      • After QC (see Barrett et al. 2009 and Bolstad et al. 2003) that removes duplicates, low quality (ie CEL files not consistent with the other tested), 5372 CEL files (samples) remained
      • Probesets were mapped to genes by using the biomaRt Bioconductor package, 18,609 probes mapped to 14245 genes
      • Sample annotation were annotated using the Whatizit (Rebholz-Schuhmann et al. 2007) tool developed in EBI which converted GEO free text to MGED Ontology, followed by manual curation producing 369 terms or biological group and were divided into four 15 meta-classes
      • Performed standard coexpression using Pearson correlation, average linkage as distances between groups, PCA, differential expression with limma

No comments: