Thursday, November 4, 2010

EST

to map ESTs and variable reads (multiple fasta-format files) to an already known related prokaryotic genome

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2789075/

http://en.wikipedia.org/wiki/Expressed_sequence_tag

The current commercially available high-throughput methodologies rely on primers or probes designed to detect each of the current reference miRNA sequences residing in miRBase, which acts as the central repository for known miRNAs (Griffiths-Jones 2006).

However, probe-based methodologies are generally restricted to the detection and profiling of only the known miRNA sequences previously identified by sequencing or homology searches.

Sequencing-based applications for identifying and profiling miRNAs have been hindered by laborious cloning techniques and the expense of capillary DNA sequencing (Pfeffer et al. 2005; Cummins et al. 2006).

In contrast with capillary sequencing, recently available “next-generation” sequencing technologies offer inexpensive increases in throughput, thereby providing a more complete view of the miRNA transcriptome.

Pluripotent human embryonic stem cells (hESCs) can be cultured under nonadherent conditions that induce them to differentiate into cells belonging to all three germ layers and form cell aggregates termed embryoid bodies (EBs) (Itskovitz-Eldor et al. 2000; Bhattacharya et al. 2004).

Samples of undifferentiated hESCs and differentiated cells from EBs were chosen for miRNA profiling, first because the pluripotency of ESCs is known to require the presence of miRNAs (Bernstein et al. 2003; Song and Tuan 2006; Wang et al. 2007) and second because specific changes in miRNA expression are thought to accompany differentiation (Chen et al. 2007).

These reads were mapped to the genome by forcing perfect alignments beginning at the first nucleotide and retaining the longest region of each read that could be aligned to the reference genome, along with all alignment positions. After mapping, a total of 766,199 (hESC) and 724,091 (EB) unique error-free trimmed small RNA sequences were represented by 4,351,479 and 3,886,865 reads.

Sequences deriving from 334 distinct miRNA genes were identified. The miRNAs were the most abundant class of small RNAs on average, but spanned the entire range of expression, with sequence counts up to ~120,000 (Fig. 1A).

Virtually no reads aligned to the genome after position 28, so we trimmed all reads at 30 nt to reduce the number of unique sequences.

For every read, the longest alignment was determined, and this subsequence, as well as the positions for every alignment of this length, was stored in a database (to a maximum of 100 alignments). 

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2279248/?tool=pubmed

No comments: