http://arxiv.org/abs/1304.0817
The study of functional genomics--particularly in non-model organisms has
been dramatically improved over the last few years by use of transcriptomes and
RNAseq. While these studies are potentially extremely powerful, a
computationally intensive procedure--the de novo construction of a reference
transcriptome must be completed as a prerequisite to further analyses. The
accurate reference is critically important as all downstream steps, including
estimating transcript abundance are critically dependent on the construction of
an accurate reference. Though a substantial amount of research has been done on
assembly, only recently have the pre-assembly procedures been studied in
detail. Specifically, several stand-alone error correction modules have been
reported on, and while they have shown to be effective in reducing errors at
the level of sequencing reads, how error correction impacts assembly accuracy
is largely unknown. Here, we show via use of a simulated dataset, that applying
error correction to sequencing reads has significant positive effects on
assembly accuracy, by reducing assembly error by nearly 50%, and therefore
should be applied to all datasets.