Velvet
de Bruijn / eulerian - convert hamiltonian path problem to Eulerian because Eularian has an approximation algorithm, takes a lot of memory
greedy
overlay, overlap, consensus - used in Sanger, can't handle large number of sequences, not good for gigs of short reads, but theoretically better at assembling as it allows for more parameters to configure
hybrid approaches - use overlay overlap for sanger reads as scaffolds and extend with de bruijn
ABYSS (Assembly By short sequences) (http://genome.cshlp.org/content/19/6/1117.long, http://seqanswers.com/wiki/ABySS)
- parallelized
- Uniform coverage is key
Coverage can be of two types, expected / theoretical coverage and actual coverage
lowest coverage bias: 3rd gen sequence (only single molecule, no PCR amplification needed) < Illumina ~ 454 < Solid < Sanger < highest coverage bias
Trans-ABySS
- http://www.nature.com/nmeth/journal/v7/n11/full/nmeth.1517.html
- http://www.bcgsc.ca/platform/bioinfo/software/trans-abyss
- from transcriptomes (RNA-seq), non-uniform coverage
- uses a range of k-values (26-50bp) (to handle variable transcript expression)
- k optimization by iterative decreasing k, subtracting out matched reads at each step
number of unique k-mers thresholds at the length of the genome
- Assembly N50 values, the contig lengths for which 50% of the sequence in an assembly is in contigs of this size or larger, were highest for intermediate k values, with a maximum of 1,458 bp at k = 39 bp
One of the challenges with most assemblers is figuring out which parameters to use, picking the right length k bp (k-mer / overlapping substring)
No comments:
Post a Comment