Friday, October 14, 2011

N50 - a statistical measure of average length of a set of sequences.

http://assemblathon.org/assemblathon-2-basic-assembly-metrics
N50 scaffold/contig length is calculated by summing lengths of scaffolds/contigs from the longest to the shortest and determining at what point you reach 50% of the total assembly size. The length of the scaffold/contig at that point is the N50 length.

http://en.wikipedia.org/wiki/Contig
A sequence contig is a contiguous, overlapping sequence read resulting from the reassembly of the small DNA fragments generated by bottom-up sequencing strategies.

http://en.wikipedia.org/wiki/N50_statistic
The N50 size is computed by sorting all contigs from largest to smallest and by determining the minimum set of contigs whose sizes total 50% of the entire genome. For example, for a genome of 600Mb, if the assembled sequences add up to 500Mb, the N50 would be calculated by sorting the contigs from largest to smallest and finding the length of the contig where the cumulative size is 250Mb.

http://seqanswers.com/forums/showthread.php?t=2332
https://www.broad.harvard.edu/crd/wiki/index.php/N50

Given a set of sequences of varying lengths, the N50 length is defined as the length N for which 50% of all bases in the sequences are in a sequence of length L < N.

the N50 (L50) is the median contig length from a list of all the contigs lengths in the assembly

N50 of {2, 2, 2, 3, 3, 4, 8, 8} is 5

No comments: