n0b3l1a: High throughput sequencing

Tuesday, April 5, 2011

High throughput sequencing

A.V. Dalca, M. Brudno, Briefings in Bioinformatics (2010) 11(1):3-14.

While
GenomeMapper [57] is the first tool to allow for
the simultaneous mapping of HTS reads to multiple
genomes, identifying variants—both SNPs and
SVs—based on many low-coverage individuals is
another important research area, and one which
may prove key to enabling the $1000 genome and
the full promise of personal genomics.

An alternate method for copy-number variation
(CNV) discovery relies on the ‘depth-of-coverage’
(DOC) signal. If a certain genomic region is present
multiple times in the donor genome, more reads will
likely be generated from it, and consequently the
corresponding region in the reference will have
higher coverage (Figure 4D). -- this assumes uniform coverage though!

While SNPs and small indels can be located by
analyzing the mappings of unpaired reads, the iden-
tification of structural variants (SVs), where the
genome is drastically altered, is more difficult with
short reads. For example, a large deletion in the
donor’s genome (i.e. a segment of the reference
not present in the donor) may create split-reads
that cover the location of the deletion (the break-
point), and map to the reference with their two
halves on opposite sides of the deleted segment.

Accordingly, the discovery
of SVs in a genome is typically based on pair-end
sequencing approaches [19]. The two reads are
mapped to the reference genome, with the distance
between them referred to as ‘mapped distance’. This
mapped distance and the relative orientations of the
mapping are then compared to the expected insert
size: if the distance is similar and the orientations are
unchanged, the matepair is termed ‘concordant’, and
is thought to be unlikely to overlap an SV. If, on the
other hand, one of these is different or changed (the
mate pair is called ‘discordant’), it likely overlaps a
variant, such as an insertion (the mapped distance
will be smaller than expected insert size), deletion
(it will be larger) or inversion (the orientation of
one of the two mappings will be opposite from the
expected).

Methods for SV detection with mate pairs can
identify many, but not all SVs. For example,
insertions (in the donor) larger than the insert size
cannot be discovered by these methods, as no mate-
pair will completely span the insertion event.

Tuesday, April 5, 2011

High throughput sequencing

No comments: