Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2010 Sep;11(5):484-98.
doi: 10.1093/bib/bbq016. Epub 2010 Jun 2.

Challenges of sequencing human genomes

Affiliations
Review

Challenges of sequencing human genomes

Daniel C Koboldt et al. Brief Bioinform. 2010 Sep.

Abstract

Massively parallel sequencing technologies continue to alter the study of human genetics. As the cost of sequencing declines, next-generation sequencing (NGS) instruments and datasets will become increasingly accessible to the wider research community. Investigators are understandably eager to harness the power of these new technologies. Sequencing human genomes on these platforms, however, presents numerous production and bioinformatics challenges. Production issues like sample contamination, library chimaeras and variable run quality have become increasingly problematic in the transition from technology development lab to production floor. Analysis of NGS data, too, remains challenging, particularly given the short-read lengths (35-250 bp) and sheer volume of data. The development of streamlined, highly automated pipelines for data analysis is critical for transition from technology adoption to accelerated research and publication. This review aims to describe the state of current NGS technologies, as well as the strategies that enable NGS users to characterize the full spectrum of DNA sequence variation in humans.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Growth of public database dbSNP from 2002 to 2010. Note exponential growth in submissions following the first genome sequenced on next-generation technology (Watson) in 2007.
Figure 2:
Figure 2:
Distribution of NGS instruments by country (March 2010). Courtesy of next-generation sequencing maps maintained by Nick Loman [70] and James Hadfield [71].
Figure 3:
Figure 3:
The intersection of WGS, Target-Seq and RNA-Seq for the characterization of human genomes. Target-Seq of specific regions (selected by PCR or capture) serves primarily for the identification of SNPs and small indels. WGS enables detection not only of SNPs and indels, but also of CNVs and SV (often aided by de novo assembly). RNA-Seq provides digital gene expression information that can be used to validate SNP/indel calls in coding regions and assess the impact of genetic variation (CNV, SNPs and indels) on gene expression. RNA-Seq with paired-end libraries also enables the identification of chimeric transcripts, which serve to validate gene fusion events resulting from genomic structural variation.
Figure 4:
Figure 4:
Performance metrics for sequence data quality. (A) Genotype quality control of sequencing runs. Concordance of per-lane SNP calls with high-density SNP array genotypes for 65 lanes of Illumina data. The low concordance of randomly mismatched controls (left) helps distinguish low-quality data (top right) from true sample mix-ups (right). (B) Error and mapping rates for five real flowcells sequenced on the Illumina platform (1 × 50 bp). Note the increased error rates and decreased alignment rates for poor-performing lanes 1 and 2 on flowcell 1.
Figure 5:
Figure 5:
Basic workflows for next-generation sequencing. (A) Sequencing and alignment. Libraries constructed from genomic DNA or RNA are sequenced on massively parallel instruments (e.g. Illumina or SOLiD). The resulting NGS reads are mapped to a reference sequence. Mapped and unmapped reads are imported into SAM/BAM format and marked for PCR/optical duplicates. (B) Post-BAM downstream analysis. The FLAG field of the BAM file indicates the mapping status for each read. Mapped, properly paired reads (or mapped fragment-end reads) are used for SNP/indel detection and copy number estimation. Aberrantly mapped reads, in which reads in a pair map with unexpected distance or orientations, are mined for evidence of structural variation. Finally, de novo assembly of unmapped reads yields predictions of structural variants and novel insertions.

Similar articles

Cited by

References

    1. Mardis ER. The impact of next-generation sequencing technology on genetics. Trends Genet. 2008;24(3):133–41. - PubMed
    1. Ahn SM, Kim TH, Lee S, et al. The first Korean genome sequence and analysis: full genome sequencing for a socio-ethnic group. Genome Res. 2009;19(9):1622–9. - PMC - PubMed
    1. Bentley DR, Balasubramanian S, Swerdlow HP, et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008;456(7218):53–9. - PMC - PubMed
    1. Drmanac R, Sparks AB, Callow MJ, et al. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science. 327(5961):78–81. - PubMed
    1. Kim JI, Ju YS, Park H, et al. A highly annotated whole-genome sequence of a Korean individual. Nature. 2009;460(7258):1011–5. - PMC - PubMed

Publication types