Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comment
. 2016 Jun 7:3:160025.
doi: 10.1038/sdata.2016.25.

Extensive sequencing of seven human genomes to characterize benchmark reference materials

Affiliations
Comment

Extensive sequencing of seven human genomes to characterize benchmark reference materials

Justin M Zook et al. Sci Data. .

Abstract

The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. We also describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry. The data come from 12 technologies: BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCode WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads. Cell lines, DNA, and data from these individuals are publicly available. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly.

PubMed Disclaimer

Conflict of interest statement

F.C., E.J., A.M. are employees of Illumina. K.P., W.S., T.L., M.S., Z.D., A.H., and H.C. are employees of BioNano Genomics. P.M., S.K.-P., G.S.Y.Z., M.S.-L., H.S.O., and P.A.M. are employees of 10X Genomics. R.M.T., C.C.C., and N.G. are employees of BGI-Complete Genomics. K.Z., S.G., F.H., and Y.F. are employees of Thermo Fisher Scientific.

Figures

Figure 1
Figure 1. Overview of the study design.
Data and analyses included in this manuscript are above the dotted line, and ongoing analyses of these data by the Genome in a Bottle Analysis Group are below the dotted line.
Figure 2
Figure 2. Moleculo sequencing characteristics.
For HG002 (left—a,c,e) and HG003 (right—b,d,f), these are distributions of (a,b) coverage of the genome by short reads, (c,d) read coverage per cloud, and (e,f) cloud length for clouds with reads that contain markers at 0, 1, or 2 ends of the cloud.
Figure 3
Figure 3. PacBio coverage for AJ Trio.
Histogram of coverage from PacBio for the AJ trio generated using bedtools genomecov (raw data available at https://plot.ly/~justinzook/122/coverage-of-aj-trio-by-pacbio/).
Figure 4
Figure 4. Statistics from first Oxford Nanopore run of the AJ Son (HG002).
SQK-MAP-004 sequenced read length distribution (a), then alignment length and alignment error type for BWA (b) and GraphMap (c).
Figure 5
Figure 5. Statistics from second Oxford Nanopore run of the AJ Son (HG002).
SK-MAP-006 sequenced read length distribution (a), then alignment length and alignment error type for BWA (b) and GraphMap (c).

Comment on

References

Data Citations

    1. Zook J. M. 2015. NCBI SRA. SRX1049768–SRX1049855
    1. Zook J. M. 2015. NCBI SRA. SRX847862–SRX848317
    1. Zook J. M. 2015. NCBI SRA. SRX1388368–SRX1388459
    1. Zook J. M. 2015. NCBI SRA. SRX1388732–SRX138874359
    1. Sheng Y. 2015. NCBI SRA. SRP047086

References

    1. Rasberry S. D. & Gills T. E. The certification, development and use of standard reference materials. Spectrochim. Acta Part B At. Spectrosc. 46, 1577–1582 (1991).
    1. Mackey E. A. et al. Certification of NIST Standard Reference Material 1575a Pine Needles and Results of an International Laboratory Comparison. NIST Special Publication 260–156 (2004).
    1. Lettieri T. R., Hartman A. W., Hembree G. G. & Marx E. J. Certification of SRM 1960—Nominal 10 micrometer diameter polystyrene spheres (space beads). Res. Natl. Inst. Stand. Technol. 96, 669 (1991). - PMC - PubMed
    1. Zook J. M. et al. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat. Biotechnol. 32, 246–251 (2014). - PubMed
    1. Parikh H. et al. svclassify: a method to establish benchmark structural variant calls. BMC Genomics 17, 64 (2016). - PMC - PubMed

Publication types