Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Jun 14:2024.09.18.613544.
doi: 10.1101/2024.09.18.613544.

Development and extensive sequencing of a broadly-consented Genome in a Bottle matched tumor-normal pair

Jennifer H McDaniel  1 Vaidehi Patel  1 Nathan D Olson  1 Hua-Jun He  1 Zhiyong He  1 Kenneth D Cole  1 Alexander A Gooden  1 Anthony Schmitt  2 Kristin Sikkink  2 Fritz J Sedlazeck  3 Harsha Doddapaneni  4 Shalini N Jhangiani  4 Donna M Muzny  4 Marie-Claude Gingras  4 Heer Mehta  4 Sairam Behera  4 Luis F Paulin  4 Alex R Hastie  5 Hung-Chun Yu  5 Victor Weigman  6 Alison Rojas  6 Katie Kennedy  6 Jamie Remington  6 Isai Salas-González  6 Mitch Sudkamp  7 Kelly Wiseman  7 Bryan R Lajoie  7 Shawn Levy  7 Miten Jain  8 Stuart Akeson  8 Giuseppe Narzisi  9 Zoe Steinsnyder  9 Catherine Reeves  9 Jennifer Shelton  9 Sarah B Kingan  10 Christine Lambert  10 Primo Bayabyan  10 Aaron M Wenger  10 Ian J McLaughlin  10 Aaron Adamson  10 Christopher Kingsley  10 Melanie Wescott  10 Young Kim  10 Benedict Paten  11 Jimin Park  11 Ivo Violich  11 Karen H Miga  11 Joshua Gardner  11 Brandy McNulty  11 Gail L Rosen  12 Rajiv McCoy  13 Francesco Brundu  14 Erfan Sayyari  14 Konrad Scheffler  14 Sean Truong  14 Severine Catreux  14 Lesley Chapman Hannah  15 Doron Lipson  16 Hila Benjamin  16 Nika Iremadze  16 Ilya Soifer  16 Gat Krieger  16 Stephen Eacker  17 Mary Wood  17 Erin Cross  18 Greg Husar  18 Stephen Gross  18 Michael Vernich  18 Mikhail Kolmogorov  19 Tanveer Ahmad  19 Ayse Keskus  19 Asher Bryant  19 Francoise Thibaud-Nissen  20 Jonathan Trow  20 Jacqueline Proszynski  21 Jeremy Wain Hirschberg  21 Krista Ryon  21 Christopher E Mason  21 Mital S Bhakta  22 J Zachary Sanborn  22 Elizabeth M Munding  22 Justin Wagner  1 Chunlin Xiao  20 Andrew S Liss  23 Justin M Zook  1
Affiliations

Development and extensive sequencing of a broadly-consented Genome in a Bottle matched tumor-normal pair

Jennifer H McDaniel et al. bioRxiv. .

Update in

  • Development and extensive sequencing of a broadly-consented Genome in a Bottle matched tumor-normal pair.
    McDaniel JH, Patel V, Olson ND, He HJ, He Z, Cole KD, Gooden AA, Schmitt A, Sikkink K, Sedlazeck FJ, Doddapaneni H, Jhangiani SN, Muzny DM, Gingras MC, Mehta H, Behera S, Paulin LF, Hastie AR, Yu HC, Weigman V, Rojas A, Kennedy K, Remington J, Salas-González I, Sudkamp M, Wiseman K, Lajoie BR, Levy S, Jain M, Akeson S, Narzisi G, Steinsnyder Z, Reeves C, Shelton J, Kingan SB, Lambert C, Baybayan P, Wenger AM, McLaughlin IJ, Adamson A, Kingsley C, Wescott M, Kim Y, Paten B, Park J, Violich I, Miga KH, Gardner J, McNulty B, Rosen GL, McCoy R, Brundu F, Sayyari E, Scheffler K, Truong S, Catreux S, Hannah LC, Lipson D, Benjamin H, Iremadze N, Soifer I, Krieger G, Eacker S, Wood M, Cross E, Husar G, Gross S, Vernich M, Kolmogorov M, Ahmad T, Keskus AG, Bryant A, Thibaud-Nissen F, Trow J, Proszynski J, Hirschberg JW, Ryon K, Mason CE, Bhakta MS, Sanborn JZ, Munding EM, Wagner J, Xiao C, Liss AS, Zook JM. McDaniel JH, et al. Sci Data. 2025 Jul 16;12(1):1195. doi: 10.1038/s41597-025-05438-2. Sci Data. 2025. PMID: 40670386 Free PMC article.

Abstract

The Genome in a Bottle Consortium (GIAB), hosted by the National Institute of Standards and Technology (NIST), is developing new matched tumor-normal samples, the first to be explicitly consented for public dissemination of genomic data and cell lines. Here, we describe a comprehensive genomic dataset from the first individual, HG008, including DNA from an adherent, epithelial-like pancreatic ductal adenocarcinoma (PDAC) tumor cell line and matched normal cells from duodenal and pancreatic tissues. Data for the tumor-normal matched samples comes from seventeen distinct state-of-the-art whole genome measurement technologies, including high depth short and long-read bulk whole genome sequencing (WGS), single cell WGS, and Hi-C, and karyotyping. In future publications, these data will be used by the GIAB Consortium to develop matched tumor-normal benchmarks for somatic variant detection. We expect these data to facilitate innovation for whole genome measurement technologies, de novo assembly of tumor and normal genomes, and bioinformatic tools to identify small and structural somatic mutations. This first-of-its-kind broadly consented open-access resource will facilitate further understanding of sequencing methods used for cancer biology.

PubMed Disclaimer

Conflict of interest statement

Competing interests A.S. and K.S. are employees of Arima Genomics. L.F.P. from BCM, was sponsored by Genentech Inc until September 2023. F.J.S from BCM, received research support from Illumina, ONT and Pacbio. A.R.H and H-C.Y. are employees of Bionano Genomics and own stock shares and options of Bionano Genomics, Inc. V.W., K.K., J.R., and I.G. are employees of BioSkryb Genomics. M.S., K.B., B.R.L. and S.L. are employees of Element Biosciences. S.B.K., C.L., P.B., A.M.W., I.J.M., A.A., C.K., M.W., and Y.K. are employees and shareholders of PacBio, Inc. D. L., H.B., N.I., I.S. and G.K. are employees and shareholders of Ultima Genomics. S.E. and M.W. are employees of Phase Genomics. E. C., G.H., S.G., and M.V. are employees of KROMATID, Inc, E.C. is also a shareholder. F. B., E.S., K.S., S.T. and S.C. are employees of Illumina, Inc. M.S.B., J.Z.S. and E.M.M. are employees of Cantata Bio. All other authors have no competing interests.

Figures

Figure 1
Figure 1
HG008 Passaging and Measurements. The PDAC tumor and normal duodenal and pancreatic tissues were resected from the HG008 individual in 2020. The HG008-T cell line was established by MGH and a bulk growth of tumor cells was produced for most measurements. This bulk growth and harvest is known as batch 0823p23. Tumor and normal samples were sent for measurements on multiple technologies denoted by the colored ovals; empty ovals are where no corresponding measurement was made. To characterize the HG008-T cell line during passaging, preliminary measurements were made and noted below the passage points denoted by “p#”. In 2024, an aliquot of the cell line was transferred to NIST by MGH for additional culturing and measurements.
Figure 2
Figure 2
Directional Genomic Hybridization (dGH) and G-banded karyograms for three passages of the HG008-T tumor cell line. Karyograms show examples of cells with and without whole genome doubling. (a-b) dGH karyograms from NIST passage 21 (2024) have each chromosome colored by one of five dyes, and information from G-banded karyograms and copy number alterations were used to make preliminary assignments of chromosomes. (c-d) G-banded karyotypes from MGH passage 31 (2022) with and without whole genome doubling. In 25 spreads, the chromosome count varied from 29 to 71 chromosomes per spread, with the exception of two polyploid spreads of over 100 chromosomes each. While the ploidy number varied, the karyotype was relatively consistent with minor variations from spread to spread. (e) G-banded karyotype from NIST passage 18 (2024) without whole genome doubling, representative of 17 of 20 spreads that had 34 to 36 chromosomes. While not shown, three out of twenty spreads showed whole genome doubling, with counts ranging from 68 to 70 chromosomes.
Figure 3
Figure 3
Haplotype Specific Coverage Plot. Plot shows coverage of haplotype 1 (HP-1, red) and haplotype 2 (HP-2, blue) for all chromosomes using HG008-T HiFi reads (Dataset ID: PB-HiFi-1) visualized using Wakhan (https://github.com/KolmogorovLab/Wakhan).
Figure 4
Figure 4
Ancestry Principal Component Analysis (PCA). PCA plot for HG008 using continental super populations of the 1000 Genomes reference samples. The PCA uses PLINK to visualize clusters of genetic similarity, and we show the first three principal component axes, which distinguish the super populations. Yellow points indicate both the normal and tumor cells’ variants from the HG008 individual make them most similar to individuals with European ancestry based on the first three principal components.
Figure 5
Figure 5
Long-read Coverage Plots. Read length and coverage distributions for long-read datasets (a) Mapped read length distribution weighted by aligned read length. (b) Coverage distribution showing number of genome positions (Mb) at each integer coverage. (c) Inverse cumulative distribution showing coverage of apparently diploid regions by aligned reads longer than the length on the x axis (this assumes no whole genome doubling).

References

    1. Zook J. M. et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Scientific Data 3, 160025 (2016). - PMC - PubMed
    1. Wagner J. et al. Benchmarking challenging small variants with linked and long reads. Cell Genomics 2, (2022). - PMC - PubMed
    1. Zhao Y. et al. Whole genome and exome sequencing reference datasets from a multi-center and cross-platform benchmark study. Sci Data 8, 296 (2021). - PMC - PubMed
    1. Jones W. et al. A verified genomic reference sample for assessing performance of cancer panels detecting small variants of low allele frequency. Genome Biol. 22, 111 (2021). - PMC - PubMed
    1. Fang L. T. et al. Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing. Nat. Biotechnol. 39, 1151–1160 (2021). - PMC - PubMed

Publication types

LinkOut - more resources