Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Dec;43(12):1979-1993.
doi: 10.1002/humu.24455. Epub 2022 Sep 10.

de novo variant calling identifies cancer mutation signatures in the 1000 Genomes Project

Affiliations

de novo variant calling identifies cancer mutation signatures in the 1000 Genomes Project

Jeffrey K Ng et al. Hum Mutat. 2022 Dec.

Abstract

Detection of de novo variants (DNVs) is critical for studies of disease-related variation and mutation rates. To accelerate DNV calling, we developed a graphics processing units-based workflow. We applied our workflow to whole-genome sequencing data from three parent-child sequenced cohorts including the Simons Simplex Collection (SSC), Simons Foundation Powering Autism Research (SPARK), and the 1000 Genomes Project (1000G) that were sequenced using DNA from blood, saliva, and lymphoblastoid cell lines (LCLs), respectively. The SSC and SPARK DNV callsets were within expectations for number of DNVs, percent at CpG sites, phasing to the paternal chromosome of origin, and average allele balance. However, the 1000G DNV callset was not within expectations and contained excessive DNVs that are likely cell line artifacts. Mutation signature analysis revealed 30% of 1000G DNV signatures matched B-cell lymphoma. Furthermore, we found variants in DNA repair genes and at Clinvar pathogenic or likely-pathogenic sites and significant excess of protein-coding DNVs in IGLL5; a gene known to be involved in B-cell lymphomas. Our study provides a new rapid DNV caller for the field and elucidates important implications of using sequencing data from LCLs for reference building and disease-related projects.

Keywords: 1000 Genomes Project; GPU accelerated workflow; Simons Simplex Collection; cell line artifacts; de novo variants.

PubMed Disclaimer

Conflict of interest statement

Pankaj Vats, Marc A. West, George Vacek, and Timothy T. Harkins are full time employees of NVIDIA.

Figures

Figure 1
Figure 1
de novo variant calling in short‐read whole‐genome sequencing data. (a) de novo workflow for detection of DNVs from aligned read files (crams); (b) and (c) benchmarking DNV workflow in a monozygotic twin pair sequenced from DNA derived from blood; (d) DNV detecting in four trios in the 1000 Genomes Project (1000G). DNVs, de novo variants.
Figure 2
Figure 2
Comparison of characteristics of DNVs detected in 1000G, Simons Simplex Collection (SSC), and Simons Foundation Powering Autism Research (SPARK) callsets. (a) Histogram of DNV counts from 1000G in 602 trios; (b) histogram of DNV counts from SSC in 4,216 trios; (c) histogram of DNV counts from SPARK in 1,326 trios; (d) percent of DNVs found within CpG sites versus the total number of DNVs for 1000G; (e) percent of DNVs found within CpG sites versus the total number of DNVs found for SSC; (f) percent of DNVs found within CpG sites versus the total number of DNVs found for SPARK; (g) percent of autosomal DNVs with paternal parent of origin versus the total number of DNVs for 1000G; (h) percent of autosomal DNVs with paternal parent of origin versus the total number of DNVs for SSC; (i) percent of autosomal DNVs with paternal parent of origin versus the total number of DNVs for SPARK. 1000G, 1000 Genomes Project; DNVs, de novo variants.
Figure 3
Figure 3
Assessment of five replicates of NA12878. (a) Population distribution of 1000G data set. (b) UpSet plot demonstrating the number of variants detected in the replicates (at the bottom of the plot the percent of true DNVs is listed for each category). (c) Phylogenetic tree of the five replicates. 1000G, 1000 Genomes Project; DNVs, de novo variants.
Figure 4
Figure 4
Mutational properties of DNVs. (a) Mutation signature analysis showing the total number of DNVs and the individuals with each signature type; (b) heatmap of individuals based on their mutational signatures; (c) mutations in the DNA repair gene RAD18 shown on their 3D structure (and modeled using mupit). Also, shown are known cancer mutations from The Cancer Genome Atlas; (d) location of DNVs based on their phased parent‐of‐origin in NA07048. Most notable there are a cluster of mutations on the maternal chromosome on chromosome 2; (e) DNVs in IGLL5 shown on their 3D structure (and modeled using mupit). The image on the left is modeling variants discovered in 1000G, the image on the right is modeling variants discovered in SSC. 1000G, 1000 Genomes Project; DNVs, de novo variants; SSC, Simons Simplex Collection.

References

    1. Alexandrov, L. B. , Nik‐Zainal, S. , Wedge, D. C. , Aparicio, S. A. , Behjati, S. , Biankin, A. V. , Bignell, G. R. , Bolli, N. , Borg, A. , Børresen‐Dale, A. L. , Boyault, S. , Burkhardt, B. , Butler, A. P. , Caldas, C. , Davies, H. R. , Desmedt, C. , Eils, R. , Eyfjörd, J. E. , Foekens, J. A. , … Stratton, M. R. (2013). Signatures of mutational processes in human cancer. Nature, 500(7463), 415–421. 10.1038/nature12477 - DOI - PMC - PubMed
    1. Allen, A. S. , Berkovic, S. F. , Cossette, P. , Delanty, N. , Dlugos, D. , Eichler, E. E. , Epstein, M. P. , Glauser, T. , Goldstein, D. B. , Han, Y. , Heinzen, E. L. , Hitomi, Y. , Howell, K. B. , Johnson, M. R. , Kuzniecky, R. , Lowenstein, D. H. , Lu, Y. F. , … Winawer, M. R. , Epi4K Consortium, Epilepsy Phenome/Genome Project . (2013). De novo mutations in epileptic encephalopathies. Nature, 501(7466), 217–221. 10.1038/nature12439 - DOI - PMC - PubMed
    1. An, J. Y. , Lin, K. , Zhu, L. , Werling, D. M. , Dong, S. , Brand, H. , Wang, H. Z. , Zhao, X. , Schwartz, G. B. , Collins, R. L. , Currall, B. B. , Dastmalchi, C. , Dea, J. , Duhn, C. , Gilson, M. C. , Klei, L. , Liang, L. , Markenscoff‐Papadimitriou, E. , Pochareddy, S. , … Sanders, S. J. (2018). Genome‐wide de novo risk score implicates promoter variation in autism spectrum disorder. Science , 362(6420). 10.1126/science.aat6576 - DOI - PMC - PubMed
    1. Belyeu, J. R. , Sasani, T. A. , Pedersen, B. S. , & Quinlan, A. R. (2021). Unfazed: Parent‐of‐origin detection for large and small de novo variants. bioRxiv, 2021.2002.2003.429658. 10.1101/2021.02.03.429658 - DOI - PMC - PubMed
    1. Besenbacher, S. , Liu, S. , Izarzugaza, J. M. , Grove, J. , Belling, K. , Bork‐Jensen, J. , Huang, S. , Als, T. D. , Li, S. , Yadav, R. , Rubio‐García, A. , Lescai, F. , Demontis, D. , Rao, J. , Ye, W. , Mailund, T. , Friborg, R. M. , Pedersen, C. N. , Xu, R. , … Rasmussen, S. (2015). Novel variation and de novo mutation rates in population‐wide de novo assembled Danish trios. Nature Communications, 6, 5969. 10.1038/ncomms6969 - DOI - PMC - PubMed

Publication types