Discovery of Novel Sequences in 1,000 Swedish Genomes
- PMID: 31560401
- PMCID: PMC6984370
- DOI: 10.1093/molbev/msz176
Discovery of Novel Sequences in 1,000 Swedish Genomes
Abstract
Novel sequences (NSs), not present in the human reference genome, are abundant and remain largely unexplored. Here, we utilize de novo assembly to study NS in 1,000 Swedish individuals first sequenced as part of the SweGen project revealing a total of 46 Mb in 61,044 distinct contigs of sequences not present in GRCh38. The contigs were aligned to recently published catalogs of Icelandic and Pan-African NSs, as well as the chimpanzee genome, revealing a great diversity of shared sequences. Analyzing the positioning of NS across the chimpanzee genome, we find that 2,807 NS align confidently within 143 chimpanzee orthologs of human genes. Aligning the whole genome sequencing data to the chimpanzee genome, we discover ancestral NS common throughout the Swedish population. The NSs were searched for repeats and repeat elements: revealing a majority of repetitive sequence (56%), and enrichment of simple repeats (28%) and satellites (15%). Lastly, we align the unmappable reads of a subset of the thousand genomes data to our collection of NS, as well as the previously published Pan-African NS: revealing that both the Swedish and Pan-African NS are widespread, and that the Swedish NSs are largely a subset of the Pan-African NS. Overall, these results highlight the importance of creating a more diverse reference genome and illustrate that significant amounts of the NS may be of ancestral origin.
Keywords: ancestral deletion; de novo assembly; novel sequences; population genomics.
© The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Figures





Comment in
-
Improved Mapping of Swedish Genes.Mol Biol Evol. 2020 Jan 1;37(1):306. doi: 10.1093/molbev/msz247. Mol Biol Evol. 2020. PMID: 31880781 No abstract available.
Similar articles
-
De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data.Genes (Basel). 2018 Oct 9;9(10):486. doi: 10.3390/genes9100486. Genes (Basel). 2018. PMID: 30304863 Free PMC article.
-
A 3-way hybrid approach to generate a new high-quality chimpanzee reference genome (Pan_tro_3.0).Gigascience. 2017 Nov 1;6(11):1-6. doi: 10.1093/gigascience/gix098. Gigascience. 2017. PMID: 29092041 Free PMC article.
-
Large tandem, higher order repeats and regularly dispersed repeat units contribute substantially to divergence between human and chimpanzee Y chromosomes.J Mol Evol. 2011 Jan;72(1):34-55. doi: 10.1007/s00239-010-9401-8. Epub 2010 Nov 20. J Mol Evol. 2011. PMID: 21103868
-
[Comparative studies on human and chimpanzee genomes].Tanpakushitsu Kakusan Koso. 2005 Dec;50(16 Suppl):2072-7. Tanpakushitsu Kakusan Koso. 2005. PMID: 16411432 Review. Japanese. No abstract available.
-
A map of the common chimpanzee genome.Bioessays. 2002 Jun;24(6):490-3. doi: 10.1002/bies.10103. Bioessays. 2002. PMID: 12111730 Review.
Cited by
-
Hybrid sequencing resolves two germline ultra-complex chromosomal rearrangements consisting of 137 breakpoint junctions in a single carrier.Hum Genet. 2021 May;140(5):775-790. doi: 10.1007/s00439-020-02242-3. Epub 2020 Dec 14. Hum Genet. 2021. PMID: 33315133 Free PMC article.
-
Developing CIRdb as a catalog of natural genetic variation in the Canary Islanders.Sci Rep. 2022 Sep 27;12(1):16132. doi: 10.1038/s41598-022-20442-x. Sci Rep. 2022. PMID: 36168029 Free PMC article.
-
Human pangenome analysis of sequences missing from the reference genome reveals their widespread evolutionary, phenotypic, and functional roles.Nucleic Acids Res. 2024 Mar 21;52(5):2212-2230. doi: 10.1093/nar/gkae086. Nucleic Acids Res. 2024. PMID: 38364871 Free PMC article.
-
Complex genomic rearrangements: an underestimated cause of rare diseases.Trends Genet. 2022 Nov;38(11):1134-1146. doi: 10.1016/j.tig.2022.06.003. Epub 2022 Jul 9. Trends Genet. 2022. PMID: 35820967 Free PMC article. Review.
-
Structural variant identification and characterization.Chromosome Res. 2020 Mar;28(1):31-47. doi: 10.1007/s10577-019-09623-z. Epub 2020 Jan 6. Chromosome Res. 2020. PMID: 31907725 Free PMC article.
References
-
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ.. 1990. Basic local alignment search tool. J Mol Biol. 215(3):403–410. - PubMed
-
- Ameur A, Che H, Martin M, Bunikis I, Dahlberg J, Höijer I, Häggqvist S, Vezzi F, Nordlund J, Olason P, et al. 2018. De novo assembly of two Swedish genomes reveals missing segments from the human GRCh38 reference and improves variant calling of population-scale sequencing data. Genes (Basel) 9(10):486. - PMC - PubMed
MeSH terms
LinkOut - more resources
Full Text Sources
Miscellaneous