Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Jul 8:rs.3.rs-4485126.
doi: 10.21203/rs.3.rs-4485126/v1.

An assessment of the genomic structural variation landscape in Sub-Saharan African populations

Affiliations

An assessment of the genomic structural variation landscape in Sub-Saharan African populations

Emma Wiener et al. Res Sq. .

Abstract

Structural variants are responsible for a large part of genomic variation between individuals and play a role in both common and rare diseases. Databases cataloguing structural variants notably do not represent the full spectrum of global diversity, particularly missing information from most African populations. To address this representation gap, we analysed 1,091 high-coverage African genomes, 545 of which are public data sets, and 546 which have been analysed for structural variants for the first time. Variants were called using five different tools and datasets merged and jointly called using SURVIVOR. We identified 67,795 structural variants throughout the genome, with 10,421 genes having at least one variant. Using a conservative overlap in merged data, 6,414 of the structural variants (9.5%) are novel compared to the Database of Genomic Variants. This study contributes to knowledge of the landscape of structural variant diversity in Africa and presents a reliable dataset for potential applications in population genetics and health-related research.

Keywords: African diversity; Structural variants; copy number variants; genomic variation.

PubMed Disclaimer

Conflict of interest statement

Additional Declarations: There is NO Competing Interest.

Figures

Fig. 1:
Fig. 1:
Overview of the calls found by each SV caller, and combining them using SURVIVOR. See the methods section for details of choices. Note that in this graph we abuse the standard UpsetPlot representation – for example, 49,000 variants are detected by both Delly and Manta which may or may not be detected by other algorithms (in the standard UpsetPlot the implication would be that they are not detected by other approaches). Note also the anomalous results for gridss and smoove is caused by the fact that joint calling is not done by these tools and so estimating the actual number of SVs is very difficult.
Fig. 2:
Fig. 2:
Violin plot of distribution of lengths
Fig. 3:
Fig. 3:
Overview of number of variants per individual.
Fig. 4:
Fig. 4:
Principal component analysis of the SVs less than 10,000 in length. Some populations are omitted for clarity. Key: ACB – 1000G African Caribbean in Barbados; Benin – H3Africa; BF/GH – H3Africa Burkina Faso and Ghana; BWR – H3Africa Botswana; CAM – H3Afica Cameroon; GWD – 1000G Gambian Western Districts; LWK – 1000H Luhya from Kenya; MSL – 1000G Mandinka Sierra Leone; NG – H3Africa Berom from Nigeria; SA – CBRL, H3Africa, Bantu-speakers from SAHGP South Africa; SAC – SA Coloureds in SAHGP; YRI — 1000G Yoruba; Zam – H3Africa Zambians.
Fig. 5:
Fig. 5:
Overview of SV regions between 10,000bp and 200,000bp in length
Fig. 6:
Fig. 6:
Overview of merging process: Variants that are less than 10,000bp in length must be supported by at least 3 tools. Variants greater than 10,000 bp must be supported by CNVPytor, a depth-based method, and one other tool.

Similar articles

References

    1. Abel H, DE L, Regier Aea. Mapping and characterization of structural variation in 17,795 human genomes. Nature. 2020;583:83–89. - PMC - PubMed
    1. Collins R, Brand H, Karczewski K, Zhao X, Alföldi J, Francioli L, et al. A structural variation reference for medical and population genetics. Nature. 2020. 5;581:444–51. - PMC - PubMed
    1. Sudmant P, Mallick S, Nelson BJ, Hormozdiari F, Krumm N, Huddleston J, et al. Global diversity, population stratification, and selection of human copy-number variation. Science. 2015. 9;349. - PMC - PubMed
    1. Li Y, Glessner J, Coe B, Li J, Mohebnasab M, Chang X, et al. Rare copy number variants in over 100,000 European ancestry subjects reveal multiple disease associations. Nature Communications. 2020. 12;11. - PMC - PubMed
    1. Coe B, Witherspoon K, Rosenfeld J, van Bon B, Vulto-van Silfhout A, Bosco P, et al. Refining analyses of copy number variation identifies specific genes associated with developmental delay. Nature Genetics. 2014;46(10):1063–71. - PMC - PubMed

Publication types

LinkOut - more resources