Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep 1;109(9):1667-1679.
doi: 10.1016/j.ajhg.2022.07.013.

Genetic structure correlates with ethnolinguistic diversity in eastern and southern Africa

Affiliations

Genetic structure correlates with ethnolinguistic diversity in eastern and southern Africa

Elizabeth G Atkinson et al. Am J Hum Genet. .

Abstract

African populations are the most diverse in the world yet are sorely underrepresented in medical genetics research. Here, we examine the structure of African populations using genetic and comprehensive multi-generational ethnolinguistic data from the Neuropsychiatric Genetics of African Populations-Psychosis study (NeuroGAP-Psychosis) consisting of 900 individuals from Ethiopia, Kenya, South Africa, and Uganda. We find that self-reported language classifications meaningfully tag underlying genetic variation that would be missed with consideration of geography alone, highlighting the importance of culture in shaping genetic diversity. Leveraging our uniquely rich multi-generational ethnolinguistic metadata, we track language transmission through the pedigree, observing the disappearance of several languages in our cohort as well as notable shifts in frequency over three generations. We find suggestive evidence for the rate of language transmission in matrilineal groups having been higher than that for patrilineal ones. We highlight both the diversity of variation within Africa as well as how within-Africa variation can be informative for broader variant interpretation; many variants that are rare elsewhere are common in parts of Africa. The work presented here improves the understanding of the spectrum of genetic variation in African populations and highlights the enormous and complex genetic and ethnolinguistic diversity across Africa.

Keywords: Africa; diverse populations; genotypes; linguistics; population genetics; population structure.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests A.R.M. has consulted for 23andMe and Illumina and received speaker fees from Genentech, Pfizer, and Illumina. B.M.N. is a member of the Deep Genomics Scientific Advisory Board. He also serves as a consultant for the Camp4 Therapeutics Corporation, Takeda Pharmaceutical, and Biogen. M.J.D. is a founder of Maze Therapeutics. The remaining authors declare no competing interests.

Figures

None
Graphical abstract
Figure 1
Figure 1
Genetic and admixture composition of the NeuroGAP-Psychosis samples against a global reference (A) First two principal components showing NeuroGAP-Psychosis samples as projected onto global variation of the full 1000 Genomes, HGDP, and AGVP. While most samples fall on a cline of African genetic variation, some South African samples exhibit high amounts of admixture and European genetic ancestry. Color scheme for global PCA plot: Latin American, yellow; East Asian, dark orange; European, tan; South Asian, fuschia; West African, green/blue; East African, red/orange; South African, purple; NeuroGAP-Psychosis collections, gray. (B) ADMIXTURE plot at best fit k (k = 10) of all African samples as well as three representative non-African populations from the 1000 Genomes Project. The GIH, CHB, and GBR were included to capture South Asian, East Asian, and European admixture, respectively. Individuals are represented as bar charts sorted by population, and ancestry components for each person are visualized with different colors. A key describing the country of origin for all populations can be found in Table S1.
Figure 2
Figure 2
Genetic composition of subcontinental African structure in the NeuroGAP-Psychosis samples PCA plots for PCs 1–8 with an African reference panel. A map of collection locations is shown to the left of PCA plots. Points are colored by region to assist in interpretation: green, west; blue, west central/central; red, east; orange, Ethiopia; purple, south. See Figures S2–S6 for plots highlighting each cohort individually.
Figure 3
Figure 3
Primary self-reported language shifts over three generations (A) Individual languages were re-classified into broader language families for comparable granularity. Note that while all languages in the legend are represented in the plot, not all are visible due to being at low frequency in the data. (B) All languages reported with at least 3% frequency in any generation are shown across the generations. Note the increase in endorsement of English and drop in Oromiffa/Oromigna in the present generation. (C) Primary language reported by the individuals within each NeuroGAP-Psychosis study country.
Figure 4
Figure 4
Procrustes correlations between genetics, geography, and language Procrustes correlations (all p < 5E−5) are shown between geography and genetics (A and B), geography and language (C and D), and genetics and language (E and F). The left column includes results for the entire NeuroGAP-Psychosis collection. The right column contains results subset to the four cohorts in East Africa. For linguistic analyses, linguistic variation is measured by the first three PCs of phoneme inventories from languages reported by individuals as spoken by themselves and their relatives. Matrilineal relatives include the mother and maternal grandmother. Patrilineal relatives include the father and paternal grandfather. Familial refers to a weighted average of all reported family members. Note that Y-axis labels vary between plots.

References

    1. The 1000 Genomes Project Consortium An integrated map of genetic variation from 1, 092 human genomes. Nature. 2012;491:56–65. - PMC - PubMed
    1. Fearon J.D. Ethnic and Cultural Diversity by Country. J. Econ. Growth. 2003;8:195–222.
    1. Sirugo G., Williams S.M., Tishkoff S.A. The missing diversity in human genetic studies. Cell. 2019;177:1080. - PubMed
    1. Popejoy A.B., Fullerton S.M. Genomics is failing on diversity. Nature. 2016;538:161–164. - PMC - PubMed
    1. Martin A.R., Teferra S., Möller M., Hoal E.G., Daly M.J. The critical needs and challenges for genetic architecture studies in Africa. Curr. Opin. Genet. Dev. 2018;53:113–120. - PMC - PubMed

Publication types

LinkOut - more resources