Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Feb;38(2):152-168.
doi: 10.1016/j.tig.2021.09.013. Epub 2021 Nov 2.

Advances in integrative African genomics

Affiliations
Review

Advances in integrative African genomics

Chao Zhang et al. Trends Genet. 2022 Feb.

Abstract

There has been a rapid increase in human genome sequencing in the past two decades, resulting in the identification of millions of previously unknown genetic variants. However, African populations are under-represented in sequencing efforts. Additional sequencing from diverse African populations and the construction of African-specific reference genomes is needed to better characterize the full spectrum of variation in humans. However, sequencing alone is insufficient to address the molecular and cellular mechanisms underlying variable phenotypes and disease risks. Determining functional consequences of genetic variation using multi-omics approaches is a fundamental post-genomic challenge. We discuss approaches to close the knowledge gaps about African genomic diversity and review advances in African integrative genomic studies and their implications for precision medicine.

Keywords: Africans; genomic diversity; integrative genomics; intermediate phenotype; omics; population-specific reference genome.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests

No interests are declared.

Figures

Figure 1.
Figure 1.. Illustration of graph-based population-specific reference genomes.
Constructing population-specific reference genomes will need the input of sequenced reads from the ancestry of interest. Both short- and long-read sequences are informative for graph reference genomes. Short-read sequencing is relatively inexpensive (~$1000–2000/genome at 30×), allowing greater sampling per total cost. Long-read sequencing, though more expensive (~$5000–20 000/genome at 30×), is required to detect complex genetic variation such as structural variants. Reads can be used for de novo assembly of scaffolds or can be mapped to a reference sequence for variant detection. De novo genome assembly is computationally intensive and is difficult to compare between populations because of the lack of genomic coordinates. Reads mapping to a reference genome that lacks genetic variation present in the ancestry of interest can result in biases in mapping and variant calling processes. Thus, a considerable proportion of unmapped reads may be observed. The traditional linear reference genome has limited power to accomodate genetic variation within a specific population, while a graph-based reference genome has the ability to accommodate complex individual genetic variation (dark orange boxes) as paths (pink dashed lines) through a graph. Detected variation can be easily updated to the original graph-based reference genome. The updated graph-based reference genome could provide a good reference for mapping and variant calling in future sequencing efforts. Unmapped reads could be assembled to contigs, which would be included in the graph-based reference genome to ensure the completeness of the reference genome. Abbreviations: CNV, copy number variant; SNV, single-nucleotide variant.
Figure 2.
Figure 2.. Integrative genomics approach to study variant function.
The relationships between genetic ancestry, the environment, fitness, individual genetics, omics data, intermediate phenotypes, and endpoint phenotypes, including diseases, are illustrated. Omics data generally represent measurements of intermediate phenotypes (dark gray box in the middle panel) that link underlying genetic variation to outcome phenotypes or disease. Integrating genomic data with intermediate phenotypes enables identification of the quantitative trait loci (QTLs), such as expression QTLs (eQTLs), protein QTLs (pQTLs), and methylation QTLs (mQTLs). Both genetic ancestry and environment can impact intermediate phenotypes and associations with specific variants, either because of gene–gene epistatic effects (G×G) or because of genetic effects that are only relevant in certain environmental contexts or given specific environmental triggers (G×E). Population differences in linkage disequilibrium and allele frequency could also result in decreased transferability of polygenic risk scores (Box 2). Evolutionary forces, such as drift and natural selection, can shape the genomic diversity of populations (right panel). Detecting signatures of natural selection can help identify functional variants, as selection only acts upon functional variation (Box 1). Abbreviations: ChIP-seq, chromatin immunoprecipitation sequencing; DNase-seq, DNase I hypersensitive sites sequencing; LC-MS, liquid chromatography–mass spectrometry; lncRNA-seq, long noncoding RNA sequencing; NMR, nuclear magnetic resonance; scRNA-seq, single-cell RNA sequencing; smRNA-seq, small RNA sequencing; WES, whole-exome sequencing; WGBS, whole-genome bisulfite sequencing; WGS, whole-genome sequencing.
Figure I.
Figure I.. Factors that shape the genetic architecture of traits and disease risk.
The different demographic histories of populations lead to genetic differences due to drift and local adaptation, which in turn shapes population patterns of linkage disequilibrium (LD) (A) and allele frequencies (B). In trait mapping studies, causative variants (dark yellow triangle in A) may be tagged by different variants in different populations (yellow triangles in A). Intermediate phenotypes are the building blocks of overarching complex traits and are impacted by underlying genetic variation (C). Environmental and lifestyle factors can also interact with underlying genetic variation to influence phenotypes (D).

References

    1. Pereira L et al. (2021) African genetic diversity and adaptation inform a precision medicine agenda. Nat. Rev. Genet 22, 284–306 - PubMed
    1. Hublin J-J et al. (2017) New fossils from Jebel Irhoud, Morocco and the pan-African origin of Homo sapiens. Nature 546, 289–292 - PubMed
    1. Tishkoff SA et al. (2009) The genetic structure and history of Africans and African Americans. Science 324, 1035–1044 - PMC - PubMed
    1. Gurdasani D et al. (2015) The African Genome Variation Project shapes medical genetics in Africa. Nature 517, 327–332 - PMC - PubMed
    1. Snow RW et al. (2017) The prevalence of Plasmodium falciparum in sub-Saharan Africa since 1900. Nature 550, 515–518 - PMC - PubMed

Publication types