Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Aug;25(8):563-577.
doi: 10.1038/s41576-024-00691-4. Epub 2024 Feb 20.

Plant pangenomes for crop improvement, biodiversity and evolution

Affiliations
Review

Plant pangenomes for crop improvement, biodiversity and evolution

Mona Schreiber et al. Nat Rev Genet. 2024 Aug.

Abstract

Plant genome sequences catalogue genes and the genetic elements that regulate their expression. Such inventories further research aims as diverse as mapping the molecular basis of trait diversity in domesticated plants or inquiries into the origin of evolutionary innovations in flowering plants millions of years ago. The transformative technological progress of DNA sequencing in the past two decades has enabled researchers to sequence ever more genomes with greater ease. Pangenomes - complete sequences of multiple individuals of a species or higher taxonomic unit - have now entered the geneticists' toolkit. The genomes of crop plants and their wild relatives are being studied with translational applications in breeding in mind. But pangenomes are applicable also in ecological and evolutionary studies, as they help classify and monitor biodiversity across the tree of life, deepen our understanding of how plant species diverged and show how plants adapt to changing environments or new selection pressures exerted by human beings.

PubMed Disclaimer

Conflict of interest statement

Competing interests

The authors declare no competing interests.

Figures

Figure 1
Figure 1. Pangenomics: assembly and comparison of genome sequences.
(a) Sequence reads, these days mostly long (>15 kb) reads, are assembled into contigs, which are arranged into chromosome-scale scaffolds (pseudomolecules) with the help of genetic and physical linkage information (dashed lines). (b) Comparisons of sequence assemblies reveal the full spectrum of sequence variation in the assembled genomes. (c) Pangenome graphs are computational representations of the assemblies and the differences between them. In this example, colour bands represent genomes as paths through the pangenome graph. Graphs with single base pair resolution are still challenging to construct at the whole-genome level. (d) A gene-centric view reduces complexity as do (e) pairwise alignments of genome sequences. (f) Short-read data (red bars), which is used for population-scale resequencing, can be integrated with pangenomes, for example, by aligning them to pangenome graphs.
Figure 2
Figure 2. Pangenomics in crop plants.
Most pangenome studies to date have focused on crops. The varieties under investigation are selected based on different criteria.(a) Cultivars of great ‘importance’ include those that are widely grown or used in genetic research. (b) Surveys of population structure enable the selection of core sets that represent with a limited number of samples genetic diversity in a given crop as best as possible. The diversity space of species is often represented in principal component analysis (PCA). Population structure is reflected in clusters (shown in different colours) that correspond to geographic origins or infraspecific taxonomy. (c) Crop-wild relatives (wild progenitors and more distant relatives) are studied because they broaden allelic diversity in cultivated varieties. (d) Pangenomes have diverse applications in crop genetics. Genome sequences of the parents of experimental population assist in mapping traits to single genetic factors (coloured bar). (d). (e) Catalogues of resistance genes enrich the toolkit of plant pathology and may be represented in matrices that record the presence (blue square) or absence (grey square) of genes in the sequenced individuals. (f) Thanks to genome sequences, geneticists can include structural variants in their search for causal polymorphisms under GWAS peaks.
Figure 3
Figure 3. A tiered strategy for pangenomics.
Different sequence strategies (level of the pyramid) are suitable for different panel size (represented by leaf numbers). Reduced representation sequencing is done on as many genotypes, sampled in situ or from genebank collection, as possible. Representative coresets, sequenced to ever greater depth, are selected for different applications. Low-coverage (1- to 5-fold coverage) short-read whole genome sequencing aided by imputation is useful for genome-wide association scans and for genotyping known SVs. High-coverage (> 10-fold for inbred, > 30-fold for heterozygous genomes) short-read sequencing underpins selection scans, haplotype definition and demographic analyses. Genome assemblies based on long-read sequencing and chromosome-scale mapping catalogue the full spectrum of structural variation. Potentially extraordinary effort will be expended on a small number of genotypes to close gaps in difficult-to-assemble regions such as long tandem repeat arrays and centromeres to obtain telomere-to-telomere (T2T) assemblies. As technology progresses, the pyramid may turn into a cube and long-read sequencing may be employed in the bottom layers as well.
Figure 4
Figure 4. Pangenomics at different taxonomic levels.
Reference sequences can be assembled for the genomes of both wild and domesticated plants. Diversity panels employed in pangenome studies may span different taxonomic levels, from single species to the tree of life. The term ‘super-pangenome’ is a useful shorthand to refer to pangenomics beyond the species level. Analysis methods differ according to whether the observed genomic variants segregate in a population of interfertile individuals or represent fixed differences between reproductively isolated species. Broadly speaking, intraspecific diversity fuels genetic mapping and breeding, whereas super-pangenomes hold answers to taxonomic and evolutionary questions. At higher taxonomic levels, taxon sampling cannot but look beyond crops, as the species that farmers attend to are in a minority.

References

    1. Brunner S, Fengler K, Morgante M, Tingey S, Rafalski A. Evolution of DNA sequence nonhomologies among maize inbreds. Plant Cell. 2005;17:343–360. doi: 10.1105/tpc.104.025627. - DOI - PMC - PubMed
    1. Tettelin H, et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome". Proc Natl Acad Sci U S A. 2005;102:13950–13955. doi: 10.1073/pnas.0506758102. - DOI - PMC - PubMed
    1. Liao W-W, et al. A draft human pangenome reference. Nature. 2023;617:312–324. doi: 10.1038/s41586-023-05896-x. [This study showcases how pangenome graphs work in a case study of a model primate] - DOI - PMC - PubMed
    1. Gao L, et al. The tomato pan-genome uncovers new genes and a rare allele regulating fruit flavor. Nature Genetics. 2019;51:1044–1051. - PubMed
    1. Jayakodi M, Schreiber M, Stein N, Mascher M. Building pan-genome infrastructures for crop plants and their use in association genetics. DNA Research. 2021;28:dsaa030. doi: 10.1093/dnares/dsaa030. - DOI - PMC - PubMed

LinkOut - more resources