Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Jan 19;28(1):dsaa030.
doi: 10.1093/dnares/dsaa030.

Building pan-genome infrastructures for crop plants and their use in association genetics

Affiliations
Review

Building pan-genome infrastructures for crop plants and their use in association genetics

Murukarthick Jayakodi et al. DNA Res. .

Abstract

Pan-genomic studies aim at representing the entire sequence diversity within a species to provide useful resources for evolutionary studies, functional genomics and breeding of cultivated plants. Cost reductions in high-throughput sequencing and advances in sequence assembly algorithms have made it possible to create multiple reference genomes along with a catalogue of all forms of genetic variations in plant species with large and complex or polyploid genomes. In this review, we summarize the current approaches to building pan-genomes as an in silico representation of plant sequence diversity and outline relevant methods for their effective utilization in linking structural with phenotypic variation. We propose as future research avenues (i) transcriptomic and epigenomic studies across multiple reference genomes and (ii) the development of user-friendly and feature-rich pan-genome browsers.

Keywords: association genetics; crop plants; genome sequencing; genomics; pan-genome.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Pan-genome selection and construction. Representative genotypes are chosen from genetically diverse populations based on genome-wide genotypic data for ex situ germplasm collections. Chromosome-scale genome assemblies are built for a small, but representative core set. The pan-genome compartments such as core (i.e. genomic sequences present in all individual of a species) and variable (i.e. sequences found in some/few individuals) are identified from the de novo assemblies.
Figure 2
Figure 2
A pan-genome workflow. (a) A representative core set of accessions is selected from the domesticated and wild gene pools. Accessions from secondary and tertiary gene pools are added to build the pan-genome at genus level. (b) Reference-quality genomes (represented in coloured hexagons) are generated for a small set of accessions and aligned to each other to catalogue the small, medium and large variants (SVs) including insertion, deletion, inversion and translocation. (c) Binary SVs (large insertions and deletions) are genotyped (Fig. 3 for genotyping strategy) in a wider panel of germplasm using short-read sequencing. Each hexagon order represents individual genome from distinct accessions. (d) A combination of assemblies and resequencing data underpins genetic analyses such as GWAS and population genetic inquiries into pan-genome complexity. Accessory functional data on gene expression and gene profiles will decorate pan-genomes to assist hypothesis generation. All information is provided to research community in a user-friendly web interface (browser).
Figure 3
Figure 3
Pan-genome representation and GWAS with SV. (a) A pan-genome graph is constructed from the alignment of chromosome-scale sequence assemblies. This graph represents all types of genetic variants. Sections of the genome are shown as coloured hexagons. Each colour represent one genotypes. SV are represented by different paths through the graph. Tools for constructing and working with pan-genome graphs under active development. Two alternative approaches to capture pan-genomic information in genetic analyses are currently being used. (b) SVs between these genomes are detected from alignments against a common reference genome. Single-copy regions are extracted from the assemblies (mauve colour) and overlapped with SV (orange colour). Single-copy k-mers residing in SVs are extracted and their abundance is ascertained in short-read data from a diversity panel to genotype the underlying SV. (c) Reference-free approaches select k-mers directly from short-read data of a diversity panel without the need of genome assemblies. Matrices of k-mer counts from either single-copy or reference-free approaches are used as markers in GWAS.

References

    1. Esquinas-Alcázar J. 2005, Science and society: protecting crop genetic diversity for food security: political, ethical and technical challenges, Nat. Rev. Genet., 6, 946–53. - PubMed
    1. Dempewolf H., Bordoni P., Rieseberg L.H., et al.2010, Food security: crop species diversity, Science, 328, 169–70. - PubMed
    1. Godfray H.C.J., Beddington J.R., Crute I.R., et al.2010, Food security: the challenge of feeding 9 billion people, Science, 327, 812–8. - PubMed
    1. Ho S.S., Urban A.E., Mills R.E.. 2020, Structural variation in the sequencing era, Nat. Rev. Genet., 21, 171–89. - PMC - PubMed
    1. Mérot C., Oomen R.A., Tigano A., et al.2020, A roadmap for understanding the evolutionary significance of structural genomic variation, Trends Ecol. Evol., 35, 561–72. - PubMed

LinkOut - more resources