Building pan-genome infrastructures for crop plants and their use in association genetics

Murukarthick Jayakodi¹, Mona Schreiber¹, Nils Stein^{1

2}, Martin Mascher^{1

3}

Affiliations

¹ Department of Genebank, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany.
² Center for Integrated Breeding Research (CiBreed), Georg-August-University Göttingen, Göttingen, Germany.
³ German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Saxony, Germany.

PMID: 33484244
PMCID: PMC7934568
DOI: 10.1093/dnares/dsaa030

Review

Building pan-genome infrastructures for crop plants and their use in association genetics

Murukarthick Jayakodi et al. DNA Res. 2021.

. 2021 Jan 19;28(1):dsaa030.

doi: 10.1093/dnares/dsaa030.

Authors

Murukarthick Jayakodi¹, Mona Schreiber¹, Nils Stein^{1

2}, Martin Mascher^{1

3}

Affiliations

¹ Department of Genebank, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany.
² Center for Integrated Breeding Research (CiBreed), Georg-August-University Göttingen, Göttingen, Germany.
³ German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Saxony, Germany.

PMID: 33484244
PMCID: PMC7934568
DOI: 10.1093/dnares/dsaa030

Abstract

Pan-genomic studies aim at representing the entire sequence diversity within a species to provide useful resources for evolutionary studies, functional genomics and breeding of cultivated plants. Cost reductions in high-throughput sequencing and advances in sequence assembly algorithms have made it possible to create multiple reference genomes along with a catalogue of all forms of genetic variations in plant species with large and complex or polyploid genomes. In this review, we summarize the current approaches to building pan-genomes as an in silico representation of plant sequence diversity and outline relevant methods for their effective utilization in linking structural with phenotypic variation. We propose as future research avenues (i) transcriptomic and epigenomic studies across multiple reference genomes and (ii) the development of user-friendly and feature-rich pan-genome browsers.

Keywords: association genetics; crop plants; genome sequencing; genomics; pan-genome.

PubMed Disclaimer

Figures

**Figure 1**
Pan-genome selection and construction. Representative genotypes are chosen from genetically diverse populations based on genome-wide genotypic data for *ex situ* germplasm collections. Chromosome-scale genome assemblies are built for a small, but representative core set. The pan-genome compartments such as core (i.e. genomic sequences present in all individual of a species) and variable (i.e. sequences found in some/few individuals) are identified from the *de novo* assemblies.

**Figure 2**
A pan-genome workflow. (a) A representative core set of accessions is selected from the domesticated and wild gene pools. Accessions from secondary and tertiary gene pools are added to build the pan-genome at genus level. (b) Reference-quality genomes (represented in coloured hexagons) are generated for a small set of accessions and aligned to each other to catalogue the small, medium and large variants (SVs) including insertion, deletion, inversion and translocation. (c) Binary SVs (large insertions and deletions) are genotyped (Fig. 3 for genotyping strategy) in a wider panel of germplasm using short-read sequencing. Each hexagon order represents individual genome from distinct accessions. (d) A combination of assemblies and resequencing data underpins genetic analyses such as GWAS and population genetic inquiries into pan-genome complexity. Accessory functional data on gene expression and gene profiles will decorate pan-genomes to assist hypothesis generation. All information is provided to research community in a user-friendly web interface (browser).

**Figure 3**
Pan-genome representation and GWAS with SV. (a) A pan-genome graph is constructed from the alignment of chromosome-scale sequence assemblies. This graph represents all types of genetic variants. Sections of the genome are shown as coloured hexagons. Each colour represent one genotypes. SV are represented by different paths through the graph. Tools for constructing and working with pan-genome graphs under active development. Two alternative approaches to capture pan-genomic information in genetic analyses are currently being used. (b) SVs between these genomes are detected from alignments against a common reference genome. Single-copy regions are extracted from the assemblies (mauve colour) and overlapped with SV (orange colour). Single-copy k-mers residing in SVs are extracted and their abundance is ascertained in short-read data from a diversity panel to genotype the underlying SV. (c) Reference-free approaches select k-mers directly from short-read data of a diversity panel without the need of genome assemblies. Matrices of k-mer counts from either single-copy or reference-free approaches are used as markers in GWAS.

See this image and copyright information in PMC

References

1. Esquinas-Alcázar J. 2005, Science and society: protecting crop genetic diversity for food security: political, ethical and technical challenges, Nat. Rev. Genet., 6, 946–53. - PubMed
1. Dempewolf H., Bordoni P., Rieseberg L.H., et al.2010, Food security: crop species diversity, Science, 328, 169–70. - PubMed
1. Godfray H.C.J., Beddington J.R., Crute I.R., et al.2010, Food security: the challenge of feeding 9 billion people, Science, 327, 812–8. - PubMed
1. Ho S.S., Urban A.E., Mills R.E.. 2020, Structural variation in the sequencing era, Nat. Rev. Genet., 21, 171–89. - PMC - PubMed
1. Mérot C., Oomen R.A., Tigano A., et al.2020, A roadmap for understanding the evolutionary significance of structural genomic variation, Trends Ecol. Evol., 35, 561–72. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Building pan-genome infrastructures for crop plants and their use in association genetics

Affiliations

Building pan-genome infrastructures for crop plants and their use in association genetics

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources