Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Apr 3;226(4):iyae012.
doi: 10.1093/genetics/iyae012.

Distinct genomic contexts predict gene presence-absence variation in different pathotypes of Magnaporthe oryzae

Affiliations

Distinct genomic contexts predict gene presence-absence variation in different pathotypes of Magnaporthe oryzae

Pierre M Joubert et al. Genetics. .

Abstract

Fungi use the accessory gene content of their pangenomes to adapt to their environments. While gene presence-absence variation contributes to shaping accessory gene reservoirs, the genomic contexts that shape these events remain unclear. Since pangenome studies are typically species-wide and do not analyze different populations separately, it is yet to be uncovered whether presence-absence variation patterns and mechanisms are consistent across populations. Fungal plant pathogens are useful models for studying presence-absence variation because they rely on it to adapt to their hosts, and members of a species often infect distinct hosts. We analyzed gene presence-absence variation in the blast fungus, Magnaporthe oryzae (syn. Pyricularia oryzae), and found that presence-absence variation genes involved in host-pathogen and microbe-microbe interactions may drive the adaptation of the fungus to its environment. We then analyzed genomic and epigenomic features of presence-absence variation and observed that proximity to transposable elements, gene GC content, gene length, expression level in the host, and histone H3K27me3 marks were different between presence-absence variation genes and conserved genes. We used these features to construct a model that was able to predict whether a gene is likely to experience presence-absence variation with high precision (86.06%) and recall (92.88%) in M. oryzae. Finally, we found that presence-absence variation genes in the rice and wheat pathotypes of M. oryzae differed in their number and their genomic context. Our results suggest that genomic and epigenomic features of gene presence-absence variation can be used to better understand and predict fungal pangenome evolution. We also show that substantial intra-species variation can exist in these features.

Keywords: Magnaporthe oryzae; Pyricularia oryzae; comparative genomics; evolution; fungi; machine learning; plant pathogen; population genetics; presence–absence variation; structural variation.

PubMed Disclaimer

Conflict of interest statement

Conflicts of interest The author(s) declare no conflicts of interest.

Figures

Fig. 1.
Fig. 1.
PAV of effector and non-effector orthogroups differentiate the clonal lineages of MoO. a) Scatter plot of values for principal components (PCs) 1 and 2 resulting from a PCA of orthogroup PAV. Each point represents 1 isolate. b) Scatter plot of values for PCs 1 and 2 resulting from a PCA of effector orthogroup PAV. Each point represents 1 isolate. c) Heat map representing which lineage-differentiating PAV orthogroups are present (color) or absent (white) in each genome. Effector orthogroups are separated from non-effector orthogroups by a black box. The phylogeny was generated using a multiple-sequence alignment of SCOs, fasttree and the full MoO phylogeny generated from our data, with lineage 1 omitted (Supplementary Fig. 2). In all panels, colors represent the clonal lineages of MoO. Blue represents lineage 2, orange represents lineage 3, and pink represents lineage 4. Lineages were named as previously described (Gladieux, Ravel, et al. 2018b).
Fig. 2.
Fig. 2.
Lineage-differentiating PAV orthogroups in MoO contain many genes related to antibiotic production and non-self-recognition. a) Gene ontology (GO) enrichment analysis of lineage-differentiating PAV orthogroups. b) Protein family (PFAM) domain enrichment analysis of lineage-differentiating PAV orthogroups. P-values shown are the results of Fisher's exact tests. Only GO terms and domains that were assigned to 3 or more lineage-differentiating PAV orthogroups and with enrichment P-values less than 0.05 were reported in this figure.
Fig. 3.
Fig. 3.
PAV genes are more common and more spread out throughout the genome in MoT than in MoO. a) Stacked barplot comparing the number of PAV orthogroups (OGs) and conserved orthogroups in MoO and MoT. “Other OGs” denote orthogroups that did not satisfy our definitions for either category. b) Distribution of the lengths of large indels (>50 bp) in MoO and MoT. c) Density plot showing the distribution of the distances to the nearest PAV gene for conserved and PAV genes in MoO and MoT. Dashed lines in density plots represent the median values for all genes in both pathotypes. d) Violin plot showing the distribution of the distances to the nearest PAV gene for conserved and PAV genes in MoO and MoT. e) Percentages and proportions of PAV and conserved genes that are within 1,000 bp of a PAV gene in MoO and MoT. Rectangles within violin plots represent interquartile ranges, dark lines represent medians, and dots represent the means with outliers removed. Statistics and statistical comparisons for data shown in panels b) through e) are listed in Supplementary Files 6, 7, 8, 9, and 10.
Fig. 4.
Fig. 4.
PAV genes are more likely to be found near transposable elements (TEs) than conserved genes. a) Density plots showing the distribution of the distances to the nearest TE for conserved and PAV genes in MoO and MoT. b) Violin plot showing the distribution of the distances to the nearest TE for conserved and PAV genes in MoO and MoT. c) Percentages and proportions of PAV and conserved genes that are within 5,000 bp of a TE in MoO and MoT. Dashed lines in density plots represent the median values for all genes in both pathotypes. Rectangles within violin plots represent interquartile ranges, dark lines represent medians, and dots represent the means with outliers removed. Statistics and statistical comparisons for data shown are listed in Supplementary Files 7, 8, 9, and 10.
Fig. 5.
Fig. 5.
PAV genes are distinct from conserved genes in many ways beyond their proximity to TEs. Violin plots showing the distributions of a) gene GC content, b) gene lengths, c) expression in culture, d) expression in planta, and e) normalized H3K27me3 histone mark ChIP-Seq signal for PAV and conserved genes in MoO and MoT. In panel e), MoT genes were not included as these data are not available for MoT. Rectangles within violin plots represent interquartile ranges, dark lines represent medians, and dots represent the means with outliers removed. Statistics describing the distributions shown and statistical comparisons between these statistics are listed in Supplementary Files 11 and 12.
Fig. 6.
Fig. 6.
Random forest classifiers accurately identify PAV genes in MoO and MoT, but the models perform poorly on genes from the host they were not trained on. a) Confusion matrix showing average percentages for each classification outcome of the MoO random forest classifier when tested on MoO genes that it was not trained on. b) Decrease in the F1 statistic of the MoO random forest classifier when each feature is permuted in the testing data. Features described as questions are binary, all other features are continuous. c) Decrease in the F1 statistic of the MoT random forest classifier when each feature is permuted in the testing data. d) Decrease in the F1 statistic of the MoO random forest classifier trained on a subset of features (reduced MoO model) when each variable is permuted in the testing data. e) Confusion matrix showing average percentages for each classification outcome of the MoT random forest classifier when tested on MoO genes. f) Confusion matrix showing average percentages for each classification outcome of the MoO random forest classifier trained on a subset of features (reduced MoO model) when tested on MoT genes. g) Density plots showing the distribution of the distances to the nearest PAV gene for false positive and true positive predictions by the MoT random forest classifier when tested on MoO genes. h) Density plots showing the distribution of the distances to the nearest PAV gene for false negative and true positive predictions by the MoO random forest classifier trained on a subset of features (reduced MoO model) when tested on MoT genes.

Update of

Similar articles

Cited by

References

    1. Alexa A, Rahnenfuhrer J. 2023. topGO: enrichment analysis for gene ontology. doi:10.18129/B9.bioc.topGO. - DOI
    1. Badet T, Fouché S, Hartmann FE, Zala M, Croll D. 2021. Machine-learning predicts genomic determinants of meiosis-driven structural variation in a eukaryotic pathogen. Nat Commun. 12(1):3551. doi:10.1038/s41467-021-23862-x. - DOI - PMC - PubMed
    1. Badet T, Oggenfuss U, Abraham L, McDonald BA, Croll D. 2020. A 19-isolate reference-quality global pangenome for the fungal wheat pathogen Zymoseptoria tritici. BMC Biol. 18(1):12. doi:10.1186/s12915-020-0744-3. - DOI - PMC - PubMed
    1. Bao W, Kojima KK, Kohany O. 2015. Repbase update, a database of repetitive elements in eukaryotic genomes. Mob DNA. 6(1):4–9. doi:10.1186/s13100-015-0041-9. - DOI - PMC - PubMed
    1. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. 2009. BLAST+: architecture and applications. BMC Bioinformatics. 10(1):421. doi:10.1186/1471-2105-10-421. - DOI - PMC - PubMed

Publication types

Supplementary concepts