Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May 5:10:e67784.
doi: 10.7554/eLife.67784.

Genetic variation, environment and demography intersect to shape Arabidopsis defense metabolite variation across Europe

Affiliations

Genetic variation, environment and demography intersect to shape Arabidopsis defense metabolite variation across Europe

Ella Katz et al. Elife. .

Abstract

Plants produce diverse metabolites to cope with the challenges presented by complex and ever-changing environments. These challenges drive the diversification of specialized metabolites within and between plant species. However, we are just beginning to understand how frequently new alleles arise controlling specialized metabolite diversity and how the geographic distribution of these alleles may be structured by ecological and demographic pressures. Here, we measure the variation in specialized metabolites across a population of 797 natural Arabidopsis thaliana accessions. We show that a combination of geography, environmental parameters, demography and different genetic processes all combine to influence the specific chemotypes and their distribution. This showed that causal loci in specialized metabolism contain frequent independently generated alleles with patterns suggesting potential within-species convergence. This provides a new perspective about the complexity of the selective forces and mechanisms that shape the generation and distribution of allelic variation that may influence local adaptation.

Keywords: A. thaliana; Arabidopsis thaliana; convergence evolution; ecology; glucosinolates; parallel evolution; plant biology; specialized metabolites.

Plain language summary

Since plants cannot move, they have evolved chemical defenses to help them respond to changes in their surroundings. For example, where animals run from predators, plants may produce toxins to put predators off. This approach is why plants are such a rich source of drugs, poisons, dyes and other useful substances. The chemicals plants produce are known as specialized metabolites, and they can change a lot between, and even within, plant species. The variety of specialized metabolites is a result of genetic changes and evolution over millions of years. Evolution is a slow process, yet plants are able to rapidly develop new specialized metabolites to protect them from new threats. Even different populations of the same species produce many distinct metabolites that help them survive in their surroundings. However, the factors that lead plants to produce new metabolites are not well understood, and it is not known how this affects genetic variation. To gain a better understanding of this process, Katz et al. studied 797 European variants of a common weed species called Arabidopsis thaliana, which is widely studied. The investigation found that many factors affect the range of specialized metabolites in each variant. These included local geography and environment, as well as genetics and population history (demography). Katz et al. revealed a pattern of relationships between the variants that could mirror their evolutionary history as the species spread and adapted to new locations. These results highlight the complex network of factors that affect plant evolution. Rapid diversification is key to plant survival in new and changing environments and has resulted in a wide range of specialized metabolites. As such they are of interest both for studying plant evolution and for understanding their ecology. Expanding similar work to more populations and other species will broaden the scope of our ability to understand how plants adapt to their surroundings.

PubMed Disclaimer

Conflict of interest statement

EK, JL, BJ, HA, SA, CB, SH, CP, RA No competing interests declared, DK Reviewing editor, eLife

Figures

Figure 1.
Figure 1.. Parallel and convergent evolution.
The schema describes our use of parallel (A) and convergent (B) evolution for within-species chemotypic variation. The letters in the blue box represent the state of the source/ancestral haplotypes. The letters within the yellow box represent the newly derived haplotypes that arose by genetic mutation in the source haplotype. Finally, X, Y and Z show the chemotypes that arise from each haplotype. Blue and red arrows represent parallel or convergent genetic changes (respectively), while mustard arrows represent the enzymatic result.
Figure 2.
Figure 2.. Aliphatic glucosinolate (GSL) biosynthesis pathway.
Short names and structures of the GSLs are in black. Genes encoding the causal enzyme for each reaction (arrow) are in gray. GS-OX is a gene family of five or more genes. OH-But: 2-OH-3-Butenyl.
Figure 3.
Figure 3.. Glucosinolate variation across Europe is dominated by two loci.
(A) The accessions are plotted on the map based on their collection site and colored based on their principal component (PC)1 score. (B) Manhattan plot of genome-wideassociation analyses using PC1. Horizontal lines represent 5% significance thresholds using Bonferroni (red) and Benjamini–Hochberg (blue).
Figure 3—figure supplement 1.
Figure 3—figure supplement 1.. Glucosinolate (GSL)-based principal component (PC) analysis.
(A) Percentage of variance explained by each PC. (B, C) Contribution of the individual GSLs to PC1 (B) and PC2 (C). Red bars: contribution of four carbon GSLs; blue bars: contribution of three carbons GSLs. ± above the bar indicates if the contribution of the variable is positive or negative. (D) Linear model for PC1 and PC2 scores with the geographic parameters. Lat: latitude; Long: longitude.
Figure 3—figure supplement 2.
Figure 3—figure supplement 2.. Glucosinolate variation across Europe is dominated by two loci.
(A) The accessions were plotted on the map based on their collection site and colored based on their principal component (PC)2 score. (B) Manhattan plot of genome-wideassociation analyses using PC2. Horizontal lines represent 5% significance thresholds using Bonferroni (red) and Benjamini–Hochberg (blue).
Figure 3—figure supplement 3.
Figure 3—figure supplement 3.. Manhattan plots of genome-wideassociation performed based on individual glucosinolate amounts as traits.
Horizontal lines represent 5% significance thresholds using Bonferroni (red) and Benjamini–Hochberg (blue).
Figure 4.
Figure 4.. Phenotypic classification based on glucosinolate (GSL) content.
(A) Using the GSL accumulation, each accession was classified to one of seven aliphatic short-chained GSL chemotypes based on the enzyme functions as follows: MAM2, AOP null: classified as 3MSO dominant, colored in yellow. MAM1, AOP null: classified as 4MSO dominant, colored in pink. MAM2, AOP3: classified as 3OHP dominant, colored in green. MAM1, AOP3: classified as 4OHB dominant, colored in light blue. MAM2, AOP2: classified as Allyl dominant, colored in blue. MAM1, AOP2, GS-OH non-functional: classified as 3-Butenyl dominant, colored in black. MAM1, AOP2, GS-OH functional: classified as 2-OH-3-Butenyl dominant, colored in red. The accessions were plotted on a map based on their collection sites and colored based on their dominant chemotype. (B) The coloring scheme with functional GSL enzymes in the aliphatic GSL pathway is shown with the percentage of accessions in each chemotypes (out of the total 797 accessions) shown in each box.
Figure 4—figure supplement 1.
Figure 4—figure supplement 1.. Phenotypic classification based on the dominant MAM enzyme.
Accessions were classified based on the side chain length of the aliphatic short-chained glucosinolates (GSLs). Accessions with a majority of GSLs containing three carbons in their side chains are classified as MAM2 dominant and colored in blue. Accessions with the majority of aliphatic short-chained GSLs containing four carbons in their side chains are classified as MAM1 dominant and colored in red. The accessions were plotted on a map based on their original collection sites.
Figure 4—figure supplement 2.
Figure 4—figure supplement 2.. Phenotypic classification based on the dominant AOP enzymes.
Relative amounts of alkenyl glucosinolates (GSLs), alkyl GSLs and methylsulfinyl (MSO) GSLs were calculated in respect to the total short-chained aliphatic GSLs as described in the Methods section. Accessions with high amounts of alkenyl GSLs were classified as AOP2 dominant and colored in pink. Accessions with high amounts of alkyl GSLs were classified as AOP3 dominant and colored in orange. Accessions with high amounts of MSO GSLs were classified as AOP null and colored in green. The accessions were plotted on a map based on their original collection sites.
Figure 4—figure supplement 3.
Figure 4—figure supplement 3.. Phenotypic classification based on GS-OH enzyme activity.
The ratio between 2-OH-3-Butenyl to 3-Butenyl glucosinolate (GSL) was calculated only for MAM1-dominant accessions (accessions with GSLs containing four carbons in their side chain). Accessions with high amounts of 2-OH-3-Butenyl were classified as GS-OH functional and colored in black. Accessions with mostly 3-Butenyl were classified as GS-OH non-functional and colored in brown. The accessions were plotted on a map based on their original collection sites.
Figure 4—figure supplement 4.
Figure 4—figure supplement 4.. Geographic partitioning of the collection.
(A) The accessions were divided to two collections using the following chain of mountains: the Pyrenees between Spain and France, the Alps between Italy and Germany, and the Carpathians in the Balkan. The accessions that are located north of these mountains are referred to as the northern accessions and colored in green. The accessions located south of these mountains are referred to as the southern accessions and colored in pink. (B) The percentage of each chemotype was independently calculated in the south and north. Butenyl: 3-Butenyl; OH-But: 2-OH-3-Butenyl.
Figure 5.
Figure 5.. MAM3 phylogeny.
(A) MAM3 phylogeny of Arabidopsis thaliana accessions, rooted by Arabidopsis lyrata MAMb, which is not shown because of distance. Tree tips are colored based on the accession chemotype. (B) The genomic structure of the GS-Elong regions in the previously sequenced accessions is shown based on Kroymann et al., 2003. The structures in the box are based on sequences obtained in this work. The numbers left to the structures indicate the number of sequenced accessions in this work (left) or by Kroymann et al., 2003 (right). The numbers are colored based on their clades. Bright gray arrows represent MAM1 sequences, and dashed arrows represent MAM2 sequences. Dark gray arrows represent MAM3 sequences. The number to the right of the genomic cartoon represents the number of carbons in the side chain. (C) Collection sites of the accessions colored by their clade classification (from section A) and shaped based on the side chain length of the aliphatic short-chained glucosinolates (circles for C3, triangles for C4).
Figure 5—figure supplement 1.
Figure 5—figure supplement 1.. Support for the MAM3 tree clades classification.
(A) MAM3 phylogeny of 637 Arabidopsis thaliana accessions, rooted by Arabidopsis lyrata MAMb, excluding accessions with low-quality sequences. (BE) MAM3 phylogeny of different combinations of A. thaliana accessions that were randomly chosen, rooted by A. lyrata MAMb. Bootstrap values >60 are indicated. In these trees, clade 2 was divided to two sub-clades. Clade’s MAM classification: clade 1, MAM2; clade 2, MAM1; clade 3, MAM2; clade 4, MAM1; clade 5, MAM1; clade 6, MAM2; clade 7, MAM1; clade 8, MAM1. (F) MYB37 phylogeny of A. thaliana accessions, rooted by the A. lyrate’s homologue (Al_scaffold_0006_2171, on chromosome 6), which is not shown because of distance. Tree tips are colored based on MAM3 clade’s classification as indicated in Figure 5A.
Figure 5—figure supplement 2.
Figure 5—figure supplement 2.. Genomic structure of the GS-Elong regions.
The GS-Elong locus from different accessions was sequenced, and the MAM1 and MAM2 structures were analyzed. The table indicates the dominant chemotype of each accession, the MAM status of each accession, and the number of accessions with the same structure that were sequences in this work. (A) Clade 1. (B) Clade 2. (C) Clade 3. (D) Clade 4. (E) Clade 5. (F) Clade 6. (G) Clade 7. T-1: truncated MAM1 (contains ~66% of 3’ MAM1). *2/1: chimeric version that contain ~50% of 5’ MAM2 and ~50% 3’ MAM1. ** 2/1: chimeric version that contain ~20% of 5’ MAM2 and ~80% 3’ MAM1.
Figure 5—figure supplement 3.
Figure 5—figure supplement 3.. MAM2 is an Arabidopsis thaliana specific gene.
Domain (A) and full sequence (B) amino acid phylogenies of the MAM/IPMS gene family. Sequences were taken from Abrahams et al., 2020, which uses Arabidopsis thalina Col-0 genome and the MAM2 amino acid sequence 1006452109 from the Arabidopsis Information Resource (TAIR) database. Both MAM1 and MAM2 fall within the MAMa domain clade. MAM1 and MAM2 are sister to each other in either tree, with the next closest MAM gene belonging to Arabidopsis lyrata. These results support a recent duplication event specific to Arabidopsis thaliana and subsequent divergence and specialization.
Figure 5—figure supplement 4.
Figure 5—figure supplement 4.. Iberia Peninsula presents low phenotypic variability and high genetic variation.
(A) All accessions from Iberia were plotted, colored and shaped based on the side chain length of the aliphatic short chained GSLs. Accessions with a majority of GSLs containing 3 carbons in their side chains are classified as MAM2 dominant, colored in blue and indicated as circles. Accessions with the majority of aliphatic short chained GSLs containing 4 carbons in their side chains are classified as MAM1 dominant, colored in red and indicated as triangles. (B) All accessions from Iberia were plotted, colored based on their MAM3 clade classification, and shaped based on the side chain length. (C) All accessions from Iberia except clades 6 and 7 were plotted, colored based on their MAM3 clade classification, and shaped based on the side chain length. (D) Clades 6 and 7 from Iberia were plotted, colored based on their MAM3 clade classification, and shaped based on the side chain length.
Figure 5—figure supplement 5.
Figure 5—figure supplement 5.. Geographic distribution of MAM haplotypes.
The MAM phylogeny is split by the major clades/haplotypes and each sub-clade’s phylogeny is reflected on the map. Tree tips are colored based on the accessions chemotype.
Figure 6.
Figure 6.. AOP genomic structure.
The genomic structure and causality of the major AOP2/AOP3 haplotypes are illustrated. Pink arrows show the AOP2 gene while yellow arrows represent AOP3. The black arrows represent the direction of transcription from the AOP2 promoter as defined in the Col-0 reference genome. Its position does not change in any of the regions. A-F represent the different structures. The black lines in C and F represent theoretical positions of independent variants creating premature stop codons. The GSL chemotype for each haplotype is listed to the right with the number of the accessions in brackets. The maps show the geographic distribution of the accessions from each structure.
Figure 6—figure supplement 1.
Figure 6—figure supplement 1.. AOP phylogeny.
Separate phylogenies of AOP2 (A) and AOP3 (B) across Arabidopsis thaliana accessions. The trees are rooted by the matching gene in Arabidopsis lyrata, which is not shown because of distance. Tree tips are colored based on accessions’ chemotype. The first column of each heatmap represents the dominant chemotype’s identity. The second column of each heatmap represents the AOP functionality: pink for alkenyl (AOP2 dominant), orange for hydroxy (AOP3 dominant) and green for MSO (null). The third column represents the accessions’ geographic location: dark green for the north and dark pink for the south. The named accessions represent the haplotypes described in Figure 6.

Comment in

References

    1. Abdi H, Williams LJ. Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics. 2010;2:433–459. doi: 10.1002/wics.101. - DOI
    1. Abrahams RS, Pires JC, Schranz ME. Genomic origin and diversification of the glucosinolate MAM locus. Frontiers in Plant Science. 2020;11:711. doi: 10.3389/fpls.2020.00711. - DOI - PMC - PubMed
    1. Agrawal AA. Overcompensation of plants in response to herbivory and the by-product benefits of mutualism. Trends in Plant Science. 2000;5:309–313. doi: 10.1016/s1360-1385(00)01679-4. - DOI - PubMed
    1. Atwell S, Huang YS, Vilhjálmsson BJ, Willems G, Horton M, Li Y, Nordborg M. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature. 2010;465:627–631. doi: 10.1038/nature08800. - DOI - PMC - PubMed
    1. Bakker EG, Traw MB, Toomajian C, Kreitman M, Bergelson J. Low levels of polymorphism in genes that control the activation of defense response in Arabidopsis thaliana. Genetics. 2008;178:2031–2043. doi: 10.1534/genetics.107.083279. - DOI - PMC - PubMed

Publication types