Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jun;174(2):904-921.
doi: 10.1104/pp.17.00295. Epub 2017 Apr 26.

Insights into the Evolution of Hydroxyproline-Rich Glycoproteins from 1000 Plant Transcriptomes

Kim L Johnson  1   2   3   4   5   6   7   8   9   10   11   12   13 Andrew M Cassin  1   2   3   4   5   6   7   8   9   10   11   12   13 Andrew Lonsdale  1   2   3   4   5   6   7   8   9   10   11   12   13 Gane Ka-Shu Wong  1   2   3   4   5   6   7   8   9   10   11   12   13 Douglas E Soltis  1   2   3   4   5   6   7   8   9   10   11   12   13 Nicholas W Miles  1   2   3   4   5   6   7   8   9   10   11   12   13 Michael Melkonian  1   2   3   4   5   6   7   8   9   10   11   12   13 Barbara Melkonian  1   2   3   4   5   6   7   8   9   10   11   12   13 Michael K Deyholos  1   2   3   4   5   6   7   8   9   10   11   12   13 James Leebens-Mack  1   2   3   4   5   6   7   8   9   10   11   12   13 Carl J Rothfels  1   2   3   4   5   6   7   8   9   10   11   12   13 Dennis W Stevenson  1   2   3   4   5   6   7   8   9   10   11   12   13 Sean W Graham  1   2   3   4   5   6   7   8   9   10   11   12   13 Xumin Wang  1   2   3   4   5   6   7   8   9   10   11   12   13 Shuangxiu Wu  1   2   3   4   5   6   7   8   9   10   11   12   13 J Chris Pires  1   2   3   4   5   6   7   8   9   10   11   12   13 Patrick P Edger  1   2   3   4   5   6   7   8   9   10   11   12   13 Eric J Carpenter  1   2   3   4   5   6   7   8   9   10   11   12   13 Antony Bacic  1   2   3   4   5   6   7   8   9   10   11   12   13 Monika S Doblin  1   2   3   4   5   6   7   8   9   10   11   12   13 Carolyn J Schultz  14   15   16   17   18   19   20   21   22   23   24   25   26
Affiliations

Insights into the Evolution of Hydroxyproline-Rich Glycoproteins from 1000 Plant Transcriptomes

Kim L Johnson et al. Plant Physiol. 2017 Jun.

Abstract

The carbohydrate-rich cell walls of land plants and algae have been the focus of much interest given the value of cell wall-based products to our current and future economies. Hydroxyproline-rich glycoproteins (HRGPs), a major group of wall glycoproteins, play important roles in plant growth and development, yet little is known about how they have evolved in parallel with the polysaccharide components of walls. We investigate the origins and evolution of the HRGP superfamily, which is commonly divided into three major multigene families: the arabinogalactan proteins (AGPs), extensins (EXTs), and proline-rich proteins. Using motif and amino acid bias, a newly developed bioinformatics pipeline, we identified HRGPs in sequences from the 1000 Plants transcriptome project (www.onekp.com). Our analyses provide new insights into the evolution of HRGPs across major evolutionary milestones, including the transition to land and the early radiation of angiosperms. Significantly, data mining reveals the origin of glycosylphosphatidylinositol (GPI)-anchored AGPs in green algae and a 3- to 4-fold increase in GPI-AGPs in liverworts and mosses. The first detection of cross-linking (CL)-EXTs is observed in bryophytes, which suggests that CL-EXTs arose though the juxtaposition of preexisting SPn EXT glycomotifs with refined Y-based motifs. We also detected the loss of CL-EXT in a few lineages, including the grass family (Poaceae), that have a cell wall composition distinct from other monocots and eudicots. A key challenge in HRGP research is tracking individual HRGPs throughout evolution. Using the 1000 Plants output, we were able to find putative orthologs of Arabidopsis pollen-specific GPI-AGPs in basal eudicots.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Number of HRGP sequences identified by MAAB in each HRGP class by 1KP group. The number of data sets per clade is shown in parentheses after the 1KP group name. All MAAB classes are reported for multiple k-mers (multk: 39, 49, 59, 69; unshaded columns). No sequences were detected for HRGP class 14. Selected data are reported for assemblies with k-mer = 25 (k25; shaded columns). Red numbers indicate that the sequences detected within this class and 1KP group are likely to be contaminants (see “Results and Discussion”).
Figure 2.
Figure 2.
Summary of HRGP sequences identified using the MAAB pipeline. A, Percentage of total HRGP sequences (by 1KP group) that are found in the classical HRGP classes (1–4), hybrid HRGP classes (5–23), and non-HRGPs (class 24). B, Number of orders analyzed and number of samples per order for each 1KP group. The monocot group excludes the commelinid monocots. C, Overview of the mean number of HRGPs for each MAAB class in the 1KP data set (by 1KP group). Shading of boxes represents the detection rate, indicated as the percentage of orders with hits (Supplemental Table S2). A shaded box showing a detection of HRGPs with a value of zero indicates an average number of sequences between 0 and 0.1. Red numbers indicate that the sequences detected within this HRGP class and 1KP group are likely to be contaminants (see “Results and Discussion”).
Figure 3.
Figure 3.
Distribution and number of CL-EXTs (class 2) in the bryophytes. MAAB detected CL-EXTs in all hornwort species in the 1KP data, whereas the distribution in mosses and liverworts shows either loss of, or multiple independent gains of, CL-EXTs. TRAL of repeats identified in class 2 and class 24 (control) sequences were used to search the MAAB input sequences and identify repeats present in the bryophyte lineages. There is good correlation between the species with CL-EXTs identified by MAAB and those with SPn/Y-based repeats identified by TRAL using class 2 repeats. No CL-EXTs were identified using TRAL repeats from class 24 sequences. The phylogenetic tree is based on Cole and Hilger (2013) and was selected because it includes all bryophyte orders.
Figure 4.
Figure 4.
A, Mean number of MAAB class 2 CL-EXT sequences identified using the MAAB pipeline within each family of monocots/commelinids. B, Many families lack CL-EXTs, but this is not due to poor-quality transcriptomes, as BUSCO analysis shows that the percentage of 956 single-copy orthologous genes found in the monocots/commelinids families is high (average ∼80%) Family data (k-mer 39 only) are summarized in a box-and-whisker plot after the removal of outliers (two Poaceae spp., 59.4 and 65.7; and one Aracaeae sp., 18.1). C, The mean number of class 1 GPI-AGPs was evaluated, and these are present in all families that lack CL-EXTs. Only Cannaceae and Restionaceae have both GPI-AGPs and CL-EXTs, although for most families, the number of data sets is low (one to three). The total number of data sets per family is indicated in parentheses.
Figure 5.
Figure 5.
Maximum likelihood (ML) tree of Brassicales 1KP and Arabidopsis GPI-AGPs (A) and CL-EXTs (B). ML trees generally show strong support for subclades with Arabidopsis sequences and 1KP sequences from family Brassicaceae. Putative GPI-AGP subclades (A) are separated by a horizontal dotted line, and the Arabidopsis orthologs are indicated by boxes (right of tree). Sequences from the Brassicaceae are in green, with Arabidopsis sequences in larger, boldface font. AtEXT3 was included with CL-EXTs (B) even though it was classified as class 20 because of its shared bias (less than 2% difference [Δ]) between percentage PSKY (74.2%) and percentage PVYK (73.1%; Johnson et al., 2017). Numbers on the nodes represent support with 100 bootstrap replicates (70 or greater, green; 60–69, orange; 40–59, black). 1KP data sets (1KP locus identifier and family name) are shown on the Brassicales tree (inset [Stevens, 2001]; version 12, July 2012, http://www.mobot.org/mobot/research/apweb/). Scale bars for branch length measure the number of substitutions per site. Sequences used to generate the trees are shown in: GPI-AGPs Brassicales (Supplemental Fig. S5A) and CL-EXTs Brassicales (Supplemental Fig. S5B).
Figure 6.
Figure 6.
Phylogenetic analysis of class 1 GPI-AGPs to identify AtAGP6/11 orthologs. The ML tree (MEGA) was constructed using putative AtAGP6/11 orthologs identified predominantly from 1KP transcriptomes and HMMER model 1 (Supplemental Table S3), with a few sequences from other sources to increase the breadth of sampling (see “Materials and Methods”; Supplemental Table S4). The tree also includes at least one sequence from each Arabidopsis GPI-AGP subclade, representative rice GPI-AGPs identified by MAAB (see Fig. 2 in Johnson et al., 2017), and four GPI-AGPs from A. trichopoda (Amtri_ERM96654, Amtri_ERM95342, Amtri_ERN01113, and Amtri_ERN06202). Sequence names are colored by 1KP group: basal angiosperms (gray), noncommelinid monocots (purple), commelinid monocots (pink), basal eudicots (aqua), core eudicots (green), asterids (red), and rosids (orange). Numbers on the nodes represent support with 100 bootstrap replicates (70 or greater, green; 60–69 orange; 40–59, black). 1KP sequences are identified by Group_Order_1KP identifier_sequence locus. Genomic sequences and other sequences from NCBI follow a similar format but include a five-letter abbreviation of genus and species. Symbols next to sequence names are used to indicate the source and MAAB class of sequences and other relevant information. An asterisk after the sequence name indicates that additional information is provided in Supplemental Table S4; for example, the YNXR (Poales) data set is contaminated with Asparagales, although the ML tree suggests that these sequences are indeed Poales. The scale bar for branch length measures the number of substitutions per site.
Figure 7.
Figure 7.
Schematic representation of the GPI-AGP subclades showing the presence and distribution of specific amino acids in the PAST-rich protein backbone. The order of GPI-AGPs is based on clades in the angiosperm GPI-AGP tree (Supplemental Fig. S3); figure parts A to K are based on alignments (Supplemental Fig. S4), and the AtAGP5/10 (G) subclade is included for comparison (based on the AGP-c subclade; see Fig. 2C in Johnson et al., 2017). A, Putative orthologs of Lys-rich AtAGP17/18 have scattered Lys (K) residues throughout the sequence and contain a short K-rich domain near the C terminus. The glycomotifs are predominantly XP1-2. B, The AGP9 subclade also has a short Lys-rich region with three or more K residues and scattered K residues; however, the glycomotifs are distinct from AtAGP17/18, being predominantly [S/T]P3. C, D, and G to I, AGP4/7/10 (C), AGP2/3 (D), AGP5/10 (G), AGP5 (H), and AGP1 (I) subclades have a classical PAST-rich backbone with no obvious bias toward other amino acids. E, Most of the putative orthologs of AtAGP58 are relatively Met (M) rich, with scattered M residues between AGP glycomotifs and a Q residue at the presumed N terminus. This Q residue at the presumed mature N terminus also is found in putative orthologs of Lys-rich AtAGP17/18 (A), AtAGP9 (B), AtAGP1/2/3/4/7/5/10 (C, D, and G–I), and AtAGP58 (E) but not AtAGP25/26/27 (F) or AGP6/11/59 (J/K). J/K, Putative orthologs of AtAGP6/11/59 share scattered K residues, concentrated in the first half of the protein and also a small cluster of acidic residues, either Asp (D) or Glu (E).

References

    1. Adair WS, Hwang C, Goodenough UW (1983) Identification and visualization of the sexual agglutinin from the mating-type plus flagellar membrane of Chlamydomonas. Cell 33: 183–193 - PubMed
    1. APG IV (2016) An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG IV. Bot J Linn Soc 181: 1–20
    1. Averyhart-Fullard V, Datta K, Marcus A (1988) A hydroxyproline-rich protein in the soybean cell wall. Proc Natl Acad Sci USA 85: 1082–1085 - PMC - PubMed
    1. Bacic A, Harris PJ, Stone BA (1988) Structure and function of plant cell walls. In Priess J, ed, The Biochemistry of Plants, Vol 14 Academic Press, New York, pp 297–371
    1. Basile DV, Basile MR (1987) The occurrence of cell wall-associated arabinogalactan proteins in the Hepaticae. Bryologist 90: 401–404

LinkOut - more resources