Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Aug 11;106(32):13421-6.
doi: 10.1073/pnas.0905818106. Epub 2009 Jul 28.

Darwinian selection for sites of Asn-linked glycosylation in phylogenetically disparate eukaryotes and viruses

Affiliations

Darwinian selection for sites of Asn-linked glycosylation in phylogenetically disparate eukaryotes and viruses

Jike Cui et al. Proc Natl Acad Sci U S A. .

Abstract

Numerous protists and rare fungi have truncated Asn-linked glycan precursors and lack N-glycan-dependent quality control (QC) systems for glycoprotein folding in the endoplasmic reticulum. Here, we show that the abundance of sequons (NXT or NXS), which are sites for N-glycosylation of secreted and membrane proteins, varies by more than a factor of 4 among phylogenetically diverse eukaryotes, based on a few variables. There is positive correlation between the density of sequons and the AT content of coding regions, although no causality can be inferred. In contrast, there appears to be Darwinian selection for sequons containing Thr, but not Ser, in eukaryotes that have N-glycan-dependent QC systems. Selection for sequons with Thr, which nearly doubles the sequon density in human secreted and membrane proteins, occurs by an increased conditional probability that Asn and Thr are present in sequons rather than elsewhere. Increasing sequon densities of the hemagglutinin (HA) of influenza viruses A/H3N2 and A/H1N1 during the past few decades of human infection also result from an increased conditional probability that Asn, Thr, and Ser are present in sequons rather than elsewhere. In contrast, there is no selection on sequons by this mechanism in HA of A/H5N1 or 2009 A/H1N1 (Swine flu). Very strong selection for sequons with both Thr and Ser in glycoprotein of M(r) 120,000 (gp120) of HIV and related retroviruses results from this same mechanism, as well as amino acid composition bias and increases in AT content. We conclude that there is Darwinian selection for sequons in phylogenetically disparate eukaryotes and viruses.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
The density of sequons is positively correlated with the AT richness of the coding regions of each eukaryote. (A) Sequon densities (average number of sequons per 500 aa plus SD) for the secreted and membrane proteins are similar among metazoans and fungi but are much more variant among protists. (B) AT contents of coding regions of predicted secreted and membrane proteins are also similar among metazoans and fungi but are much more variant among protists. (C) Sequon density is not directly related to length of N-glycan precursors, where metazoans are numbered [Anopheles gambiae (1), Caenorhabditis elegans (2), Canis familiaris (3), Ciona intestinalis (4), Danio rerio (6), Drosophila melanogaster (5), Homo sapiens (7), Muris muscularis (8), and Tetraodon nigroviridis (9). Fungi are in lowercase [Antonospora locustae (a), Aspergillus nidulans (b), Candida albicans (c), Cryptococcus neoformans (d), Encephalitozoon cuniculi (e), Gibberella zeae (f), Kluyveromyces lactis (g), Magnaporthe grisea (h), Neurospora crassa (i), Saccharomyces cerevisiae (j), Schizosaccharomyces pombe (k), Ustilago maydis (m), and Yarrowia lipolytica (n)]. Protists are in uppercase [Cryptosporidium parvum (A), Dictyostelium discoideum (B), Entamoeba histolytica (C), Giardia lamblia (D), Leishmania major (E), Plasmodium falciparum (F), Theileria anulata (G), Trypanosoma cruzi (H), and Trichomonas vaginalis (J)]. One plant (Arabidopsis thaliana) is marked with a plus sign. Eukaryotes that have N-glycan-dependent QC of glycoprotein folding are marked in blue. Eukaryotes that lack N-glycan-dependent QC of glycoprotein folding are marked in red. (D) Sequon density is positively correlated with AT content of secreted and membrane proteins of all eukaryotes (R2 values are 0.68 and 0.89 for blue and red lines, respectively). An analysis of variance shows AT content accounts for 63% of the variance, whereas N-glycan-dependent QC accounts for 11%. The percentage of predicted secreted and membrane proteins with at least 1 sequon is also correlated with the AT richness (Fig. S1). In addition, when AT content is ≤55%, the sequon densities of secreted proteins of eukaryotes with N-glycan-dependent QC (marked in blue) are significantly greater than those of eukaryotes without QC (marked in red) by using rank-sum test at α = 5%. (E) Sequon density is positively correlated with AT content, because Asn is encoded by AA(TC), whereas Pro, which cannot be in sequons, is encoded by CC(AGCT) (R2 values are 0.91 and 0.71 for Asn and Pro, respectively).
Fig. 2.
Fig. 2.
Darwinian selection for sequons with Thr in the secreted and membrane proteins of eukaryotes with N-glycan-dependent QC of proteins folding is based for the most part on an increased conditional probability that Asn and Thr will be present in sequons rather than elsewhere in these proteins. (A) Density of sequons with Thr (number per 500 aa) in secreted and membrane proteins versus nucleocytosolic proteins (negative controls) for each organism, which are abbreviated as in Fig. 1. In the case of eukaryotes that have N-glycan-dependent QC of glycoprotein folding and are marked in blue, there is moderately strong selection for sequons with Thr, so that the slope of the blue line is 1.7 (R2 = 0.46) rather than 1. In contrast, eukaryotes that lack N-glycan-dependent QC of glycoprotein folding and are marked in red show no selection, so the slope of the red line is 1.1 (R2 = 0.91). (B) Density of sequons with Ser (number per 500 aa) in secreted and membrane proteins versus nucleocytosolic proteins for each organism. There is no selection for or against sequons with Ser in eukaryotes with or without N-glycan-dependent QC, so that all points fall on the dotted line with the slope of 1 and intercept of 0. (C) The mechanisms for positive selection for sequons with Thr is shown by plotting the actual density of sequons with Thr in secreted and membrane proteins versus that calculated by the Asn, Thr, and Pro frequencies for each organism (i.e., the expected density). In eukaryotes with N-glycan-dependent QC that are marked in blue, there is an increased conditional probability that Asn and Thr will be in sequons rather than elsewhere in secreted proteins, so that the slope of the blue line is 1.5 (R2 = 0.74). In contrast, there is no increased conditional probability that Asn and Thr will be in sequons rather than elsewhere in secreted proteins of eukaryotes without N-glycan-dependent QC, so that the slope of the red line is 1.0 (R2 = 0.98). In Fig. S2, a small positive selection for amino acid composition bias and negative results for AT-content bias are shown.
Fig. 3.
Fig. 3.
Increasing sequon densities of HA of A/H3N2 (Left) and A/H1N1 (Center) strains of influenza virus with antigenic drift results from an increased conditional probability that Asn, Thr, and Ser will be present in sequons rather than elsewhere in HA. Selection for sequons (solid arrow) based on this mechanism, which is determined by comparing actual (solid pink triangles) versus calculated or expected (open pink triangles) sequon densities for HA, increases with time. As a control, there is no selection for sequons in viral capsid and polymerases (capsidic proteins), where the observed density of sequons (solid blue circles) equals the expected density (open blue circles). In contrast, amino acid composition bias (white arrow), which is determined by comparing the expected sequon density of HA (open pink triangles) with that of capsid and polymerases of influenza viruses (open blue circles), remains the same with time. The HA proteins of A/H5N1 and 2009 A/H1N1 (Right) show modest selection based on amino acid composition bias but do not show selection based on an increased conditional probability of Asn, Thr, and Ser being present in sequons rather than elsewhere in HA (20, 25). Changes in the amino acid sequences of A/H3N2 influenza proteins with time is shown in Fig. S3.
Fig. 4.
Fig. 4.
Very strong positive selection for sequons with Thr and Ser in gp120 of HIV and other retroviruses results from an increased conditional probability that Asn, Thr, and Ser will be present in sequons rather than elsewhere for selection in gp120, amino acid composition bias, and changes in AT content. (A) Density of sequons per 500 aa in gp120 versus capsid proteins and enzymes (negative controls) for various retroviruses, which are abbreviated and color-coded as follows: HIV-1 strains (A–O) are marked with lowercase blue letters. HIV-2 strains are marked with uppercase red letters, whereas other lentiviruses are marked with green numbers. Very strong selection for sequons in gp120 of all of the retroviruses dwarfs the relatively modest selection for sequons in host-secreted and membrane proteins (marked with a red plus sign). (B) An important mechanism for positive selection for sequons in gp120 of retroviruses, which is based on an increased conditional probability that Asn, Thr, and Ser will be present in sequons rather than elsewhere in gp120, is shown by plotting the counted density of sequons in gp120 versus that calculated (expected value) by the Asn, Thr, Ser, and Pro content of gp120. (C) Amino acid composition bias, which increases the number of sequons in gp120, is shown by plotting the calculated sequon density of gp120 versus that of capsid and enzymes. Although Asn, Ser, and Thr are relatively increased in gp120 versus retroviral capsid and enzymes, Pro is decreased (Fig. S4). (D) AT content of gp120 coding sequence versus the rest of the coding sequence of the retrovirus shows there is moderate positive selection for AT in gp120s of all retroviruses examined. Fig. S5 shows there is no change in sequon densities of gp120 of HIV strains A1, B, C, and D with time.

References

    1. Helenius A, Aebi M. Roles of N-linked glycans in the endoplasmic reticulum. Annu Rev Biochem. 2004;73:1019–1049. - PubMed
    1. Samuelson J, et al. The diversity of protist and fungal dolichol-linked precursors to Asn-linked glycans likely results from secondary loss of sets of glycosyltransferases. Proc Natl Acad Sci USA. 2005;102:1548–1553. - PMC - PubMed
    1. Kornfeld R, Kornfeld S. Assembly of asparagine-linked oligosaccharides. Annu Rev Biochem. 1985;54:631–664. - PubMed
    1. Apweiler R, Hermjakob H, Sharon N. On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database. Biochim Biophys Acta. 1999;1473:4–8. - PubMed
    1. Ben-Dor S, Esterman N, Rubin E, Sharon N. Biases and complex patterns in the residues flanking protein N-glycosylation sites. Glycobiology. 2004;14:95–101. - PubMed

Publication types

MeSH terms

LinkOut - more resources