Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Apr 26:10:781.
doi: 10.3389/fmicb.2019.00781. eCollection 2019.

Agglutinin-Like Sequence (ALS) Genes in the Candida parapsilosis Species Complex: Blurring the Boundaries Between Gene Families That Encode Cell-Wall Proteins

Affiliations

Agglutinin-Like Sequence (ALS) Genes in the Candida parapsilosis Species Complex: Blurring the Boundaries Between Gene Families That Encode Cell-Wall Proteins

Soon-Hwan Oh et al. Front Microbiol. .

Abstract

The agglutinin-like sequence (Als) proteins are best-characterized in Candida albicans and known for their role in adhesion of the fungal cell to host and abiotic surfaces. ALS sequences are often misassembled in whole-genome sequence data because each species has multiple ALS loci that contain similar sequences, most notably tandem copies of highly conserved repeated sequences. The Candida parapsilosis species complex includes Candida parapsilosis, Candida orthopsilosis, and Candida metapsilosis, three distinct but closely related species. Using publicly available genome resources, de novo genome assemblies, and laboratory experimentation including Sanger sequencing, five ALS genes were characterized in C. parapsilosis strain CDC317, three in C. orthopsilosis strain 90-125, and four in C. metapsilosis strain ATCC 96143. The newly characterized ALS genes shared similar features with the well-known C. albicans ALS family, but also displayed unique attributes such as novel short, imperfect repeat sequences that were found in other genes encoding fungal cell-wall proteins. Evidence of recombination between ALS sequences and other genes was most obvious in CmALS2265, which had the 5' end of an ALS gene and the repeated sequences and 3' end from the IFF/HYR family. Together, these results blur the boundaries between the fungal cell-wall families that were defined in C. albicans. TaqMan assays were used to quantify relative expression for each ALS gene. Some measurements were complicated by the assay location within the ALS gene. Considerable variation was noted in relative gene expression for isolates of the same species. Overall, however, there was a trend toward higher relative gene expression in saturated cultures rather than younger cultures. This work provides a complete description of the ALS genes in the C. parapsilosis species complex and a toolkit that promotes further investigations into the role of the Als proteins in host-fungal interactions.

Keywords: ALS family; Candida species; adhesion; agglutinin-like sequence genes; cell-wall proteins; fungi.

PubMed Disclaimer

Figures

FIGURE 1
FIGURE 1
Schematic of chromosomal locations of ALS genes in C. parapsilosis, C. orthopsilosis, and C. metapsilosis, drawn to scale. (A) Location of contiguous ALS genes on C. parapsilosis contig 006372, C. orthopsilosis chromosome 3, and contig 3 from the C. metapsilosis genome assembly PQNC00000000. ALS gene names are in bold italics type. Arrows show the direction of transcription for each gene. Flanking sequences were identified to indicate gene order conservation among the three species. Names of the flanking ORFs were taken from the Candida Genome Database (www.candidagenome.org) and/or GenBank. (B) Location of genes CpALS0660 (C. parapsilosis contig 006139), CoALS800 (C. orthopsilosis chromosome 2), and CmALS800 (C. metapsilosis contig 2). Features were drawn similarly to (A). (C) Location of the fourth C. metapsilosis ALS gene (CmALS2265) on contig 6. This region was orthologous to C. parapsilosis contig 005504 and C. orthopsilosis chromosome 6 but contained approximately 16 kb of extra sequence. The region contained the reading frame for CmIff61 (GenBank accession number MK205443) that encoded a 341-amino acid protein not present in the other species. Gene names were shown above other coding regions. ??? encoded a hypothetical protein (CPAR2_601270; CORT_0F02270) that was conserved in all three species, but shorter in C. metapsilosis. MAP2 = CPAR2_601280, CORT_0F02280; SBA1 = CPAR2_601260, CORT_0F02260; MRP4 = CPAR2_601250, CORT_0F02250. Some genes in this region (ALA1, MRP4, SBA1) were reminiscent of those that are located near CaALS5, CaALS1, and CaALS9 on C. albicans chromosome 6 (Zhao et al., 2003).
FIGURE 2
FIGURE 2
Schematic of proteins predicted from the ALS genes of the C. parapsilosis species complex. Each predicted protein had a secretory signal peptide, a classical NT-Als domain with eight conserved Cys residues to direct folding, an amyloid-forming region (AFR; Garcia et al., 2011), a Thr-rich sequence (T domain), a C-terminal domain rich in Ser/Thr, and a signal for addition of a GPI anchor that directs the mature protein to a final localization linked to β-1,6-glucan in the fungal cell wall (Lu et al., 1994). Only 5 of the 12 proteins included a central domain of tandemly repeated sequences like those found in C. albicans Als proteins (Hoyer et al., 2008). Different colors of the repeated units indicated differences in consensus sequence; shading within the same protein indicated repeat units that varied in the number of amino acids. The other proteins had regions of short, imperfect repeated sequences such as Ser-Ser-Ser-Glu-Pro-Pro (SSSEPP) and/or Gly-Ser-Gly-Asn (GSGN). CmAls2265 had the NT-Als domain attached to tandemly repeated sequences from the Iff/Hyr family (Bates et al., 2007; Boisramé et al., 2011) and a C-terminal region more-characteristic of Iff/Hyr proteins than C. albicans Als proteins.
FIGURE 3
FIGURE 3
Phylogenetic tree of ALS sequences from C. albicans and the C. parapsilosis species complex. The tree was drawn using a ML analysis after removing ambiguous regions using Gblocks. Thickened branches indicated Bayesian posterior probabilities >95% while the numbers above the branches were bootstrap values >50%. The tree showed strong support for many branches.
FIGURE 4
FIGURE 4
Structural comparison among peptide-binding cavities (PBCs) of a model Als protein and selected proteins from the C. parapsilosis species complex. (A) Side-by-side comparison of the C1 binding pocket of CaAls9-2 (left, green carbons; Protein Data Bank ID 2Y7N; Salgado et al., 2011) and a structural model of CpAls4790 (right, gray carbons) bound to fibrinogen gamma peptide (magenta carbons). The pocket was more constrained in CpAls4790 by the Val22Arg substitution, suggesting a more-selective binding activity than present in CaAls9-2. (B) Schematic of the NT-Als PBC showing the pockets surrounding amino acids of the bound peptide ligand (C1 through C5, with C1 proximal to the invariant Lys at the bottom of the PBC). Only residues which form the binding pocket were shown. Example proteins (CpAls4790, CoAls4210, CmAls2265) were selected from diverse parts of the phylogenetic tree in Figure 3. Side chain residues were indicated as circles and color-coded green (medium to large hydrophobic), light green (small hydrophobic), dark blue (positively charged), orange (negatively charged), or light blue (hydrophilic), and numbered to indicate their position within each protein. Residues with an asterisk were part of two pockets and the sequence was repeated for each pocket. The schematic highlights conserved positions (e.g., Y21, R171, W294) and illustrates variability found throughout the PBC.
FIGURE 5
FIGURE 5
Conservation of tandemly repeated sequence units and short, imperfect repeats in various C. albicans and C. parapsilosis species complex proteins. (A) Consensus tandem repeat sequences were derived for C. albicans Als3 (CaAls3; GenBank accession number AY223552), CaAls5 (AY227440), and CaAls9-1 (AY269423). In C. albicans, each tandemly repeated unit was a perfect 36-aa length. Consensus sequences required that 80% or more of the amino acids were identical at each position. A similar approach was taken to aligning the consensus tandem repeat units from proteins from C. parapsilosis, C. orthopsilosis, and C. metapsilosis. GenBank accession numbers were shown in Table 3. In these proteins, the 36-amino acid repeat unit length was not necessarily conserved in each copy as color coded in Figure 2. The consensus repeat unit from CmAls2265 was 42 amino acids and different from the other Als proteins. (B) Alignment of consensus tandemly repeated sequence units from CmAls2265 and proteins from the Iff family. Repeat unit length was not necessarily conserved among the proteins, but a consensus sequence emerged from comparison among multiple proteins. GenBank accession numbers for the proteins were shown in Supplementary Table S4. (C) Top panel: C. parapsilosis species complex proteins that encode the short, imperfect SSSEPP (blue)/SSEPP (green)/SSEP (gray) repeated motif. Lower panel: Proteins that encode the GSGN (yellow) repeated motif. Protein identifiers were noted at the left of each sequence. The sequence labeled CmContig4 (GenBank accession number MK215077) most closely matched CORT_0E05950, which was annotated as Cdc24 GDP-GTP exchange factor, an unlikely identification of its function. Other proteins were labeled with identifiers from GenBank sequences or from the proteins listed in Supplementary Table S4. Names for proteins that had both SSEP/SSSEPP and GSGN short, imperfect repeats were listed in bold type in the top and lower panels.
FIGURE 6
FIGURE 6
Examples of allelic sequence variation among ALS genes in the C. parapsilosis species complex. (A) The representative genome sequence for C. parapsilosis CDC317 (GenBank accession ASM18276v2) presented ALS genes in a highly accurate manner. Sanger sequencing in the current study revealed many sequence polymorphisms in the diploid C. parapsilosis ALS genes that were not reflected in the haploid GenBank assembly. However, there was only one notable example where Sanger sequence varied from the GenBank genome: CpALS4780 had 12 fewer nucleotides in the region of the gene that encodes the GSGN short, imperfect repeat. (B) Regions of repeated sequences were often variable in length as shown in this example of CpALS4770 alleles from various C. parapsilosis isolates (949, CDC317, 1125, 950). Manual alignment of the amino acid sequences underscored the different length of CpALS4770 alleles in this region, leading to differing numbers of copies of the SSSEPP short, imperfect repeat. (C) The new C. metapsilosis genome assembly resolved into alleles and revealed previously unrecognized allelic variation that affected data interpretation. Primers to amplify C. metapsilosis ALS genes initially were designed from the representative genome sequence in GenBank (CBZN0200000000). The original forward primer to amplify CmALS4210 (highlighted in green) amplified only allele 1 due to mismatches and extra nucleotides in the sequence for allele 2. Re-design of the primer (highlighted in blue) led to successful amplification of the CmALS4210-2 allele, distinguishing it from CmALS4210-1. The CmALS4210 start codon was highlighted in yellow. (D) Alignment between amino acid sequences predicted from the CmALS4220 alleles. Considerable sequence variation between the alleles was observed including an expanded repeated sequence (Gly-Ala-Thr; GAT) in CmAls4220-1. These differences were predicted by the new C. metapsilosis genome sequence and verified by Sanger sequencing. (E) Allelic variation between strains of C. metapsilosis that affected interpretation of TaqMan assay data (see below). TaqMan assays were designed from the sequence of C. metapsilosis strain ATCC 96143. Nucleotide polymorphisms at the 3′ end of CmALS2265 in strain 61 resulted in reduced detection by the TaqMan assay. The stop codon for CmALS2265 was highlighted in yellow. The mismatches between the sequences for the forward primer (green), reverse primer (blue), and probe (gray) were sufficient to severely under-estimate relative gene expression for CmALS2265 in strain 61, suggesting that the gene was barely transcribed.
FIGURE 7
FIGURE 7
Demonstration of TaqMan assay specificity. TaqMan assays were selected for quantification of relative gene expression because of their exquisite specificity, which was needed to distinguish between highly similar loci in the same species. The most-extreme example was from C. orthopsilosis where sequences in the 5′ end of the gene were >75% identical. (A) Nucleotide sequences that were the most dissimilar in the 5′ end of CoALS4210, CoALS4220, and CoALS800. Yellow highlighting marked positions where at least two of the three sequences were identical. Asterisks marked the positions that were identical in all three sequences. In each sequence, the forward TaqMan primer sequence was underlined and shown in bold type. The TaqMan probe sequence was double underlined; each was in the sense orientation. The TaqMan reverse primer was underlined in bold, red type. (B) Ct values from TaqMan assays using cloned plasmid DNA as the reaction template. Dilutions of plasmid DNA were added to replicate TaqMan assays where the assay and template matched or were mismatched to gauge assay specificity. Ct values were recorded in the table and demonstrated recognition of the plasmid only when matched with the correct assay. For example, 5 ng of the cloned 5′ end of CoALS4210 gave a Ct value of 11.3 when assayed with the CoALS4210 TaqMan primers and probe. Undetectable signal (U/D) or a signal that was nearly undetectable (cycle 36.1) were recorded when the same amount of plasmid DNA was tested with the CoALS4220 or CoALS800 TaqMan assays. The data clearly demonstrated specificity of the TaqMan assays for their intended targets, even among genes as similar as the ALS genes of C. orthopsilosis.
FIGURE 8
FIGURE 8
Relative gene expression measured by TaqMan assays. (A) C. orthopsilosis strain 90-125 was grown in YPD for 1 h. RNA preparations from these cells were assayed by TaqMan or the SYBR Green method of Lombardi et al. (2019). Mean (± standard error of the mean) Ct values were reported. Results were shaded to indicate the intensity of relative gene expression. The color key at the bottom of the diagram was shaded from red (high relative gene expression) to purple (low relative gene expression) to facilitate comparison between results. CoALS4220 was expressed more highly than either of the other genes in the growth conditions tested. The SYBR Green method suggested a higher relative expression level for CoALS4210 than did the TaqMan assay. (B) CoALS4220 was used as a model gene to test the effect of cDNA synthesis priming method and TaqMan assay location on estimates of relative gene expression. Placement of the TaqMan assay at the 3′ end of the gene produced higher expression estimates than placement at the 5′ end of the gene. Priming with random hexamers produced a higher gene expression estimate than oligo dT priming. (C) Themes illustrated in (A,B) carried forward to a more-extensive analysis of C. orthopsilosis ALS gene expression. Strain 90-125 was grown for 1 h and 24 h in YPD medium. CoALS4220 was more-highly expressed than the other genes. Estimates of relative gene expression were higher for TaqMan assays at the 3′ end of the gene than at the 5′ end of the gene. (D) Relative expression of C. parapsilosis ALS genes in two strains (CDC317, ATCC 22019) grown in two different conditions (YPD for 24 h; SC with fetal bovine serum for 2 h). CpALS4790 was more-highly expressed than the other genes, even CpALS4800 which was proposed to arise from duplication of CpALS4790. (E) Relative expression of C. metapsilosis ALS genes in four strains (ATCC 96143, 61, 397, 482) in two different growth conditions (YPD for 24 h; RPMI 1640 for 1 h). Considerable strain variation was observed with genes that were relatively quiet in one isolate (e.g., CmALS4210 in strain 482) but extremely highly expressed in another (strain 397). In many instances, gene expression estimates from the 5′ end and 3′ end TaqMan assays provided similar results, although several notable exceptions were present. The apparent lack of CmALS2265 expression in strain 61 (as measured with the 3′ end assay) prompted examination of the gene sequence at the site of the TaqMan assay (Figure 6E). Sequence variation between the TaqMan assay and gene sequence in this region explained the falsely low estimate.

Similar articles

Cited by

References

    1. Alfaro M. E., Zoller S., Lutzoni F. (2003). Bayes or bootstrap? A simulation study comparing the performance of Bayesian Markov chain Monte Carlo sampling and bootstrapping in assessing phylogenetic confidence. Mol. Biol. Evol. 20 255–266. 10.1093/molbev/msg028 - DOI - PubMed
    1. Applied Biosystems Inc. (2008). Guide to Performing Relative Quantitation of Gene Expression using Real-Time Quantitative PCR. Available at: https://www.thermofisher.com/document-connect/document-connect.html?url=... (accessed December 20, 2017).
    1. Bailey D. A., Feldmann P. J. F., Bovey M., Gow N. A. R., Brown A. J. P. (1996). The Candida albicans HYR1 gene, which is activated in response to hyphal development, belongs to a gene family encoding yeast cell wall proteins. J. Bacteriol. 178 5353–5360. - PMC - PubMed
    1. Bates S., de la Rosa J. M., MacCallum D. M., Brown A. J. P., Gow N. A. R., Odds F. C. (2007). Candida albicans Iff1, a secreted protein required for cell wall structure and virulence. Infect. Immun. 75 2922–2928. 10.1128/IAI.00102-07 - DOI - PMC - PubMed
    1. Bertini A., De Bernardis F., Hensgens L. A. M., Sandini S., Senesi S., Tavanti A. (2013). Comparison of Candida parapsilosis, Candida orthopsilosis, and Candida metapsilosis adhesive properties and pathogenicity. Int. J. Med. Microbiol. 303 98–103. 10.1016/j.ijmm.2012.12.006 - DOI - PubMed