Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Dec 14:11:794529.
doi: 10.3389/fcimb.2021.794529. eCollection 2021.

Using Genomics to Shape the Definition of the Agglutinin-Like Sequence (ALS) Family in the Saccharomycetales

Affiliations

Using Genomics to Shape the Definition of the Agglutinin-Like Sequence (ALS) Family in the Saccharomycetales

Soon-Hwan Oh et al. Front Cell Infect Microbiol. .

Abstract

The Candida albicans agglutinin-like sequence (ALS) family is studied because of its contribution to cell adhesion, fungal colonization, and polymicrobial biofilm formation. The goal of this work was to derive an accurate census and sequence for ALS genes in pathogenic yeasts and other closely related species, while probing the boundaries of the ALS family within the Order Saccharomycetales. Bioinformatic methods were combined with laboratory experimentation to characterize 47 novel ALS loci from 8 fungal species. AlphaFold predictions suggested the presence of a conserved N-terminal adhesive domain (NT-Als) structure in all Als proteins reported to date, as well as in S. cerevisiae alpha-agglutinin (Sag1). Lodderomyces elongisporus, Meyerozyma guilliermondii, and Scheffersomyces stipitis were notable because each species had genes with C. albicans ALS features, as well as at least one that encoded a Sag1-like protein. Detection of recombination events between the ALS family and gene families encoding other cell-surface proteins such as Iff/Hyr and Flo suggest widespread domain swapping with the potential to create cell-surface diversity among yeast species. Results from the analysis also revealed subtelomeric ALS genes, ALS pseudogenes, and the potential for yeast species to secrete their own soluble adhesion inhibitors. Information presented here supports the inclusion of SAG1 in the ALS family and yields many experimental hypotheses to pursue to further reveal the nature of the ALS family.

Keywords: ALS genes; adhesion; comparative genomics; fungi; protein structure; repeated sequences.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Phylogenetic tree showing relationships between fungal species used in this study. The tree was pruned from the genome-scale phylogeny of the kingdom Fungi developed by Li et al. (2021). The phylogeny was based on 290 concatenated sequences in 1644 species. C. metapsilosis was not part of the original analysis and was added to the tree based on its close relationship with C. parapsilosis and C. orthopsilosis (Tavanti et al., 2005). All species listed were Phylum Ascomycota, Subphylum Saccharomycotina, Class Saccharomycetes, Order Saccharomycetales. Vertical bars on the right of the image indicate Family designations according to the NCBI Taxonomy Database (https://www.ncbi.nlm.nih.gov/taxonomy). Brown = Family Saccharomycetaceae, Green = Family Pichiaceae, Purple = Family Debaryomycetaceae, and Blue = Family Metschnikowiaceae.
Figure 2
Figure 2
Schematic of the genome region that included LeALS2716 and LeALS2721 from (A) assembly ASM14968v1 and (B) assembly ASM1362098v1. (C) shows the region as represented in the Candida Genome Order Browser (CGOB; https://cgob.ucd.ie; Maguire et al., 2013) based on ASM14968v1 data. L. elongisporus information is circled in red; ORF numbers are shown in each rectangle and the direction of transcription indicated by the arrow below. ORFs 02718, 02719, and 02720 featured the IFF/HYR repeated sequences that were also found in LeALS2716 suggesting that ORF 02716 was longer than initially annotated. The large number of repeated sequences complicated genome sequence assembly in this region. The predicted size of LeALS2716 in ASM1362098v1 was greater than the final PCR-amplified/Sanger-sequenced fragment that was deposited into GenBank (accession number MN893370; green arrow).
Figure 3
Figure 3
Experimental and AlphaFold-predicted protein structures. (A) Crystallographic structure of C. albicans NT-Als3 (Protein Data Bank accession 4LE8) visualized using PyMOL. (B) C. albicans NT-Als3 structure predicted by AlphaFold from the 4LE8 amino acid sequence. AlphaFold structural predictions for the corresponding region of C. albicans NT-Als1 (C; 83% identical to NT-Als3), ClAls3274 (D; 33% identical to NT-Als3), ScSag1 (E; 25% identity), CauAls4498 (F; 31% identity), and CAGL0G04125g (G; 21% identity). An AlphaFold structural prediction was also completed for the N-terminal functional domain of C. albicans Hyr1 (H) and S. cerevisiae Flo1 (I) to demonstrate structural diversity among cell-surface proteins that contain a central domain of repeated sequences. The AlphaFold prediction for C. albicans NT-Als3 (B) recapitulated the known experimental structure (A; RMSD = 0.53 as calculated using PyMOL align). Predictions for molecules (B–G) produced the same general structure suggesting that all should be included in the Als protein family. Supplementary File S12 shows the structures of C. albicans NT-Als3 (B) and ScSag1 (E) aligned with the disulfide bonds highlighted.
Figure 4
Figure 4
Visualization of NT-Als sequence features from Supplementary Table S2. Each spot represents an Als protein; color coding matches Supplementary Table S2. C. albicans NT-Als3 has 8 Cys that create four disulfide bonds (Lin et al., 2014) while S. cerevisiae Sag1 has only 6 and is missing the C57-C133 disulfide bond that is present in NT-Als3 (Salgado et al., 2011). Most NT-Als proteins had 8 Cys like NT-Als3 (center column), some had 6 Cys like ScSag1 (left column), and others had varying numbers of Cys (range of 4 to 14; right column). Presence of the amyloid-forming region (AFR) in C. albicans Als proteins promotes protein aggregation (Ho et al., 2019). While many Als proteins had the expected AFR (strength and location; top row), others had a predicted weak AFR (second row) or none (bottom row). Some S. passalidarum Als proteins had a strong AFR 20-30 amino acids C-terminal to the expected location known in C. albicans (third row). It was unknown whether this alternative location contributed to aggregative potential. An invariant Lys in C. albicans NT-Als establishes a salt bridge with the C-terminal carboxylic acid of an incoming peptide ligand (Salgado et al., 2011). Arg (R) in this location may serve a similar function. Some predicted proteins did not have a positively charged amino acid in this position (X). A red asterisk indicates proteins featured in the structural predictions (Figure 3).
Figure 5
Figure 5
Relationships between NT-Als sequences depicted in a tree format. Information from Figure 4 (number of Cys in NT-Als, presence of the invariant Lys, nature of the AFR) is included using symbols at the left of the tree. Yellow hexagons on the right of the branches depict ortholog group information from Table 3. Colored dots indicating species follow the color scheme from Supplementary Table S2 and Figure 4. The scale bar represents substitutions per site.
Figure 6
Figure 6
Phylogenetic tree of Families in the Order Saccharomycetales. The tree was traced from the data of Li et al. (2021) to highlight the region of interest for the current study. Family names were those used by Li et al. (2021). Table 4 placed these names into the context of Family designations used on the NCBI Taxonomy Database (Schoch et al., 2020). The red dot denotes the common ancestor of the CUG-Ser1 clade and Saccharomycetaceae, the two groups in which ALS genes were most common (indicated by larger font). ALS genes were also detected in the Phaffomycetaceae.

Similar articles

Cited by

References

    1. Bailey D. A., Feldmann P. J., Bovey M., Gow N. A., Brown A. J. (1996). The Candida albicans HYR1 Gene, Which is Activated in Response to Hyphal Development, Belongs to a Gene Family Encoding Yeast Cell Wall Proteins. J. Bacteriol. 178, 5353–5360. doi: 10.1128/jb.178.18.5353-5360.1996 - DOI - PMC - PubMed
    1. Bates S., de la Rosa J. M., MacCallum D. M., Brown A. J. P., Gow N. A. R., Odds F. C. (2007). Candida albicans Iff11, a Secreted Protein Required for Cell Wall Structure and Virulence. Infect. Immun. 75, 2922–2928. doi: 10.1128/IAI.00102-07 - DOI - PMC - PubMed
    1. Bertini A., Zoppo M., Lombardi L., Rizzato C., De Carolis E., Vella A., et al. . (2016). Targeted Gene Disruption in Candida parapsilosis Demonstrates a Role for CPAR2_404800 in Adhesion to a Biotic Surface and in a Murine Model of Ascending Urinary Tract Infection. Virulence 7, 85–97. doi: 10.1080/21505594.2015.1112491 - DOI - PMC - PubMed
    1. Boisramé A., Cornu A., Da Costa G., Richard M. L. (2011). Unexpected Role for a Serine/Threonine-Rich Domain in the Candida albicans Iff Protein Family. Eukaryot Cell 10, 1317–1330. doi: 10.1128/EC.05044-11 - DOI - PMC - PubMed
    1. Bolger A. M., Lohse M., Usadel B.. (2014). Trimmomatic: A Flexible Trimmer for Illumina Sequence Data. Bioinformatics 30, 2114–2120. doi: 10.1093/bioinformatics/btu170 - DOI - PMC - PubMed

Publication types