Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jan 20:11:594531.
doi: 10.3389/fmicb.2020.594531. eCollection 2020.

Pursuing Advances in DNA Sequencing Technology to Solve a Complex Genomic Jigsaw Puzzle: The Agglutinin-Like Sequence (ALS) Genes of Candida tropicalis

Affiliations

Pursuing Advances in DNA Sequencing Technology to Solve a Complex Genomic Jigsaw Puzzle: The Agglutinin-Like Sequence (ALS) Genes of Candida tropicalis

Soon-Hwan Oh et al. Front Microbiol. .

Abstract

The agglutinin-like sequence (ALS) gene family encodes cell-surface adhesins that interact with host and abiotic surfaces, promoting colonization by opportunistic fungal pathogens such as Candida tropicalis. Studies of Als protein contribution to C. tropicalis adhesion would benefit from an accurate catalog of ALS gene sequences as well as insight into relative gene expression levels. Even in the genomics era, this information has been elusive: genome assemblies are often broken within ALS genes because of their extensive regions of highly conserved, repeated DNA sequences and because there are many similar ALS genes at different chromosomal locations. Here, we describe the benefit of long-read DNA sequencing technology to facilitate characterization of C. tropicalis ALS loci. Thirteen ALS loci in C. tropicalis strain MYA-3404 were deduced from a genome assembly constructed from Illumina MiSeq and Oxford Nanopore MinION data. Although the MinION data were valuable, PCR amplification and Sanger sequencing of ALS loci were still required to complete and verify the gene sequences. Each predicted Als protein featured an N-terminal binding domain, a central domain of tandemly repeated sequences, and a C-terminal domain rich in Ser and Thr. The presence of a secretory signal peptide and consensus sequence for addition of a glycosylphosphatidylinositol (GPI) anchor was consistent with predicted protein localization to the cell surface. TaqMan assays were designed to recognize each ALS gene, as well as both alleles at the divergent CtrALS3882 locus. C. tropicalis cells grown in five different in vitro conditions showed differential expression of various ALS genes. To place the C. tropicalis data into a larger context, TaqMan assays were also designed and validated for analysis of ALS gene expression in Candida albicans and Candida dubliniensis. These comparisons identified the subset of highly expressed C. tropicalis ALS genes that were predicted to encode proteins with the most abundant cell-surface presence, prioritizing them for subsequent functional analysis. Data presented here provide a solid foundation for future experimentation to deduce ALS family contributions to C. tropicalis adhesion and pathogenesis.

Keywords: ALS genes; Candida tropicalis; fungal adhesion; gene expression; genome.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Schematic of ALS loci in Candida tropicalis MYA-3404 as deduced using assembly ASM694213v1, PCR amplification, and Sanger sequencing. Contig numbers from assembly ASM694213v1 were noted at the left of each diagram; chromosome assignments included in assembly ASM1317755v1 (Guin et al., 2020) were also noted. Arrows were drawn to scale and represent the size of one allele at each ALS locus. Because ALS sequences were finalized using PCR amplification for Sanger sequencing, the alleles shown here were likely biased toward the smaller allele when one existed. Final gene names were indicated above each locus; ASM633v3 open reading frames (ORFs) that were combined into the final gene were designated below. Intergenic distances were drawn to scale except where quite large and then indicated by a dashed line and approximate length (in kb). Sequences were deposited in GenBank and accession numbers listed in Table 2. In some cases where the number of copies of the tandem repeat sequence was exceptionally large, unique primers for PCR amplification and Sanger sequencing could not be designed. Instead, the reported sequence was the consensus from the 105× coverage of the Illumina MiSeq/Oxford Nanopore MinION data.
FIGURE 2
FIGURE 2
Schematic of predicted C. tropicalis MYA-3404 Als proteins drawn to scale. Protein sequences showed common Als features (reviewed in Hoyer and Cota, 2016) including a secretory signal sequence, NT + T domain, copies of tandemly repeated sequences, and a C-terminal (CT) domain that was rich in Ser/Thr. C. tropicalis Als proteins had some novel repeated sequences such as those rich in ESS or EST, or the GTASTP motif. Protein features were color coded to recognize similarity across the family. Red blocks that represent NT + T domains were further modified to denote similarity between sequences (e.g., speckles or dashed lines as reflected by percent identity values in Figure 3). Blocks denoting tandemly repeated sequences were in different shades of green to indicate sequences with greater similarity to each other. CtrAls2293 was drawn to include a predicted GPI anchor domain, although the putative signal is much weaker in this sequence than others in the family. Extremely long tandem-repeat domains (e.g., CtrALS941 and CtrALS3797) could not be verified by PCR amplification and Sanger sequencing because highly conserved repeated units did not permit the design of unique primers. The reported sequence relied on the 105× coverage from assembly ASM694213v1, which for CtrALS3797 required insertion of “XXX” (gray color) to make the size of the gene match the fragment sizes generated by PCR.
FIGURE 3
FIGURE 3
Percent identity values between the nucleotide sequences for the 5′ domain of the C. tropicalis ALS genes (upper diagonal) and their predicted amino acid sequences (lower diagonal). The region used for comparisons corresponds to the NT-Als domain of each protein. C. albicans Als3 (GenBank AY223552.1) was included for comparison. Boxes were shaded to indicate overall percent identity with hotter colors (red, yellow) used for higher percent identity than cooler colors (green, blue, and gray). Both alleles of CtrALS3882 were included in the diagram to highlight lack of identity to each other, but high sequence conservation with other loci.
FIGURE 4
FIGURE 4
Schematic of the CtrALS3882 locus to support the conclusion that CtrALS3882-1 and CtrALS3882-2 occupied the same relative physical location on the diploid chromosomes of C. tropicalis MYA-3404. Hash marks in the diagram indicated an approximate distance; an accurate distance was shown in Figure 1. CtrALS3882-1 and CtrALS3882-2 were only 52% identical in the 5′ domain of the ORF (Figure 3). ORFs were represented by arrows that indicate the orientation of each gene. Orange arrows indicate CTRG_03872 and CTRG_03881, which were 97% identical. Primers used to demonstrate the physical location of each ORF are shown; primer sequences are recorded in Supplementary Table S1 to indicate whether they recognized both alleles or were specific for CtrALS3882-1 or CtrALS3882-2.
FIGURE 5
FIGURE 5
Analysis of tandem repeat consensus sequences to assess similarity among repeat units in Als proteins from various fungal species. Most repeat units were 36 amino acids, although exceptions were noted. Consensus sequences required that 80% or more of the amino acids were identical at each position. (A) Consensus sequences derived from protein translation of C. tropicalis ALS sequences identified in Table 2. The repeat unit in CtrAls1030 was 37 amino acids. Asterisks marked sequences for which only one to two tandem repeat copies were present. Lack of tandem repeat copies in these proteins provided the potentially false impression of a high level of sequence conservation. ALS tandem repeat consensus sequences for C. albicans (Ca; Oh et al., 2019), C. dubliniensis (Cd), and the C. parapsilosis species complex (CpCoCm; Oh et al., 2019) were included for comparison. (B) Consensus sequence for the novel tandem repeat from the C-terminal domain of CtrAls3797 (see Figure 2).
FIGURE 6
FIGURE 6
Phylogeny of Als N-terminal domain sequences from C. albicans (Ca), C. dubliniensis (Cd), C. tropicalis (Ctr), C. parapsilosis (Cp), C. orthopsilosis (Co), and C. metapsilosis (Cm). The best-scoring Maximum likelihood tree is shown with maximum likelihood bootstrap values and Bayesian posterior probabilities at each node; only support values greater than 70% and 0.90, respectively, were shown. Branch lengths were proportional to evolutionary change and measured in substitutions per site.
FIGURE 7
FIGURE 7
Relative expression of C. tropicalis ALS genes as measured by TaqMan assays. Expression of the 13 C. tropicalis ALS genes was measured for strain MYA-3404 grown in five different culture conditions. Micrographs of representative cells from each culture condition were captured to show cellular morphology that corresponded to the gene expression data (scale bar = 10 μm). ΔCt values and standard errors of the mean were reported. The color bar below the data tables was shaded to represent high (red) to low (purple) expression levels.
FIGURE 8
FIGURE 8
TaqMan measurement of ALS gene expression in C. albicans (A) and C. dubliniensis (B). C. albicans SC5314 and C. dubliniensis CD36 were grown in various culture conditions known to display differential expression of C. albicans ALS genes. Images were captured for representative cells from each culture (scale bar = 10 μm). ΔCt values and standard errors of the mean were reported. The color-coded scale bar ranged from high expression (red) to low (purple). Two C. dubliniensis ALS loci had identical sequences and were detected by the same TaqMan assay (CdALS64800 and CdALS65010). C. dubliniensis ALS genes were orthologous to C. albicans ALS loci as designated in (B). Prior to availability of a C. dubliniensis genome sequence, ALS genes were identified using consensus PCR primers derived from C. albicans ALS sequences (Hoyer et al., 2001). These were designated ALSD1 (now recognized as CdALS86290), ALSD2 (CdALS64210), and ALSD3 (CdALS64610).
FIGURE 9
FIGURE 9
Effect of nucleotide sequence mismatch on TaqMan assay measurement of nucleic acid abundance. TaqMan assays were designed using gene sequences from C. tropicalis strain MYA-3404. Selection of unique target regions that distinguished among genes in the ALS family was the primary consideration in assay design. Sequencing of the 5′ domain of each ALS gene in six other C. tropicalis isolates revealed some mismatches in the TaqMan primer and/or probe sequences. Examples were selected to titrate the effect of an increasing number of mismatches, some in key primer/probe positions. Sequences in each panel were taken from Supplementary File S3. The forward primer was highlighted in green, reverse primer in blue, and probe in gray. Mismatches were noted with yellow color. Target sequences in each strain examined were PCR amplified, cloned, and Sanger sequence verified to ensure the mismatches were present. Cloned DNA was purified, diluted, and used in the assays reported here. (A) The sequence of the 5′ domain of CtrALS3791 in strain 1020 showed a single nucleotide mismatch in the middle of the reverse primer. Ct values from TaqMan assays with 10-fold dilutions of cloned construct showed almost an equal ability for the assay to detect each strain (e.g., no difference for 5 pg DNA and negligible differences for subsequent dilutions). (B) C. tropicalis strain 951 showed one to two mismatches each in the primer and probe sequences for gene CtrALS1028. Ct values suggested that detection of strain 951 lagged approximately 0.6 cycle behind detection of strain MYA-3404, providing an underestimate of DNA abundance. (C) Increasing numbers of mismatches between primer and probe sequences resulted in greater underestimates of DNA abundance for strain 951, this time in CtrALS3797 where nearly a two-cycle difference in Ct was observed. (D) Mismatches for CtrALS1038 in strain 951 were so marked that the TaqMan assay was unable to detect genomic DNA, even at the highest concentration. These data provide the foundation necessary to adapt use of the TaqMan assays to C. tropicalis isolates in which primer and probe mismatches may exist.

Similar articles

Cited by

References

    1. Bolger A. M., Lohse M., Usadel B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30 2114–2120. 10.1093/bioinformatics/btu170 - DOI - PMC - PubMed
    1. Butler G., Rasmussen M. D., Lin M. F., Santos M. A. S., Sakthikumar S., Munro C. A., et al. (2009). Evolution of pathogenicity and sexual reproduction in eight Candida genomes. Nature 459 657–662. 10.1038/nature08064 - DOI - PMC - PubMed
    1. Castresana J. (2000). Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol. Biol. Evol. 17 540–542. 10.1093/oxfordjournals.molbev.a026334 - DOI - PubMed
    1. Coleman D. A., Oh S.-H., Manfra-Maretta S. L., Hoyer L. L. (2012). A monoclonal antibody specific for Candida albicans Als4 demonstrates overlapping localization of Als family proteins on the fungal cell surface and highlights differences between Als localization in vitro and in vivo. FEMS Immunol. Med. Micobiol. 64 321–333. 10.1111/j.1574-695X.2011.00914.x - DOI - PMC - PubMed
    1. Coleman D. A., Oh S.-H., Zhao X., Hoyer L. L. (2010). Heterogeneous distribution of Candida albicans cell-surface antigens demonstrated with an Als1-specific monoclonal antibody. Microbiology 156 3645–3659. 10.1099/mic.0.043851-0 - DOI - PMC - PubMed

LinkOut - more resources