Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2023 Apr;9(4):mgen000979.
doi: 10.1099/mgen.0.000979.

Comparing genomic variant identification protocols for Candida auris

Affiliations
Comparative Study

Comparing genomic variant identification protocols for Candida auris

Xiao Li et al. Microb Genom. 2023 Apr.

Abstract

Genomic analyses are widely applied to epidemiological, population genetic and experimental studies of pathogenic fungi. A wide range of methods are employed to carry out these analyses, typically without including controls that gauge the accuracy of variant prediction. The importance of tracking outbreaks at a global scale has raised the urgency of establishing high-accuracy pipelines that generate consistent results between research groups. To evaluate currently employed methods for whole-genome variant detection and elaborate best practices for fungal pathogens, we compared how 14 independent variant calling pipelines performed across 35 Candida auris isolates from 4 distinct clades and evaluated the performance of variant calling, single-nucleotide polymorphism (SNP) counts and phylogenetic inference results. Although these pipelines used different variant callers and filtering criteria, we found high overall agreement of SNPs from each pipeline. This concordance correlated with site quality, as SNPs discovered by a few pipelines tended to show lower mapping quality scores and depth of coverage than those recovered by all pipelines. We observed that the major differences between pipelines were due to variation in read trimming strategies, SNP calling methods and parameters, and downstream filtration criteria. We calculated specificity and sensitivity for each pipeline by aligning three isolates with chromosomal level assemblies and found that the GATK-based pipelines were well balanced between these metrics. Selection of trimming methods had a greater impact on SAMtools-based pipelines than those using GATK. Phylogenetic trees inferred by each pipeline showed high consistency at the clade level, but there was more variability between isolates from a single outbreak, with pipelines that used more stringent cutoffs having lower resolution. This project generated two truth datasets useful for routine benchmarking of C. auris variant calling, a consensus VCF of genotypes discovered by 10 or more pipelines across these 35 diverse isolates and variants for 2 samples identified from whole-genome alignments. This study provides a foundation for evaluating SNP calling pipelines and developing best practices for future fungal genomic studies.

Keywords: Candida; benchmarking; fungal genomics; variant calling pipelines; whole-genome sequencing.

PubMed Disclaimer

Conflict of interest statement

The authors declare that there are no conflicts of interest.

Figures

Fig. 1.
Fig. 1.
Single-nucleotide polymorphisms (SNPs) called by each pipeline per clade. Each plot depicts the total number of SNPs identified for each pipeline (datasets 1 to 12). The 35 samples are summarized in 4 plots by clade. (a) Clade I (n=20), (b) clade II (n=4), (c) clade III (n=3) and (d) clade IV (n=8).
Fig. 2.
Fig. 2.
Sensitivity, specificity and their harmonic mean of each pipeline. Panels (a) and (b) depict the sensitivity (y-axis) and specificity (x-axis) for each pipeline (pipelines 1 to 12). Specificity and sensitivity were calculated by comparing variant calls from CA05 (B11221; clade III (a) and CA06 (B11245; clade IV) (b) to the truth set of SNPs identified between genome assemblies of these isolates with B8441 (clade I). (c) Barplot shows the distribution of the harmonic mean (F1 score) for sensitivity and specificity for each pipeline.
Fig. 3.
Fig. 3.
Comparison of SNPs called across datasets from 12 pipelines. (a) The percentage of all detected sites called by as few as just 1 to as many as all 12 pipelines is shown. (b) The percentage of all sites that represent private SNPs is shown for each pipeline.
Fig. 4.
Fig. 4.
High-confidence SNPs missed by each pipeline. For sites found in (a) 11, (b) 10, or (c) 9 datasets, the number missed by each pipeline is summarized.
Fig. 5.
Fig. 5.
Sample level SNP concordance by clades. Sites were compared for each sample across the datasets produced by 12 pipelines, and the number of pipelines supporting each SNP is shown by clade. The number of SNP sites identified for each sample in between 1 and 12 datasets is summarized by clades: (a) clade I (n=20), (b) clade II (n=4), (c) clade III (n=3) and (d) clade IV (n=8).
Fig. 6.
Fig. 6.
Pairwise differences between control pairs of isolates reported by each pipeline. (a) Pairwise differences reported in matrix files for all 14 pipelines. (b) Pairwise differences excluding pipelines with high reported differences (7, 8 and 13).
Fig. 7.
Fig. 7.
Consensus tree from maximum-parsimony trees generated by each pipeline. Consensus support across trees provided for 10 pipelines is shown for nodes with at least 50 % consensus support for all isolates (a) and for clade I isolates (b). Nodes without support have taxa disagreement between the trees from different pipelines. Taxa labels (CA01–CA35) are coloured by clade (legend). Vertical lines next to taxa labels indicate control sample pairs shown in Fig. 6.
Fig. 8.
Fig. 8.
Workflow and recommendations for genomic variant identification protocols in fungi. The workflow is divided into four main colour-coded sections that are meant to be performed sequentially. Recommendations are listed within each step. 1The B8441 C. auris genome assembly is available in the Candida Genome Database (http://www.candidagenome.org) and NCBI (PEKT00000000.2). 2Ti/Tv ratio, Transition/Transversion ratio.

References

    1. Tsay S, Welsh RM, Adams EH, Chow NA, Gade L, et al. Notes from the field: ongoing transmission of Candida auris in health care facilities - United States, June 2016-May 2017. MMWR Morb Mortal Wkly Rep. 2017;66:514–515. doi: 10.15585/mmwr.mm6619a7. - DOI - PMC - PubMed
    1. Desjardins CA, Giamberardino C, Sykes SM, Yu C-H, Tenor JL, et al. Population genomics and the evolution of virulence in the fungal pathogen Cryptococcus neoformans . Genome Res. 2017;27:1207–1219. doi: 10.1101/gr.218727.116. - DOI - PMC - PubMed
    1. Chow NA, Muñoz JF, Gade L, Berkow EL, Li X, et al. Tracing the evolutionary history and global expansion of Candida auris using population genomic analyses. mBio. 2020;11:e03364-19. doi: 10.1128/mBio.03364-19. - DOI - PMC - PubMed
    1. Ropars J, Maufrais C, Diogo D, Marcet-Houben M, Perin A, et al. Gene flow contributes to diversification of the major fungal pathogen Candida albicans . Nat Commun. 2018;9:2253. doi: 10.1038/s41467-018-04787-4. - DOI - PMC - PubMed
    1. Lockhart SR, Etienne KA, Vallabhaneni S, Farooqi J, Chowdhary A, et al. Simultaneous emergence of multidrug-resistant Candida auris on 3 continents confirmed by whole-genome sequencing and epidemiological analyses. Clin Infect Dis. 2017;64:134–140. doi: 10.1093/cid/ciw691. - DOI - PMC - PubMed

Publication types

LinkOut - more resources