Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Dec 4;12(12):e0188104.
doi: 10.1371/journal.pone.0188104. eCollection 2017.

Merging FT-IR and NGS for simultaneous phenotypic and genotypic identification of pathogenic Candida species

Affiliations

Merging FT-IR and NGS for simultaneous phenotypic and genotypic identification of pathogenic Candida species

Claudia Colabella et al. PLoS One. .

Abstract

The rapid and accurate identification of pathogen yeast species is crucial for clinical diagnosis due to the high level of mortality and morbidity induced, even after antifungal therapy. For this purpose, new rapid, high-throughput and reliable identification methods are required. In this work we described a combined approach based on two high-throughput techniques in order to improve the identification of pathogenic yeast strains. Next Generation Sequencing (NGS) of ITS and D1/D2 LSU marker regions together with FTIR spectroscopy were applied to identify 256 strains belonging to Candida genus isolated in nosocomial environments. Multivariate data analysis (MVA) was carried out on NGS and FT-IR data-sets, separately. Strains of Candida albicans, C. parapsilosis, C. glabrata and C. tropicalis, were identified with high-throughput NGS sequencing of ITS and LSU markers and then with FTIR. Inter- and intra-species variability was investigated by consensus principal component analysis (CPCA) which combines high-dimensional data of the two complementary analytical approaches in concatenated PCA blocks normalized to the same weight. The total percentage of correct identification reached around 97.4% for C. albicans and 74% for C. parapsilosis while the other two species showed lower identification rates. Results suggested that the identification success increases with the increasing number of strains actually used in the PLS analysis. The absence of reliable FT-IR libraries in the current scenario is the major limitation in FTIR-based identification of strains, although this metabolomics fingerprint represents a valid and affordable aid to rapid and high-throughput to clinical diagnosis. According to our data, FT-IR libraries should include some tens of certified strains per species, possibly over 50, deriving from diverse sources and collected over an extensive time period. This implies a multidisciplinary effort of specialists working in strain isolation and maintenance, molecular taxonomy, FT-IR technique and chemo-metrics, data management and data basing.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: In the past 2 years, CT has been paid for lectures on behalf of Pfizer, Novartis, Merck Astra, Angelini, Gilead and Astellas.

Figures

Fig 1
Fig 1. For CPCA of FT-IR and NGS data a row-to-row correspondence needs to be obtained.
In order to integrate NGS data and FT-IR data in one data model, consensus principal component analysis is applied (CPCA). For CPCA, the data is organized such that a row-to-row correspondence between the different data blocks is obtained. The NGS distance data matrix contains N samples as rows and as columns the variables, which are the distances to all samples. The FT-IR data contains N samples as rows and as columns the absorbance values at different wavenumbers. The FT-IR data is further split into four data blocks according to groups of chemicals.
Fig 2
Fig 2. Correlation loading plot (PC1 and PC2) of NGS and FT-IR (block 1, 2, 3 and 4) with global scores.
The correlation loading plots showing the correlation between the global scores of the CPCA analysis with the four different FT-IR blocks and the distance matrix of the genetic data. In addition, the correlations between the global scores and the genetic distance matrix and the indicator variables for each species are visualized in each plot. (A) Correlations between global scores and the lipid region (block 1, 3050–2800 cm-1), the genetic distance matrix and the species indicator variables; (B) correlations between the global scores and the mixed lipid and protein region (block 2, 1800–1500 cm-1), the genetic distance matrix and the species indicator variables; (C) correlations between the global scores and the mixed lipid, protein and polysaccharide region (1500–1200 cm-1), the genetic distance matrix and the species indicator variables and (D) correlations between the global scores and the polysaccharide region (1200–700 cm-1), the genetic distance matrix and the species indicator variables. Blue dots represent FT-IR wavelengths; black dots distances to TS Cand alb, TS Cand para, TS Cand glab and TS Cand trop represent type strains (TS) of the four species and green dots namely ALB, PAR, GLA and TRO represent the group variables (indicator variables).
Fig 3
Fig 3. Score plots of CPCA (PC1 and PC2) analysis of genetic-NGS and phenotypic-FT-IR spectroscopic data of strains from four Candida species—C. albicans, C. parapsilosis, C. glabrata and C. tropicalis.
The score plots of blocks 1–4 of CPCA analysis of FT-IR spectroscopy data, where block 1 is for lipid region (3050–2800 cm-1), block 2 is for mixed lipid and protein region (1800–1500 cm-1), block 3 is for mixed lipid, protein and polysaccharide region (1500–1200 cm-1) and block 4 is for polysaccharide region (1200–700 cm-1). The score plot of block 5 is for NGS data. The score plot of block 6 represents the global score plot of CPCA components one and two indicating the consensus of all blocks.
Fig 4
Fig 4. Score plots of CPCA (PC3 and PC4) analysis of genetic-NGS and phenotypic-FT-IR spectroscopic data of strains from four Candida species—C. albicans, C. parapsilosis, C. glabrata and C. tropicalis.
The score plots of block 1–4 are for FT-IR spectroscopy data, where block 1 refers to the lipid region (3050–2800 cm-1), block 2 to the mixed lipid and protein region (1800–1500 cm-1), block 3 to the mixed lipid, protein and polysaccharide region (1500–1200 cm-1) and block 4 to the polysaccharide region (1200–700 cm-1). The score plot of block 5 refers NGS data. The score plot of block 6 represents the global score plot of CPCA components three and four indicating the consensus of all blocks.
Fig 5
Fig 5. Confusion matrix for the cross-validated classification model.
Errors are given as misclassification rate (MCR), which is the fraction of misclassified samples over the total number of samples. The success rate (SR) is given in percentage and equals SR = (1-MCR)*100. The number of samples in each group is specified in the left column with the true group affiliations. The predicted group is specified on the top of the matrix.
Fig 6
Fig 6. Distribution of the strains distances to TS (type strain) and CS (central strain).
Distribution of strains distances reference spectra of the four Candida species respect to TS (A, C, E and G) and CS (B, D, F and H), respectively.
Fig 7
Fig 7. Comparison of single and double match approaches in classify Candida strains.
(A) Single match analysis with CHROMagar. (B) Double match analysis with CHROMagar and MALDI-TOF. Black columns represent the percentage of matchings’ to the Type Strain (TS); white columns report the matchings’ to the central strain (CS). Green columns report the percentage of the sum of matchings’ in the single match analysis (A) and the maximum obtainable percentage of correct matchings’ in the double match analysis (B).

References

    1. De Queiroz K. Species concepts and species delimitation. Systematic biology. 2007;56(6):879–86. doi: 10.1080/10635150701701083 - DOI - PubMed
    1. Adamowicz SJ, Scoles GJ. International Barcode of Life: Evolution of a global research community. Genome. 2015;58(5):151–62. doi: 10.1139/gen-2015-0094 - DOI - PubMed
    1. Hebert PD, Stoeckle MY, Zemlak TS, Francis CM. Identification of birds through DNA barcodes. PLoS Biol. 2004;2(10):e312 doi: 10.1371/journal.pbio.0020312 - DOI - PMC - PubMed
    1. Hebert PD, Cywinska A, Ball SL. Biological identifications through DNA barcodes. Proceedings of the Royal Society of London B: Biological Sciences. 2003;270(1512):313–21. - PMC - PubMed
    1. Meyer CP, Paulay G. DNA barcoding: error rates based on comprehensive sampling. PLoS biol. 2005;3(12):e422 doi: 10.1371/journal.pbio.0030422 - DOI - PMC - PubMed

MeSH terms

LinkOut - more resources