Machine learning-based detection of adventitious microbes in T-cell therapy cultures using long-read sequencing

Affiliations

¹ Singapore-MIT Alliance for Research and Technology , Singapore, Singapore.
² MIT Center for Biomedical Innovation, Massachusetts Institute of Technology , Boston, USA.
³ Singapore Centre for Environmental Life Sciences Engineering, Life Sciences Institute, National University of Singapore , Singapore, Singapore.
⁴ Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University , Singapore, Singapore.
⁵ CSIRO Microbiomes for One Systems Health, Agriculture and Food , Westmead, Australia.

PMID: 37646508
PMCID: PMC10580871
DOI: 10.1128/spectrum.01350-23

Machine learning-based detection of adventitious microbes in T-cell therapy cultures using long-read sequencing

James P B Strutt et al. Microbiol Spectr. 2023.

. 2023 Aug 30;11(5):e0135023.

doi: 10.1128/spectrum.01350-23. Online ahead of print.

Authors

Affiliations

¹ Singapore-MIT Alliance for Research and Technology , Singapore, Singapore.
² MIT Center for Biomedical Innovation, Massachusetts Institute of Technology , Boston, USA.
³ Singapore Centre for Environmental Life Sciences Engineering, Life Sciences Institute, National University of Singapore , Singapore, Singapore.
⁴ Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University , Singapore, Singapore.
⁵ CSIRO Microbiomes for One Systems Health, Agriculture and Food , Westmead, Australia.

PMID: 37646508
PMCID: PMC10580871
DOI: 10.1128/spectrum.01350-23

Abstract

Assuring that cell therapy products are safe before releasing them for use in patients is critical. Currently, compendial sterility testing for bacteria and fungi can take 7-14 days. The goal of this work was to develop a rapid untargeted approach for the sensitive detection of microbial contaminants at low abundance from low volume samples during the manufacturing process of cell therapies. We developed a long-read sequencing methodology using Oxford Nanopore Technologies MinION platform with 16S and 18S amplicon sequencing to detect USP <71> organisms and other microbial species. Reads are classified metagenomically to predict the microbial species. We used an extreme gradient boosting machine learning algorithm (XGBoost) to first assess if a sample is contaminated, and second, determine whether the predicted contaminant is correctly classified or misclassified. The model was used to make a final decision on the sterility status of the input sample. An optimized experimental and bioinformatics pipeline starting from spiked species through to sequenced reads allowed for the detection of microbial samples at 10 colony-forming units (CFU)/mL using metagenomic classification. Machine learning can be coupled with long-read sequencing to detect and identify sample sterility status and microbial species present in T-cell cultures, including the USP <71> organisms to 10 CFU/mL. IMPORTANCE This research presents a novel method for rapidly and accurately detecting microbial contaminants in cell therapy products, which is essential for ensuring patient safety. Traditional testing methods are time-consuming, taking 7-14 days, while our approach can significantly reduce this time. By combining advanced long-read nanopore sequencing techniques and machine learning, we can effectively identify the presence and types of microbial contaminants at low abundance levels. This breakthrough has the potential to improve the safety and efficiency of cell therapy manufacturing, leading to better patient outcomes and a more streamlined production process.

Keywords: T-cells; adventitious agents; machine learning; sterility.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Fig 1**
Pipeline workflow. (A) Machine learning pipeline overview: microbial contaminants were prepared, sequenced, and the reads processed. The metagenomic classification data, overall run read quality data, predicted species quality data, and time to next read data were combined into a single table of features. A decision tree gradient boosting classifier algorithm XGBoost was deployed to assess contaminant sterility status, for more information see Machine learning pipeline in Materials and Methods or github. (B) Bacteria (gram positive or gram negative) or fungus (yeast) are spiked into PBS-washed cultured T-cells. The process is repeated threefold with and without T-cells using cells from a different passage and separately cultured microbes. DNA is extracted using mechanical lysis, buffers, and magnetic beads. DNA is amplified using targeted rDNA primers for the 16S region and 18S–28S region. (C) Sequencing analysis pipeline: sequenced base called reads were cleaned and host reads removed. Remaining reads were classified against the combined viral, fungal, and bacterial database using Centrifuge and High-Speed BLAST. Classified reads along with other data were provided to the machine learning pipeline for sample contamination status analysis.

**Fig 2**
Assessment of incorporating low-quality reads in sequence classification. (A) Additional low-quality reads incorporated into data analysis (N = 3) for 10 CFU/mL samples. Mean read length for correctly classified (true positives) reads with and without use of the low-quality reads were depicted in blue and black, respectively. Incorrectly assessed reads (misclassified) were depicted in cyan and gray. (B) Summary table of high-quality reality reads compared to using all reads for the microbes alone and microbes spiked into T-cells at 10 CFU/mL. Read numbers are for reads assigned to the correct spiked organism and represent a subset of all sequenced samples.

**Fig 3**
Microbes spiked into T-cell cultures. All samples were amplified with either 16S primers for bacterial species, or 18S–28S primers for fungal species. Samples were prepared for analysis as microbial cultures as well as simulated microbial contamination by the addition of microbes to activated T-cells at 100 CFU/mL (A) and 10 CFU/mL (B), pure culture spikes (gray) compared to contaminants spiked into T-cells (blue). Species tested were *K. pneumoniae, P. aeruginosa* PAO1, *C. acnes,* and the USP <71> species; *C. albicans*, *B. subtilis*, *Clostridium sporogenes*, *S. aureus*, *P. aeruginosa* 9027. Error bars are biological replicates (N = 3).

**Fig 4**
Machine learning XGBoost model performance, sample, and prediction contaminant status. (A–C) Model performance statistics from data from centrifuge metagenomic classifier used to generate two XGBoost classification models. (**D–F)** Model performance statistics from data from high-speed BLASTn metagenomic classifier used to generate two XGBoost classification models. (**A and D)** Confusion matrix and classification report for model assessing sample sterility status. (**B and E)** Confusion matrix and classification report for model assessing whether a predicted contaminant is a correctly classified contaminant or misclassified. (**C and F)** All spikes and negative control model predictions were assessed for prediction accuracy regarding whether the sample assayed is sterile. Black bars depict samples assigned as likely contaminated, blue bars depict samples identified as sterile, while gray depicts samples where the algorithm had difficulty assigning a decision of either sterile or contaminated. CFU: colony-forming units. Sample status is defined as sterile: true negative or contaminated: true positive. Correct contaminant classification is defined as a true positive contaminant vs a misclassified contaminant.

See this image and copyright information in PMC

References

1. Wei J, Han X, Bo J, Han W. 2019. Target selection for CAR-T therapy. J Hematol Oncol 12:62. doi: 10.1186/s13045-019-0758-x - DOI - PMC - PubMed
1. Nath SC, Harper L, Rancourt DE. 2020. Cell-based therapy manufacturing in stirred suspension bioreactor: thoughts for cGMP compliance. Front Bioeng Biotechnol 8:599674. doi: 10.3389/fbioe.2020.599674 - DOI - PMC - PubMed
1. García-Bernal D, García-Arranz M, Yáñez RM, Hervás-Salcedo R, Cortés A, Fernández-García M, Hernando-Rodríguez M, Quintana-Bustamante Ó, Bueren JA, García-Olmo D, Moraleda JM, Segovia JC, Zapata AG. 2021. The current status of mesenchymal stromal cells: controversies, unresolved issues and some promising solutions to improve their therapeutic efficacy. Front Cell Dev Biol 9:650664. doi: 10.3389/fcell.2021.650664 - DOI - PMC - PubMed
1. England MR, Stock F, Gebo JET, Frank KM, Lau AF. 2019. Comprehensive evaluation of compendial USP <71>, BacT/Alert Dual-T, and Bactec FX for detection of product sterility testing contaminants. J Clin Microbiol 57:e01548-18. doi: 10.1128/JCM.01548-18 - DOI - PMC - PubMed
1. Barone PW, Wiebe ME, Leung JC, Hussein ITM, Keumurian FJ, Bouressa J, Brussel A, Chen D, Chong M, Dehghani H, Gerentes L, Gilbert J, Gold D, Kiss R, Kreil TR, Labatut R, Li Y, Müllberg J, Mallet L, Menzel C, Moody M, Monpoeho S, Murphy M, Plavsic M, Roth NJ, Roush D, Ruffing M, Schicho R, Snyder R, Stark D, Zhang C, Wolfrum J, Sinskey AJ, Springs SL. 2020. Viral contamination in biologic manufacture and implications for emerging therapies. Nat Biotechnol 38:563–572. doi: 10.1038/s41587-020-0507-2 - DOI - PubMed

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Molecular Biology Databases
- BacDive

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Machine learning-based detection of adventitious microbes in T-cell therapy cultures using long-read sequencing

Affiliations

Machine learning-based detection of adventitious microbes in T-cell therapy cultures using long-read sequencing

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases