Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov;297(6):1741-1754.
doi: 10.1007/s00438-022-01951-w. Epub 2022 Sep 20.

A computational approach to biological pathogenicity

Affiliations

A computational approach to biological pathogenicity

Max Garzon et al. Mol Genet Genomics. 2022 Nov.

Abstract

The current pandemic (COVID-19) has made evident the need to approach pathogenicity from a deeper and more systematic perspective that might lead to methodologies to quickly predict new strains of microbes that could be pathogenic to humans. Here we propose as a solution a general and principled definition of pathogenicity that can be practically implemented in operational ways in a framework for characterizing and assessing the (degree of) potential pathogenicity of a microbe to a given host (e.g., a human individual) just based on DNA biomarkers, and to the point of predicting its impact on a host a priori to a meaningful degree of accuracy. The definition is based on basic biochemistry, the Gibbs free Energy of duplex formation between oligonucleotides and some deep structural properties of DNA revealed by an approximation with certain properties. We propose two operational tests based on the nearest neighbor (NN) model of the Gibbs Energy and an approximating metric (the h-distance.) Quality assessments demonstrate that these tests predict pathogenicity with an accuracy of over 80%, and sensitivity and specificity over 90%. Other tests obtained by training machine learning models on deep features extracted from DNA sequences yield scores of 90% for accuracy, 100% for sensitivity and 80% for specificity. These results hint towards the possibility of an operational, objective, and general conceptual framework for prior identification of pathogens and their impact without the cost of death or sickness in a host (e.g., humans.) Consequently, a reasonable prediction of possible pathogens might pave the way to eventually transform the way we handle and prepare for future pandemic events and mitigate the adverse impact on human health, while reducing the number of clinical trials to obtain similar results.

Keywords: Digital genomic signature; Gibbs energy; Hybridization; Machine learning; Pathogenic relationship; Pathogens/nonpathogens; h-distance.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1
Fig. 1
A DNA sequence x is shredded into fragments of the same length n as that of the probes on an nxh basis so that the total number of fragments hybridizing with each oligo can be counted for each probe to obtain a feature vector from x. The oligos for the basis are judiciously selected in such a way that no cross hybridization occurs among probes in the basis itself and, moreover, that every random fragment hybridizes to (ideally exactly) one probe. An ideal basis thus produces feature vectors that are fully reproducible and contain much of the information in the original sequence x
Fig. 2
Fig. 2
Performance assessment of the definition of pathogenicity of bacteria and fungi using thresholding methods, based on the decision about hybridization events between oligos in the proxies of a host and a microorganism (Top: based on Gibbs Energy and Bottom: based on h-distance.) The x-axis represents different data sets for proxies and grids (IDs are in Table 4.)
Fig. 3
Fig. 3
Performance assessment of the definition of pathogenicity of bacteria (top), fungi (middle) and combined (bottom) obtained using machine learning models trained on genomic signatures

References

    1. Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P (2002) Introduction to pathogens. In Molecular biology of the cell, 4th edn. Garland Science.
    1. Azizzadeh S, Garzon M, Mainali S. Classifying single nucleotide polymorphisms in humans. Mol Genet Genomics. 2021;296:1161–1173. doi: 10.1007/s00438-021-01805-x. - DOI - PubMed
    1. Balloux F, van Dorp L. Q&A: What are pathogens, and what have they done to and for us? BMC Biol. 2017;15(1):1–6. doi: 10.1186/s12915-017-0433-z. - DOI - PMC - PubMed
    1. Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2012;41(D1):D36–D42. doi: 10.1093/nar/gks1195. - DOI - PMC - PubMed
    1. Casadevall A, Pirofski LA. Host-pathogen interactions: redefining the basic concepts of virulence and pathogenicity. Infect Immun. 1999;67(8):3703–3713. doi: 10.1128/IAI.67.8.3703-3713.1999. - DOI - PMC - PubMed