ORF1ab codon frequency model predicts host-pathogen relationship in orthocoronavirinae

doi:10.3389/fbinf.2025.1562668

. 2025 Mar 18:5:1562668.

doi: 10.3389/fbinf.2025.1562668. eCollection 2025.

ORF1ab codon frequency model predicts host-pathogen relationship in orthocoronavirinae

Phillip E Davis¹, Joseph A Russell¹

Affiliations

PMID: 40170904
PMCID: PMC11958986
DOI: 10.3389/fbinf.2025.1562668

ORF1ab codon frequency model predicts host-pathogen relationship in orthocoronavirinae

Phillip E Davis et al. Front Bioinform. 2025.

. 2025 Mar 18:5:1562668.

doi: 10.3389/fbinf.2025.1562668. eCollection 2025.

Authors

Phillip E Davis¹, Joseph A Russell¹

Affiliation

¹ MRIGlobal, Gaithersburg, MD, United States.

PMID: 40170904
PMCID: PMC11958986
DOI: 10.3389/fbinf.2025.1562668

Abstract

Predicting phenotypic properties of a virus directly from its sequence data is an attractive goal for viral epidemiology. Here, we focus narrowly on the Orthocoronavirinae clade and demonstrate models that are powerfully predictive for a human-pathogen phenotype with 76.74% average precision and 85.96% average recall on the withheld test set groups, using only Orf1ab codon frequencies. We show alternative examples for other viral coding sequences and feature representations that do not perform well and discuss what distinguishes the models that are performant. These models point to a small subset of features, specifically 5 codons, that are critical to the success of the models. We discuss and contextualize how this observation may fit within a larger model for the role of translation in virus-host agreement.

Keywords: bioinformactics; feature selection; genotype-to-phenotype; machine learning; viruses.

PubMed Disclaimer

Conflict of interest statement

Authors PD and JR were employed by MRIGlobal.

Figures

**FIGURE 1**
Average performance metrics across each of the one hundred test set splits for each combination of viral CDS and feature representation. Codon frequency model is top performer, with boosts in average performance across each metric over RSCU. Error bars represent 95% confidence interval.

**FIGURE 2**
Number of Non-Zero (NNZ) coefficients for the top 15 codons in the codon frequency model across all one hundred models fit on Orf1ab for codon frequency and RSCU features. The Trp^TGG codon is used in 96 of 100 codon frequency models but is not available as a feature in the RSCU models.

See this image and copyright information in PMC

References

1. Babayan S. A., Orton R. J., Streicker D. G. (2018). Predicting reservoir hosts and arthropod vectors from evolutionary signatures in RNA virus genomes. Science 362, 577–580. 10.1126/science.aap9072 - DOI - PMC - PubMed
1. Bahiri-Elitzur S., Tuller T. (2021). Codon-based indices for modeling gene expression and transcript evolution. Comput. Struct. Biotechnol. J. 19, 2646–2663. 10.1016/j.csbj.2021.04.042 - DOI - PMC - PubMed
1. Belalov I. S., Lukashev A. N. (2013). Causes and implications of codon usage bias in RNA viruses. PLOS ONE 8 (2), e56642. 10.1371/journal.pone.0056642 - DOI - PMC - PubMed
1. Brierley L., Fowler A. (2021). Predicting the animal hosts of coronaviruses from compositional biases of spike protein and whole genome sequences through machine learning. PLoS Pathog. 17 (4), e1009149. 10.1371/journal.ppat.1009149 - DOI - PMC - PubMed
1. Corman V. M., Eckerle I., Memish Z. A., Liljander A. M., Dijkman R., Jonsdottir H., et al. (2016). Link of a ubiquitous human coronavirus to dromedary camels. Proc. Natl. Acad. Sci. U. S. A. 113 (35), 9864–9869. 10.1073/pnas.1604472113 - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources
- Frontiers Media SA
- PubMed Central

[1] Babayan S. A., Orton R. J., Streicker D. G. (2018). Predicting reservoir hosts and arthropod vectors from evolutionary signatures in RNA virus genomes. Science 362, 577–580. 10.1126/science.aap9072 - DOI - PMC - PubMed

[2] Babayan S. A., Orton R. J., Streicker D. G. (2018). Predicting reservoir hosts and arthropod vectors from evolutionary signatures in RNA virus genomes. Science 362, 577–580. 10.1126/science.aap9072 - DOI - PMC - PubMed

[3] Bahiri-Elitzur S., Tuller T. (2021). Codon-based indices for modeling gene expression and transcript evolution. Comput. Struct. Biotechnol. J. 19, 2646–2663. 10.1016/j.csbj.2021.04.042 - DOI - PMC - PubMed

[4] Bahiri-Elitzur S., Tuller T. (2021). Codon-based indices for modeling gene expression and transcript evolution. Comput. Struct. Biotechnol. J. 19, 2646–2663. 10.1016/j.csbj.2021.04.042 - DOI - PMC - PubMed

[5] Belalov I. S., Lukashev A. N. (2013). Causes and implications of codon usage bias in RNA viruses. PLOS ONE 8 (2), e56642. 10.1371/journal.pone.0056642 - DOI - PMC - PubMed

[6] Belalov I. S., Lukashev A. N. (2013). Causes and implications of codon usage bias in RNA viruses. PLOS ONE 8 (2), e56642. 10.1371/journal.pone.0056642 - DOI - PMC - PubMed

[7] Brierley L., Fowler A. (2021). Predicting the animal hosts of coronaviruses from compositional biases of spike protein and whole genome sequences through machine learning. PLoS Pathog. 17 (4), e1009149. 10.1371/journal.ppat.1009149 - DOI - PMC - PubMed

[8] Brierley L., Fowler A. (2021). Predicting the animal hosts of coronaviruses from compositional biases of spike protein and whole genome sequences through machine learning. PLoS Pathog. 17 (4), e1009149. 10.1371/journal.ppat.1009149 - DOI - PMC - PubMed

[9] Corman V. M., Eckerle I., Memish Z. A., Liljander A. M., Dijkman R., Jonsdottir H., et al. (2016). Link of a ubiquitous human coronavirus to dromedary camels. Proc. Natl. Acad. Sci. U. S. A. 113 (35), 9864–9869. 10.1073/pnas.1604472113 - DOI - PMC - PubMed

[10] Corman V. M., Eckerle I., Memish Z. A., Liljander A. M., Dijkman R., Jonsdottir H., et al. (2016). Link of a ubiquitous human coronavirus to dromedary camels. Proc. Natl. Acad. Sci. U. S. A. 113 (35), 9864–9869. 10.1073/pnas.1604472113 - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

ORF1ab codon frequency model predicts host-pathogen relationship in orthocoronavirinae

Affiliation

ORF1ab codon frequency model predicts host-pathogen relationship in orthocoronavirinae

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

References

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

References

Related information

LinkOut - more resources

Full Text Sources