Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Oct 13:10:e14055.
doi: 10.7717/peerj.14055. eCollection 2022.

Ribovirus classification by a polymerase barcode sequence

Affiliations

Ribovirus classification by a polymerase barcode sequence

Artem Babaian et al. PeerJ. .

Erratum in

Abstract

RNA viruses encoding a polymerase gene (riboviruses) dominate the known eukaryotic virome. High-throughput sequencing is revealing a wealth of new riboviruses known only from sequence, precluding classification by traditional taxonomic methods. Sequence classification is often based on polymerase sequences, but standardised methods to support this approach are currently lacking. To address this need, we describe the polymerase palmprint, a segment of the palm sub-domain robustly delineated by well-conserved catalytic motifs. We present an algorithm, Palmscan, which identifies palmprints in nucleotide and amino acid sequences; PALMdb, a collection of palmprints derived from public sequence databases; and palmID, a public website implementing palmprint identification, search, and annotation. Together, these methods demonstrate a proof-of-concept workflow for high-throughput characterisation of RNA viruses, paving the path for the continued rapid growth in RNA virus discovery anticipated in the coming decade.

Keywords: RNA virus; RNA-dependent RNA polymerase; Virus classification; Virus evolution.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1
Figure 1. Conservation of catalytic motifs in three divergent RdRp structures in the Protein Data Bank.
Coronaviridae (COV, virus: SARS-CoV-2, pdb: 7CYQ.) is from a positive-strand RNA virus. Reoviridae (REO, virus: Mammalian orthoreovirus 3 Dearing, pdb: 1N1H) is from adouble-stranded RNA virus, and Permutotetraviridae (PER, virus: Thosea asigna virus, pdb: 5CYR) is apositive-stranded RNA virus with permutation of motif C. RdRp domains defined by PFAM (COV: RdRP_1, TAV: DNA/RNA pol_sf, and REO: RdRP_5) are shown within the open reading frame for REO and PER, and the mature polyprotein cleavage peptide from the 7,096aa ORF1ab for COV.
Figure 2
Figure 2. (A) The palmprint segment is a ~100aa region in the active site of the polymerase domain.
Motifs A, B and C are well-conserved; the intervening V1 and V2 regions are more variable. (B) Sequence logos for the five established RdRp-containing ribovirus phyla (Duplorna=Duplornaviricota, etc.) and one for RTs in Artverviricota; a PSSM is constructed corresponding to each logo. (C) Palmscan alignment for NC_009224.1 palmprint showing motifs in canonical ABC order. (D) Alignment for NC_040813.1 withpermuted motifs in CAB order.
Figure 3
Figure 3. Defining RdRp boundaries for sequence-based classification.
Schematic depiction of methods for defining RdRp segment boundaries for sequence analysis. As shown at the top, RdRp may be embedded in a multi-gene ORF (see also Fig. 1). Below are three alternative RdRp boundary schemes defined by Wolf et al. (2018) (“Wolf2018”), Zayed et al. (2022) (“Zayed2022”), and Edgar et al. (2022) (“Edgar2022”), respectively. Wolf2018 attempted to identify approximately full-length genes, discarding fragments unless they are close to full-length. This scheme is problematic because RdRp is often found in a longer ORF with other functional domains, and in such cases the boundary of the RdRp is often unclear. Zayed2022 used a similar scheme while additionally allowing fragments. Allowing fragments allows more sequences to be included but is problematic for classification because pairs with little or no overlap may be assigned to different vOTUs even if they belong to the same species. Edgar2022 used palmprints, a short segment of RdRp with well-defined boundaries.
Figure 4
Figure 4. Overview of palmID and procedurally generated figures (interactive version: https://serratus.io/palmid?hash=ruby).
(A) Workflow, (B) quality control, (C) geospatial map, and (D) matching palmprints in PALMdb.
Figure 5
Figure 5. Lengths of the V1 and V2 variable regions and palmprint segment.
Distributions were measured on full-length RefSeq Orthornavirae genomes.
Figure 6
Figure 6. Identity threshold tuning.
(A) Number of clusters obtained by clustering RdRP palmprints of 2,048 recognised ICTV species at identity thresholds 97%, 96% … 85%. (B) Number of species that are split over multiple OTUs, lumped together with one or more other species into a single OTU, both lumped and split (Lmp+Spl), or pure (not lumped or split). The best fit of number of clusters to number of species is obtained at 90% identity.
Figure 7
Figure 7. High-confidence palmprint score threshold.
Distribution of RdRP palmprint scores on non-RdRP decoy set (top) and full PFAM RdRP alignments (bottom). The score threshold was set to 20 to discriminate RdRP polymerases with high confidence.
Figure 8
Figure 8. Rubi- and rubi-like viruses identified by palmID (A) Genome synteny among of Rubiviruses (RV) and related Matonaviruses (MV) showing significant (E < 10−4) protein domain matches.
(B) Parallel phylogenetic tree created from RNA dependent RNA polymerase (RdRP) or concatenated capsid and E2/E1 glycoproteins, inlay showing unrooted RdRP-tree. (C) Protein sequence alignment of the common RdRP fragment with motif A, B, and C highlighted.

References

    1. Abarenkov K, Henrik Nilsson R, Larsson K-H, Alexander IJ, Eberhardt U, Erland S, Høiland K, Kjøller R, Larsson E, Pennanen T, Sen R, Taylor AFS, Tedersoo L, Ursing BM, Vrålstad T, Liimatainen K, Peintner U, Kõljalg U. The UNITE database for molecular identification of fungi—recent updates and future perspectives. New Phytologist. 2010;186(2):281–285. doi: 10.1111/nph.2010.186.issue-2. - DOI - PubMed
    1. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research. 1997;25(17):3389–3402. doi: 10.1093/nar/25.17.3389. - DOI - PMC - PubMed
    1. Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer EL, Studholme DJ, Yeats C, Eddy SR. The Pfam protein families database. Nucleic Acids Research. 2004;32(suppl_1):D138–D141. doi: 10.1093/nar/gkh121. - DOI - PMC - PubMed
    1. Bennett AJ, Paskey AC, Ebinger A, Pfaff F, Priemer G, Höper D, Breithaupt A, Heuser E, Ulrich RG, Kuhn JH, Bishop-Lilly KA, Beer M, Goldberg TL. Relatives of rubella virus in diverse mammals. Nature. 2020;586:424–428. doi: 10.1038/s41586-020-2812-9. Number: 7829 Publisher: Nature Publishing Group. - DOI - PMC - PubMed
    1. Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. Genbank. Nucleic Acids Research. 2012;41(D1):D36–D42. doi: 10.1093/nar/gks1195. - DOI - PMC - PubMed

Publication types

Substances

Grants and funding

LinkOut - more resources