SDT: a virus classification tool based on pairwise sequence alignment and identity calculation

Brejnev Muhizi Muhire¹, Arvind Varsani², Darren Patrick Martin¹

Affiliations

¹ Department of Clinical Laboratory Sciences, University of Cape Town, Cape Town, South Africa.
² Department of Clinical Laboratory Sciences, University of Cape Town, Cape Town, South Africa; School of Biological Sciences and Biomolecular Interaction Centre, University of Canterbury, Christchurch, New Zealand; Department of Plant Pathology and Emerging Pathogens Institute, University of Florida, Gainesville, Florida, United States of America.

PMID: 25259891
PMCID: PMC4178126
DOI: 10.1371/journal.pone.0108277

SDT: a virus classification tool based on pairwise sequence alignment and identity calculation

Brejnev Muhizi Muhire et al. PLoS One. 2014.

. 2014 Sep 26;9(9):e108277.

doi: 10.1371/journal.pone.0108277. eCollection 2014.

Authors

Brejnev Muhizi Muhire¹, Arvind Varsani², Darren Patrick Martin¹

Affiliations

¹ Department of Clinical Laboratory Sciences, University of Cape Town, Cape Town, South Africa.
² Department of Clinical Laboratory Sciences, University of Cape Town, Cape Town, South Africa; School of Biological Sciences and Biomolecular Interaction Centre, University of Canterbury, Christchurch, New Zealand; Department of Plant Pathology and Emerging Pathogens Institute, University of Florida, Gainesville, Florida, United States of America.

PMID: 25259891
PMCID: PMC4178126
DOI: 10.1371/journal.pone.0108277

Abstract

The perpetually increasing rate at which viral full-genome sequences are being determined is creating a pressing demand for computational tools that will aid the objective classification of these genome sequences. Taxonomic classification approaches that are based on pairwise genetic identity measures are potentially highly automatable and are progressively gaining favour with the International Committee on Taxonomy of Viruses (ICTV). There are, however, various issues with the calculation of such measures that could potentially undermine the accuracy and consistency with which they can be applied to virus classification. Firstly, pairwise sequence identities computed based on multiple sequence alignments rather than on multiple independent pairwise alignments can lead to the deflation of identity scores with increasing dataset sizes. Also, when gap-characters need to be introduced during sequence alignments to account for insertions and deletions, methodological variations in the way that these characters are introduced and handled during pairwise genetic identity calculations can cause high degrees of inconsistency in the way that different methods classify the same sets of sequences. Here we present Sequence Demarcation Tool (SDT), a free user-friendly computer program that aims to provide a robust and highly reproducible means of objectively using pairwise genetic identity calculations to classify any set of nucleotide or amino acid sequences. SDT can produce publication quality pairwise identity plots and colour-coded distance matrices to further aid the classification of sequences according to ICTV approved taxonomic demarcation criteria. Besides a graphical interface version of the program for Windows computers, command-line versions of the program are available for a variety of different operating systems (including a parallel version for cluster computing platforms).

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 1. The SDT interface.**
(A) Colour-coded pairwise identity matrix generated from 29 *Chickpea chlorotic dwarf virus* genomes. Each coloured cell represents a percentage identity score between two sequences (one indicated horizontally to the left and the other vertically at the bottom). A coloured key indicates the correspondence between pairwise identities and the colours displayed in the matrix. (B) Pairwise identity frequency distribution plot. The horizontal axis indicates percentage pairwise identities, and the vertical axis indicates proportions of these identities within the distribution. While peaks on the graph indicate pairwise sequence identity thresholds that would yield the most ambiguous classifications, troughs indicate thresholds that would yield the least ambiguous classifications and could therefore be tentatively used as relatively conflict free operational taxonomic unit demarcation cut-offs.

**Figure 2. Distribution of pairwise genetic/evolutionary distances of the same set of 25 mastrevirus full genome sequences in the context of progressively larger sequence datasets.**
The constant frequency distribution (represented by red graph) illustrates the consistency of pairwise distance calculation based on pairwise alignments while the changing frequency distributions (represented by blue and green graphs) indicate how pairwise distance scores based on multiple sequence alignment tend to become inflated as dataset sizes get larger.

See this image and copyright information in PMC

References

1. Kim M, Oh H-S, Park S-C, Chun J (2014) Towards a taxonomic coherence between average nucleotide identity and 16 S rRNA gene sequence similarity for species demarcation of prokaryotes. Int J Syst Evol Microbiol 64: 346–351. - PubMed
1. Bao Y, Chetvernin V, Tatusova T (2012) PAirwise Sequence Comparison (PASC) and its application in the classification of filoviruses. Viruses 4: 1318–1327. - PMC - PubMed
1. Muhire B, Martin DP, Brown JK, Navas-Castillo J, Moriones E, et al. (2013) A genome-wide pairwise-identity-based proposal for the classification of viruses in the genus Mastrevirus (family Geminiviridae). Arch Virol 158: 1411–1424. - PubMed
1. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32: 1792–1797. - PMC - PubMed
1. Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, et al. (2007) Clustal W and Clustal X version 2.0. Bioinformatics 23: 2947–2948. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

SDT: a virus classification tool based on pairwise sequence alignment and identity calculation

Affiliations

SDT: a virus classification tool based on pairwise sequence alignment and identity calculation

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous