Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 May 24:11:e15339.
doi: 10.7717/peerj.15339. eCollection 2023.

minSNPs: an R package for the derivation of resolution-optimised SNP sets from microbial genomic data

Affiliations

minSNPs: an R package for the derivation of resolution-optimised SNP sets from microbial genomic data

Kian Soon Hoon et al. PeerJ. .

Abstract

Here, we present the R package, minSNPs. This is a re-development of a previously described Java application named Minimum SNPs. MinSNPs assembles resolution-optimised sets of single nucleotide polymorphisms (SNPs) from sequence alignments such as genome-wide orthologous SNP matrices. MinSNPs can derive sets of SNPs optimised for discriminating any user-defined combination of sequences from all others. Alternatively, SNP sets may be optimised to determine all sequences from all other sequences, i.e., to maximise diversity. MinSNPs encompasses functions that facilitate rapid and flexible SNP mining, and clear and comprehensive presentation of the results. The minSNPs' running time scales in a linear fashion with input data volume and the numbers of SNPs and SNPs sets specified in the output. MinSNPs was tested using a previously reported orthologous SNP matrix of Staphylococcus aureus and an orthologous SNP matrix of 3,279 genomes with 164,335 SNPs assembled from four S. aureus short read genomic data sets. MinSNPs was shown to be effective for deriving discriminatory SNP sets for potential surveillance targets and in identifying SNP sets optimised to discriminate isolates from different clonal complexes. MinSNPs was also tested with a large Plasmodium vivax orthologous SNP matrix. A set of five SNPs was derived that reliably indicated the country of origin within three south-east Asian countries. In summary, we report the capacity to assemble comprehensive SNP matrices that effectively capture microbial genomic diversity, and to rapidly and flexibly mine these entities for optimised marker sets.

Keywords: Genome; Genome alignments; Microbial; Plasmodium; Resolution optimised; SNP genotyping; SNP matrices; SNP mining; SNPs; Staphylococcus.

PubMed Disclaimer

Conflict of interest statement

The authors declare there are no competing interests. Peter Shaw is employed by Oujian Laboratory.

Figures

Figure 1
Figure 1. A summary of how to use minSNPs, and the SNP search algorithm.
Figure 2
Figure 2. Correspondence between SNP allele genotypes and phylogeny for the S. aureus STARRS data.
Correspondence between SNP allele genotypes and phylogeny for the S. aureus STARRS data. The phylogenetic tree was reproduced from (15) and labelled with two newly identified high-D SNP sets (https://microreact.org/project/minsnps-starrs). High-diversity index SNP sets 1 and 11 are comprised of positions 111760, 1925985, 2663300, 2683490, 124088, and position 539419, 1413096, 1146945, 2184528, 1577370, of the Mu50 reference genome.

References

    1. Adam I, Alam MS, Alemu S, Amaratunga C, Amato R, Andrianaranjaka V, Anstey NM, Aseffa A, Ashley E, Assefa A, Auburn S, Barber BE, Barry A, Pereira DB, Cao J, Chau NH, Chotivanich K, Chu C, Dondorp AM, Drury E, Echeverry DF, Erko B, Espino F, Fairhurst R, Faiz A, Villegas MAF, Gao Q, Golassa L, Goncalves S, Grigg MJ, Hamedi Y, Hien TT, Htut Y, Johnson KJ, Karunaweera N, Khan W, Krudsood S, Kwiatkowski DP, Lacerda M, Ley B, Lim P, Liu Y, Llanos-Cuentas A, Lon C, Lopera-Mesa T, Marfurt J, Michon P, Miotto O, Mohammed R, Mueller I, Namaik-larp C, Newton PN, Nguyen T-N, Nosten F, Noviyanti R, Pava Z, Pearson RD, Petros B, Phyo AP, Price RN, Pukrittayakamee S, Rahim AG, Randrianarivelojosia M, Rayner JC, Rumaseb A, Siegel SV, Simpson VJ, Thriemer K, Tobon-Castano A, Trimarsanto H, Ferreira MU, Vélez ID, Wangchuk S, Wellems TE, White NJ, William T, Yasnot MF, Yilma D. An open dataset of Plasmodium vivax genome variation in 1,895 worldwide samples. Wellcome Open Research. 2022;7:136. doi: 10.12688/wellcomeopenres.17795.1. - DOI - PMC - PubMed
    1. Auburn S, Benavente ED, Miotto O, Pearson RD, Amato R, Grigg MJ, Barber BE, William T, Handayuni I, Marfurt J, Trimarsanto H, Noviyanti R, Sriprawat K, Nosten F, Campino S, Clark TG, Anstey NM, Kwiatkowski DP, Price RN. Genomic analysis of a pre-elimination Malaysian Plasmodium vivax population reveals selective pressures and changing transmission dynamics. Nature Communications. 2018;9:2585. doi: 10.1038/s41467-018-04965-4. - DOI - PMC - PubMed
    1. Coll F, Raven KE, Knight GM, Blane B, Harrison EM, Leek D, Enoch DA, Brown NM, Parkhill J, Peacock SJ. Definition of a genetic relatedness cutoff to exclude recent transmission of meticillin-resistant Staphylococcus aureus: a genomic epidemiology analysis. The Lancet Microbe. 2020;1:e328–e335. - PMC - PubMed
    1. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R. The variant call format and VCFtools. Bioinformatics. 2011;27(15):2156–2158. doi: 10.1093/bioinformatics/btr330. - DOI - PMC - PubMed
    1. Diez Benavente E, Campos M, Phelan J, Nolder D, Dombrowski JG, Marinho CRF, Sriprawat K, Taylor AR, Watson J, Roper C, Nosten F, Sutherland CJ, Campino S, Clark TG. A molecular barcode to inform the geographical origin and transmission dynamics of Plasmodium vivax malaria. PLOS Genetics. 2020;16:e1008576. doi: 10.1371/journal.pgen.1008576. - DOI - PMC - PubMed

Publication types

LinkOut - more resources