Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Aug 24:14:1228552.
doi: 10.3389/fgene.2023.1228552. eCollection 2023.

EasySSR: a user-friendly web application with full command-line features for large-scale batch microsatellite mining and samples comparison

Affiliations

EasySSR: a user-friendly web application with full command-line features for large-scale batch microsatellite mining and samples comparison

Sandy Ingrid Aguiar Alves et al. Front Genet. .

Abstract

Microsatellites, also known as SSRs or STRs, are polymorphic DNA regions with tandem repetitions of a nucleotide motif of size 1-6 base pairs with a broad range of applications in many fields, such as comparative genomics, molecular biology, and forensics. However, the majority of researchers do not have computational training and struggle while running command-line tools or very limited web tools for their SSR research, spending a considerable amount of time learning how to execute the software and conducting the post-processing data tabulation in other tools or manually-time that could be used directly in data analysis. We present EasySSR, a user-friendly web tool with command-line full functionality, designed for practical use in batch identifying and comparing SSRs in sequences, draft, or complete genomes, not requiring previous bioinformatic skills to run. EasySSR requires only a FASTA and an optional GENBANK file of one or more genomes to identify and compare STRs. The tool can automatically analyze and compare SSRs in whole genomes, convert GenBank to PTT files, identify perfect and imperfect SSRs and coding and non-coding regions, compare their frequencies, abundancy, motifs, flanking sequences, and iterations, producing many outputs ready for download such as PTT files, interactive charts, and Excel tables, giving the user the data ready for further analysis in minutes. EasySSR was implemented as a web application, which can be executed from any browser and is available for free at https://computationalbiology.ufpa.br/easyssr/. Tutorials, usage notes, and download links to the source code can be found at https://github.com/engbiopct/EasySSR.

Keywords: batch; bioinformatics; comparison; genome; large scale; microsatellites; motifs; web tool.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
EasySSR workflow from user input to output. (A) In input, EasySSR receives user information, user, and parameters. (B) In Step 1, it receives the input, verifies the data, and converts GENBANK to PTT files. (C) With each pair of FASTA files-PTT files ready, EasySSR starts Step 2 by analyzing every file with IMEx, repeating the process until all files have been processed. (D) Then, in Step 3, EasySSR processes all IMEX outputs, stores the data in a new project at the database, and processes the summarized data into sheets and charts. (E) The output is exhibited through a HTML page, and the data are made available for download.
FIGURE 2
FIGURE 2
(A) EasySSR input screen. (B) EasySSR loading screen.
FIGURE 3
FIGURE 3
Custom parameters interface.
FIGURE 4
FIGURE 4
Easy SSR output screen part 1, with time of analysis, download report folder, and donut comparison charts. Demonstration of EasySSR Reports from the batch comparison of perfect and imperfect SSR in 54 complete genomes of Corynebacterium pseudotuberculosis with gene annotation.
FIGURE 5
FIGURE 5
Easy SSR output screen part 2, from the large-scale analysis and comparison of perfect and imperfect SSR in 54 complete genomes of Corynebacterium pseudotuberculosis with gene annotation. (A) Interactive stacked bar chart summarizing the top 10 motifs with iteration present in most genomes, with their frequency per genome. (B) Interactive stacked bar chart summarizing the top 10 motifs present in most genomes, with their frequency per genome.
FIGURE 6
FIGURE 6
Easy SSR output screen part 3, from the large-scale analysis and comparison of perfect and imperfect SSRs in 54 complete genomes of Corynebacterium pseudotuberculosis, with gene annotation. (A) Data table, (B) Frequency of Motifs per Genome table, and (C) Statistics table ordered by sequence name.
FIGURE 7
FIGURE 7
Demonstration of EasySSR Reports from the batch comparison of perfect SSRs in 10 BAC genomes without gene annotation. (A) EasySSR comparison charts with graphs for imperfect SSRs are blank due to the parameters set for mining perfect SSRs only, and coding/non-coding graphs are all in one color because no annotation file was input (B) EasySSR statistics table reports in web mode, with all coding information as 0 because no annotation file was input.
FIGURE 8
FIGURE 8
Demonstration of EasySSR Reports from the batch comparison of perfect and imperfect SSR in five sequences with gene annotation: human atrophin1 gene, Plasmodium falciparum chromosome IV, yeast chromosome IV, Mycobacterium tuberculosis H37Rv, and Escherichia coli K12. (A) Comparison charts and (B) statistics table reports in print mode.
FIGURE 9
FIGURE 9
Easy SSR output screen from the large-scale analysis and comparison of perfect SSR in 54 complete genomes of Corynebacterium pseudotuberculosis with gene annotation. (A) Comparison charts and (B) statistics table reports ordered by total SSR.
FIGURE 10
FIGURE 10
Demonstration of EasySSR Reports from the batch comparison of perfect and imperfect SSR in 54 sequences of Corynebacterium pseudotuberculosis with annotation. Statistics table reports in Excel mode optimized for visualization of the complete output with all columns and rows.

References

    1. Beier S., Thiel T., Münch T., Scholz U., Mascher M. (2017). MISA-Web: a web server for microsatellite prediction. Bioinformatics 33, 2583–2585. 10.1093/bioinformatics/btx198 - DOI - PMC - PubMed
    1. Benson G. (1999). Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580. 10.1093/nar/27.2.573 - DOI - PMC - PubMed
    1. Biswas M. K., Natarajan S., Biswas D., Nath U. K., Park J.-I., Nou I. (2018). Lsat: liliaceae simple sequences analysis tool, a web server. Bioinformation 14, 181–182. 10.6026/97320630014181 - DOI - PMC - PubMed
    1. Boeva V., Regnier M., Papatsenko D., Makeev V. (2006). Short fuzzy tandem repeats in genomic sequences, identification, and possible role in regulation of gene expression. Bioinformatics 22, 676–684. 10.1093/bioinformatics/btk032 - DOI - PubMed
    1. da Maia L. C., Palmieri D. A., de Souza V. Q., Kopp M. M., de Carvalho F. I. F., Costa de Oliveira A. (2008). SSR locator: tool for simple sequence repeat discovery integrated with primer design and PCR simulation. Int. J. Plant Genomics 2008, 412696–412699. 10.1155/2008/412696 - DOI - PMC - PubMed

LinkOut - more resources