Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 8;23(1):268.
doi: 10.1186/s12859-022-04809-5.

getSequenceInfo: a suite of tools allowing to get genome sequence information from public repositories

Affiliations

getSequenceInfo: a suite of tools allowing to get genome sequence information from public repositories

Vincent Moco et al. BMC Bioinformatics. .

Abstract

Background: Biological sequences are increasing rapidly and exponentially worldwide. Nucleotide sequence databases play an important role in providing meaningful genomic information on a variety of biological organisms.

Results: The getSequenceInfo software tool allows to access sequence information from various public repositories (GenBank, RefSeq, and the European Nucleotide Archive), and is compatible with different operating systems (Linux, MacOS, and Microsoft Windows) in a programmatic way (command line) or as a graphical user interface. getSequenceInfo or gSeqI v1.0 should help users to get some information on queried sequences that could be useful for specific studies (e.g. the country of origin/isolation or the release date of queried sequences). Queries can be made to retrieve sequence data based on a given kingdom and species, or from a given date. This program allows the separation between chromosomes and plasmids (or other genetic elements/components) by arranging each component in a given folder. Some basic statistics are also performed by the program (such as the calculation of GC content for queried assemblies). An empirically designed nucleotide ratio is calculated using nucleotide information in order to tentatively provide a "NucleScore" for studied genome assemblies. Besides the main gSeqI tool, other additional tools have been developed to perform various tasks related to sequence analysis.

Conclusion: The aim of this study is to democratize the use of public repositories in programmatic ways, and to facilitate sequence data analysis in a pedagogical perspective. Output results are available in FASTA, FASTQ, Excel/TSV or HTML formats. The program is freely available at: https://github.com/karubiotools/getSequenceInfo . getSequenceInfo and supplementary tools are partly available through the recently released Galaxy KaruBioNet platform ( http://calamar.univ-ag.fr/c3i/galaxy_karubionet.html ).

Keywords: Assembly; DNA; Genome sequences; Metadata; Nucleotide diversity; Repository.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Overview of the GUI of getSequenceInfo
Fig. 2
Fig. 2
Galaxy KaruBioNet screenshot with gSeqI and additional tools framed in red
Fig. 3
Fig. 3
NucleScore value of 50 genomes belonging to 7 different bacterial species (Acinetobacter baumannii, Bacillus cereus, Bacillus subtilis, Escherichia coli, Klebsiella pneumoniae, Salmonella enterica, Staphylococcus aureus)

References

    1. Karsch-Mizrachi I, Takagi T, Cochrane G. International nucleotide sequence database collaboration. Int Nucleotide Seq Database Collab Nucleic Acids Res. 2018;46:D48–D51. doi: 10.1093/nar/gkx1097. - DOI - PMC - PubMed
    1. Ogasawara O, Kodama Y, Mashima J, Kosuge T, Fujisawa T. DDBJ Database updates and computational infrastructure enhancement. Nucleic Acids Res. 2020;48:D45–D50. doi: 10.1093/nar/gkz982. - DOI - PMC - PubMed
    1. Amid C, Alako BTF, BalavenkataramanKadhirvelu V, Burdett T, Burgin J, Fan J, Harrison PW, Holt S, Hussein A, Ivanov E, Jayathilaka S, Kay S, Keane T, Leinonen R, Liu X, Martinez-Villacorta J, Milano A, Pakseresht A, Rahman N, Rajan J, Reddy K, Richards E, Smirnov D, Sokolov A, Vijayaraja S, Cochrane G. The European nucleotide archive in 2019. Nucleic Acids Res. 2020;48:D70–D76. doi: 10.1093/nar/gkz1063. - DOI - PMC - PubMed
    1. Sayers EW, Cavanaugh M, Clark K, Ostell J, Pruitt KD, Karsch-Mizrachi I. GenBank. Nucleic Acids Res. 2020;48:D84–D86. doi: 10.1093/nar/gkz956. - DOI - PMC - PubMed
    1. Zhu Y, Stephens RM, Meltzer PS, Davis SR. SRAdb: query and use public next-generation sequencing data from within R. BMC Bioinform. 2013;14:19. doi: 10.1186/1471-2105-14-19. - DOI - PMC - PubMed

LinkOut - more resources