Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Dec 10:11:595.
doi: 10.1186/1471-2105-11-595.

BIGSdb: Scalable analysis of bacterial genome variation at the population level

Affiliations

BIGSdb: Scalable analysis of bacterial genome variation at the population level

Keith A Jolley et al. BMC Bioinformatics. .

Abstract

Background: The opportunities for bacterial population genomics that are being realised by the application of parallel nucleotide sequencing require novel bioinformatics platforms. These must be capable of the storage, retrieval, and analysis of linked phenotypic and genotypic information in an accessible, scalable and computationally efficient manner.

Results: The Bacterial Isolate Genome Sequence Database (BIGSDB) is a scalable, open source, web-accessible database system that meets these needs, enabling phenotype and sequence data, which can range from a single sequence read to whole genome data, to be efficiently linked for a limitless number of bacterial specimens. The system builds on the widely used mlstdbNet software, developed for the storage and distribution of multilocus sequence typing (MLST) data, and incorporates the capacity to define and identify any number of loci and genetic variants at those loci within the stored nucleotide sequences. These loci can be further organised into 'schemes' for isolate characterisation or for evolutionary or functional analyses. Isolates and loci can be indexed by multiple names and any number of alternative schemes can be accommodated, enabling cross-referencing of different studies and approaches. LIMS functionality of the software enables linkage to and organisation of laboratory samples. The data are easily linked to external databases and fine-grained authentication of access permits multiple users to participate in community annotation by setting up or contributing to different schemes within the database. Some of the applications of BIGSDB are illustrated with the genera Neisseria and Streptococcus.The BIGSDB source code and documentation are available at http://pubmlst.org/software/database/bigsdb/.

Conclusions: Genomic data can be used to characterise bacterial isolates in many different ways but it can also be efficiently exploited for evolutionary or functional studies. BIGSDB represents a freely available resource that will assist the broader community in the elucidation of the structure and function of bacteria by means of a population genomics approach.

PubMed Disclaimer

Figures

Figure 1
Figure 1
BIGSdb links bacterial isolate provenance, phenotypic and genotypic data. Sequences from multiple sources such as single dye-terminator reaction reads, contigs generated from parallel sequencing technologies or complete assembled genomes can be associated with an isolate record. Following locus tagging, sequences can be readily extracted and exported in formats suitable for various analyses.
Figure 2
Figure 2
Iterative process of analysing loci. Since detailed phenotypic information can be included with the isolate record, the correlation between enzyme sequence diversity and phenotype can be examined using the integrated analysis tools.
Figure 3
Figure 3
The Genome Comparator plugin can identify loci shared among genomes. ClonalFrame trees were generated from 43 Streptococcal genome sequences using A) seven MLSA gene fragment loci and B) 77 complete genes found to be present throughout the genus identified by BIGSDB. Aligned sequences were exported from the database and 50% consensus trees generated from six independent runs with 50 k iterations, 50 k burn-in iterations, and a thinning interval of 100.
Figure 4
Figure 4
The BIGSdb database platform is highly scalable. The system can be used to analyse a few isolates up to many thousands, each with full genome data attached. This compares favourably with existing database and analysis resources.

References

    1. Pettersson E, Lundeberg J, Ahmadian A. Generations of sequencing technologies. Genomics. 2009;93(2):105–111. doi: 10.1016/j.ygeno.2008.10.003. - DOI - PubMed
    1. Roumagnac P, Weill FX, Dolecek C, Baker S, Brisse S, Chinh NT, Le TA, Acosta CJ, Farrar J, Dougan G. et al.Evolutionary history of Salmonella typhi. Science. 2006;314(5803):1301–1304. doi: 10.1126/science.1134933. - DOI - PMC - PubMed
    1. Baker S, Holt K, van de Vosse E, Roumagnac P, Whitehead S, King E, Ewels P, Keniry A, Weill FX, Lightfoot D. et al.High-throughput genotyping of Salmonella enterica serovar Typhi allowing geographical assignment of haplotypes and pathotypes within an urban District of Jakarta, Indonesia. J Clin Microbiol. 2008;46(5):1741–1746. doi: 10.1128/JCM.02249-07. - DOI - PMC - PubMed
    1. Holt KE, Parkhill J, Mazzoni CJ, Roumagnac P, Weill FX, Goodhead I, Rance R, Baker S, Maskell DJ, Wain J. et al.High-throughput sequencing provides insights into genome variation and evolution in Salmonella Typhi. Nat Genet. 2008;40(8):987–993. doi: 10.1038/ng.195. - DOI - PMC - PubMed
    1. Harris SR, Feil EJ, Holden MT, Quail MA, Nickerson EK, Chantratita N, Gardete S, Tavares A, Day N, Lindsay JA. et al.Evolution of MRSA during hospital transmission and intercontinental spread. Science. 2010;327(5964):469–474. doi: 10.1126/science.1182395. - DOI - PMC - PubMed

Publication types

LinkOut - more resources