Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Aug 22;19(1):303.
doi: 10.1186/s12859-018-2315-y.

bioSyntax: syntax highlighting for computational biology

Affiliations

bioSyntax: syntax highlighting for computational biology

Artem Babaian et al. BMC Bioinformatics. .

Abstract

Background: Computational biology requires the reading and comprehension of biological data files. Plain-text formats such as SAM, VCF, GTF, PDB and FASTA, often contain critical information which is obfuscated by the data structure complexity.

Results: bioSyntax ( https://biosyntax.org/ ) is a freely available suite of biological syntax highlighting packages for vim, gedit, Sublime, VSCode, and less. bioSyntax improves the legibility of low-level biological data in the bioinformatics workspace.

Conclusion: bioSyntax supports computational scientists in parsing and comprehending their data efficiently and thus can accelerate research output.

Keywords: Command line interface; FASTA; FASTQ; SAM; Sublime; Syntax highlighting; VCF; Vim.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Syntax highlighting for sequence alignment map (.sam) file format. a Terminal screenshot of the ‘less HG00128_hgr1.sam’ command run i. normally or ii. with bioSyntax. Related information in the header and data sections are grouped by colours (genomic coordinates, green; sample information, pale blue ...) to improve legibility. Each data-row is an individual sequencing read. Iii. CIGAR alignment strings in particular can be highlighted such that they become substantially easier to read. b A broad view of the nucleotide and PHRED-score for 30 reads i. before, and ii. after syntax highlighting. Underlying information of about the data becomes intuitively visible such as PCR-duplicates (black arrow) and poor quality areas and reads (blue arrow) based on iii. PHRED score
Fig. 2
Fig. 2
bioSyntax nucleotide colour scheme. a The four primary bases are coloured in two pairs of contrasting colours. IUPAC ambiguous bases are then coloured in increasingly lighter tones of the approximately mixed colours. To accomodate 4-dimensional bases in 3-dimensional colours, aMino (A or C) and Keto (G or T) bases are darker. b A comparison of nucleotide colour-schemes in the literature. c bioSyntax colouring allows for approximation of a sequences GC-content by how warm (high GC) or cool (high AT) it appears

References

    1. Lipman DJ, Pearson WR. Rapid and sensitive protein similarity searches. Science. 1985;227:1435–1441. doi: 10.1126/science.2983426. - DOI - PubMed
    1. Cock PJA, Fields CJ, Goto N, Heuer ML, Rice PM. The sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 2010;38:1767–1771. doi: 10.1093/nar/gkp1137. - DOI - PMC - PubMed
    1. Keibler E, Brent MR. Eval: a software package for analysis of genome annotations. BMC Bioinformatics. 2003;4:50. doi: 10.1186/1471-2105-4-50. - DOI - PMC - PubMed
    1. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27:2156–2158. doi: 10.1093/bioinformatics/btr330. - DOI - PMC - PubMed
    1. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, et al. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–242. doi: 10.1093/nar/28.1.235. - DOI - PMC - PubMed

LinkOut - more resources