Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Apr 1;36(7):2272-2274.
doi: 10.1093/bioinformatics/btz921.

Logomaker: beautiful sequence logos in Python

Affiliations

Logomaker: beautiful sequence logos in Python

Ammar Tareen et al. Bioinformatics. .

Abstract

Summary: Sequence logos are visually compelling ways of illustrating the biological properties of DNA, RNA and protein sequences, yet it is currently difficult to generate and customize such logos within the Python programming environment. Here we introduce Logomaker, a Python API for creating publication-quality sequence logos. Logomaker can produce both standard and highly customized logos from either a matrix-like array of numbers or a multiple-sequence alignment. Logos are rendered as native matplotlib objects that are easy to stylize and incorporate into multi-panel figures.

Availability and implementation: Logomaker can be installed using the pip package manager and is compatible with both Python 2.7 and Python 3.6. Documentation is provided at http://logomaker.readthedocs.io; source code is available at http://github.com/jbkinney/logomaker.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Logomaker logos can represent diverse types of data. (A) Example input to Logomaker. Shown is an energy matrix for the transcription factor CRP; the elements of this pandas DataFrame represent - ΔΔG values contributed by each possible base (columns) at each nucleotide position (rows). Data are from Kinney et al. (2010). (B) An energy logo for CRP created by passing the DataFrame in panel A to Logomaker. The structural context of each nucleotide position is indicated [PDB 1CGP (Parkinson et al., 1996)]. (C) A probability logo computed from all annotated 5 splices sites in the human genome (Frankish et al., 2019). The dashed line indicates the exon/intron boundary. (D) An information logo computed from a multiple alignment of WW domain sequences [PFAM RP15 (Finn et al., 2014)], with the eponymous positions of this domain highlighted. (E) An enrichment logo representing the effects of mutations within the ARS1 replication origin of S.cerevisiae. Orange characters indicate the ARS1 wild-type sequence; highlighted regions correspond (from left to right) to the A, B1 and B2 elements of this sequence (Rao and Stillman, 1995). Data (unpublished; collected by J.B.K.) are from a mutARS-seq experiment analogous to the one reported by Liachko et al. (2013). (F) A masked logo (Shrikumar et al., 2017) representing the importance scores of nucleotides in the vicinity of U2SURP exon 9, as predicted by a deep neural network model of splice site selection. Logo adapted (with permission) from Fig. 1D of Jaganathan et al. (2019). The script used to make this figure is posted on the Logomaker GitHub page at logomaker/examples/figure.ipynb

References

    1. Bailey T.L. et al. (2009) MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 37, W202–W208. - PMC - PubMed
    1. Barnes S.L. et al. (2019) Mapping DNA sequence to transcription factor binding energy in vivo. PLoS Comput. Biol., 15, e1006226. - PMC - PubMed
    1. Belliveau N.M. et al. (2018) Systematic approach for dissecting the molecular mechanisms of transcriptional regulation in bacteria. Proc. Natl. Acad. Sci. USA, 115, E4796–E4805. - PMC - PubMed
    1. Colaert N. et al. (2009) Improved visualization of protein consensus sequences by iceLogo. Nat. Methods, 6, 786–787. - PubMed
    1. Crooks G.E. et al. (2004) WebLogo: a sequence logo generator. Genome Res., 14, 1188–1190. - PMC - PubMed

Publication types