MAVL/StickWRLD for protein: visualizing protein sequence families to detect non-consensus features

William C Ray¹

Affiliations

Affiliation

¹ Children's Research Institute and The Department of Pediatrics, The Ohio State University, 700 Children's Drive, Columbus, OH 43205, USA. ray@biosci.ohio-state.edu

PMID: 15980480
PMCID: PMC1160135
DOI: 10.1093/nar/gki374

MAVL/StickWRLD for protein: visualizing protein sequence families to detect non-consensus features

William C Ray. Nucleic Acids Res. 2005.

. 2005 Jul 1;33(Web Server issue):W315-9.

doi: 10.1093/nar/gki374.

Author

William C Ray¹

Affiliation

¹ Children's Research Institute and The Department of Pediatrics, The Ohio State University, 700 Children's Drive, Columbus, OH 43205, USA. ray@biosci.ohio-state.edu

PMID: 15980480
PMCID: PMC1160135
DOI: 10.1093/nar/gki374

Abstract

A fundamental problem with applying Consensus, Weight-Matrix or hidden Markov models as search tools for biosequences is that there is no way to know, from the model, if the modeled sequences display any dependencies between positional identities. In some instances, these dependencies are crucial in correctly accepting or rejecting other sequences as members of the family. MAVL (multiple alignment variation linker) and StickWRLD provide a web-based method to visually survey the model-training sequences to discover and characterize possible dependencies. Initially introduced for nucleic acid sequences, with MAVL/StickWRLD, it is easy to distinguish typical DNA or RNA structural dependencies in input families, identify mixed populations of distinct subfamilies, or discover novel dependencies that result from binding interactions or other selective pressures [W. Ray (2004) Nucleic Acids Res., 32, W59-W63]. Since the announcement of MAVL/StickWRLD for nucleic acids, one of the most requested new features has been the extension of this visualization method to support protein alignments. We are pleased to report that this extension has been successful, that the basic visualization has been augmented in several ways to enhance protein viewing, and that the results with protein alignments are even more dramatic than with NA alignments. MAVL/StickWRLD can be accessed at http://www.microbial-pathogenesis.org/stickwrld/.

PubMed Disclaimer

Figures

**Figure 1**
The residue properties and colors that are available, and applied by the ‘Sort Residues By’, and ‘Color Residues By’ interface options. The AA column lists amino acid residues by their single-letter code, and ‘b’ for StickWRLD's gap representation. The GV, GC and GP columns list Grantham (9) volume, composition and polarity values respectively. The KDH column lists Kyte–Doolittle (10) hydropathy. The BH column displays classic Branden–Tooze hydrophobicity-associated colors. The RA and RS columns show RasMol ‘Amino’ and ‘Shapely’ color schemes, and the CX column shows the CLUSTAL X default color scheme. Please remember that computer, monitor and VRML browser settings will all affect the actual displayed colors.

**Figure 2**
A StickWRLD diagram showing related positions in the ADK_Lid domain. The positively related cysteines at 4, 8, 35 and 38 stabilize the domain structure using bound zinc in a conformation analogous to a zinc finger, while the positively related histidine, serine, aspartic acid and threonine in the same positions stabilize the same domain structure using a network of hydrogen bonds. The amino acid identities are arranged vertically by their Kyte–Doolittle hydropathy score, and are colored using Branden and Tooze's classical coloring and grouping of residues by hydrophobic character. Consensus identities in each position are highlighted by a transparent unit cube. In a live VRML browser, this diagram is completely navigable and the viewer can rotate, move and zoom the 3D diagram to examine details of any portion.

**Figure 3**
A LogoMat-M HMM-Logo visualizing the HMM defined by the Pfam ADK_lid training sequence set, renumbered to match the complete 299-member predicted family. The logo provides a convenient visualization of the identity probabilities at each position, and to what extent each position contributes to the information content of the model, but it does not suggest the apparent requirement for either of the alternate C4, C8, C35, C38 or H4, H8, R10, D35, T38, E41 motifs that are found in the actual sequences.

**Figure 4**
A RasMol rendition of the bovine ADK_lid structure. Residues 4, 8, 35 and 38 are mutated to cysteines in Gram-positive bacterial ADK_lid domains. Residues 10 and 41 are implicated in a structural capacity in variants lacking the cysteines, due to strong positive relationships between their identities, and those of the 4, 8, 35, 38 quadruple.

See this image and copyright information in PMC

Cited by

MAVL/StickWRLD: analyzing structural constraints using interpositional dependencies in biomolecular sequence alignments.
Ozer HG, Ray WC. Ozer HG, et al. Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W133-6. doi: 10.1093/nar/gkl251. Nucleic Acids Res. 2006. PMID: 16844976 Free PMC article.
Optimization of Synthetic Proteins: Identification of Interpositional Dependencies Indicating Structurally and/or Functionally Linked Residues.
Rumpf RW, Ray WC. Rumpf RW, et al. J Vis Exp. 2015 Jul 14;(101):e52878. doi: 10.3791/52878. J Vis Exp. 2015. PMID: 26274377 Free PMC article.
FvatfA regulates growth, stress tolerance as well as mycotoxin and pigment productions in Fusarium verticillioides.
Szabó Z, Pákozdi K, Murvai K, Pusztahelyi T, Kecskeméti Á, Gáspár A, Logrieco AF, Emri T, Ádám AL, Leiter É, Hornok L, Pócsi I. Szabó Z, et al. Appl Microbiol Biotechnol. 2020 Sep;104(18):7879-7899. doi: 10.1007/s00253-020-10717-6. Epub 2020 Jul 27. Appl Microbiol Biotechnol. 2020. PMID: 32719911 Free PMC article.
Understanding the sequence requirements of protein families: insights from the BioVis 2013 contests.
Ray WC, Rumpf RW, Sullivan B, Callahan N, Magliery T, Machiraju R, Wong B, Krzywinski M, Bartlett CW. Ray WC, et al. BMC Proc. 2014 Aug 28;8(Suppl 2 Proceedings of the 3rd Annual Symposium on Biologica):S1. doi: 10.1186/1753-6561-8-S2-S1. eCollection 2014. BMC Proc. 2014. PMID: 25237388 Free PMC article.
Addressing the unmet need for visualizing conditional random fields in biological data.
Ray WC, Wolock SL, Callahan NW, Dong M, Li QQ, Liang C, Magliery TJ, Bartlett CW. Ray WC, et al. BMC Bioinformatics. 2014 Jul 10;15:202. doi: 10.1186/1471-2105-15-202. BMC Bioinformatics. 2014. PMID: 25000815 Free PMC article.

References

1. Bailey T.L., Gribskov M. Combining evidence using p-values: application to sequence homology searches. Bioinformatics. 1998;14:48–54. - PubMed
1. Durbin R., Eddy S., Krogh A., Mitchison G. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge: Cambridge University Press; 1998.
1. Gregoret L.M., Sauer R.T. Additivity of mutant effects assessed by binomial mutagenesis. Proc. Natl Acad. Sci. USA. 1993;90:4246–4250. - PMC - PubMed
1. Taylor W.R., Hatrick K. Compensating changes in protein multiple sequence alignments. Protein Eng. 1994;7:341–348. - PubMed
1. Afonnikov D.A., Kolchanov N.A. CRASP: a program for analysis of coordinated substitutions in multiple alignments of protein sequences. Nucleic Acids Res. 2004;32:W64–W68. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

MAVL/StickWRLD for protein: visualizing protein sequence families to detect non-consensus features

Affiliation

MAVL/StickWRLD for protein: visualizing protein sequence families to detect non-consensus features

Author

Affiliation

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources