Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Jul 1;33(Web Server issue):W315-9.
doi: 10.1093/nar/gki374.

MAVL/StickWRLD for protein: visualizing protein sequence families to detect non-consensus features

Affiliations

MAVL/StickWRLD for protein: visualizing protein sequence families to detect non-consensus features

William C Ray. Nucleic Acids Res. .

Abstract

A fundamental problem with applying Consensus, Weight-Matrix or hidden Markov models as search tools for biosequences is that there is no way to know, from the model, if the modeled sequences display any dependencies between positional identities. In some instances, these dependencies are crucial in correctly accepting or rejecting other sequences as members of the family. MAVL (multiple alignment variation linker) and StickWRLD provide a web-based method to visually survey the model-training sequences to discover and characterize possible dependencies. Initially introduced for nucleic acid sequences, with MAVL/StickWRLD, it is easy to distinguish typical DNA or RNA structural dependencies in input families, identify mixed populations of distinct subfamilies, or discover novel dependencies that result from binding interactions or other selective pressures [W. Ray (2004) Nucleic Acids Res., 32, W59-W63]. Since the announcement of MAVL/StickWRLD for nucleic acids, one of the most requested new features has been the extension of this visualization method to support protein alignments. We are pleased to report that this extension has been successful, that the basic visualization has been augmented in several ways to enhance protein viewing, and that the results with protein alignments are even more dramatic than with NA alignments. MAVL/StickWRLD can be accessed at http://www.microbial-pathogenesis.org/stickwrld/.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The residue properties and colors that are available, and applied by the ‘Sort Residues By’, and ‘Color Residues By’ interface options. The AA column lists amino acid residues by their single-letter code, and ‘b’ for StickWRLD's gap representation. The GV, GC and GP columns list Grantham (9) volume, composition and polarity values respectively. The KDH column lists Kyte–Doolittle (10) hydropathy. The BH column displays classic Branden–Tooze hydrophobicity-associated colors. The RA and RS columns show RasMol ‘Amino’ and ‘Shapely’ color schemes, and the CX column shows the CLUSTAL X default color scheme. Please remember that computer, monitor and VRML browser settings will all affect the actual displayed colors.
Figure 2
Figure 2
A StickWRLD diagram showing related positions in the ADK_Lid domain. The positively related cysteines at 4, 8, 35 and 38 stabilize the domain structure using bound zinc in a conformation analogous to a zinc finger, while the positively related histidine, serine, aspartic acid and threonine in the same positions stabilize the same domain structure using a network of hydrogen bonds. The amino acid identities are arranged vertically by their Kyte–Doolittle hydropathy score, and are colored using Branden and Tooze's classical coloring and grouping of residues by hydrophobic character. Consensus identities in each position are highlighted by a transparent unit cube. In a live VRML browser, this diagram is completely navigable and the viewer can rotate, move and zoom the 3D diagram to examine details of any portion.
Figure 3
Figure 3
A LogoMat-M HMM-Logo visualizing the HMM defined by the Pfam ADK_lid training sequence set, renumbered to match the complete 299-member predicted family. The logo provides a convenient visualization of the identity probabilities at each position, and to what extent each position contributes to the information content of the model, but it does not suggest the apparent requirement for either of the alternate C4, C8, C35, C38 or H4, H8, R10, D35, T38, E41 motifs that are found in the actual sequences.
Figure 4
Figure 4
A RasMol rendition of the bovine ADK_lid structure. Residues 4, 8, 35 and 38 are mutated to cysteines in Gram-positive bacterial ADK_lid domains. Residues 10 and 41 are implicated in a structural capacity in variants lacking the cysteines, due to strong positive relationships between their identities, and those of the 4, 8, 35, 38 quadruple.

Similar articles

Cited by

References

    1. Bailey T.L., Gribskov M. Combining evidence using p-values: application to sequence homology searches. Bioinformatics. 1998;14:48–54. - PubMed
    1. Durbin R., Eddy S., Krogh A., Mitchison G. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge: Cambridge University Press; 1998.
    1. Gregoret L.M., Sauer R.T. Additivity of mutant effects assessed by binomial mutagenesis. Proc. Natl Acad. Sci. USA. 1993;90:4246–4250. - PMC - PubMed
    1. Taylor W.R., Hatrick K. Compensating changes in protein multiple sequence alignments. Protein Eng. 1994;7:341–348. - PubMed
    1. Afonnikov D.A., Kolchanov N.A. CRASP: a program for analysis of coordinated substitutions in multiple alignments of protein sequences. Nucleic Acids Res. 2004;32:W64–W68. - PMC - PubMed

Substances