Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2008 Dec 22:9:554.
doi: 10.1186/1471-2105-9-554.

ProfileGrids as a new visual representation of large multiple sequence alignments: a case study of the RecA protein family

Affiliations
Comparative Study

ProfileGrids as a new visual representation of large multiple sequence alignments: a case study of the RecA protein family

Alberto I Roca et al. BMC Bioinformatics. .

Abstract

Background: Multiple sequence alignments are a fundamental tool for the comparative analysis of proteins and nucleic acids. However, large data sets are no longer manageable for visualization and investigation using the traditional stacked sequence alignment representation.

Results: We introduce ProfileGrids that represent a multiple sequence alignment as a matrix color-coded according to the residue frequency occurring at each column position. JProfileGrid is a Java application for computing and analyzing ProfileGrids. A dynamic interaction with the alignment information is achieved by changing the ProfileGrid color scheme, by extracting sequence subsets at selected residues of interest, and by relating alignment information to residue physical properties. Conserved family motifs can be identified by the overlay of similarity plot calculations on a ProfileGrid. Figures suitable for publication can be generated from the saved spreadsheet output of the colored matrices as well as by the export of conservation information for use in the PyMOL molecular visualization program.We demonstrate the utility of ProfileGrids on 300 bacterial homologs of the RecA family - a universally conserved protein involved in DNA recombination and repair. Careful attention was paid to curating the collected RecA sequences since ProfileGrids allow the easy identification of rare residues in an alignment. We relate the RecA alignment sequence conservation to the following three topics: the recently identified DNA binding residues, the unexplored MAW motif, and a unique Bacillus subtilis RecA homolog sequence feature.

Conclusion: ProfileGrids allow large protein families to be visualized more effectively than the traditional stacked sequence alignment form. This new graphical representation facilitates the determination of the sequence conservation at residue positions of interest, enables the examination of structural patterns by using residue physical properties, and permits the display of rare sequence features within the context of an entire alignment. JProfileGrid is free for non-commercial use and is available from http://www.profilegrid.org. Furthermore, we present a curated RecA protein collection that is more diverse than previous data sets; and, therefore, this RecA ProfileGrid is a rich source of information for nanoanatomy analysis.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A screen shot of the JProfileGrid parameter settings window.
Figure 2
Figure 2
The ProfileGrid viewer showing the RecA protein family results. The first 3 rows of the ProfileGrid are a position ruler (Posn), a majority consensus (Major), and a template sequence (here of the E. coli RecA homolog). The remaining rows tabulate the frequency of the amino acid and gap characters at each position of the alignment. Cells are color shaded according to the frequency value (Figure 3). The top-left corner identifies the character and the frequency of the ProfileGrid cell currently selected by the cursor.
Figure 3
Figure 3
The frequency settings determining a ProfileGrid cell color.
Figure 4
Figure 4
B. subtilis RecA highlight sequence example with frequency colors and values turned off.
Figure 5
Figure 5
The alignment viewer showing sequences from the currently selected ProfileGrid cell.
Figure 6
Figure 6
Similarity plot of the RecA protein family. Similarity values over the first 150 residues of the alignment were calculated using the BLOSUM62 scoring matrix and a window size of 9. A threshold value of 0.8 is indicated by the dashed line. A complete plot using a smaller RecA data set has been previously published [17].
Figure 7
Figure 7
ProfileGrid of 300 bacterial RecA protein sequences. The first row is the E. coli RecA protein sequence. The ProfileGrid cells are colored according to the following bins: <10% (white), ≥10% (gray), ≥25% (yellow), ≥50% (orange), ≥70% (green), ≥90% (red). The boxed regions (potential motifs) were drawn by JProfileGrid from the similarity plot calculations using an 80% threshold cutoff. For visual clarity, only the first 150 residues of the alignment are shown; and, the frequency values are omitted. Additional File 2 is the entire RecA ProfileGrid including frequency values. This figure was generated from the JProfileGrid spreadsheet output.
Figure 8
Figure 8
Visualization of PyMOL script output. JProfileGrid can write a ".pml" file that will define the following named selections based upon the ProfileGrid information: identical residues (black sidechains); conserved motifs ("mot#") colored from most amino terminal (red) to most carboxyl terminal (green); and connecting variable ("var#") regions (gray). These different selections are mapped on to the E. coli RecA crystal structure [PDB:2REB]. This orientation is defined as the anterior view of the RecA monomer anatomical position. Some of the named selections are indicated by arrows in this PyMOL screen shot.
Figure 9
Figure 9
Structural analysis of MAW and P-loop motif regions. The MAW and P-loop motifs are highly conserved parts of the RecA protein family found at E. coli homolog positions 40–65 and 66–73, respectively. Labels denote the locations of α-helix B and β-strand 1 from the E. coli RecA crystal structure. Sorting the ProfileGrid rows by various amino acid physical constants reveals structural patterns within the context of the entire MSA. (A) Sorting by decreasing helical propensity shows that residues which do not favor helical formation (circled) immediately follow a helix in the MAW motif. (B) Sorting by decreasing volume displays the pattern (blue lines) that large amino acids are flanked by residues smaller than threonine. Whereas these panels were generated from the spreadsheet output, the JProfileGrid software allows an interactive analysis by switching between residue properties and color schemes.
Figure 10
Figure 10
Representing a unique B. subtilis RecA sequence feature. In this ProfileGrid where the residues are sorted by volume, the B. subtilis RecA homolog is chosen as the "highlight sequence" and appears in the row immediately under the E. coli RecA template sequence. JProfileGrid performs a pair-wise comparison and represents any differences between the two sequences with blue boxes. It is clear within the context of the entire MSA that B. subtilis has a rarely occurring sequence from residues 85 to 90 (E. coli RecA numbering).

Similar articles

Cited by

References

    1. Devereux J, Haeberli PH, Smithies OS. A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res. 1984;12:387–395. - PMC - PubMed
    1. Parry-Smith DJ, Attwood TK. SOMAP: a novel interactive approach to multiple protein sequences alignment. Comput Appl Biosci. 1991;7:233–235. - PubMed
    1. Barton GJ. ALSCRIPT: a tool to format multiple sequence alignments. Protein Eng. 1993;6:37–40. - PubMed
    1. Smith DK, Xue H. A major component approach to presenting consensus sequences. Bioinformatics. 1998;14:151–156. - PubMed
    1. Schneider TD, Stephens RM. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 1990;18:6097–6100. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources