. 2005 Jun 1:6:137.

doi: 10.1186/1471-2105-6-137.

SplitTester: software to identify domains responsible for functional divergence in protein family

Xiang Gao¹, Kent A Vander Velden, Daniel F Voytas, Xun Gu

Affiliations

PMID: 15929795
PMCID: PMC1181622
DOI: 10.1186/1471-2105-6-137

SplitTester: software to identify domains responsible for functional divergence in protein family

Xiang Gao et al. BMC Bioinformatics. 2005.

. 2005 Jun 1:6:137.

doi: 10.1186/1471-2105-6-137.

Authors

Xiang Gao¹, Kent A Vander Velden, Daniel F Voytas, Xun Gu

Affiliation

¹ Department of Genetics, Development & Cell Biology, Iowa State University, Ames, Iowa 50011, USA. gaoxiang@iastate.edu

PMID: 15929795
PMCID: PMC1181622
DOI: 10.1186/1471-2105-6-137

Abstract

Background: Many protein families have undergone functional divergence after gene duplications such that current subgroups of the family carry out overlapping but distinct biological roles. For the protein families with known functional subtypes (a functional split), we developed the software, SplitTester, to identify potential regions that are responsible for the observed distinct functional subtypes within the same protein family.

Results: Our software, SplitTester, takes a multiple protein sequences alignment as input, generated from protein members of two subgroups with known functional divergence. SplitTester was designed to construct the neighbor joining tree (a split cluster) from variable-sized sliding windows across the alignment in a process called split-clustering. SplitTester identifies the regions, whose split cluster is consistent with the functional split, but may be inconsistent with the phylogeny of the protein family. We hypothesize that at least some number of these identified regions, which are not following a random mutation process, are responsible for the observed functional split. To test our method, we used reverse transcriptase from a group of Pseudoviridae retrotransposons: to identify residues specific for diverged primer recognition. Candidate regions were then mapped onto the three dimensional structures of reverse transcriptase. The locations of these amino acids within the enzyme are consistent with their biological roles.

Conclusion: SplitTester aims to identify specific domain sequences responsible for functional divergence of subgroups within a protein family. From the analysis of retroelements reverse transcriptase family, we successfully identified the regions splitting this family according to the primer specificity, implying their functions in the specific primer selection.

PubMed Disclaimer

Figures

**Figure 1**
**The algorithm for the tree-based method to identify protein functional domains.** Multiple amino acid sequences alignment is used as an input file. Phylogenetic trees from different windows of the alignment are generated by the neighbor-joining method. For each window, the program determines whether the tree from the local sequence (split cluster) matches a predefined functional split. If the split cluster is consistent with their functional split, the sequence window is a candidate for carrying out that function. The program is iterative and starts with very small windows (i.e. three amino acids), which gradually increase until the window size equals the length of the protein alignment.

**Figure 2**
**A snapshot of the Split-Tester software.** A distance matrix was selected to compute phylogenetic relationships of the aligned input sequence data. The regions of the alignment that support the functional split are then plotted in the top window. The X-axis represents the length of the aligned sequences; the Y-axis represents increasing window size. After the computation is complete, the user can select a specific window for analysis by clicking on the left end of colored horizontal bars. The colors indicate the degree of confidence that a given window supports the predefined functional split (red = 100%; yellow = 75%; green = 50%; blue = 25%). The two panels on the lower right show all NJ equivalent trees generated from the selected window. The lower left window shows the actual sequences that support the predefined phylogenetic relationship within the selected window.

**Figure 3**
**Functional divergence in reverse transcriptase.** (A) The *SplitTester* output for the reverse transcriptase dataset. Windows supporting the functional split are shown as colored lines in the plot. The X-axis represents the length of the aligned sequences; the Y-axis represents increasing window size (see legend to Fig. 2 for additional detail). (B) The X-ray structure of the HIV reverse transcriptase/primer/template complex (1RTD). The reverse transcriptase protein is represented by the yellow strand. The two green regions are domains identified by *SplitTester*. All residue numbers correspond to HIV sequence positions in 1RTD. Residues 166–215 and 280–311 in the aligned retrotransposon sequences correspond to 167–210 and 267–297 in the HIV 1RTD sequence, respectively.

**Figure 4**
**Phylogenetic relationship from the full length multiple sequence alignment and the predicted region.** (A) NJ phylogeny (MEGA3.0 [32]) from the reverse transcriptase full length sequence alignments clusters the genes from the same host: *Osser*, Tnt1, *SIRE*-1 and *Opie*-2 are from plant host, while *Copia* and *1731* are from *Drosophila*. Ty1 and Ty5 are from *Saccharomyces cerevisiae*. (B) Split cluster from the window length of 50 aa (position 166–215) in the predicted region 1 supports the functional subtype split. (C) Functional signal (measured by the bootstrap of node β in panel B), as well as the evolutionary background (measured by the bootstrap of node α in panel A), plotted against the window size. In the window length less than 90, the split-clustering supports the functional subtypes split and the bootstrap value reach the peak in window with length around 50 aa. A mixed topology is detected when window length is longer than 90 aa, measured by the bootstrap (γ) between two major subtrees. When more amino acid sites are included, the bootstrapping value converges to the node α in panel A.

See this image and copyright information in PMC

Cited by

Divergent evolution of the chloroplast small heat shock protein gene in the genera Rhododendron (Ericaceae) and Machilus (Lauraceae).
Wu ML, Lin TP, Lin MY, Cheng YP, Hwang SY. Wu ML, et al. Ann Bot. 2007 Mar;99(3):461-75. doi: 10.1093/aob/mcl288. Epub 2007 Feb 9. Ann Bot. 2007. PMID: 17293350 Free PMC article.
Genome-wide functional divergence after the symbiosis of proteobacteria with insects unraveled through a novel computational approach.
Toft C, Williams TA, Fares MA. Toft C, et al. PLoS Comput Biol. 2009 Apr;5(4):e1000344. doi: 10.1371/journal.pcbi.1000344. Epub 2009 Apr 3. PLoS Comput Biol. 2009. PMID: 19343224 Free PMC article.

References

1. Tatusov RL, Koonin EV, Lipman DJ. A genomic perspective on protein families. Science. 1997;278:631–637. doi: 10.1126/science.278.5338.631. - DOI - PubMed
1. Golding GB, Dean AM. The structural basis of molecular adaptation. Mol Biol Evol. 1998;15:355–369. - PubMed
1. Yang Z, Bielawski JP. Statistical methods for detecting molecular adaptation. Trends in Ecology and Evolution. 2000;15:496–503. doi: 10.1016/S0169-5347(00)01994-7. - DOI - PMC - PubMed
1. Gu X. Statistical methods for testing functional divergence after gene duplication. Mol Biol Evol. 1999;16:1664–1674. - PubMed
1. Gu X, Vander Velden K. DIVERGE: phylogeny-based analysis for functional-structural divergence of a protein family. Bioinformatics. 2002;18:500–501. doi: 10.1093/bioinformatics/18.3.500. - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

SplitTester: software to identify domains responsible for functional divergence in protein family

Affiliation

SplitTester: software to identify domains responsible for functional divergence in protein family

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources