Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Jun 1:6:137.
doi: 10.1186/1471-2105-6-137.

SplitTester: software to identify domains responsible for functional divergence in protein family

Affiliations

SplitTester: software to identify domains responsible for functional divergence in protein family

Xiang Gao et al. BMC Bioinformatics. .

Abstract

Background: Many protein families have undergone functional divergence after gene duplications such that current subgroups of the family carry out overlapping but distinct biological roles. For the protein families with known functional subtypes (a functional split), we developed the software, SplitTester, to identify potential regions that are responsible for the observed distinct functional subtypes within the same protein family.

Results: Our software, SplitTester, takes a multiple protein sequences alignment as input, generated from protein members of two subgroups with known functional divergence. SplitTester was designed to construct the neighbor joining tree (a split cluster) from variable-sized sliding windows across the alignment in a process called split-clustering. SplitTester identifies the regions, whose split cluster is consistent with the functional split, but may be inconsistent with the phylogeny of the protein family. We hypothesize that at least some number of these identified regions, which are not following a random mutation process, are responsible for the observed functional split. To test our method, we used reverse transcriptase from a group of Pseudoviridae retrotransposons: to identify residues specific for diverged primer recognition. Candidate regions were then mapped onto the three dimensional structures of reverse transcriptase. The locations of these amino acids within the enzyme are consistent with their biological roles.

Conclusion: SplitTester aims to identify specific domain sequences responsible for functional divergence of subgroups within a protein family. From the analysis of retroelements reverse transcriptase family, we successfully identified the regions splitting this family according to the primer specificity, implying their functions in the specific primer selection.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The algorithm for the tree-based method to identify protein functional domains. Multiple amino acid sequences alignment is used as an input file. Phylogenetic trees from different windows of the alignment are generated by the neighbor-joining method. For each window, the program determines whether the tree from the local sequence (split cluster) matches a predefined functional split. If the split cluster is consistent with their functional split, the sequence window is a candidate for carrying out that function. The program is iterative and starts with very small windows (i.e. three amino acids), which gradually increase until the window size equals the length of the protein alignment.
Figure 2
Figure 2
A snapshot of the Split-Tester software. A distance matrix was selected to compute phylogenetic relationships of the aligned input sequence data. The regions of the alignment that support the functional split are then plotted in the top window. The X-axis represents the length of the aligned sequences; the Y-axis represents increasing window size. After the computation is complete, the user can select a specific window for analysis by clicking on the left end of colored horizontal bars. The colors indicate the degree of confidence that a given window supports the predefined functional split (red = 100%; yellow = 75%; green = 50%; blue = 25%). The two panels on the lower right show all NJ equivalent trees generated from the selected window. The lower left window shows the actual sequences that support the predefined phylogenetic relationship within the selected window.
Figure 3
Figure 3
Functional divergence in reverse transcriptase. (A) The SplitTester output for the reverse transcriptase dataset. Windows supporting the functional split are shown as colored lines in the plot. The X-axis represents the length of the aligned sequences; the Y-axis represents increasing window size (see legend to Fig. 2 for additional detail). (B) The X-ray structure of the HIV reverse transcriptase/primer/template complex (1RTD). The reverse transcriptase protein is represented by the yellow strand. The two green regions are domains identified by SplitTester. All residue numbers correspond to HIV sequence positions in 1RTD. Residues 166–215 and 280–311 in the aligned retrotransposon sequences correspond to 167–210 and 267–297 in the HIV 1RTD sequence, respectively.
Figure 4
Figure 4
Phylogenetic relationship from the full length multiple sequence alignment and the predicted region. (A) NJ phylogeny (MEGA3.0 [32]) from the reverse transcriptase full length sequence alignments clusters the genes from the same host: Osser, Tnt1, SIRE-1 and Opie-2 are from plant host, while Copia and 1731 are from Drosophila. Ty1 and Ty5 are from Saccharomyces cerevisiae. (B) Split cluster from the window length of 50 aa (position 166–215) in the predicted region 1 supports the functional subtype split. (C) Functional signal (measured by the bootstrap of node β in panel B), as well as the evolutionary background (measured by the bootstrap of node α in panel A), plotted against the window size. In the window length less than 90, the split-clustering supports the functional subtypes split and the bootstrap value reach the peak in window with length around 50 aa. A mixed topology is detected when window length is longer than 90 aa, measured by the bootstrap (γ) between two major subtrees. When more amino acid sites are included, the bootstrapping value converges to the node α in panel A.

Similar articles

Cited by

References

    1. Tatusov RL, Koonin EV, Lipman DJ. A genomic perspective on protein families. Science. 1997;278:631–637. doi: 10.1126/science.278.5338.631. - DOI - PubMed
    1. Golding GB, Dean AM. The structural basis of molecular adaptation. Mol Biol Evol. 1998;15:355–369. - PubMed
    1. Yang Z, Bielawski JP. Statistical methods for detecting molecular adaptation. Trends in Ecology and Evolution. 2000;15:496–503. doi: 10.1016/S0169-5347(00)01994-7. - DOI - PMC - PubMed
    1. Gu X. Statistical methods for testing functional divergence after gene duplication. Mol Biol Evol. 1999;16:1664–1674. - PubMed
    1. Gu X, Vander Velden K. DIVERGE: phylogeny-based analysis for functional-structural divergence of a protein family. Bioinformatics. 2002;18:500–501. doi: 10.1093/bioinformatics/18.3.500. - DOI - PubMed

Publication types

MeSH terms

LinkOut - more resources