Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Apr 1;23(7):809-14.
doi: 10.1093/bioinformatics/btm034. Epub 2007 Feb 3.

SCOOP: a simple method for identification of novel protein superfamily relationships

Affiliations

SCOOP: a simple method for identification of novel protein superfamily relationships

Alex Bateman et al. Bioinformatics. .

Abstract

Motivation: Profile searches of sequence databases are a sensitive way to detect sequence relationships. Sophisticated profile-profile comparison algorithms that have been recently introduced increase search sensitivity even further.

Results: In this article, a simpler approach than profile-profile comparison is presented that has a comparable performance to state-of-the-art tools such as COMPASS, HHsearch and PRC. This approach is called SCOOP (Simple Comparison Of Outputs Program), and is shown to find known relationships between families in the Pfam database as well as detect novel distant relationships between families. Several novel discoveries are presented including the discovery that a domain of unknown function (DUF283) found in Dicer proteins is related to double-stranded RNA-binding domains.

Availability: SCOOP is freely available under a GNU GPL license from http://www.sanger.ac.uk/Users/agb/SCOOP/.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Roc curves comparing scoop to profile-profile comparison tools. The graph shows the cumulative number of true family relationships that are found with increasing number of false relationships. (a and b) using the conservative definition at high and low numbers of false positives; (c and d) using the liberal definition at high and low numbers of false positives.
Fig. 1
Fig. 1
Roc curves comparing scoop to profile-profile comparison tools. The graph shows the cumulative number of true family relationships that are found with increasing number of false relationships. (a and b) using the conservative definition at high and low numbers of false positives; (c and d) using the liberal definition at high and low numbers of false positives.
Fig. 2
Fig. 2
ROC curves comparing SCOOP including matches using different E-value thresholds. All curves use the conservative definition of false positives.
Fig. 3
Fig. 3
The fraction of matches above a given SCOOP score that are known to be true or false, or are unknown.
Fig. 4
Fig. 4
A pairwise HMM logo (Schuster-Bockler et al., 2005) of the highest scoring false match from the SCOOP results, between the WW domain and the MHC II alpha domain.
Fig. 5
Fig. 5
A pairwise HMM logo (Schuster-Bockler et al., 2005) of the CUE and DMA domains.
Fig. 6
Fig. 6
A venn diagram showing the degree of overlap between the top 1000 matches of SCOOP, PRC and HHsearch.

References

    1. Altschul SF, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed
    1. Andreeva A, et al. SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res. 2004;32:D226–D229. - PMC - PubMed
    1. Edgar RC, Sjolander K. COACH: profile-profile alignment of protein families using hidden Markov models. Bioinformatics. 2004;20:1309–1318. - PubMed
    1. Finn RD, et al. Pfam: clans, web tools and services. Nucleic Acids Res. 2006;34:D247–D251. - PMC - PubMed
    1. Krogh A, et al. Hidden Markov models in computational biology. J. Mol. Biol. 1994;235:1501–1531. - PubMed

Publication types