Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Feb 1;32(3):345-53.
doi: 10.1093/bioinformatics/btv582. Epub 2015 Oct 12.

A multi-objective optimization approach accurately resolves protein domain architectures

Affiliations

A multi-objective optimization approach accurately resolves protein domain architectures

J S Bernardes et al. Bioinformatics. .

Abstract

Motivation: Given a protein sequence and a number of potential domains matching it, what are the domain content and the most likely domain architecture for the sequence? This problem is of fundamental importance in protein annotation, constituting one of the main steps of all predictive annotation strategies. On the other hand, when potential domains are several and in conflict because of overlapping domain boundaries, finding a solution for the problem might become difficult. An accurate prediction of the domain architecture of a multi-domain protein provides important information for function prediction, comparative genomics and molecular evolution.

Results: We developed DAMA (Domain Annotation by a Multi-objective Approach), a novel approach that identifies architectures through a multi-objective optimization algorithm combining scores of domain matches, previously observed multi-domain co-occurrence and domain overlapping. DAMA has been validated on a known benchmark dataset based on CATH structural domain assignments and on the set of Plasmodium falciparum proteins. When compared with existing tools on both datasets, it outperforms all of them.

Availability and implementation: DAMA software is implemented in C++ and the source code can be found at http://www.lcqb.upmc.fr/DAMA.

Contact: juliana.silva_bernardes@upmc.fr or alessandra.carbone@lip6.fr

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Main steps in DAMA algorithm. (A) Potential domains (P) for a query sequence are ranked from bottom to top by their decreasing E value. Overlaps between domains are allowed (see domains de or dg). Domains are denoted with different letters. The same letter code is used in (B–E). (B) Domains in (A) are represented as nodes of an ‘interval’ graph, where edges connect all overlapping domains with the exception of domains with a very small overlap or domains that appear as overlapping in the list of known protein architectures in (C) (see the overlapping condition). Edges not included in G are (g, d) because the overlap is too small and (a, c) because the overlap is known (see C). (C) List of known domain architectures sharing domains with P. (D) List of all Maximal Independent Subsets associated to some domain (MSI) for the graph in B satisfying pairwise domain constraints according to the list in (C). (E) List of feasible architectures obtained by crossing information coming from (C) and (D). (F) Filtering of the architectures in E with 5 optimization functions and selection of the best architecture
Fig. 2.
Fig. 2.
Selection of architectures with a multiple domain co-occurrence. (A) Two architectures, A1, A2, where A1 is composed by a double occurrence of domain b. Domains are denoted with different letters, and the same letter code is used in (B) and (C). (B) List of architectures that allow for the selection of A1 with the objective function F2, because F2(abb)=3 and F2(abc)=2. (C) List of architectures that allow for the selection of A2 with the objective function F4, because F4(abb)=2 and F4(abc)=3. Note that F2(abb)=F2(abc)=2,F3(abb)=F3(abc)=3
Fig. 3.
Fig. 3.
Performance of DAMA and other tools on the P. falciparum proteins annotation. (A) The y-axis is the number of predicted domains per protein (‘signal’), while the x-axis is the FDR (‘noise’), so better performing methods have higher curves (more signal for a given noise threshold). On the 1-mer and the 4-mer hypotheses, DAMA (red) outperforms all hmmscan variations tested (black) and the methods MDA (pink), dPUC (blue) and CODD (green). (B) The y-axis is the number of certified domains (‘signal’) obtained by a method, while the x-axis is the FDR (‘noise’) computed over domain architectures. Colors as in (A). (C) Distribution of the number of proteins with a fixed number of predicted domains, for each tool at FDR =2e-4 (see vertical bar in A) for 1-mer and at FDR =9e-4 for 4-mer. (D) Distribution of number of proteins with a fixed number of predicted domains, for each tool at FDR =0.05 (see vertical bar in B)
Fig. 4.
Fig. 4.
Three examples of domain predictions on P. falciparum proteins. Architectures of P. falciparum proteins identified by DAMA, CODD, dPUC, MDA run with default parameters. DAMA shows to identify more domains that other tools and predictions are based on known co-occurrence in Pfam27: (A) DEAD, SPRY, Helicase_C co-occur in 113 proteins; (B) RRM_5 (2 occurrences), RRM_1, RRM_6 co-occur in 34 proteins; (C) CDC48_N, CDC48_2, AAA x 2, Vps4_C co-occur in 159 proteins. The name of the protein (PlasmDB id) is followed by its length (number of amino-acids). DAMA was used fixing a FDR threshold at 2e–04, as in the 1-mer experiment reported in Figure 3C

Similar articles

Cited by

References

    1. Apic G., et al. (2001) Domain combinations in archaeal, eubacterial and eukaryotic proteomes. J. Mol. Biol., 310, 311–325. - PubMed
    1. Aurrecoechea C., et al. (2009) PlasmoDB: a functional genomic database for malaria parasites. Nucleic Acids Res., 37, D539–D543. - PMC - PubMed
    1. Bahl A., et al. (2003) PlasmoDB: the Plasmodium genome resource. A database integrating experimental and computational data. Nucleic Acids Res., 31, 212–215. - PMC - PubMed
    1. Basu M., et al. (2008) Evolution of protein domain promiscuity in eukaryotes. Genome Res., 18, 449–461. - PMC - PubMed
    1. Björklund A., et al. (2005) Domain rearrangements in protein evolution. J. Mol. Biol., 353, 911–923. - PubMed

Publication types

Substances