Comparative Study

. 2003 Feb 1;31(3):944-52.

doi: 10.1093/nar/gkg189.

Improving the performance of DomainParser for structural domain partition using neural network

Jun-tao Guo¹, Dong Xu, Dongsup Kim, Ying Xu

Affiliations

PMID: 12560490
PMCID: PMC149209
DOI: 10.1093/nar/gkg189

Comparative Study

Improving the performance of DomainParser for structural domain partition using neural network

Jun-tao Guo et al. Nucleic Acids Res. 2003.

. 2003 Feb 1;31(3):944-52.

doi: 10.1093/nar/gkg189.

Authors

Jun-tao Guo¹, Dong Xu, Dongsup Kim, Ying Xu

Affiliation

¹ Protein Informatics Group, Life Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37830-6480, USA.

PMID: 12560490
PMCID: PMC149209
DOI: 10.1093/nar/gkg189

Abstract

Structural domains are considered as the basic units of protein folding, evolution, function and design. Automatic decomposition of protein structures into structural domains, though after many years of investigation, remains a challenging and unsolved problem. Manual inspection still plays a key role in domain decomposition of a protein structure. We have previously developed a computer program, DomainParser, using network flow algorithms. The algorithm partitions a protein structure into domains accurately when the number of domains to be partitioned is known. However the performance drops when this number is unclear (the overall performance is 74.5% over a set of 1317 protein chains). Through utilization of various types of structural information including hydrophobic moment profile, we have developed an effective method for assessing the most probable number of domains a structure may have. The core of this method is a neural network, which is trained to discriminate correctly partitioned domains from incorrectly partitioned domains. When compared with the manual decomposition results given in the SCOP database, our new algorithm achieves higher decomposition accuracy (81.9%) on the same data set.

PubMed Disclaimer

Figures

**Figure 1**
Protein structure of 2bb2 (β B2-crystallin) and its schematic representation of a flow network. (A) The ball–stick representation of the protein structure. (B) Schematic representation of the flow network of 2bb2.

**Figure 2**
The zero- and second-order spherical moment profiles of the protein 2i1b and the second-order spherical moment profile of the decoy structure of 2i1b. The zero-order moment shown has been multiplied by 30.

**Figure 3**
The distributions of the values of parameters Cd, Id and Td in well-partitioned domains and overcut domains. (Top) Cd, compactness of a domain. (Middle) Id, interface size relative to the domain volume. (Bottom) Td, relative motion between domains of each partition.

**Figure 4**
Domain size and the number of segments of each domain in well-partitioned and overcut domains. (Top) Distributions of the number of segments. (Bottom) Distributions of domain sizes.

**Figure 5**
Neural network architecture for evaluation of decomposed individual domains. This network has nine input nodes, six hidden nodes and one output node. The nine input parameters are shown on the left.

**Figure 6**
The frequency of true positive assignments plotted against the output of the neural network.

**Figure 7**
CPU time of running new DomainParser as a function of the number of residues for 278 two-domain chains in FSSP.

**Figure 8**
Domain decompositions of 2adma and 1hzda by DomainParser. SCOP assigns both 2adma and 1hzda as single-domain proteins. The thick ribbons and thin strands show different domains. (A) 2adma (21–243/244–413). (B) 1hzda (74–279/280–339).

**Figure 9**
Single α-helix and simple structures are assigned as separate domains by SCOP. (A) 1aaya (103–131/132–159/160–187). (B) 6prch (1–36/37–258). (C) 1d0ab (150–501/334–349). (D) 1zmec (31–66/ 67–100). DomainParser defines them as single-domain proteins.

**Figure 10**
Domain assignments of 1plq, 1ig3a and 1b8sa by SCOP. All three are defined as two-domain proteins. Thick ribbons and thin strands show different domains assigned by SCOP. (A) 1plq (1–126/127–258). (B) 1ig3a (179–263/10–178). (C) 1b8sa (319–450/9–318;451–506).

See this image and copyright information in PMC

References

1. Wetlaufer D.B. (1973) Nucleation, rapid folding and globular intrachain regions in proteins. Proc. Natl Acad. Sci. USA, 70, 697–701. - PMC - PubMed
1. Richardson J.S. (1981) The anatomy and taxonomy of protein structure. Adv. Protein Chem., 34, 167–339. - PubMed
1. Murzin A.G., Brenner,S.E., Hubbard,T. and Chothia,C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol., 247, 536–540. - PubMed
1. Orengo C.A., Michie,A.D., Jones,S., Jones,D.T., Swindells,M.B. and Thornton,J.M. (1997) CATH—a hierarchic classification of protein domain structures. Structure, 5, 1093–1108. - PubMed
1. Jones D.T. (1999) GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. J. Mol. Biol., 287, 797–815. - PubMed

Publication types

Actions
Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Improving the performance of DomainParser for structural domain partition using neural network

Affiliation

Improving the performance of DomainParser for structural domain partition using neural network

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources