Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Aug 27;10 Suppl 8(Suppl 8):S2.
doi: 10.1186/1471-2105-10-S8-S2.

EnzyMiner: automatic identification of protein level mutations and their impact on target enzymes from PubMed abstracts

Affiliations

EnzyMiner: automatic identification of protein level mutations and their impact on target enzymes from PubMed abstracts

Süveyda Yeniterzi et al. BMC Bioinformatics. .

Abstract

Background: A better understanding of the mechanisms of an enzyme's functionality and stability, as well as knowledge and impact of mutations is crucial for researchers working with enzymes. Though, several of the enzymes' databases are currently available, scientific literature still remains at large for up-to-date source of learning the effects of a mutation on an enzyme. However, going through vast amounts of scientific documents to extract the information on desired mutation has always been a time consuming process. In this paper, therefore, we describe an unique method, termed as EnzyMiner, which automatically identifies the PubMed abstracts that contain information on the impact of a protein level mutation on the stability and/or the activity of a given enzyme.

Results: We present an automated system which identifies the abstracts that contain an amino-acid-level mutation and then classifies them according to the mutation's effect on the enzyme. In the case of mutation identification, MuGeX, an automated mutation-gene extraction system has an accuracy of 93.1% with a 91.5 F-measure. For impact analysis, document classification is performed to identify the abstracts that contain a change in enzyme's stability or activity resulting from the mutation. The system was trained on lipases and tested on amylases with an accuracy of 85%.

Conclusion: EnzyMiner identifies the abstracts that contain a protein mutation for a given enzyme and checks whether the abstract is related to a disease with the help of information extraction and machine learning techniques. For disease related abstracts, the mutation list and direct links to the abstracts are retrieved from the system and displayed on the Web. For those abstracts that are related to non-diseases, in addition to having the mutation list, the abstracts are also categorized into two groups. These two groups determine whether the mutation has an effect on the enzyme's stability or functionality followed by displaying these on the web.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A schematic illustration of the EnzyMiner system. DB: database.
Figure 2
Figure 2
Abstracts that contain a change in the catalytic activity of amylase resulted from a protein mutation.
Figure 3
Figure 3
Abstracts that contain a change in the stability of amylase resulted from a protein mutation.
Figure 4
Figure 4
Performances of four classification algorithms at Stability vs. Catalytic Classification Module. Model 1: White space tokenizer, no stemming, unigram. Model 2: Alphanumeric tokenizer, no stemming, unigram. Model 3: White space tokenizer, stemming, unigram. Model 4: White space tokenizer, no stemming, bigram. Model 5: Alphanumeric tokenizer, stemming, unigram. Model 6: Alphanumeric tokenizer, no stemming, bigram. Model 7: White space tokenizer, stemming, bigram. Model 8: Alphanumeric tokenizer, stemming, bigram.
Figure 5
Figure 5
Performance of Probabilistic Indexing with different processing options. Model 1: White space tokenizer, no stemming, unigram. Model 2: Alphanumeric tokenizer, no stemming, unigram. Model 3: White space tokenizer, stemming, unigram. Model 4: White space tokenizer, no stemming, bigram. Model 5: Alphanumeric tokenizer, stemming, unigram. Model 6: Alphanumeric tokenizer, no stemming, bigram. Model 7: White space tokenizer, stemming, bigram. Model 8: Alphanumeric tokenizer, stemming, bigram.

References

    1. Renugopalakrishnan V, Garduno-Juarez R, Narasimhan G, Verma C, Wei X, Li P. Rational design of thermally stable proteins: relevance to bionanotechnology. J Nanosci Nanotechnol. 2005;5:1759–1767. - PubMed
    1. Hult K, Berglund P. Engineered enzymes for improved organic synthesis. Curr Opin Biotechnol. 2002;14:395–400. - PubMed
    1. Bairoch A. The ENZYME database in 2000. Nucleic Acids Research. 2000;28:304–305. - PMC - PubMed
    1. Fleischmann A, Darsow M, Degtyarenko K, Fleischmann W, Boyce S, Axelsen KB, Bairoch A, Schomburg D, Tipton KF, Apweiler R. IntEnz, the integrated relational enzyme database. Nucleic Acids Res. 2004;32:D434–D437. - PMC - PubMed
    1. Laskowski R, Wallace A. Enzyme Structures Database http://www.ebi.ac.uk/thornton-srv/databases/enzymes/

LinkOut - more resources