Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 1997:5:25-32.

Automatic annotation for biological sequences by extraction of keywords from MEDLINE abstracts. Development of a prototype system

Affiliations
  • PMID: 9322011
Comparative Study

Automatic annotation for biological sequences by extraction of keywords from MEDLINE abstracts. Development of a prototype system

M A Andrade et al. Proc Int Conf Intell Syst Mol Biol. 1997.

Abstract

We have developed a prototype for the automatic annotation of functional characteristics in protein families. The system is able to extract biological information directly from scientific literature in the form of MEDLINE abstracts. The criterion for selecting relevant keywords is the difference between their frequency in the abstracts associated with the protein family under study and its frequency in other unrelated protein families. The concept of functional information associated to protein families is the key feature of our system and gathers evolutionary information into the problem of functional annotation of biological sequences. The system has been tested in two different scenarios: first, a large set of protein families with a small number of abstract per family and second, selected protein families with large number of abstracts attached to each one. In both cases the performances are compared with annotations provided by human experts showing a clear relation between the amount of information provided to the system and the quality of the annotations. The automatic annotations are in many cases of similar quality to the ones contained in current data bases. The possibilities and difficulties to be encountered during the development of a full system for automatic annotation are discussed.

PubMed Disclaimer

Publication types