A method for automatically extracting infectious disease-related primers and probes from the literature
- PMID: 20682041
- PMCID: PMC2923139
- DOI: 10.1186/1471-2105-11-410
A method for automatically extracting infectious disease-related primers and probes from the literature
Abstract
Background: Primer and probe sequences are the main components of nucleic acid-based detection systems. Biologists use primers and probes for different tasks, some related to the diagnosis and prescription of infectious diseases. The biological literature is the main information source for empirically validated primer and probe sequences. Therefore, it is becoming increasingly important for researchers to navigate this important information. In this paper, we present a four-phase method for extracting and annotating primer/probe sequences from the literature. These phases are: (1) convert each document into a tree of paper sections, (2) detect the candidate sequences using a set of finite state machine-based recognizers, (3) refine problem sequences using a rule-based expert system, and (4) annotate the extracted sequences with their related organism/gene information.
Results: We tested our approach using a test set composed of 297 manuscripts. The extracted sequences and their organism/gene annotations were manually evaluated by a panel of molecular biologists. The results of the evaluation show that our approach is suitable for automatically extracting DNA sequences, achieving precision/recall rates of 97.98% and 95.77%, respectively. In addition, 76.66% of the detected sequences were correctly annotated with their organism name. The system also provided correct gene-related information for 46.18% of the sequences assigned a correct organism name.
Conclusions: We believe that the proposed method can facilitate routine tasks for biomedical researchers using molecular methods to diagnose and prescribe different infectious diseases. In addition, the proposed method can be expanded to detect and extract other biological sequences from the literature. The extracted information can also be used to readily update available primer/probe databases or to create new databases from scratch.
Figures




Similar articles
-
A knowledge engineering approach to recognizing and extracting sequences of nucleic acids from scientific literature.Annu Int Conf IEEE Eng Med Biol Soc. 2010;2010:1081-4. doi: 10.1109/IEMBS.2010.5627316. Annu Int Conf IEEE Eng Med Biol Soc. 2010. PMID: 21096556
-
Integrated minimum-set primers and unique probe design algorithms for differential detection on symptom-related pathogens.Bioinformatics. 2005 Dec 15;21(24):4330-7. doi: 10.1093/bioinformatics/bti730. Epub 2005 Oct 25. Bioinformatics. 2005. PMID: 16249263
-
SeqState: primer design and sequence statistics for phylogenetic DNA datasets.Appl Bioinformatics. 2005;4(1):65-9. doi: 10.2165/00822942-200504010-00008. Appl Bioinformatics. 2005. PMID: 16000015
-
PCR primers and probes for the 16S rRNA gene of most species of pathogenic bacteria, including bacteria found in cerebrospinal fluid.J Clin Microbiol. 1994 Feb;32(2):335-51. doi: 10.1128/jcm.32.2.335-351.1994. J Clin Microbiol. 1994. PMID: 7512093 Free PMC article.
-
[Novel computerized method for designing nucleotide sequence used for DNA probes and PCR primers].Nihon Rinsho. 1994 Feb;52(2):530-41. Nihon Rinsho. 1994. PMID: 8126913 Review. Japanese.
Cited by
-
MiPRIME: an integrated and intelligent platform for mining primer and probe sequences of microbial species.Bioinformatics. 2024 Jul 1;40(7):btae429. doi: 10.1093/bioinformatics/btae429. Bioinformatics. 2024. PMID: 38954836 Free PMC article.
-
Using nanoinformatics methods for automatically identifying relevant nanotoxicology entities from the literature.Biomed Res Int. 2013;2013:410294. doi: 10.1155/2013/410294. Epub 2012 Dec 27. Biomed Res Int. 2013. PMID: 23509721 Free PMC article.
-
e-MIR2: a public online inventory of medical informatics resources.BMC Med Inform Decis Mak. 2012 Aug 2;12:82. doi: 10.1186/1472-6947-12-82. BMC Med Inform Decis Mak. 2012. PMID: 22857741 Free PMC article.
-
Annotating genes and genomes with DNA sequences extracted from biomedical articles.Bioinformatics. 2011 Apr 1;27(7):980-6. doi: 10.1093/bioinformatics/btr043. Epub 2011 Feb 16. Bioinformatics. 2011. PMID: 21325301 Free PMC article.
References
-
- Ratcliff RM, Chang G, Kok T, Sloots TP. Molecular diagnosis of medical viruses. Curr Issues Mol Biol. 2007;9(2):87–102. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources