Construction of customized sub-databases from NCBI-nr database for rapid annotation of huge metagenomic datasets using a combined BLAST and MEGAN approach
- PMID: 23573212
- PMCID: PMC3613424
- DOI: 10.1371/journal.pone.0059831
Construction of customized sub-databases from NCBI-nr database for rapid annotation of huge metagenomic datasets using a combined BLAST and MEGAN approach
Abstract
We developed a fast method to construct local sub-databases from the NCBI-nr database for the quick similarity search and annotation of huge metagenomic datasets based on BLAST-MEGAN approach. A three-step sub-database annotation pipeline (SAP) was further proposed to conduct the annotation in a much more time-efficient way which required far less computational capacity than the direct NCBI-nr database BLAST-MEGAN approach. The 1(st) BLAST of SAP was conducted using the original metagenomic dataset against the constructed sub-database for a quick screening of candidate target sequences. Then, the candidate target sequences identified in the 1(st) BLAST were subjected to the 2(nd) BLAST against the whole NCBI-nr database. The BLAST results were finally annotated using MEGAN to filter out those mistakenly selected sequences in the 1(st) BLAST to guarantee the accuracy of the results. Based on the tests conducted in this study, SAP achieved a speedup of ~150-385 times at the BLAST e-value of 1e-5, compared to the direct BLAST against NCBI-nr database. The annotation results of SAP are exactly in agreement with those of the direct NCBI-nr database BLAST-MEGAN approach, which is very time-consuming and computationally intensive. Selecting rigorous thresholds (e.g. e-value of 1e-10) would further accelerate SAP process. The SAP pipeline may also be coupled with novel similarity search tools (e.g. RAPsearch) other than BLAST to achieve even faster annotation of huge metagenomic datasets. Above all, this sub-database construction method and SAP pipeline provides a new time-efficient and convenient annotation similarity search strategy for laboratories without access to high performance computing facilities. SAP also offers a solution to high performance computing facilities for the processing of more similarity search tasks.
Conflict of interest statement
Figures



Similar articles
-
Using AnnoTree to Get More Assignments, Faster, in DIAMOND+MEGAN Microbiome Analysis.mSystems. 2022 Feb 22;7(1):e0140821. doi: 10.1128/msystems.01408-21. Epub 2022 Feb 22. mSystems. 2022. PMID: 35191776 Free PMC article.
-
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].Yi Chuan Xue Bao. 2004 May;31(5):431-43. Yi Chuan Xue Bao. 2004. PMID: 15478601 Chinese.
-
GHOSTX: A Fast Sequence Homology Search Tool for Functional Annotation of Metagenomic Data.Methods Mol Biol. 2017;1611:15-25. doi: 10.1007/978-1-4939-7015-5_2. Methods Mol Biol. 2017. PMID: 28451968
-
Metagenomic search strategies for interactions among plants and multiple microbes.Front Plant Sci. 2014 Jun 11;5:268. doi: 10.3389/fpls.2014.00268. eCollection 2014. Front Plant Sci. 2014. PMID: 24966863 Free PMC article. Review.
-
New Tools in Orthology Analysis: A Brief Review of Promising Perspectives.Front Genet. 2017 Oct 31;8:165. doi: 10.3389/fgene.2017.00165. eCollection 2017. Front Genet. 2017. PMID: 29163633 Free PMC article. Review.
Cited by
-
Genome survey sequencing for the characterization of genetic background of Dracaena cambodiana and its defense response during dragon's blood formation.PLoS One. 2018 Dec 14;13(12):e0209258. doi: 10.1371/journal.pone.0209258. eCollection 2018. PLoS One. 2018. PMID: 30550595 Free PMC article.
-
Comparative transcriptome analysis of Haematococcus pluvialis on astaxanthin biosynthesis in response to irradiation with red or blue LED wavelength.World J Microbiol Biotechnol. 2018 Jun 18;34(7):96. doi: 10.1007/s11274-018-2459-y. World J Microbiol Biotechnol. 2018. PMID: 29916185
-
Chromosome-level genome assembly of the tetraploid medicinal and natural dye plant Persicaria tinctoria.Sci Data. 2024 Dec 27;11(1):1440. doi: 10.1038/s41597-024-04317-6. Sci Data. 2024. PMID: 39730378 Free PMC article.
-
Loss of Pathogenicity and Evidence of Horizontal Gene Transfer in Colletotrichum gloeosporioides From a Medicinal Plant.Mol Plant Pathol. 2025 Jun;26(6):e70098. doi: 10.1111/mpp.70098. Mol Plant Pathol. 2025. PMID: 40451789 Free PMC article.
-
Evaluation of a hybrid approach using UBLAST and BLASTX for metagenomic sequences annotation of specific functional genes.PLoS One. 2014 Oct 27;9(10):e110947. doi: 10.1371/journal.pone.0110947. eCollection 2014. PLoS One. 2014. PMID: 25347677 Free PMC article.
References
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials
Miscellaneous