CNN-MGP: Convolutional Neural Networks for Metagenomics Gene Prediction
- PMID: 30588558
- PMCID: PMC6841655
- DOI: 10.1007/s12539-018-0313-4
CNN-MGP: Convolutional Neural Networks for Metagenomics Gene Prediction
Abstract
Accurate gene prediction in metagenomics fragments is a computationally challenging task due to the short-read length, incomplete, and fragmented nature of the data. Most gene-prediction programs are based on extracting a large number of features and then applying statistical approaches or supervised classification approaches to predict genes. In our study, we introduce a convolutional neural network for metagenomics gene prediction (CNN-MGP) program that predicts genes in metagenomics fragments directly from raw DNA sequences, without the need for manual feature extraction and feature selection stages. CNN-MGP is able to learn the characteristics of coding and non-coding regions and distinguish coding and non-coding open reading frames (ORFs). We train 10 CNN models on 10 mutually exclusive datasets based on pre-defined GC content ranges. We extract ORFs from each fragment; then, the ORFs are encoded numerically and inputted into an appropriate CNN model based on the fragment-GC content. The output from the CNN is the probability that an ORF will encode a gene. Finally, a greedy algorithm is used to select the final gene list. Overall, CNN-MGP is effective and achieves a 91% accuracy on testing dataset. CNN-MGP shows the ability of deep learning to predict genes in metagenomics fragments, and it achieves an accuracy higher than or comparable to state-of-the-art gene-prediction programs that use pre-defined features.
Keywords: Convolutional neural network; Deep learning; Gene prediction; Metagenomics; ORF.
Figures


Similar articles
-
MGC: a metagenomic gene caller.BMC Bioinformatics. 2013;14 Suppl 9(Suppl 9):S6. doi: 10.1186/1471-2105-14-S9-S6. Epub 2013 Jun 28. BMC Bioinformatics. 2013. PMID: 23901840 Free PMC article.
-
deepNEC: a novel alignment-free tool for the identification and classification of nitrogen biochemical network-related enzymes using deep learning.Brief Bioinform. 2022 May 13;23(3):bbac071. doi: 10.1093/bib/bbac071. Brief Bioinform. 2022. PMID: 35325031
-
Feature selection for gene prediction in metagenomic fragments.BioData Min. 2018 Jun 7;11:9. doi: 10.1186/s13040-018-0170-z. eCollection 2018. BioData Min. 2018. PMID: 30026811 Free PMC article.
-
A survey on protein-DNA-binding sites in computational biology.Brief Funct Genomics. 2022 Sep 16;21(5):357-375. doi: 10.1093/bfgp/elac009. Brief Funct Genomics. 2022. PMID: 35652477 Review.
-
Predicting Host Phenotype Based on Gut Microbiome Using a Convolutional Neural Network Approach.Methods Mol Biol. 2021;2190:249-266. doi: 10.1007/978-1-0716-0826-5_12. Methods Mol Biol. 2021. PMID: 32804370 Review.
Cited by
-
Machine learning applications in RNA modification sites prediction.Comput Struct Biotechnol J. 2021 Sep 29;19:5510-5524. doi: 10.1016/j.csbj.2021.09.025. eCollection 2021. Comput Struct Biotechnol J. 2021. PMID: 34712397 Free PMC article. Review.
-
Analysis of metagenomic data.Nat Rev Methods Primers. 2025;5:5. doi: 10.1038/s43586-024-00376-6. Epub 2025 Jan 23. Nat Rev Methods Primers. 2025. PMID: 40688383 Free PMC article.
-
A toolbox of machine learning software to support microbiome analysis.Front Microbiol. 2023 Nov 22;14:1250806. doi: 10.3389/fmicb.2023.1250806. eCollection 2023. Front Microbiol. 2023. PMID: 38075858 Free PMC article. Review.
-
Application and Comparison of Supervised Learning Strategies to Classify Polarity of Epithelial Cell Spheroids in 3D Culture.Front Genet. 2020 Mar 27;11:248. doi: 10.3389/fgene.2020.00248. eCollection 2020. Front Genet. 2020. PMID: 32292417 Free PMC article.
-
Genomic language models (gLMs) decode bacterial genomes for improved gene prediction and translation initiation site identification.Brief Bioinform. 2025 Jul 2;26(4):bbaf311. doi: 10.1093/bib/bbaf311. Brief Bioinform. 2025. PMID: 40605274 Free PMC article.
References
-
- Di Bella JM, Bao Y, Gloor GB, Burton JP, Reid G. High throughput sequencing methods and analysis for microbiome research. J Microbiol Methods. 2013;95(3):401–414. - PubMed
-
- Bashir Y, Pradeep Singh S, Kumar Konwar B. Metagenomics: an application based perspective. Chin J Biol. 2014;2014:146030.
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Miscellaneous