Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 22;10(4):e0125824.
doi: 10.1128/msystems.01258-24. Epub 2025 Mar 10.

deep-Sep: a deep learning-based method for fast and accurate prediction of selenoprotein genes in bacteria

Affiliations

deep-Sep: a deep learning-based method for fast and accurate prediction of selenoprotein genes in bacteria

Yao Xiao et al. mSystems. .

Abstract

Selenoproteins are a special group of proteins with major roles in cellular antioxidant defense. They contain the 21st amino acid selenocysteine (Sec) in the active sites, which is encoded by an in-frame UGA codon. Compared to eukaryotes, identification of selenoprotein genes in bacteria remains challenging due to the absence of an effective strategy for distinguishing the Sec-encoding UGA codon from a normal stop signal. In this study, we have developed a deep learning-based algorithm, deep-Sep, for quickly and precisely identifying selenoprotein genes in bacterial genomic sequences. This algorithm uses a Transformer-based neural network architecture to construct an optimal model for detecting Sec-encoding UGA codons and a homology search-based strategy to remove additional false positives. During the training and testing stages, deep-Sep has demonstrated commendable performance, including an F1 score of 0.939 and an area under the receiver operating characteristic curve of 0.987. Furthermore, when applied to 20 bacterial genomes as independent test data sets, deep-Sep exhibited remarkable capability in identifying both known and new selenoprotein genes, which significantly outperforms the existing state-of-the-art method. Our algorithm has proved to be a powerful tool for comprehensively characterizing selenoprotein genes in bacterial genomes, which should not only assist in accurate annotation of selenoprotein genes in genome sequencing projects but also provide new insights for a deeper understanding of the roles of selenium in bacteria.IMPORTANCESelenium is an essential micronutrient present in selenoproteins in the form of Sec, which is a rare amino acid encoded by the opal stop codon UGA. Identification of all selenoproteins is of vital importance for investigating the functions of selenium in nature. Previous strategies for predicting selenoprotein genes mainly relied on the identification of a special cis-acting Sec insertion sequence (SECIS) element within mRNAs. However, due to the complexity and variability of SECIS elements, recognition of all selenoprotein genes in bacteria is still a major challenge in the annotation of bacterial genomes. We have developed a deep learning-based algorithm to predict selenoprotein genes in bacterial genomic sequences, which demonstrates superior performance compared to currently available methods. This algorithm can be utilized in either web-based or local (standalone) modes, serving as a promising tool for identifying the complete set of selenoprotein genes in bacteria.

Keywords: UGA codon; bacteria; deep learning; selenium; selenoprotein.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig 1
Fig 1
A schematic overview of the deep-Sep algorithm. The procedure consists of two modules: BERT-based deep neural network module and homology search-based module. Details of the search process are introduced in the text.
Fig 2
Fig 2
Multiple sequence alignments of new selenoproteins and their Cys-containing homologs. The alignments show Sec-flanking regions in new selenoprotein sequences predicted in examined bacterial genomes and their Sec-/Cys-containing homologs in other organisms. Predicted Sec (U) and the corresponding Cys (C) residues are shown in red and blue backgrounds, respectively. Other residues shown in white on black or grey are conserved in these proteins. (A) TonB-dependent receptor (COG1629, CirA); (B) selenoprotein O-like (COG0397, SelO); (C) glutamine amidotransferase (COG0504, PyrG); (D) NAD(P)/FAD-dependent oxidoreductase (COG1252, Ndh); (E) DUF523 domain-containing protein (COG1683, YbbK); (F) formate dehydrogenase FDH3 subunit beta (COG0437, HybA); (G) hypothetical protein DG; (H) hypothetical protein DW; (I) hypothetical protein DF.
Fig 3
Fig 3
Alignment of SECIS elements present in new bacterial selenoprotein genes. The SECIS elements were predicted by using the bSECISearch program. Conserved nucleotides in the majority of known bacterial SECIS elements are highlighted in black.

References

    1. Mangiapane E, Pessione A, Pessione E. 2014. Selenium and selenoproteins: an overview on different biological systems. Curr Protein Pept Sci 15:598–607. doi:10.2174/1389203715666140608151134 - DOI - PubMed
    1. Roman M, Jitaru P, Barbante C. 2014. Selenium biochemistry and its role for human health. Metallomics 6:25–54. doi:10.1039/c3mt00185g - DOI - PubMed
    1. Avery JC, Hoffmann PR. 2018. Selenium, selenoproteins, and immunity. Nutrients 10:1203. doi:10.3390/nu10091203 - DOI - PMC - PubMed
    1. Hariharan S, Dharmaraj S. 2020. Selenium and selenoproteins: it’s role in regulation of inflammation. Inflammopharmacology 28:667–695. doi:10.1007/s10787-020-00690-x - DOI - PMC - PubMed
    1. Chaudière J. 2023. Biological and catalytic properties of selenoproteins. Int J Mol Sci 24:10109. doi:10.3390/ijms241210109 - DOI - PMC - PubMed

LinkOut - more resources