Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 27;25(1):1252.
doi: 10.1186/s12864-024-11173-6.

EDCLoc: a prediction model for mRNA subcellular localization using improved focal loss to address multi-label class imbalance

Affiliations

EDCLoc: a prediction model for mRNA subcellular localization using improved focal loss to address multi-label class imbalance

Yu Deng et al. BMC Genomics. .

Abstract

Background: The subcellular localization of mRNA plays a crucial role in gene expression regulation and various cellular processes. However, existing wet lab techniques like RNA-FISH are usually time-consuming, labor-intensive, and limited to specific tissue types. Researchers have developed several computational methods to predict mRNA subcellular localization to address this. These methods face the problem of class imbalance in multi-label classification, causing models to favor majority classes and overlook minority classes during training. Additionally, traditional feature extraction methods have high computational costs, incomplete features, and may lead to the loss of critical information. On the other hand, deep learning methods face challenges related to hardware performance and training time when handling complex sequences. They may suffer from the curse of dimensionality and overfitting problems. Therefore, there is an urgent need for more efficient and accurate prediction models.

Results: To address these issues, we propose a multi-label classifier, EDCLoc, for predicting mRNA subcellular localization. EDCLoc reduces training pressure through a stepwise pooling strategy and applies grouped convolution blocks of varying sizes at different levels, combined with residual connections, to achieve efficient feature extraction and gradient propagation. The model employs global max pooling at the end to further reduce feature dimensions and highlight key features. To tackle class imbalance, we improved the focal loss function to enhance the model's focus on minority classes. Evaluation results show that EDCLoc outperforms existing methods in most subcellular regions. Additionally, the position weight matrix extracted by multi-scale CNN filters can match known RNA-binding protein motifs, demonstrating EDCLoc's effectiveness in capturing key sequence features.

Conclusions: EDCLoc outperforms existing prediction tools in most subcellular regions and effectively mitigates class imbalance issues in multi-label classification. These advantages make EDCLoc a reliable choice for multi-label mRNA subcellular localization. The dataset and source code used in this study are available at https://github.com/DellCode233/EDCLoc .

Keywords: Class imbalance; Deep learning; Focal loss; MRNA subcellular localization; Multi-label.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: Not applicable. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
The detailed distribution of mRNA sequences across six subcellular compartments
Fig. 2
Fig. 2
Sequence length distribution for the entire dataset and each subcellular compartment
Fig. 3
Fig. 3
The overview of EDCLoc. A Vectorization: mRNA sequences of arbitrary lengths are truncated or padded to a uniform length T (T = 8000) and then converted into a 5×T matrix through one-hot encoding. B Model Training: The encoded feature matrix is input into the network model, and the loss is calculated using the improved focal loss. This loss is then used to update the model weights during training. C Prediction: During the prediction phase, the model loads the weights that performed best for each subcellular compartment on the validation set to predict the probabilities for the corresponding compartments
Fig. 4
Fig. 4
The model architecture of EDCLoc. A The encoded feature matrix. B The multi-scale layer: Convolutional filters with varying kernel sizes extract features, which are then concatenated and projected to a fixed dimension. C The model body: Stacked grouped convolutional blocks with max-pooling reduce feature dimensions, with residual connections and global max-pooling refining the final feature vector. D The fully connected classifier: The final 64-dimensional vector is used for multi-label classification, predicting probabilities for six subcellular localizations
Fig. 5
Fig. 5
Heatmap illustrating the effect of different numbers of body blocks and max-pooling strides on classification performance. Blue shades represent MCC values, and red shades represent AUC values. The horizontal axis indicates the pooling stride, while the vertical axis represents the number of body blocks
Fig. 6
Fig. 6
Visualization of representative CNN motifs mapped to known RBPs. Known RBPs (upper row) are from [38]. The CNN patterns (lower row) are generated by different CNN filters, with the filter index labeled below each pattern. A RNCMPT00172 (IGF2BP3) matched with CNN filter 12. B RNCMPT00173 (RBMS3) matched with CNN filter 16. C RNCMPT00157 (PABPN1) matched with CNN filter 17

Similar articles

References

    1. Medioni C, Mowry K, Besse F. Principles and roles of mRNA localization in animal development. Development. 2012;139:3263–76. - PMC - PubMed
    1. Buccitelli C, Selbach M. mRNAs, proteins and the emerging principles of gene expression control. Nat Rev Genet. 2020;21:630–44. - PubMed
    1. Long RM, Singer RH, Meng X, Gonzalez I, Nasmyth K, Jansen R-P. Mating type switching in yeast controlled by asymmetric localization of ASH1 mRNA. Science. 1997;277:383–7. - PubMed
    1. Gonsalvez GB, Urbinati CR, Long RM. RNA localization in yeast: moving towards a mechanism. Biol Cell. 2005;97:75–86. - PubMed
    1. Kugler J-M, Lasko P. Localization, anchoring and translational control of oskar, gurken, bicoid and nanos mRNA during Drosophila oogenesis. Fly. 2009;3:15–28. - PubMed

LinkOut - more resources