A Review on Multiscale-Deep-Learning Applications

doi:10.3390/s22197384

Review

. 2022 Sep 28;22(19):7384.

doi: 10.3390/s22197384.

A Review on Multiscale-Deep-Learning Applications

Elizar Elizar^{1

2}, Mohd Asyraf Zulkifley¹, Rusdha Muharar², Mohd Hairi Mohd Zaman¹, Seri Mastura Mustaza¹

Affiliations

¹ Department of Electrical, Electronic and Systems Engineering, Faculty of Engineering and Built Environment, Universiti Kebangsaan Malaysia, Bangi 43600, Selangor, Malaysia.
² Department of Electrical and Computer Engineering, Faculty of Engineering, Universitas Syiah Kuala, Kopelma Darussalam 23111, Indonesia.

PMID: 36236483
PMCID: PMC9573412
DOI: 10.3390/s22197384

Review

A Review on Multiscale-Deep-Learning Applications

Elizar Elizar et al. Sensors (Basel). 2022.

. 2022 Sep 28;22(19):7384.

doi: 10.3390/s22197384.

Authors

Elizar Elizar^{1

2}, Mohd Asyraf Zulkifley¹, Rusdha Muharar², Mohd Hairi Mohd Zaman¹, Seri Mastura Mustaza¹

Affiliations

¹ Department of Electrical, Electronic and Systems Engineering, Faculty of Engineering and Built Environment, Universiti Kebangsaan Malaysia, Bangi 43600, Selangor, Malaysia.
² Department of Electrical and Computer Engineering, Faculty of Engineering, Universitas Syiah Kuala, Kopelma Darussalam 23111, Indonesia.

PMID: 36236483
PMCID: PMC9573412
DOI: 10.3390/s22197384

Abstract

In general, most of the existing convolutional neural network (CNN)-based deep-learning models suffer from spatial-information loss and inadequate feature-representation issues. This is due to their inability to capture multiscale-context information and the exclusion of semantic information throughout the pooling operations. In the early layers of a CNN, the network encodes simple semantic representations, such as edges and corners, while, in the latter part of the CNN, the network encodes more complex semantic features, such as complex geometric shapes. Theoretically, it is better for a CNN to extract features from different levels of semantic representation because tasks such as classification and segmentation work better when both simple and complex feature maps are utilized. Hence, it is also crucial to embed multiscale capability throughout the network so that the various scales of the features can be optimally captured to represent the intended task. Multiscale representation enables the network to fuse low-level and high-level features from a restricted receptive field to enhance the deep-model performance. The main novelty of this review is the comprehensive novel taxonomy of multiscale-deep-learning methods, which includes details of several architectures and their strengths that have been implemented in the existing works. Predominantly, multiscale approaches in deep-learning networks can be classed into two categories: multiscale feature learning and multiscale feature fusion. Multiscale feature learning refers to the method of deriving feature maps by examining kernels over several sizes to collect a larger range of relevant features and predict the input images' spatial mapping. Multiscale feature fusion uses features with different resolutions to find patterns over short and long distances, without a deep network. Additionally, several examples of the techniques are also discussed according to their applications in satellite imagery, medical imaging, agriculture, and industrial and manufacturing systems.

Keywords: artificial intelligence; convolutional neural network; deep learning; machine learning; multiscale features; neural network.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
The primary taxonomy of multiscale-deep-learning architecture used in classification and segmentation tasks.

**Figure 2**
Multiscale receptive fields of deep-feature maps that are used to activate the visual semantics and their contexts. Multiscale representations help in better segmenting the objects by combining low-level and high-level representations.

**Figure 3**
Multiscale CNN, defined as a network with multiple distinct CNN networks with various contextual input sizes that run concurrently, whereby the outputs are combined at the end of the network to obtain rich multiscale semantic features.

**Figure 4**
The spatial-pyramid-pooling module extracts information from different scales that varies among different subregions. Using a four-level pyramid, the pooling kernels cover the whole, half, and small portions of the image. A more powerful representation could be fused with information from the different subregions within these receptive fields.

**Figure 5**
Multilevel spatial bin, with the example of bin-size-6 resultant feature maps segmented into 6 × 6 subsets.

**Figure 6**
In ASSP, the atrous convolution uses a parameter called the dilation rate that adjusts the field of view to allow a wider receptive field for better semantic-segmentation results. By increasing the dilation rate at each block, the spatial resolution can be preserved, and a deeper network can be built by capturing features at multiple scales.

**Figure 7**
In early fusion, all local attributes (shapes and colors) are retrieved from identical regions and locally concatenated before encoding. In late fusion, image representations are derived independently for each attribute and concatenated afterward.

**Figure 8**
Feature-pyramid-network (FPN) model that combines low- and high-resolution features via a top-down pathway to enrich semantic features at all levels.

See this image and copyright information in PMC

Cited by

Preliminary data on artificial intelligence tool in magnetic resonance imaging assessment of degenerative pathologies of lumbar spine.
Granata V, Fusco R, Coluccino S, Russo C, Grassi F, Tortora F, Conforti R, Caranci F. Granata V, et al. Radiol Med. 2024 Apr;129(4):623-630. doi: 10.1007/s11547-024-01791-1. Epub 2024 Feb 13. Radiol Med. 2024. PMID: 38349415
Automated Glaucoma Screening and Diagnosis Based on Retinal Fundus Images Using Deep Learning Approaches: A Comprehensive Review.
Zedan MJM, Zulkifley MA, Ibrahim AA, Moubark AM, Kamari NAM, Abdani SR. Zedan MJM, et al. Diagnostics (Basel). 2023 Jun 26;13(13):2180. doi: 10.3390/diagnostics13132180. Diagnostics (Basel). 2023. PMID: 37443574 Free PMC article. Review.
Differentially localized protein identification for breast cancer based on deep learning in immunohistochemical images.
Zhang Z, Fu L, Yun B, Wang X, Wang X, Wu Y, Lv J, Chen L, Li W. Zhang Z, et al. Commun Biol. 2024 Aug 2;7(1):935. doi: 10.1038/s42003-024-06548-0. Commun Biol. 2024. PMID: 39095659 Free PMC article.
Evolution of deep learning tooth segmentation from CT/CBCT images: a systematic review and meta-analysis.
Kot WY, Au Yeung SY, Leung YY, Leung PH, Yang WF. Kot WY, et al. BMC Oral Health. 2025 May 26;25(1):800. doi: 10.1186/s12903-025-05984-6. BMC Oral Health. 2025. PMID: 40420051 Free PMC article.
Empowering Vision Transformer by Network Hyper-Parameter Selection for Whole Pelvis Prostate Planning Target Volume Auto-Segmentation.
Cho H, Lee JS, Kim JS, Koom WS, Kim H. Cho H, et al. Cancers (Basel). 2023 Nov 21;15(23):5507. doi: 10.3390/cancers15235507. Cancers (Basel). 2023. PMID: 38067211 Free PMC article.

See all "Cited by" articles

References

1. Gao K., Niu S., Ji Z., Wu M., Chen Q., Xu R., Yuan S., Fan W., Chen Y., Dong J. Double-Branched and Area-Constraint Fully Convolutional Networks for Automated Serous Retinal Detachment Segmentation in SD-OCT Images. Comput. Methods Programs Biomed. 2019;176:69–80. doi: 10.1016/j.cmpb.2019.04.027. - DOI - PubMed
1. Teng L., Li H., Karim S. DMCNN: A Deep Multiscale Convolutional Neural Network Model for Medical Image Segmentation. J. Healthc. Eng. 2019;2019:8597606. doi: 10.1155/2019/8597606. - DOI - PMC - PubMed
1. Sermanet P., Lecun Y. Traffic Sign Recognition with Multi-Scale Convolutional Networks; Proceedings of the International Joint Conference on Neural Networks; San Jose, CA, USA. 31 July–5 August 2011; pp. 2809–2813. - DOI
1. Buyssens P., Elmoataz A., Lézoray O. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Volume 7725 LNCS. Springer; Berlin/Heidelberg, Germany: 2013. Multiscale Convolutional Neural Networks for Vision–Based Classification of Cells; pp. 342–352. - DOI
1. Zamri N.F.M., Tahir N.M., Ali M.S.A.M., Ashar N.D.K., Al-misreb A.A. Mini-Review of Street Crime Prediction and Classification Methods. J. Kejuruter. 2021;33:391.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

[1] Gao K., Niu S., Ji Z., Wu M., Chen Q., Xu R., Yuan S., Fan W., Chen Y., Dong J. Double-Branched and Area-Constraint Fully Convolutional Networks for Automated Serous Retinal Detachment Segmentation in SD-OCT Images. Comput. Methods Programs Biomed. 2019;176:69–80. doi: 10.1016/j.cmpb.2019.04.027. - DOI - PubMed

[2] Gao K., Niu S., Ji Z., Wu M., Chen Q., Xu R., Yuan S., Fan W., Chen Y., Dong J. Double-Branched and Area-Constraint Fully Convolutional Networks for Automated Serous Retinal Detachment Segmentation in SD-OCT Images. Comput. Methods Programs Biomed. 2019;176:69–80. doi: 10.1016/j.cmpb.2019.04.027. - DOI - PubMed

[3] Teng L., Li H., Karim S. DMCNN: A Deep Multiscale Convolutional Neural Network Model for Medical Image Segmentation. J. Healthc. Eng. 2019;2019:8597606. doi: 10.1155/2019/8597606. - DOI - PMC - PubMed

[4] Teng L., Li H., Karim S. DMCNN: A Deep Multiscale Convolutional Neural Network Model for Medical Image Segmentation. J. Healthc. Eng. 2019;2019:8597606. doi: 10.1155/2019/8597606. - DOI - PMC - PubMed

[5] Sermanet P., Lecun Y. Traffic Sign Recognition with Multi-Scale Convolutional Networks; Proceedings of the International Joint Conference on Neural Networks; San Jose, CA, USA. 31 July–5 August 2011; pp. 2809–2813. - DOI

[6] Sermanet P., Lecun Y. Traffic Sign Recognition with Multi-Scale Convolutional Networks; Proceedings of the International Joint Conference on Neural Networks; San Jose, CA, USA. 31 July–5 August 2011; pp. 2809–2813. - DOI

[7] Buyssens P., Elmoataz A., Lézoray O. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Volume 7725 LNCS. Springer; Berlin/Heidelberg, Germany: 2013. Multiscale Convolutional Neural Networks for Vision–Based Classification of Cells; pp. 342–352. - DOI

[8] Buyssens P., Elmoataz A., Lézoray O. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Volume 7725 LNCS. Springer; Berlin/Heidelberg, Germany: 2013. Multiscale Convolutional Neural Networks for Vision–Based Classification of Cells; pp. 342–352. - DOI

[9] Zamri N.F.M., Tahir N.M., Ali M.S.A.M., Ashar N.D.K., Al-misreb A.A. Mini-Review of Street Crime Prediction and Classification Methods. J. Kejuruter. 2021;33:391.

[10] Zamri N.F.M., Tahir N.M., Ali M.S.A.M., Ashar N.D.K., Al-misreb A.A. Mini-Review of Street Crime Prediction and Classification Methods. J. Kejuruter. 2021;33:391.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A Review on Multiscale-Deep-Learning Applications

Affiliations

A Review on Multiscale-Deep-Learning Applications

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources