Review

. 2024 May 23;24(11):3335.

doi: 10.3390/s24113335.

Compatibility Review for Object Detection Enhancement through Super-Resolution

Daehee Kim^{1

2}, Sungmin Lee³, Junghyeon Seo², Song Noh⁴, Jaekoo Lee²

Affiliations

¹ NAVER Cloud Corp., Seongnam 13529, Republic of Korea.
² College of Computer Science, Kookmin University, Seoul 02707, Republic of Korea.
³ SK Telecom, Seoul 04539, Republic of Korea.
⁴ Department of Information and Telecommunication Engineering, Incheon National University, Incheon 22012, Republic of Korea.

PMID: 38894125
PMCID: PMC11174808
DOI: 10.3390/s24113335

Review

Compatibility Review for Object Detection Enhancement through Super-Resolution

Daehee Kim et al. Sensors (Basel). 2024.

. 2024 May 23;24(11):3335.

doi: 10.3390/s24113335.

Authors

Daehee Kim^{1

2}, Sungmin Lee³, Junghyeon Seo², Song Noh⁴, Jaekoo Lee²

Affiliations

¹ NAVER Cloud Corp., Seongnam 13529, Republic of Korea.
² College of Computer Science, Kookmin University, Seoul 02707, Republic of Korea.
³ SK Telecom, Seoul 04539, Republic of Korea.
⁴ Department of Information and Telecommunication Engineering, Incheon National University, Incheon 22012, Republic of Korea.

PMID: 38894125
PMCID: PMC11174808
DOI: 10.3390/s24113335

Abstract

With the introduction of deep learning, a significant amount of research has been conducted in the field of computer vision in the past decade. In particular, research on object detection (OD) continues to progress rapidly. However, despite these advances, some limitations need to be overcome to enable real-world applications of deep learning-based OD models. One such limitation is inaccurate OD when image quality is poor or a target object is small. The performance degradation phenomenon for small objects is similar to the fundamental limitations of an OD model, such as the constraint of the receptive field, which is a difficult problem to solve using only an OD model. Therefore, OD performance can be hindered by low image quality or small target objects. To address this issue, this study investigates the compatibility of super-resolution (SR) and OD techniques to improve detection, particularly for small objects. We analyze the combination of SR and OD models, classifying them based on architectural characteristics. The experimental results show a substantial improvement when integrating OD detectors with SR models. Overall, it was demonstrated that, when the evaluation metrics (PSNR, SSIM) of the SR models are high, the performance in OD is correspondingly high as well. Especially, evaluations on the MS COCO dataset reveal that the enhancement rate for small objects is 9.4% higher compared to all objects. This work provides an analysis of SR and OD model compatibility, demonstrating the potential benefits of their synergistic combination. The experimental code can be found on our GitHub repository.

Keywords: deep learning; face recognition; neural networks; object detection; super-resolution.

PubMed Disclaimer

Conflict of interest statement

Author Daehee Kim was employed by the company NAVER Cloud Corp. Author Sungmin Lee was employed by the company SK Telecom. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**Figure A1**
Examples of inference by applying DETR [94] to SR images. SR images are the result of applying each SR method to the LR COCO dataset [3] degraded by BI.

**Figure A2**
Examples of inference by applying DETR [94] to SR images. SR images are the result of applying each SR method to the LR COCO dataset [3] degraded by BD.

**Figure A3**
Examples of inference by applying DETR [94] to SR images. SR images are the result of applying each SR method to the LR COCO dataset [3] degraded by DN.

**Figure A4**
Examples of inference by applying YOLOv3 [88] to SR images. SR images are the result of applying each SR method to the LR COCO dataset [3] degraded by BI.

**Figure A5**
Examples of inference by applying YOLOv3 [88] to SR images. SR images are the result of applying each SR method to the LR COCO dataset [3] degraded by BD.

**Figure A6**
Examples of inference by applying YOLOv3 [88] to SR images. SR images are the result of applying each SR method to the LR COCO dataset [3] degraded by DN.

**Figure A7**
The result performing ×4 SR on the image of the ‘13_Interview_Interview_ On_Location_13_334.jpg’ image of the Wider Face validation set [13] through ESRGAN [54] trained on Widerface [13] and DIV2K [43], respectively.

**Figure A8**
Object detection performance for COCO val2017 [3] of YOLOv3 [88], EfficientDet [4], RetinaNet [84], Faster R-CNN [6], DETR [94], DINO [96], and Co-DETR [97] according to each SR model.

**Figure A9**
Object detection performance for the Wider Face validation set of YOLOv3 [88], EfficientDet [4], RetinaNet [84], Faster R-CNN [6], DETR [94], DINO [96], and Co-DETR [97] according to each SR model.

**Figure 1**
Hierarchically structured taxonomy of representative deep learning-based SR models. * indicates the model used in the experiment. The dotted line indicates a model also included in other architectural styles, and the black background is an architecture using a multi-scale receptive field.

**Figure 2**
Major architectural changes in the SR model over time. * indicates the model used in the experiment.

**Figure 3**
Shape compilation used for figures.

**Figure 4**
Sub-pixel convolution of ESPCN [34]. Each color represents the differences between feature map channels.

**Figure 5**
(i) EDSR structure [42]. (ii) Residual block in EDSR.

**Figure 6**
Representative models of recursive architecture. (a): (i) MemNet structure [47]. (ii) Memory block in MemNet. Recursive unit is the use of the same residual block multiple times. (b): (i) SRFBN structure [48]. (ii) Feedback block in SRFBN.

**Figure 7**
Representative models of densely connected architecture. (a): (i) RDN structure [24]. (ii) Residual dense block in RDN. (b): (i) DBPN structure [36]. (ii) Up-projection unit in DBPN.

**Figure 8**
(i) SRResNet structure, which is a generator of SRGAN [41]. (ii) Discriminator of SRGAN.

**Figure 9**
(i) Overall architecture of SRNTT [58]. (ii) The process of texture transfer using the feature map of a reference image $F_{n} (R e f ↓ ↑)$ and an LR image $F_{n} (L R ↑)$ .

**Figure 10**
LapSRN architecture [35]. The top blue arrows represent the feature extraction branch, and the bottom yellow arrows represent the image reconstruction branch.

**Figure 11**
Representative models of multi-path architecture. (a): (i) IDN structure [69]. (ii) Distillation block in IDN. (b): (i) MSRN structure [66]. (ii) Multi-scale residual block in MSRN. This shows a schematic of a block consisting of multi-scale receptive fields in parallel.

**Figure 12**
Representative models of attention architecture. (a): (i) Overall RCAN structure [71]. (ii) Residual group, including residual channel attention block (RCAB), in RCAN. (**iii**) RCAB. (b) Process of performing non-local module in NLRN [50].

**Figure 13**
(i) Overall HAT structure [74]. (ii) Residual hybrid attention group (RHAG). (**iii**) Hybrid Attention Block(HAB) in RHAG, CAB, and SAB mean channel-attention and self-attention. (iv) Overlapping cross-attention block (OCAB) in RHAG.

**Figure 14**
(i) Overall DAT structure [75]. (ii) Dual spatial transformer block (DSTB) and dual channel transformer block (DCTB). (**iii**) Spatial-gate feed-forward network (SGFN) in DSTB and DCTB.

**Figure 15**
Overall zero-shot SR structure [76]. The top blue arrows represent the process of training a CNN using input images, and the bottom blue arrow represents the SR process after training.

**Figure 16**
Overall structure of unpaired image super-resolution using pseudo-supervision [77]. The green arrows represent a process of learning with LR images generated from HR images, and the blue arrows represent a process of generating SR from a real image.

**Figure 17**
Tree of object detection models. OD models can be classified into two-stage and single-stage frameworks.

**Figure 18**
Relative OD enhancing index of each SR model for each degradation method (i.e., BI, BD, and DN). Given $\frac{Δ P_{SR; d}}{Δ P_{m a x (SR); d}}$ for $d = {BI, BD, DN}$ and $SR = {DRRN, MSRN, DBPN, RCAN, EDSR, RRDB, ESRGAN}$ , we compute the relative OD enhancing index. (Dataset: MS COCO 2017 validation set [3]).

**Figure 19**
Relative OD-enhancing index per PSNR index for each SR model. Note that ESRGAN [54] is a model that trained RRDB [54] backbone with adversarial learning. With the adversarial loss, a higher enhancement rate for OD can be obtained even if the PSNR indicator is low. (Dataset for OD: COCO 2017 validation set [3], Dataset for PSNR: Set5 [101] ×4).

See this image and copyright information in PMC

References

1. Deng J., Dong W., Socher R., Li L.J., Li K., Fei-Fei L. ImageNet: A Large-Scale Hierarchical Image Database; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Miami, FL, USA. 20–25 June 2009.
1. Everingham M., Van Gool L., Williams C.K., Winn J., Zisserman A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010;88:303–338. doi: 10.1007/s11263-009-0275-4. - DOI
1. Lin T.Y., Maire M., Belongie S., Hays J., Perona P., Ramanan D., Dollár P., Zitnick C.L. Computer Vision—ECCV 2014, Proceedings of the 13th European Conference, Zurich, Switzerland, 6–12 September 2014. Springer; Cham, Switzerland: 2014. Microsoft COCO: Common objects in context; pp. 740–755.
1. Tan M., Pang R., Le Q.V. Efficientdet: Scalable and efficient object detection; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Seattle, WA, USA. 13–19 June 2020; pp. 10781–10790.
1. Anwar A., Raychowdhury A. Masked Face Recognition for Secure Authentication. arXiv. 20202008.11104

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Compatibility Review for Object Detection Enhancement through Super-Resolution

Affiliations

Compatibility Review for Object Detection Enhancement through Super-Resolution

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials

Miscellaneous