Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 May 23;24(11):3335.
doi: 10.3390/s24113335.

Compatibility Review for Object Detection Enhancement through Super-Resolution

Affiliations
Review

Compatibility Review for Object Detection Enhancement through Super-Resolution

Daehee Kim et al. Sensors (Basel). .

Abstract

With the introduction of deep learning, a significant amount of research has been conducted in the field of computer vision in the past decade. In particular, research on object detection (OD) continues to progress rapidly. However, despite these advances, some limitations need to be overcome to enable real-world applications of deep learning-based OD models. One such limitation is inaccurate OD when image quality is poor or a target object is small. The performance degradation phenomenon for small objects is similar to the fundamental limitations of an OD model, such as the constraint of the receptive field, which is a difficult problem to solve using only an OD model. Therefore, OD performance can be hindered by low image quality or small target objects. To address this issue, this study investigates the compatibility of super-resolution (SR) and OD techniques to improve detection, particularly for small objects. We analyze the combination of SR and OD models, classifying them based on architectural characteristics. The experimental results show a substantial improvement when integrating OD detectors with SR models. Overall, it was demonstrated that, when the evaluation metrics (PSNR, SSIM) of the SR models are high, the performance in OD is correspondingly high as well. Especially, evaluations on the MS COCO dataset reveal that the enhancement rate for small objects is 9.4% higher compared to all objects. This work provides an analysis of SR and OD model compatibility, demonstrating the potential benefits of their synergistic combination. The experimental code can be found on our GitHub repository.

Keywords: deep learning; face recognition; neural networks; object detection; super-resolution.

PubMed Disclaimer

Conflict of interest statement

Author Daehee Kim was employed by the company NAVER Cloud Corp. Author Sungmin Lee was employed by the company SK Telecom. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure A1
Figure A1
Examples of inference by applying DETR [94] to SR images. SR images are the result of applying each SR method to the LR COCO dataset [3] degraded by BI.
Figure A2
Figure A2
Examples of inference by applying DETR [94] to SR images. SR images are the result of applying each SR method to the LR COCO dataset [3] degraded by BD.
Figure A3
Figure A3
Examples of inference by applying DETR [94] to SR images. SR images are the result of applying each SR method to the LR COCO dataset [3] degraded by DN.
Figure A4
Figure A4
Examples of inference by applying YOLOv3 [88] to SR images. SR images are the result of applying each SR method to the LR COCO dataset [3] degraded by BI.
Figure A5
Figure A5
Examples of inference by applying YOLOv3 [88] to SR images. SR images are the result of applying each SR method to the LR COCO dataset [3] degraded by BD.
Figure A6
Figure A6
Examples of inference by applying YOLOv3 [88] to SR images. SR images are the result of applying each SR method to the LR COCO dataset [3] degraded by DN.
Figure A7
Figure A7
The result performing ×4 SR on the image of the ‘13_Interview_Interview_ On_Location_13_334.jpg’ image of the Wider Face validation set [13] through ESRGAN [54] trained on Widerface [13] and DIV2K [43], respectively.
Figure A8
Figure A8
Object detection performance for COCO val2017 [3] of YOLOv3 [88], EfficientDet [4], RetinaNet [84], Faster R-CNN [6], DETR [94], DINO [96], and Co-DETR [97] according to each SR model.
Figure A9
Figure A9
Object detection performance for the Wider Face validation set of YOLOv3 [88], EfficientDet [4], RetinaNet [84], Faster R-CNN [6], DETR [94], DINO [96], and Co-DETR [97] according to each SR model.
Figure 1
Figure 1
Hierarchically structured taxonomy of representative deep learning-based SR models. * indicates the model used in the experiment. The dotted line indicates a model also included in other architectural styles, and the black background is an architecture using a multi-scale receptive field.
Figure 2
Figure 2
Major architectural changes in the SR model over time. * indicates the model used in the experiment.
Figure 3
Figure 3
Shape compilation used for figures.
Figure 4
Figure 4
Sub-pixel convolution of ESPCN [34]. Each color represents the differences between feature map channels.
Figure 5
Figure 5
(i) EDSR structure [42]. (ii) Residual block in EDSR.
Figure 6
Figure 6
Representative models of recursive architecture. (a): (i) MemNet structure [47]. (ii) Memory block in MemNet. Recursive unit is the use of the same residual block multiple times. (b): (i) SRFBN structure [48]. (ii) Feedback block in SRFBN.
Figure 7
Figure 7
Representative models of densely connected architecture. (a): (i) RDN structure [24]. (ii) Residual dense block in RDN. (b): (i) DBPN structure [36]. (ii) Up-projection unit in DBPN.
Figure 8
Figure 8
(i) SRResNet structure, which is a generator of SRGAN [41]. (ii) Discriminator of SRGAN.
Figure 9
Figure 9
(i) Overall architecture of SRNTT [58]. (ii) The process of texture transfer using the feature map of a reference image Fn(Ref) and an LR image Fn(LR).
Figure 10
Figure 10
LapSRN architecture [35]. The top blue arrows represent the feature extraction branch, and the bottom yellow arrows represent the image reconstruction branch.
Figure 11
Figure 11
Representative models of multi-path architecture. (a): (i) IDN structure [69]. (ii) Distillation block in IDN. (b): (i) MSRN structure [66]. (ii) Multi-scale residual block in MSRN. This shows a schematic of a block consisting of multi-scale receptive fields in parallel.
Figure 12
Figure 12
Representative models of attention architecture. (a): (i) Overall RCAN structure [71]. (ii) Residual group, including residual channel attention block (RCAB), in RCAN. (iii) RCAB. (b) Process of performing non-local module in NLRN [50].
Figure 13
Figure 13
(i) Overall HAT structure [74]. (ii) Residual hybrid attention group (RHAG). (iii) Hybrid Attention Block(HAB) in RHAG, CAB, and SAB mean channel-attention and self-attention. (iv) Overlapping cross-attention block (OCAB) in RHAG.
Figure 14
Figure 14
(i) Overall DAT structure [75]. (ii) Dual spatial transformer block (DSTB) and dual channel transformer block (DCTB). (iii) Spatial-gate feed-forward network (SGFN) in DSTB and DCTB.
Figure 15
Figure 15
Overall zero-shot SR structure [76]. The top blue arrows represent the process of training a CNN using input images, and the bottom blue arrow represents the SR process after training.
Figure 16
Figure 16
Overall structure of unpaired image super-resolution using pseudo-supervision [77]. The green arrows represent a process of learning with LR images generated from HR images, and the blue arrows represent a process of generating SR from a real image.
Figure 17
Figure 17
Tree of object detection models. OD models can be classified into two-stage and single-stage frameworks.
Figure 18
Figure 18
Relative OD enhancing index of each SR model for each degradation method (i.e., BI, BD, and DN). Given ΔPSR;dΔPmax(SR);d for d={BI,BD,DN} and SR={DRRN,MSRN,DBPN,RCAN,EDSR,RRDB,ESRGAN}, we compute the relative OD enhancing index. (Dataset: MS COCO 2017 validation set [3]).
Figure 19
Figure 19
Relative OD-enhancing index per PSNR index for each SR model. Note that ESRGAN [54] is a model that trained RRDB [54] backbone with adversarial learning. With the adversarial loss, a higher enhancement rate for OD can be obtained even if the PSNR indicator is low. (Dataset for OD: COCO 2017 validation set [3], Dataset for PSNR: Set5 [101] ×4).

Similar articles

References

    1. Deng J., Dong W., Socher R., Li L.J., Li K., Fei-Fei L. ImageNet: A Large-Scale Hierarchical Image Database; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Miami, FL, USA. 20–25 June 2009.
    1. Everingham M., Van Gool L., Williams C.K., Winn J., Zisserman A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010;88:303–338. doi: 10.1007/s11263-009-0275-4. - DOI
    1. Lin T.Y., Maire M., Belongie S., Hays J., Perona P., Ramanan D., Dollár P., Zitnick C.L. Computer Vision—ECCV 2014, Proceedings of the 13th European Conference, Zurich, Switzerland, 6–12 September 2014. Springer; Cham, Switzerland: 2014. Microsoft COCO: Common objects in context; pp. 740–755.
    1. Tan M., Pang R., Le Q.V. Efficientdet: Scalable and efficient object detection; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Seattle, WA, USA. 13–19 June 2020; pp. 10781–10790.
    1. Anwar A., Raychowdhury A. Masked Face Recognition for Secure Authentication. arXiv. 20202008.11104