Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Apr 11;22(8):2932.
doi: 10.3390/s22082932.

AGs-Unet: Building Extraction Model for High Resolution Remote Sensing Images Based on Attention Gates U Network

Affiliations

AGs-Unet: Building Extraction Model for High Resolution Remote Sensing Images Based on Attention Gates U Network

Mingyang Yu et al. Sensors (Basel). .

Abstract

Building contour extraction from high-resolution remote sensing images is a basic task for the reasonable planning of regional construction. Recently, building segmentation methods based on the U-Net network have become popular as they largely improve the segmentation accuracy by applying 'skip connection' to combine high-level and low-level feature information more effectively. Meanwhile, researchers have demonstrated that introducing an attention mechanism into U-Net can enhance local feature expression and improve the performance of building extraction in remote sensing images. In this paper, we intend to explore the effectiveness of the primeval attention gate module and propose the novel Attention Gate Module (AG) based on adjusting the position of 'Resampler' in an attention gate to Sigmoid function for a building extraction task, and a novel Attention Gates U network (AGs-Unet) is further proposed based on AG, which can automatically learn different forms of building structures in high-resolution remote sensing images and realize efficient extraction of building contour. AGs-Unet integrates attention gates with a single U-Net network, in which a series of attention gate modules are added into the 'skip connection' for suppressing the irrelevant and noisy feature responses in the input image to highlight the dominant features of the buildings in the image. AGs-Unet improves the feature selection of the attention map to enhance the ability of feature learning, as well as paying attention to the feature information of small-scale buildings. We conducted the experiments on the WHU building dataset and the INRIA Aerial Image Labeling dataset, in which the proposed AGs-Unet model is compared with several classic models (such as FCN8s, SegNet, U-Net, and DANet) and two state-of-the-art models (such as PISANet, and ARC-Net). The extraction accuracy of each model is evaluated by using three evaluation indexes, namely, overall accuracy, precision, and intersection over union. Experimental results show that the proposed AGs-Unet model can improve the quality of building extraction from high-resolution remote sensing images effectively in terms of prediction performance and result accuracy.

Keywords: AGs-Unet model; WHU dataset; building extraction; deep learning; high resolution remote sensing image.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Model framework of AGs-Unet, consisting of three parts: encoder part (blue, block 1–4), the converter (navy blue and pink, block 5 and ‘skip connection’), and decoder (blocks 6–9).
Figure 2
Figure 2
Structure of a AG module. The below shows the original AG module, and above is the part of the AG module changed position of the ‘Resampler’.
Figure 3
Figure 3
Image and label selected from WHU dataset: (a,b) show the image and label of the original images in the dataset, respectively; and (c,d) show the image and label after random rotation, respectively.
Figure 4
Figure 4
Image and label selected from the INRIA dataset: (a,c) show the image and (b,d) show the corresponding label in the dataset, white and black pixels mark building and non-building, respectively.
Figure 5
Figure 5
Variation in training accuracy and loss value of AGs-Unet in: (a) WHU dataset; and (b) INRIA dataset.
Figure 6
Figure 6
Experimental visualization results of each group. Each team selected two representative images to test the model trained by U-Net and AGs-Unet, where green represented buildings correctly extracted, blue represented buildings missing, red represented buildings incorrectly extracted, and black represented background.
Figure 6
Figure 6
Experimental visualization results of each group. Each team selected two representative images to test the model trained by U-Net and AGs-Unet, where green represented buildings correctly extracted, blue represented buildings missing, red represented buildings incorrectly extracted, and black represented background.
Figure 7
Figure 7
Comparison of IoU and Accuracy between U-Net and AGs-Unet models. The left figure (a) shows the comparison results of IoU between U-Net and AGs-Unet models while the right figure (b) shows the comparison results of Accuracy between U-Net and AGs-Unet models.
Figure 8
Figure 8
Comparison of extraction results of each model building in test dataset. The first two rows are aerial images and ground truth, respectively. Rows 3–8 are building extraction results of SegNet, FCN8s, DANet, U-Net, PISANet, ARC-Net, and our proposed AGs-Unet, respectively. The green and black pixels of the maps represent the predictions of true positive and true negative, respectively.
Figure 8
Figure 8
Comparison of extraction results of each model building in test dataset. The first two rows are aerial images and ground truth, respectively. Rows 3–8 are building extraction results of SegNet, FCN8s, DANet, U-Net, PISANet, ARC-Net, and our proposed AGs-Unet, respectively. The green and black pixels of the maps represent the predictions of true positive and true negative, respectively.
Figure 9
Figure 9
Statistics of structural parameters and calculations for AGs-Unet.
Figure 10
Figure 10
Comparison of aBlation experimental results. The first two columns are aerial images and ground truth, respectively. Columns 3–5 are building extraction results of U-Net, our proposedAGs-Unet-2 with 2 AG in U-Net, and AGs-Unet, respectively. The green, red, blue, and black pixels of the maps represent the predictions of true positive, false positive, false negative, and true negative, respectively.

References

    1. Lunetta R.S., Johnson D.M., Lyon J.G., Crotwell J. Impacts of imagery temporal frequency on land-cover change detection monitoring. Remote Sens. Environ. 2004;89:444–454. doi: 10.1016/j.rse.2003.10.022. - DOI
    1. Wu T., Hu Y., Peng L., Chen R. Improved Anchor-Free Instance Segmentation for Building Extraction from High-Resolution Remote Sensing Images. Remote Sens. 2020;12:2910. doi: 10.3390/rs12182910. - DOI
    1. Liu Y., So E., Li Z., Su G., Gross L., Li X., Qi W., Yang F., Fu B., Yalikun A., et al. Scenario-based seismic vulnerability and hazard analyses to help direct disaster risk reduction in rural Weinan, China. Int. J. Disaster Risk Reduct. 2020;48:101577. doi: 10.1016/j.ijdrr.2020.101577. - DOI
    1. Sun S., Mu L., Wang L., Liu P., Liu X., Zhang Y. Semantic Segmentation for Buildings of Large Intra-Class Variation in Remote Sensing Images with O-GAN. Remote Sens. 2021;13:475. doi: 10.3390/rs13030475. - DOI
    1. Camps-Valls G., Tuia D., Bruzzone L., Benediktsson J.A. Advances in Hyperspectral Image Classification: Earth Monitoring with Statistical Learning Methods. IEEE Signal Processing Mag. 2014;31:45–54. doi: 10.1109/MSP.2013.2279179. - DOI

MeSH terms

LinkOut - more resources