Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep 8;9(1):554.
doi: 10.1038/s41597-022-01674-y.

CatLC: Catalonia Multiresolution Land Cover Dataset

Affiliations

CatLC: Catalonia Multiresolution Land Cover Dataset

Carlos García et al. Sci Data. .

Abstract

The availability of large annotated image datasets represented one of the tipping points in the progress of object recognition in the realm of natural images, but other important visual spaces are still lacking this asset. In the case of remote sensing, only a few richly annotated datasets covering small areas are available. In this paper, we present the Catalonia Multiresolution Land Cover Dataset (CatLC), a remote sensing dataset corresponding to a mid-size geographical area which has been carefully annotated with a large variety of land cover classes. The dataset includes pre-processed images from the Cartographic and Geological Institute of Catalonia (ICGC) ( https://www.icgc.cat/en/Downloads ) and the European Space Agency (ESA) ( https://scihub.copernicus.eu ) catalogs, captured from both aircraft and satellites. Detailed topographic layers inferred from other sensors are also included. CatLC is a multiresolution, multimodal, multitemporal dataset, that can be readily used by the machine learning community to explore new classification techniques for land cover mapping in different scenarios such as area estimation in forest inventories, hydrologic studies involving microclimatic variables or geologic hazards identification and assessment. Moreover, remote sensing data present some specific characteristics that are not shared by natural images and that have been seldom explored. In this vein, CatLC dataset aims to engage with computer vision experts interested in remote sensing and also stimulate new research and development in the field of machine learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Continuous area using different layers of the dataset together with the ground truth labels.
Fig. 2
Fig. 2
Location of the area of interest, Catalonia (Spain).
Fig. 3
Fig. 3
CatLC dataset with 41 classes and legend.
Fig. 4
Fig. 4
Class distribution on CatLC dataset.
Fig. 5
Fig. 5
The distribution of the land covers within the mapped territory is heterogeneous. Some covers as herbaceous crops or dense coniferous forests are much more common than airport areas or water bodies. In Fig. 4, we can see the histogram for the complete dataset.
Fig. 6
Fig. 6
Orthophoto RGB samples.
Fig. 7
Fig. 7
Orthophoto (Infrared,R,G) samples.
Fig. 8
Fig. 8
Sentinel-1 (average image during 2018) samples.
Fig. 9
Fig. 9
Sentinel-2 RGB April (a–c) and August (d–f) 2018 samples.
Fig. 10
Fig. 10
Sentinel-2 process with atmospheric and topographic corrections. Original (up) and corrected image (down).
Fig. 11
Fig. 11
Digital Elevation Model (DEM) samples.
Fig. 12
Fig. 12
Digital Surface Model (DSM) samples.
Fig. 13
Fig. 13
Canopy Height Model (CHM) samples.
Fig. 14
Fig. 14
Distribution of the CatLC dataset in three sets: Blue for the train set, red for the validation set and brown for the test set.
Fig. 15
Fig. 15
Confusion matrix using different input data. All trained with U-Net neural network. The 41 classes have been compacted to the 4 superclasses (1: agriculture, 2: forest, 3: urban, 4: water).
Fig. 16
Fig. 16
Mean Intersection over Union using different input data for 4 superclasses.
Fig. 17
Fig. 17
Mean Intersection over Union using different input data for 41 classes.
Fig. 18
Fig. 18
Confusion matrix using the orthophoto (RGB-IR) as input data.
Fig. 19
Fig. 19
Confusion matrix using Sentinel-2 (April + August) as input data.
Fig. 20
Fig. 20
Confusion matrix using the complete CatLC dataset as input data.
Fig. 21
Fig. 21
Example of U-Net segmentation: RGB orthophoto, land-cover ground truth, U-Net prediction.

References

    1. Deng, J. et al. Imagenet: A large-scale hierarchical image database. IEEE conference on computer vision and pattern recognition 248–255 (2009).
    1. Everingham, M., Gool, L. V., Christopher, K. I., Williams, J. & Zisserman, A. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html (2012).
    1. García, C., Vitrià, J. & Mora, O. Uncertainty-based human-in-the-loop deep learning for land cover segmentation. Remote Sensing12, 10.3390/rs12223836 (2020).
    1. ICGC Orthophoto technical specifications, https://datacloud.ide.cat/especificacions/ortofoto-25cm-v4r0-esp-02ca-20... (2019).
    1. ESA. Sentinel application platform (snap). https://step.esa.int/main/toolboxes/snap/ (2020).