Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug 11;15(1):29305.
doi: 10.1038/s41598-025-15268-2.

Ethical considerations and robustness of artificial neural networks in medical image analysis under data corruption

Affiliations

Ethical considerations and robustness of artificial neural networks in medical image analysis under data corruption

Michael Okunev et al. Sci Rep. .

Abstract

Medicine is one of the most sensitive fields in which artificial intelligence (AI) is extensively used, spanning from medical image analysis to clinical support. Specifically, in medicine, where every decision may severely affect human lives, the issue of ensuring that AI systems operate ethically and produce results that align with ethical considerations is of great importance. In this work, we investigate the combination of several key parameters on the performance of artificial neural networks (ANNs) used for medical image analysis in the presence of data corruption or errors. For this purpose, we examined five different ANN architectures (AlexNet, LeNet 5, VGG16, ResNet-50, and Vision Transformers - ViT), and for each architecture, we checked its performance under varying combinations of training dataset sizes and percentages of images that are corrupted through mislabeling. The image mislabeling simulates deliberate or nondeliberate changes to the dataset, which may cause the AI system to produce unreliable results. We found that the five ANN architectures produce different results for the same task, both for cases with and without dataset modification, which implies that the selection of which ANN architecture to implement may have ethical aspects that need to be considered. We also found that label corruption resulted in a mixture of performance metrics tendencies, indicating that it is difficult to conclude whether label corruption has occurred. Our findings demonstrate the relation between ethics in AI and ANN architecture implementation and AI computational parameters used therefor, and raise awareness of the need to find appropriate ways to determine whether label corruption has occurred.

Keywords: Artificial neural network; Chest X-ray images; Dataset size; Ethics; Label corruption.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
(a) Dataset balance for top four single-label diagnoses, (b) Visual examples of top four thorax diagnoses after rebalancing, (c) Diagnosis distribution in the rebalanced dataset. The x-axis variable is the relative number of diagnoses. Our main purpose is to show that we have an equal number of images per patient sex and diagnosis, with roughly the same distribution across age. The dotted lines represent the mean and SD.
Fig. 2
Fig. 2
ResNet-50 precision heatmap based on dataset size and label corruption, analyzing male patients only, female patients only, and combined data per diagnosis.
Fig. 3
Fig. 3
AlexNet precision heatmap based on dataset size and label corruption, analyzing male patients only, female patients only, and combined data per diagnosis.
Fig. 4
Fig. 4
VGG-16 precision heatmap based on dataset size and label corruption, analyzing male patients only, female patients only, and combined data per diagnosis.
Fig. 5
Fig. 5
LeNet-5 precision heatmap based on dataset size and label corruption, analyzing male patients only, female patients only, and combined data per diagnosis.
Fig. 6
Fig. 6
ViT precision heatmap based on dataset size and label corruption, analyzing male patients only, female patients only, and combined data per diagnosis.
Fig. 7
Fig. 7
Precision and recall metrics for ResNet-50 for each diagnosis and patient sex (a) without dataset modification, and (b) under 35% label corruption.
Fig. 8
Fig. 8
Precision and recall metrics for AlexNet for each diagnosis and patient sex (a) without dataset modification, and (b) under 35% label corruption.
Fig. 9
Fig. 9
Precision and recall metrics for VGG-16 for each diagnosis and patient sex (a) without dataset modification and (b) under 35% label corruption with 60% dataset reduction.
Fig. 10
Fig. 10
Precision and recall metrics for LeNet-5 for each diagnosis and patient sex (a) without dataset modification and (b) under 50% dataset reduction.
Fig. 11
Fig. 11
ResNet-50 precision heatmap based on dataset size and label corruption, analyzing male patients only, female patients only, and combined data per diagnosis.
Fig. 12
Fig. 12
AlexNet precision heatmap based on dataset size and label corruption, analyzing male patients only, female patients only, and combined data per diagnosis.
Fig. 13
Fig. 13
Precision and recall metrics for ResNet-50 for each diagnosis and patient sex (a) without dataset modification and (b) under 35% dataset corruption.

References

    1. Arya, K. V., Rodriguez, C., Singh, S. & Singhal, A. Artificial Intelligence and Machine Learning Techniques in Image Processing and Computer Vision. (2024).
    1. Hilpisch. Y. J. Artificial Intelligence in Finance: A Python-Based Guide (2020).
    1. Millington, I. AI for Games (Crc Press, Taylor & Francis Group, 2019).
    1. Custers, B. & Fosch-Villaronga, E. Law and Artificial Intelligence: Regulating AI and Applying AI in Legal Practice (T.M.C. Asser, 2022).
    1. Zgallai, W. A. & Dilber U. O. Artificial Intelligence and Image Processing in Medical Imaging, (2024).

MeSH terms

LinkOut - more resources