Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 1;15(1):21963.
doi: 10.1038/s41598-025-07515-3.

Use of a sperm morphology assessment standardisation training tool improves the accuracy of novice sperm morphologists

Affiliations

Use of a sperm morphology assessment standardisation training tool improves the accuracy of novice sperm morphologists

Katherine Rose Seymour et al. Sci Rep. .

Abstract

Sperm morphology assessment is recognised as a critical, yet variable, test of male fertility. This variability is due in part to the lack of standardised training for morphologists. This study utilised a bespoke 'Sperm Morphology Assessment Standardisation Training Tool' to train novice morphologists using machine learning principles and consisted of two experiments. Experiment 1 assessed novice morphologists' (n = 22) accuracy across 2- category (normal; abnormal), 5- category (normal; head defect, midpiece defect, tail defect, cytoplasmic droplet), 8- category (normal; cytoplasmic droplet; midpiece defect; loose heads and abnormal tails; pyriform head; knobbed acrosomes; vacuoles and teratoids; swollen acrosomes), and 25- category (normal; all defects defined individually) classification systems, with untrained users achieving 81.0 ± 2.5%, 68 ± 3.59%, 64 ± 3.5%, and 53 ± 3.69%, respectively. A second cohort (n = 16) exposed to a visual aid and video significantly improved first-test accuracy (94.9 ± 0.66%, 92.9 ± 0.81%, 90 ± 0.91% and 82.7 ± 1.05, p < 0.001). Experiment 2 evaluated repeated training over four weeks, resulting in significant improvement in accuracy (82 ± 1.05% to 90 ± 1.38%, p < 0.001) and diagnostic speed (7.0 ± 0.4s to 4.9 ± 0.3s, p < 0.001). Final accuracy rates reached 98 ± 0.43%, 97 ± 0.58%, 96 ± 0.81%, and 90 ± 1.38% across classification systems 2-, 5-, 8- and 25-categories respectively. Significant differences in accuracy and variation were observed between the classification systems. This tool effectively standardised sperm morphology assessment. Future research could explore its application in other species, including in human andrology, given its accessibility and adaptability across classification systems.

Keywords: Advanced semen assessment; Ram sperm morphology; Reproduction; Sperm morphology assessment; Standardised training tool; Subjective assessment.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Boxplot of accuracy results from experiment 1 (n = 22). Users with no prior experience in sperm morphology assessment attempted the 100 sperm-test without access to the video or visual aid. Users each classified the same 100 ram sperm (shown in random order). Results per classification system used (25, 8, 5 and 2 morphological categories) are shown. * indicates that results were statistically significantly different when compared to the other classification systems.
Fig. 2
Fig. 2
Dunn Tests Pairwise Comparisons heatmap depicting significant differences in variation of accuracy of all tests across the 4 week study period. Results were taken from users classifying 100 sperm each (n = 1600) per test and average results per test were compared. There was no significant variation amongst results for day 1 (tests 1–4, red colour) or amongst days 2–5 (tests 5–14 red colour). Variation was significantly different when comparing day 1 results to days 2–5 (tests 5–14, blue colour).
Fig. 3
Fig. 3
Mean (+ /- SEM) accuracy of assessors (n = 16) for each test across the 4 week study period when labelling with 25 categories. Users classified 100 ram sperm images (shown to them randomly) each per test.
Fig. 4
Fig. 4
Coefficient of variation (user standard deviation/user mean) per user (n = 16) for the first (test 1, week 1) and final (test 14, week 4) tests. Users classified 100 ram sperm each, using 25 morphological categories.
Fig. 5
Fig. 5
Mean (+ /- SEM) time users spent classifying sperm per 100 ram sperm/test classified using 25 morphological categories (n = 1600) (mm:ss.0) for the 14 tests across the 4 week study period.
Fig. 6
Fig. 6
Dunn Tests Pairwise Comparisons heatmap depicting significant differences in duration (ss:mm.0) spent classifying images using the 25 morphological categories classification system per test. Results were taken from users classifying 100 sperm each (n = 1600) per test and average duration at identifying each category were compared. There was no significant difference (red colour) between the duration spent classifying for the first 4 tests (day 1). The results from tests during day 1 were significantly different from all other tests (5–14, days 2–5).
Fig. 7
Fig. 7
Mean (+ /- SEM) accuracy of users (N = 16) classifying 100 ram sperm per test using 25 morphological categories compared to mean (+ /- SEM) duration time spent labelling each image (n = 1600) (mm:ss.0) for the first test of each testing day across the 4 week study period (Days 1–2 indicate the two intensive training days in week 1, Days 3–5 indicate the follow up tests in weeks 2–4).
Fig. 8
Fig. 8
Mean accuracy of users (N = 16) in experiment 2 following morphological classification of 100 ram sperm on 14 occasions (i.e. n = 224 accuracy scores, N = 22,400 sperm classified) per classification system (25, 8, 5 and 2 morphological categories). Box plots with common superscripts are not statistically significantly (p < 0.05) different.
Fig. 9
Fig. 9
Mean (+ /- SEM) accuracy results from users classifying 100 ram sperm/test for the first test of each testing day across the 4-week testing period (n = 16). Results are shown per classification system (25, 8, 5 and 2 morphological categories).
Fig. 10
Fig. 10
Comparison of mean (+ /- SEM) user accuracy at classifying 100 ram sperm images/test each when using 8 morphological categories (Normal, ‘Proximal cytoplasmic droplets’, ‘Midpiece abnormalities’, ‘Loose/Multiple heads and abnormal tails’, ‘Knobbed acrosomes’, ‘Vacuoles and teratoids’ and ‘Swollen acrosomes’). Results are shown for the first test of each of the 5 testing days. The category ‘Pyriform heads’ was omitted from the data set due to insufficient occurrences in the population to allow for statistical analysis.
Fig. 11
Fig. 11
Comparison of mean (+ /- SEM) user accuracy at classifying 100 ram sperm images/test each when using 5 morphological categories (Normal, Head, Midpiece, Tail and Cytoplasmic Droplets). Results are shown for the first test of each of the 5 testing days.
Fig. 12
Fig. 12
Dunn Tests Pairwise Comparisons heatmap depicting significant differences in variation of accuracy of the 8 morphological categories across the 4-week study period. Results were taken from users classifying 100 sperm each (n = 1600) per test and average accuracy at identifying each category were compared. There was no significant difference (red colour) when comparing categories ‘Midpiece abnormalities’ to ‘Vacuoles and teratoids’ as well as ‘Swollen acrosomes’ to ‘Loose/multiple Heads and abnormal tails’. The accuracy of all other morphological categories were significantly different from each other (blue colour).
Fig. 13
Fig. 13
Dunn Tests Pairwise Comparisons heatmap depicting significant differences in variation of accuracy of the 5 morphological categories across the 4-week study period. Results were taken from users classifying 100 sperm each (n = 1600) per test and average accuracy at identifying each category were compared. There was no significant difference (red colour) when comparing categories ‘Normal to ‘Tail’. Comparison of categories ‘Midpiece’ to all other categories were significant (blue colour).

Similar articles

References

    1. Björndahl, L. et al. Standards in semen examination: Publishing reproducible and reliable data based on high-quality methodology. Hum. Reprod.37(11), 2497–2502 (2022). - PMC - PubMed
    1. Mallidis, C. et al. Ten years’ experience with an external quality control program for semen analysis. Fertil. Steril.98(3), 611-616.e614 (2012). - PubMed
    1. Matson, P. L. Andrology: External quality assessment for semen analysis and sperm antibody detection: Results of a pilot scheme. Hum. Reprod.10(3), 620–625 (1995). - PubMed
    1. Keel, B. A. Quality control, quality assurance, and proficiency testing in the andrology laboratory. Arch. Androl.48(6), 417–431 (2002). - PubMed
    1. Barratt, C. L. R., Björndahl, L., Menkveld, R. & Mortimer, D. Eshre special interest group for andrology basic semen analysis course: A continued focus on accuracy, quality, efficiency and clinical relevance†. Hum. Reprod.26(12), 3207–3212 (2011). - PubMed

LinkOut - more resources