Use of a sperm morphology assessment standardisation training tool improves the accuracy of novice sperm morphologists

doi:10.1038/s41598-025-07515-3

. 2025 Jul 1;15(1):21963.

doi: 10.1038/s41598-025-07515-3.

Use of a sperm morphology assessment standardisation training tool improves the accuracy of novice sperm morphologists

Katherine Rose Seymour¹, Jessica P Rickard², Kelsey R Pool³, Taylor Pini⁴, Simon P de Graaf²

Affiliations

¹ School of Life and Environmental Sciences, Faculty of Science, The University of Sydney, Room 344, RMC Gunn Building, Sydney, NSW, B19, Australia. Katherine.seymour@sydney.edu.au.
² School of Life and Environmental Sciences, Faculty of Science, The University of Sydney, Room 344, RMC Gunn Building, Sydney, NSW, B19, Australia.
³ School of Agriculture and Environment, The University of Western Australia, Crawley, WA, Australia.
⁴ School of Veterinary Science, Faculty of Science, The University of Queensland, Brisbane, QLD, Australia.

PMID: 40595188
PMCID: PMC12215844
DOI: 10.1038/s41598-025-07515-3

Use of a sperm morphology assessment standardisation training tool improves the accuracy of novice sperm morphologists

Katherine Rose Seymour et al. Sci Rep. 2025.

. 2025 Jul 1;15(1):21963.

doi: 10.1038/s41598-025-07515-3.

Authors

Katherine Rose Seymour¹, Jessica P Rickard², Kelsey R Pool³, Taylor Pini⁴, Simon P de Graaf²

Affiliations

¹ School of Life and Environmental Sciences, Faculty of Science, The University of Sydney, Room 344, RMC Gunn Building, Sydney, NSW, B19, Australia. Katherine.seymour@sydney.edu.au.
² School of Life and Environmental Sciences, Faculty of Science, The University of Sydney, Room 344, RMC Gunn Building, Sydney, NSW, B19, Australia.
³ School of Agriculture and Environment, The University of Western Australia, Crawley, WA, Australia.
⁴ School of Veterinary Science, Faculty of Science, The University of Queensland, Brisbane, QLD, Australia.

PMID: 40595188
PMCID: PMC12215844
DOI: 10.1038/s41598-025-07515-3

Abstract

Sperm morphology assessment is recognised as a critical, yet variable, test of male fertility. This variability is due in part to the lack of standardised training for morphologists. This study utilised a bespoke 'Sperm Morphology Assessment Standardisation Training Tool' to train novice morphologists using machine learning principles and consisted of two experiments. Experiment 1 assessed novice morphologists' (n = 22) accuracy across 2- category (normal; abnormal), 5- category (normal; head defect, midpiece defect, tail defect, cytoplasmic droplet), 8- category (normal; cytoplasmic droplet; midpiece defect; loose heads and abnormal tails; pyriform head; knobbed acrosomes; vacuoles and teratoids; swollen acrosomes), and 25- category (normal; all defects defined individually) classification systems, with untrained users achieving 81.0 ± 2.5%, 68 ± 3.59%, 64 ± 3.5%, and 53 ± 3.69%, respectively. A second cohort (n = 16) exposed to a visual aid and video significantly improved first-test accuracy (94.9 ± 0.66%, 92.9 ± 0.81%, 90 ± 0.91% and 82.7 ± 1.05, p < 0.001). Experiment 2 evaluated repeated training over four weeks, resulting in significant improvement in accuracy (82 ± 1.05% to 90 ± 1.38%, p < 0.001) and diagnostic speed (7.0 ± 0.4s to 4.9 ± 0.3s, p < 0.001). Final accuracy rates reached 98 ± 0.43%, 97 ± 0.58%, 96 ± 0.81%, and 90 ± 1.38% across classification systems 2-, 5-, 8- and 25-categories respectively. Significant differences in accuracy and variation were observed between the classification systems. This tool effectively standardised sperm morphology assessment. Future research could explore its application in other species, including in human andrology, given its accessibility and adaptability across classification systems.

Keywords: Advanced semen assessment; Ram sperm morphology; Reproduction; Sperm morphology assessment; Standardised training tool; Subjective assessment.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Figures

**Fig. 1**
Boxplot of accuracy results from experiment 1 (n = 22). Users with no prior experience in sperm morphology assessment attempted the 100 sperm-test without access to the video or visual aid. Users each classified the same 100 ram sperm (shown in random order). Results per classification system used (25, 8, 5 and 2 morphological categories) are shown. * indicates that results were statistically significantly different when compared to the other classification systems.

**Fig. 2**
Dunn Tests Pairwise Comparisons heatmap depicting significant differences in variation of accuracy of all tests across the 4 week study period. Results were taken from users classifying 100 sperm each (n = 1600) per test and average results per test were compared. There was no significant variation amongst results for day 1 (tests 1–4, red colour) or amongst days 2–5 (tests 5–14 red colour). Variation was significantly different when comparing day 1 results to days 2–5 (tests 5–14, blue colour).

**Fig. 3**
Mean (+ /- SEM) accuracy of assessors (n = 16) for each test across the 4 week study period when labelling with 25 categories. Users classified 100 ram sperm images (shown to them randomly) each per test.

**Fig. 4**
Coefficient of variation (user standard deviation/user mean) per user (n = 16) for the first (test 1, week 1) and final (test 14, week 4) tests. Users classified 100 ram sperm each, using 25 morphological categories.

**Fig. 5**
Mean (+ /- SEM) time users spent classifying sperm per 100 ram sperm/test classified using 25 morphological categories (n = 1600) (mm:ss.0) for the 14 tests across the 4 week study period.

**Fig. 6**
Dunn Tests Pairwise Comparisons heatmap depicting significant differences in duration (ss:mm.0) spent classifying images using the 25 morphological categories classification system per test. Results were taken from users classifying 100 sperm each (n = 1600) per test and average duration at identifying each category were compared. There was no significant difference (red colour) between the duration spent classifying for the first 4 tests (day 1). The results from tests during day 1 were significantly different from all other tests (5–14, days 2–5).

**Fig. 7**
Mean (+ /- SEM) accuracy of users (N = 16) classifying 100 ram sperm per test using 25 morphological categories compared to mean (+ /- SEM) duration time spent labelling each image (n = 1600) (mm:ss.0) for the first test of each testing day across the 4 week study period (Days 1–2 indicate the two intensive training days in week 1, Days 3–5 indicate the follow up tests in weeks 2–4).

**Fig. 8**
Mean accuracy of users (N = 16) in experiment 2 following morphological classification of 100 ram sperm on 14 occasions (i.e. n = 224 accuracy scores, N = 22,400 sperm classified) per classification system (25, 8, 5 and 2 morphological categories). Box plots with common superscripts are not statistically significantly (p < 0.05) different.

**Fig. 9**
Mean (+ /- SEM) accuracy results from users classifying 100 ram sperm/test for the first test of each testing day across the 4-week testing period (n = 16). Results are shown per classification system (25, 8, 5 and 2 morphological categories).

**Fig. 10**
Comparison of mean (+ /- SEM) user accuracy at classifying 100 ram sperm images/test each when using 8 morphological categories (Normal, ‘Proximal cytoplasmic droplets’, ‘Midpiece abnormalities’, ‘Loose/Multiple heads and abnormal tails’, ‘Knobbed acrosomes’, ‘Vacuoles and teratoids’ and ‘Swollen acrosomes’). Results are shown for the first test of each of the 5 testing days. The category ‘Pyriform heads’ was omitted from the data set due to insufficient occurrences in the population to allow for statistical analysis.

**Fig. 11**
Comparison of mean (+ /- SEM) user accuracy at classifying 100 ram sperm images/test each when using 5 morphological categories (Normal, Head, Midpiece, Tail and Cytoplasmic Droplets). Results are shown for the first test of each of the 5 testing days.

**Fig. 12**
Dunn Tests Pairwise Comparisons heatmap depicting significant differences in variation of accuracy of the 8 morphological categories across the 4-week study period. Results were taken from users classifying 100 sperm each (n = 1600) per test and average accuracy at identifying each category were compared. There was no significant difference (red colour) when comparing categories ‘Midpiece abnormalities’ to ‘Vacuoles and teratoids’ as well as ‘Swollen acrosomes’ to ‘Loose/multiple Heads and abnormal tails’. The accuracy of all other morphological categories were significantly different from each other (blue colour).

**Fig. 13**
Dunn Tests Pairwise Comparisons heatmap depicting significant differences in variation of accuracy of the 5 morphological categories across the 4-week study period. Results were taken from users classifying 100 sperm each (n = 1600) per test and average accuracy at identifying each category were compared. There was no significant difference (red colour) when comparing categories ‘Normal to ‘Tail’. Comparison of categories ‘Midpiece’ to all other categories were significant (blue colour).

See this image and copyright information in PMC

References

1. Björndahl, L. et al. Standards in semen examination: Publishing reproducible and reliable data based on high-quality methodology. Hum. Reprod.37(11), 2497–2502 (2022). - PMC - PubMed
1. Mallidis, C. et al. Ten years’ experience with an external quality control program for semen analysis. Fertil. Steril.98(3), 611-616.e614 (2012). - PubMed
1. Matson, P. L. Andrology: External quality assessment for semen analysis and sperm antibody detection: Results of a pilot scheme. Hum. Reprod.10(3), 620–625 (1995). - PubMed
1. Keel, B. A. Quality control, quality assurance, and proficiency testing in the andrology laboratory. Arch. Androl.48(6), 417–431 (2002). - PubMed
1. Barratt, C. L. R., Björndahl, L., Menkveld, R. & Mortimer, D. Eshre special interest group for andrology basic semen analysis course: A continued focus on accuracy, quality, efficiency and clinical relevance†. Hum. Reprod.26(12), 3207–3212 (2011). - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
- PubMed Central

[1] Björndahl, L. et al. Standards in semen examination: Publishing reproducible and reliable data based on high-quality methodology. Hum. Reprod.37(11), 2497–2502 (2022). - PMC - PubMed

[2] Björndahl, L. et al. Standards in semen examination: Publishing reproducible and reliable data based on high-quality methodology. Hum. Reprod.37(11), 2497–2502 (2022). - PMC - PubMed

[3] Mallidis, C. et al. Ten years’ experience with an external quality control program for semen analysis. Fertil. Steril.98(3), 611-616.e614 (2012). - PubMed

[4] Mallidis, C. et al. Ten years’ experience with an external quality control program for semen analysis. Fertil. Steril.98(3), 611-616.e614 (2012). - PubMed

[5] Matson, P. L. Andrology: External quality assessment for semen analysis and sperm antibody detection: Results of a pilot scheme. Hum. Reprod.10(3), 620–625 (1995). - PubMed

[6] Matson, P. L. Andrology: External quality assessment for semen analysis and sperm antibody detection: Results of a pilot scheme. Hum. Reprod.10(3), 620–625 (1995). - PubMed

[7] Keel, B. A. Quality control, quality assurance, and proficiency testing in the andrology laboratory. Arch. Androl.48(6), 417–431 (2002). - PubMed

[8] Keel, B. A. Quality control, quality assurance, and proficiency testing in the andrology laboratory. Arch. Androl.48(6), 417–431 (2002). - PubMed

[9] Barratt, C. L. R., Björndahl, L., Menkveld, R. & Mortimer, D. Eshre special interest group for andrology basic semen analysis course: A continued focus on accuracy, quality, efficiency and clinical relevance†. Hum. Reprod.26(12), 3207–3212 (2011). - PubMed

[10] Barratt, C. L. R., Björndahl, L., Menkveld, R. & Mortimer, D. Eshre special interest group for andrology basic semen analysis course: A continued focus on accuracy, quality, efficiency and clinical relevance†. Hum. Reprod.26(12), 3207–3212 (2011). - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Use of a sperm morphology assessment standardisation training tool improves the accuracy of novice sperm morphologists

Affiliations

Use of a sperm morphology assessment standardisation training tool improves the accuracy of novice sperm morphologists

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

LinkOut - more resources

Full Text Sources