Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jul 29;2(4):e190198.
doi: 10.1148/ryai.2020190198. eCollection 2020 Jul.

Rethinking Greulich and Pyle: A Deep Learning Approach to Pediatric Bone Age Assessment Using Pediatric Trauma Hand Radiographs

Affiliations

Rethinking Greulich and Pyle: A Deep Learning Approach to Pediatric Bone Age Assessment Using Pediatric Trauma Hand Radiographs

Ian Pan et al. Radiol Artif Intell. .

Abstract

Purpose: To develop a deep learning approach to bone age assessment based on a training set of developmentally normal pediatric hand radiographs and to compare this approach with automated and manual bone age assessment methods based on Greulich and Pyle (GP).

Methods: In this retrospective study, a convolutional neural network (trauma hand radiograph-trained deep learning bone age assessment method [TDL-BAAM]) was trained on 15 129 frontal view pediatric trauma hand radiographs obtained between December 14, 2009, and May 31, 2017, from Children's Hospital of New York, to predict chronological age. A total of 214 trauma hand radiographs from Hasbro Children's Hospital were used as an independent test set. The test set was rated by the TDL-BAAM model as well as a GP-based deep learning model (GPDL-BAAM) and two pediatric radiologists (radiologists 1 and 2) using the GP method. All ratings were compared with chronological age using mean absolute error (MAE), and standard concordance analyses were performed.

Results: The MAE of the TDL-BAAM model was 11.1 months, compared with 12.9 months for GPDL-BAAM (P = .0005), 14.6 months for radiologist 1 (P < .0001), and 16.0 for radiologist 2 (P < .0001). For TDL-BAAM, 95.3% of predictions were within 24 months of chronological age compared with 91.6% for GPDL-BAAM (P = .096), 86.0% for radiologist 1 (P < .0001), and 84.6% for radiologist 2 (P < .0001). Concordance was high between all methods and chronological age (intraclass coefficient > 0.93). Deep learning models demonstrated a systematic bias with a tendency to overpredict age for younger children versus radiologists who showed a consistent mean bias.

Conclusion: A deep learning model trained on pediatric trauma hand radiographs is on par with automated and manual GP-based methods for bone age assessment and provides a foundation for developing population-specific deep learning algorithms for bone age assessment in modern pediatric populations.Supplemental material is available for this article.© RSNA, 2020See also the commentary by Halabi in this issue.

PubMed Disclaimer

Conflict of interest statement

Disclosures of Conflicts of Interest: I.P. Activities related to the present article: disclosed no relevant relationships. Activities not related to the present article: disclosed no relevant relationships. Other relationships: author is a consultant for MD.ai. G.L.B. disclosed no relevant relationships. S.M. disclosed no relevant relationships. D.M. disclosed no relevant relationships. C.R. disclosed no relevant relationships. D.W.S. disclosed no relevant relationships. R.S.A. disclosed no relevant relationships.

Figures

Distribution of chronological age in Children’s Hospital of New York (CHONY) training set (top), CHONY test set (middle), and Hasbro Children’s Hospital (HCH) (bottom).
Figure 1:
Distribution of chronological age in Children’s Hospital of New York (CHONY) training set (top), CHONY test set (middle), and Hasbro Children’s Hospital (HCH) (bottom).
Deming regression comparisons among chronological age (CA), trauma hand radiograph–trained deep learning algorithm (TDL-BAAM), Greulich and Pyle–based deep learning algorithm (GPDL-BAAM), radiologist 1 (RAD1), and radiologist 2 (RAD2). Perfect concordance is represented by a 45-degree line (black line), and the observed concordance is represented by the Deming regression slope (red line)
Figure 2:
Deming regression comparisons among chronological age (CA), trauma hand radiograph–trained deep learning algorithm (TDL-BAAM), Greulich and Pyle–based deep learning algorithm (GPDL-BAAM), radiologist 1 (RAD1), and radiologist 2 (RAD2). Perfect concordance is represented by a 45-degree line (black line), and the observed concordance is represented by the Deming regression slope (red line)
Bland-Altman comparisons among chronological age (CA), trauma hand radiograph–trained deep learning algorithm (TDL-BAAM), Greulich and Pyle–based deep learning algorithm (GPDL-BAAM), radiologist 1 (RAD1), and radiologist 2 (RAD2). The center dashed line represents the observed mean difference. The top and bottom dashed lines denote 1.96 standard deviations above and below the mean difference. The dotted line represents 95% confidence intervals for these three values. A black line at 0 is the reference representing no bias (mean or slope) exists. The blue line represents the estimated bias from 0 with respect to age with 95% confidence intervals (gray shaded area).
Figure 3:
Bland-Altman comparisons among chronological age (CA), trauma hand radiograph–trained deep learning algorithm (TDL-BAAM), Greulich and Pyle–based deep learning algorithm (GPDL-BAAM), radiologist 1 (RAD1), and radiologist 2 (RAD2). The center dashed line represents the observed mean difference. The top and bottom dashed lines denote 1.96 standard deviations above and below the mean difference. The dotted line represents 95% confidence intervals for these three values. A black line at 0 is the reference representing no bias (mean or slope) exists. The blue line represents the estimated bias from 0 with respect to age with 95% confidence intervals (gray shaded area).
Predicted bone age in months (y-axis) by chronological age (CA), trauma hand radiograph–trained deep learning algorithm (TDL-BAAM), Greulich and Pyle–based deep learning algorithm (GPDL-BAAM), radiologist 1 (RAD1), and radiologist 2 (RAD2) between male and female patients with 95% confidence intervals.
Figure 4:
Predicted bone age in months (y-axis) by chronological age (CA), trauma hand radiograph–trained deep learning algorithm (TDL-BAAM), Greulich and Pyle–based deep learning algorithm (GPDL-BAAM), radiologist 1 (RAD1), and radiologist 2 (RAD2) between male and female patients with 95% confidence intervals.

Comment in

  • Taking Matters into Your Own Hands.
    Halabi SS. Halabi SS. Radiol Artif Intell. 2020 Jul 29;2(4):e200150. doi: 10.1148/ryai.2020200150. eCollection 2020 Jul. Radiol Artif Intell. 2020. PMID: 33939791 Free PMC article. No abstract available.

References

    1. Martin DD, Wit JM, Hochberg Z, et al. The use of bone age in clinical practice - part 1. Horm Res Paediatr 2011;76(1):1–9. - PubMed
    1. Martin DD, Wit JM, Hochberg Z, et al. The use of bone age in clinical practice - part 2. Horm Res Paediatr 2011;76(1):10–16. - PubMed
    1. Breen MA, Tsai A, Stamm A, Kleinman PK. Bone age assessment practices in infants and older children among Society for Pediatric Radiology members. Pediatr Radiol 2016;46(9):1269–1274. - PubMed
    1. Bull RK, Edwards PD, Kemp PM, Fry S, Hughes IA. Bone age assessment: a large scale comparison of the Greulich and Pyle, and Tanner and Whitehouse (TW2) methods. Arch Dis Child 1999;81(2):172–173. - PMC - PubMed
    1. Greulich W, Pyle S. Radiographic Atlas of Skeletal Development of the Hand and Wrist. Stanford, Calif: Stanford University Press, 1999.

LinkOut - more resources