Multicenter Study

. 2021 Feb 26;16(2):e0247404.

doi: 10.1371/journal.pone.0247404. eCollection 2021.

A natural language processing and deep learning approach to identify child abuse from pediatric electronic medical records

Akshaya V Annapragada¹, Marcella M Donaruma-Kwoh², Ananth V Annapragada^{3

4}, Zbigniew A Starosolski^{3

4}

Affiliations

¹ Harvard John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, United States of America.
² Pediatrics-Public Health, Baylor College of Medicine, Houston, TX, United States of America.
³ The Singleton Department of Pediatric Radiology, Texas Children's Hospital, Houston, TX, United States of America.
⁴ Department of Radiology, Baylor College of Medicine, Houston, TX, United States of America.

PMID: 33635890
PMCID: PMC7909689
DOI: 10.1371/journal.pone.0247404

Multicenter Study

A natural language processing and deep learning approach to identify child abuse from pediatric electronic medical records

Akshaya V Annapragada et al. PLoS One. 2021.

. 2021 Feb 26;16(2):e0247404.

doi: 10.1371/journal.pone.0247404. eCollection 2021.

Authors

Akshaya V Annapragada¹, Marcella M Donaruma-Kwoh², Ananth V Annapragada^{3

4}, Zbigniew A Starosolski^{3

4}

Affiliations

¹ Harvard John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, United States of America.
² Pediatrics-Public Health, Baylor College of Medicine, Houston, TX, United States of America.
³ The Singleton Department of Pediatric Radiology, Texas Children's Hospital, Houston, TX, United States of America.
⁴ Department of Radiology, Baylor College of Medicine, Houston, TX, United States of America.

PMID: 33635890
PMCID: PMC7909689
DOI: 10.1371/journal.pone.0247404

Abstract

Child physical abuse is a leading cause of traumatic injury and death in children. In 2017, child abuse was responsible for 1688 fatalities in the United States, of 3.5 million children referred to Child Protection Services and 674,000 substantiated victims. While large referral hospitals maintain teams trained in Child Abuse Pediatrics, smaller community hospitals often do not have such dedicated resources to evaluate patients for potential abuse. Moreover, identification of abuse has a low margin of error, as false positive identifications lead to unwarranted separations, while false negatives allow dangerous situations to continue. This context makes the consistent detection of and response to abuse difficult, particularly given subtle signs in young, non-verbal patients. Here, we describe the development of artificial intelligence algorithms that use unstructured free-text in the electronic medical record-including notes from physicians, nurses, and social workers-to identify children who are suspected victims of physical abuse. Importantly, only the notes from time of first encounter (e.g.: birth, routine visit, sickness) to the last record before child protection team involvement were used. This allowed us to develop an algorithm using only information available prior to referral to the specialized child protection team. The study was performed in a multi-center referral pediatric hospital on patients screened for abuse within five different locations between 2015 and 2019. Of 1123 patients, 867 records were available after data cleaning and processing, and 55% were abuse-positive as determined by a multi-disciplinary team of clinical professionals. These electronic medical records were encoded with three natural language processing (NLP) algorithms-Bag of Words (BOW), Word Embeddings (WE), and Rules-Based (RB)-and used to train multiple neural network architectures. The BOW and WE encodings utilize the full free-text, while RB selects crucial phrases as identified by physicians. The best architecture was selected by average classification accuracy for the best performing model from each train-test split of a cross-validation experiment. Natural language processing coupled with neural networks detected cases of likely child abuse using only information available to clinicians prior to child protection team referral with average accuracy of 0.90±0.02 and average area under the receiver operator characteristic curve (ROC-AUC) 0.93±0.02 for the best performing Bag of Words models. The best performing rules-based models achieved average accuracy of 0.77±0.04 and average ROC-AUC 0.81±0.05, while a Word Embeddings strategy was severely limited by lack of representative embeddings. Importantly, the best performing model had a false positive rate of 8%, as compared to rates of 20% or higher in previously reported studies. This artificial intelligence approach can help screen patients for whom an abuse concern exists and streamline the identification of patients who may benefit from referral to a child protection team. Furthermore, this approach could be applied to develop computer-aided-diagnosis platforms for the challenging and often intractable problem of reliably identifying pediatric patients suffering from physical abuse.

PubMed Disclaimer

Conflict of interest statement

NO authors have competing interests.

Figures

**Fig 1. Schematic of patient record selection and processing.**
(a) The records were processed to extract only those notes written before the first note from a Child Abuse Pediatrics team MD or NP, hence allowing for prediction using only information available before the decision to refer a patient to the CAP team. (b) 1123 records for patients evaluated for suspected abuse between 1/1/2015 and 5/1/2019 were identified. Several were excluded for reasons listed in the figure, leaving 867 records for deep learning. (c) Schematic of cross-validation procedure used to create 10 distinct train-test splits.

**Fig 2. Cross- validation.**
Boxplots showing accuracy for n = 10 trials for each of 10 train-test splits, with our chosen model architecture in each strategy. The orange line shows the median, while the edges of the box show the 1st and 3rd quartile. The whiskers extend to 1.5 times the interquartile range, while points greater than 3rd quartile + 1.5*IQR or less than 1st quartile– 1.5*IQR are shown as discrete points.

**Fig 3. Performance of the best model in each of 10 train-test splits.**
A) the average accuracy of ten repetitions, B) the average area under the ROC curve (AUC) of ten repetitions.

**Fig 4. ROC curves, AUC, accuracy, PPV, sensitivity, specificity, and F1 score for the best performing model in each train-test split for BOW-TFIDF and rules-based approach.**
For each model category the receiver operator (ROC) curve, AUC, Accuracy, PPV, Sensitivity, Specificity, and F1 Score for the best model in each train-test split is shown. The ROC curve shows the sensitivity-specificity tradeoff for different classification thresholds, while the tables show the AUC for the ROC curve, as well as accuracy, PPV, sensitivity, specificity, and F1 score at the .5 threshold used in our classification algorithm. (a,c) BOW-TFIDF, (b,d) Rules-Based. The BOW models have highest AUC, with a characteristic ROC plot shape, and high sensitivity, PPV, specificity, and F1 Score.

**Fig 5. Leave-one-out sensitivity analysis of rules used in rules-based approach.**
(a) The change in percentage accuracy that occurs when each rule is in invalidated (set to -1 for each record) for the best performing model from the best train-test split by maximum accuracy, and the best performing model from the worst train-test split by maximum accuracy. For the best split, the invalidation of each rule has no change or lowers the accuracy, with the phrase “history domestic violence” having the greatest impact and reducing accuracy by 0.11 from 0.82 to 0.7. For the worst split, the invalidation of each rule can have no change, or can raise or lower the accuracy. The phrases “history domestic violence” and “rib fracture” have the largest negative impact and reduce accuracy by 0.05 from 0.7 to 0.65, while the phrases “Inconsistent”, “unwitnessed”, “altered mental status”, “employment” and “witnessed” have the largest positive impact and increase accuracy by 0.02 to 0.72. (b) Alphabetical list of rules which do not change accuracy during leave-one-out sensitivity analysis.

See this image and copyright information in PMC

References

1. Paul AR, Adamo MA. Non-accidental trauma in pediatric patients: a review of epidemiology, pathophysiology, diagnosis and treatment. Transl Pediatr. 2014; 3(3): 195–207. 10.3978/j.issn.2224-4336.2014.06.01 - DOI - PMC - PubMed
1. Child Maltreatment 2017; (2019). https://www.acf.hhs.gov/cb/resource/child-maltreatment-2017
1. Hymel KP, Armijo-Garcia V, Foster R. Frazier TN, Stoiko M. et al.. Validation of a Clinical Prediction Rule for Pediatric Abusive Head Trauma. Pediatrics 2014; 134 (6): e1537–e1544. 10.1542/peds.2014-1329 - DOI - PubMed
1. Chouldechova A, Putnam-Hornstein E, Benavides-Prado D, Fialko O, Vaithianathan R. A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions. Proceedings of Machine Learning Research 2018; 81: 1–15
1. Cowley L, Morris C, Maguire S, Farewell D, Kemp A. Validation of a prediction tool for abusive head trauma. Pediatrics 2015; 136(2): 290–298. 10.1542/peds.2014-3993 - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A natural language processing and deep learning approach to identify child abuse from pediatric electronic medical records

Affiliations

A natural language processing and deep learning approach to identify child abuse from pediatric electronic medical records

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical