. 2021 Feb 17:3:602683.

doi: 10.3389/fdgth.2021.602683. eCollection 2021.

Utilizing Text Mining, Data Linkage and Deep Learning in Police and Health Records to Predict Future Offenses in Family and Domestic Violence

George Karystianis¹, Rina Carines Cabral², Soyeon Caren Han², Josiah Poon², Tony Butler¹

Affiliations

¹ School of Population Health, University of New South Wales, Sydney, NSW, Australia.
² School of Computer Science, University of Sydney, Sydney, NSW, Australia.

PMID: 34713088
PMCID: PMC8521947
DOI: 10.3389/fdgth.2021.602683

Utilizing Text Mining, Data Linkage and Deep Learning in Police and Health Records to Predict Future Offenses in Family and Domestic Violence

George Karystianis et al. Front Digit Health. 2021.

. 2021 Feb 17:3:602683.

doi: 10.3389/fdgth.2021.602683. eCollection 2021.

Authors

George Karystianis¹, Rina Carines Cabral², Soyeon Caren Han², Josiah Poon², Tony Butler¹

Affiliations

¹ School of Population Health, University of New South Wales, Sydney, NSW, Australia.
² School of Computer Science, University of Sydney, Sydney, NSW, Australia.

PMID: 34713088
PMCID: PMC8521947
DOI: 10.3389/fdgth.2021.602683

Abstract

Family and Domestic violence (FDV) is a global problem with significant social, economic, and health consequences for victims including increased health care costs, mental trauma, and social stigmatization. In Australia, the estimated annual cost of FDV is $22 billion, with one woman being murdered by a current or former partner every week. Despite this, tools that can predict future FDV based on the features of the person of interest (POI) and victim are lacking. The New South Wales Police Force attends thousands of FDV events each year and records details as fixed fields (e.g., demographic information for individuals involved in the event) and as text narratives which describe abuse types, victim injuries, threats, including the mental health status for POIs and victims. This information within the narratives is mostly untapped for research and reporting purposes. After applying a text mining methodology to extract information from 492,393 FDV event narratives (abuse types, victim injuries, mental illness mentions), we linked these characteristics with the respective fixed fields and with actual mental health diagnoses obtained from the NSW Ministry of Health for the same cohort to form a comprehensive FDV dataset. These data were input into five deep learning models (MLP, LSTM, Bi-LSTM, Bi-GRU, BERT) to predict three FDV offense types ("hands-on," "hands-off," "Apprehended Domestic Violence Order (ADVO) breach"). The transformer model with BERT embeddings returned the best performance (69.00% accuracy; 66.76% ROC) for "ADVO breach" in a multilabel classification setup while the binary classification setup generated similar results. "Hands-off" offenses proved the hardest offense type to predict (60.72% accuracy; 57.86% ROC using BERT) but showed potential to improve with fine-tuning of binary classification setups. "Hands-on" offenses benefitted least from the contextual information gained through BERT embeddings in which MLP with categorical embeddings outperformed it in three out of four metrics (65.95% accuracy; 78.03% F1-score; 70.00% precision). The encouraging results indicate that future FDV offenses can be predicted using deep learning on a large corpus of police and health data. Incorporating additional data sources will likely increase the performance which can assist those working on FDV and law enforcement to improve outcomes and better manage FDV events.

Keywords: big data; data linkage; deep learning; family and domestic violence; health records; predictive analytics; text mining.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**Figure 1**
Distribution of the three offense types within the 416,441 police recorded FDV events.

**Figure 2**
Distribution and intersection of the three offense types within the 416,441 police recorded FDV events.

**Figure 3**
Highest ranking feature categories returned from a chi-square test for the offense types to be predicted from police recorded FDV events. P refers to the fixed field (e.g., premise type, general offense, POI sex) data while TM refers to text mined information (e.g., abuse types, victim injuries).

**Figure 4**
Distribution of the three offense types after data preprocessing in the training and test sets.

**Figure 5**
Base schematics for the deep learning models.

**Figure 6**
An overview of the methodology used to predict three FDV offense types utilizing a combination of different data sources through a previously used text mining approach, linkage to health records and five deep learning models.

**Figure 7**
Performance comparison for the ROC and accuracy measures using different subsets of the FDV event sequence dataset; P refers to the fixed field data, TM refers to the text mining data and ALL refers to the fixed field, text mined and NSW Health dataset; S refers to the setup number for each deep learning model.

**Figure 8**
Interpretable explanation from LIME for an instance of predicting a “hands-on” offense type based on two previous FDV events. Top-left are the prediction probabilities generated by the BERT model; a graph displaying the top 10 words that contribute to the negative class “not hands-on” (blue) and the positive class “hands-on” (orange) and their weights of influence is shown on the left; text data with the top 10 words highlighted (darker-colored highlights have larger weights) are on the right.

See this image and copyright information in PMC

Cited by

A Systematic Literature Review of the Use of Computational Text Analysis Methods in Intimate Partner Violence Research.
Neubauer L, Straw I, Mariconti E, Tanczer LM. Neubauer L, et al. J Fam Violence. 2023 Mar 21:1-20. doi: 10.1007/s10896-023-00517-7. Online ahead of print. J Fam Violence. 2023. PMID: 37358974 Free PMC article. Review.
Mental Illness Concordance Between Hospital Clinical Records and Mentions in Domestic Violence Police Narratives: Data Linkage Study.
Karystianis G, Cabral RC, Adily A, Lukmanjaya W, Schofield P, Buchan I, Nenadic G, Butler T. Karystianis G, et al. JMIR Form Res. 2022 Oct 20;6(10):e39373. doi: 10.2196/39373. JMIR Form Res. 2022. PMID: 36264613 Free PMC article.

References

1. World Health Organisation . Violence Against Women. (2017). Available online at: https://www.who.int/news-room/fact-sheets/detail/violence-against-women.
1. VicHealth . The Health Costs of Violence. Measuring the Burden of Diseases Caused by Intimate Partner Violence. Melbourne: (2005).
1. Australian Institute of Health and Welfare . Family, Domestic and Sexual Violence in Australia. (2018). Available online at: https://www.aihw.gov.au/reports/domestic-violence/family-domestic-sexual....
1. Campo M. Children's exposure to domestic and family violence: key issues and responses. J Home Econ Inst Aust. (2015) 22:33.
1. Sheridan DJ, Nash KR. Acute injury patterns of intimate partner violence victims. Trauma Viol Abuse. (2007) 8:281–9. 10.1177/1524838007303504 - DOI - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Utilizing Text Mining, Data Linkage and Deep Learning in Police and Health Records to Predict Future Offenses in Family and Domestic Violence

Affiliations

Utilizing Text Mining, Data Linkage and Deep Learning in Police and Health Records to Predict Future Offenses in Family and Domestic Violence

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Related information

LinkOut - more resources

Full Text Sources