Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb 17:3:602683.
doi: 10.3389/fdgth.2021.602683. eCollection 2021.

Utilizing Text Mining, Data Linkage and Deep Learning in Police and Health Records to Predict Future Offenses in Family and Domestic Violence

Affiliations

Utilizing Text Mining, Data Linkage and Deep Learning in Police and Health Records to Predict Future Offenses in Family and Domestic Violence

George Karystianis et al. Front Digit Health. .

Abstract

Family and Domestic violence (FDV) is a global problem with significant social, economic, and health consequences for victims including increased health care costs, mental trauma, and social stigmatization. In Australia, the estimated annual cost of FDV is $22 billion, with one woman being murdered by a current or former partner every week. Despite this, tools that can predict future FDV based on the features of the person of interest (POI) and victim are lacking. The New South Wales Police Force attends thousands of FDV events each year and records details as fixed fields (e.g., demographic information for individuals involved in the event) and as text narratives which describe abuse types, victim injuries, threats, including the mental health status for POIs and victims. This information within the narratives is mostly untapped for research and reporting purposes. After applying a text mining methodology to extract information from 492,393 FDV event narratives (abuse types, victim injuries, mental illness mentions), we linked these characteristics with the respective fixed fields and with actual mental health diagnoses obtained from the NSW Ministry of Health for the same cohort to form a comprehensive FDV dataset. These data were input into five deep learning models (MLP, LSTM, Bi-LSTM, Bi-GRU, BERT) to predict three FDV offense types ("hands-on," "hands-off," "Apprehended Domestic Violence Order (ADVO) breach"). The transformer model with BERT embeddings returned the best performance (69.00% accuracy; 66.76% ROC) for "ADVO breach" in a multilabel classification setup while the binary classification setup generated similar results. "Hands-off" offenses proved the hardest offense type to predict (60.72% accuracy; 57.86% ROC using BERT) but showed potential to improve with fine-tuning of binary classification setups. "Hands-on" offenses benefitted least from the contextual information gained through BERT embeddings in which MLP with categorical embeddings outperformed it in three out of four metrics (65.95% accuracy; 78.03% F1-score; 70.00% precision). The encouraging results indicate that future FDV offenses can be predicted using deep learning on a large corpus of police and health data. Incorporating additional data sources will likely increase the performance which can assist those working on FDV and law enforcement to improve outcomes and better manage FDV events.

Keywords: big data; data linkage; deep learning; family and domestic violence; health records; predictive analytics; text mining.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Distribution of the three offense types within the 416,441 police recorded FDV events.
Figure 2
Figure 2
Distribution and intersection of the three offense types within the 416,441 police recorded FDV events.
Figure 3
Figure 3
Highest ranking feature categories returned from a chi-square test for the offense types to be predicted from police recorded FDV events. P refers to the fixed field (e.g., premise type, general offense, POI sex) data while TM refers to text mined information (e.g., abuse types, victim injuries).
Figure 4
Figure 4
Distribution of the three offense types after data preprocessing in the training and test sets.
Figure 5
Figure 5
Base schematics for the deep learning models.
Figure 6
Figure 6
An overview of the methodology used to predict three FDV offense types utilizing a combination of different data sources through a previously used text mining approach, linkage to health records and five deep learning models.
Figure 7
Figure 7
Performance comparison for the ROC and accuracy measures using different subsets of the FDV event sequence dataset; P refers to the fixed field data, TM refers to the text mining data and ALL refers to the fixed field, text mined and NSW Health dataset; S refers to the setup number for each deep learning model.
Figure 8
Figure 8
Interpretable explanation from LIME for an instance of predicting a “hands-on” offense type based on two previous FDV events. Top-left are the prediction probabilities generated by the BERT model; a graph displaying the top 10 words that contribute to the negative class “not hands-on” (blue) and the positive class “hands-on” (orange) and their weights of influence is shown on the left; text data with the top 10 words highlighted (darker-colored highlights have larger weights) are on the right.

Similar articles

Cited by

References

    1. World Health Organisation . Violence Against Women. (2017). Available online at: https://www.who.int/news-room/fact-sheets/detail/violence-against-women.
    1. VicHealth . The Health Costs of Violence. Measuring the Burden of Diseases Caused by Intimate Partner Violence. Melbourne: (2005).
    1. Australian Institute of Health and Welfare . Family, Domestic and Sexual Violence in Australia. (2018). Available online at: https://www.aihw.gov.au/reports/domestic-violence/family-domestic-sexual....
    1. Campo M. Children's exposure to domestic and family violence: key issues and responses. J Home Econ Inst Aust. (2015) 22:33.
    1. Sheridan DJ, Nash KR. Acute injury patterns of intimate partner violence victims. Trauma Viol Abuse. (2007) 8:281–9. 10.1177/1524838007303504 - DOI - PubMed

LinkOut - more resources