Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Mar:115:103668.
doi: 10.1016/j.jbi.2020.103668. Epub 2021 Jan 27.

Modified Needleman-Wunsch algorithm for clinical pathway clustering

Affiliations

Modified Needleman-Wunsch algorithm for clinical pathway clustering

Emma Aspland et al. J Biomed Inform. 2021 Mar.

Abstract

Clinical pathways are used to guide clinicians to provide a standardised delivery of care. Because of their standardisation, the aim of clinical pathways is to reduce variation in both care process and patient outcomes. When learning clinical pathways from data through data mining, it is common practice to represent each patient pathway as a string corresponding to their movements through activities. Clustering techniques are popular methods for pathway mining, and therefore this paper focuses on distance metrics applied to string data for k-medoids clustering. The two main aims are to firstly, develop a technique that seamlessly integrates expert information with data and secondly, to develop a string distance metric for the purpose of process data. The overall goal was to allow for more meaningful clustering results to be found by adding context into the string similarity calculation. Eight common distance metrics and their applicability are discussed. These distance metrics prove to give an arbitrary distance, without consideration for context, and each produce different results. As a result, this paper describes the development of a new distance metric, the modified Needleman-Wunsch algorithm, that allows for expert interaction with the calculation by assigning groupings and rankings to activities, which provide context to the strings. This algorithm has been developed in partnership with UK's National Health Service (NHS) with the focus on a lung cancer pathway, however the handling of the data and algorithm allows for application to any disease type. This method is contained within Sim.Pro.Flow, a publicly available decision support tool.

Keywords: Clinical pathways; Data mining; Lung cancer.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

None
Graphical abstract
Fig. 1
Fig. 1
All patient pathways displayed as a heatmap.
Fig. 2
Fig. 2
Example of the calculation for the Levenshtein distance.
Fig. 3
Fig. 3
Example of dynamic programming using the Levenshtein distance.
Fig. 4
Fig. 4
Example of the calculation for the Damerau–Levenshtein distance.
Fig. 5
Fig. 5
Example of the calculation for the Jaro distance. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 6
Fig. 6
Example of the calculation for the Jaro–Winkler distance.
Fig. 7
Fig. 7
Example of the calculation for the Needleman–Wunsch distance.
Fig. 8
Fig. 8
Example of the Needleman–Wunsch algorithm.
Fig. 9
Fig. 9
Example of bi-gram for Jaccard distance.
Fig. 10
Fig. 10
Example of longest common subsequence.
Fig. 11
Fig. 11
Example of longest common subsequence.
Fig. 12
Fig. 12
Example of modified dynamic programming algorithm.
Fig. 13
Fig. 13
Example of modified traceback.
Fig. 14
Fig. 14
Example of feature five.
Fig. A.15
Fig. A.15
Simplified National Optimal Lung Cancer Pathway.
Fig. A.16
Fig. A.16
Modified Needleman–Wunsch Distance Matrix for Sample 2.
Fig. A.17
Fig. A.17
Comparison of the Ten Metrics Applied to Sample 1. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. A.18
Fig. A.18
Comparison of the Ten Metrics Applied to Sample 2. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
None
None
None
None

References

    1. World Health Organisation . 2018. Latest global cancer data. https://www.who.int/cancer/PRGlobocanFinal.pdf.
    1. Snyder M. Big data and health. Lancet Digit. Health. 2019;1(6):e252–e254. - PubMed
    1. Zhang Y., Padman R., Patel N. Paving the cowpath: Learning and visualizing clinical pathways from electronic health record data. J. Biomed. Inform. 2015;58:186–197. cited By 10. - PubMed
    1. Fauman M. Do physicians use practice guidelines? Psychiatr. Times. 2006:13.
    1. Aspland E.L., Gartner D., Harper P.R. Clinical pathway modelling: A literature. Health Syst. 2019 - PMC - PubMed

Publication types

LinkOut - more resources