Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018;2(4):402-422.
doi: 10.1007/s41666-018-0029-6. Epub 2018 Jul 30.

Nearest Consensus Clustering Classification to Identify Subclasses and Predict Disease

Affiliations

Nearest Consensus Clustering Classification to Identify Subclasses and Predict Disease

Awad A Alyousef et al. J Healthc Inform Res. 2018.

Abstract

Disease subtyping, which helps to develop personalized treatments, remains a challenge in data analysis because of the many different ways to group patients based upon their data. However, if we can identify subclasses of disease, then it will help to develop better models that are more specific to individuals and should therefore improve prediction and understanding of the underlying characteristics of the disease in question. This paper proposes a new algorithm that integrates consensus clustering methods with classification in order to overcome issues with sample bias. The new algorithm combines K-means with consensus clustering in order build cohort-specific decision trees that improve classification as well as aid the understanding of the underlying differences of the discovered groups. The methods are tested on a real-world freely available breast cancer dataset and data from a London hospital on systemic sclerosis, a rare potentially fatal condition. Results show that "nearest consensus clustering classification" improves the accuracy and the prediction significantly when this algorithm has been compared with competitive similar methods.

Keywords: Classification; Consensus clustering; Disease subgroup discovery.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Consensus clustering algorithm (schematic)
Fig. 2
Fig. 2
Nearest consensus clustering classification: training and testing data (schematic figure)
Fig. 3
Fig. 3
Comparison of K-means, decision tree, nearest K-means, and nearest CC for time to develop pulmonary arterial hypertension class in systemic sclerosis dataset
Fig. 4
Fig. 4
Consensus clustering decision tree for group 1 in SS dataset and time to develop pulmonary arterial hypertension class
Fig. 5
Fig. 5
Consensus clustering decision tree for group 2 in SS dataset and time to develop pulmonary arterial hypertension class
Fig. 6
Fig. 6
Consensus clustering decision tree for group 3 in SS Dataset and time to develop pulmonary arterial hypertension class
Fig. 7
Fig. 7
Kaplan-Meier curves by nearest consensus clustering on time to develop pulmonary arterial hypertension dataset. With time to develop pulmonary arterial hypertension in months on the x-axis and percentage of patients survived from that organ complication on the y-axis, the graph illustrates the survival curves obtained grouping patients based on nearest consensus clustering
Fig. 8
Fig. 8
Comparison of K-means, decision tree, nearest K-means, and nearest CC for time to death class in systemic sclerosis dataset
Fig. 9
Fig. 9
Kaplan-Meier curves by nearest consensus clustering on time to death dataset. With time to death in months on the x-axis and percentage of patients survived on the y-axis, the graph illustrates the survival curves obtained grouping patients based on nearest consensus clustering
Fig. 10
Fig. 10
Comparison of nearest CC classification for time to develop pulmonary arterial hypertension class with different values of K
Fig. 11
Fig. 11
Comparison of nearest CC classification for time to death class with different values of K

References

    1. Kellam P, Liu X, Martin N, Orengo C, Swift S, Tucker A (2004) Comparing, contrasting and combining in viral gene expression data
    1. Kalyani P. Approaches to partition medical data using clustering algorithms. Int J Comput Appl. 2012;49(N23):7–10.
    1. Wu P, Liu J, Pei S, Wu C, Yang K, Wang S, Wu S. Integrated genomic analysis identifies clinically relevant subtypes of renal clear cell carcinoma. BMC Cancer. 2018;18(1):287. doi: 10.1186/s12885-018-4176-1. - DOI - PMC - PubMed
    1. Zhu P, Zhu W, Hu Q, Zhang C, Zuo W. Subspace clustering guided unsupervised feature selection. Pattern Recogn. 2017;66:364–374. doi: 10.1016/j.patcog.2017.01.016. - DOI
    1. Tucker A, Garway D. The pseudotemporal bootstrap for predicting glaucoma from cross-sectional visual field data. IEEE. 2010;14:N1. - PubMed

LinkOut - more resources