Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jun 9;17(6):e0267714.
doi: 10.1371/journal.pone.0267714. eCollection 2022.

Discriminatory Gleason grade group signatures of prostate cancer: An application of machine learning methods

Affiliations

Discriminatory Gleason grade group signatures of prostate cancer: An application of machine learning methods

Mpho Mokoatle et al. PLoS One. .

Abstract

One of the most precise methods to detect prostate cancer is by evaluation of a stained biopsy by a pathologist under a microscope. Regions of the tissue are assessed and graded according to the observed histological pattern. However, this is not only laborious, but also relies on the experience of the pathologist and tends to suffer from the lack of reproducibility of biopsy outcomes across pathologists. As a result, computational approaches are being sought and machine learning has been gaining momentum in the prediction of the Gleason grade group. To date, machine learning literature has addressed this problem by using features from magnetic resonance imaging images, whole slide images, tissue microarrays, gene expression data, and clinical features. However, there is a gap with regards to predicting the Gleason grade group using DNA sequences as the only input source to the machine learning models. In this work, using whole genome sequence data from South African prostate cancer patients, an application of machine learning and biological experiments were combined to understand the challenges that are associated with the prediction of the Gleason grade group. A series of machine learning binary classifiers (XGBoost, LSTM, GRU, LR, RF) were created only relying on DNA sequences input features. All the models were not able to adequately discriminate between the DNA sequences of the studied Gleason grade groups (Gleason grade group 1 and 5). However, the models were further evaluated in the prediction of tumor DNA sequences from matched-normal DNA sequences, given DNA sequences as the only input source. In this new problem, the models performed acceptably better than before with the XGBoost model achieving the highest accuracy of 74 ± 01, F1 score of 79 ± 01, recall of 99 ± 0.0, and precision of 66 ± 0.1.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Blood DNA sequences x transformed into k-mers with their corresponding Gleason grade group y.
Fig 2
Fig 2. Architecture of an LSTM unit [59].
Fig 3
Fig 3. Architecture of a GRU unit [59].
Fig 4
Fig 4. This figure represents the summary of all the methods that were executed in this work.
Fig 5
Fig 5. Visualisation of TF-IDF kmers for BRCA 1.
Fig 6
Fig 6. Visualisation of TF-IDF kmers for BRCA 2 kmers.
Fig 7
Fig 7. Confusion matrix of the Random Forest model for BRCA 1.
Fig 8
Fig 8. Confusion matrix of the GRU model for BRCA 2.
Fig 9
Fig 9. Confusion matrix of the XGBoost model for the APC gene.

References

    1. Cassim N, Ahmad A, Wadee R, Rebbeck T, Glencross D, George J. Prostate cancer age-standardised incidence increase between 2006 and 2016 in Gauteng Province, South Africa: A laboratory data-based analysis. South African Medical Journal. 2021;111(1):26–32. doi: 10.7196/SAMJ.2020.v111i1.14850 - DOI - PubMed
    1. Pienta KJ, Esper PS. Risk factors for prostate cancer. Annals of internal medicine. 1993;118(10):793–803. doi: 10.7326/0003-4819-118-10-199305150-00007 - DOI - PubMed
    1. Heidenreich A, Bastian PJ, Bellmunt J, Bolla M, Joniau S, Mason M, et al.. Guidelines on prostate cancer. European association of urology. 2012; p. 45. - PubMed
    1. Gann PH. Risk factors for prostate cancer. Reviews in urology. 2002;4(Suppl 5):S3. - PMC - PubMed
    1. van der Leest M, Cornel E, Israël B, Hendriks R, Padhani AR, Hoogenboom M, et al.. Head-to-head comparison of transrectal ultrasound-guided prostate biopsy versus multiparametric prostate resonance imaging with subsequent magnetic resonance-guided biopsy in biopsy-naive men with elevated prostate-specific antigen: a large prospective multicenter clinical study. European urology. 2019;75(4):570–578. - PubMed

Publication types