Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar 19:18:668-675.
doi: 10.1016/j.csbj.2020.03.007. eCollection 2020.

PreMSIm: An R package for predicting microsatellite instability from the expression profiling of a gene panel in cancer

Affiliations

PreMSIm: An R package for predicting microsatellite instability from the expression profiling of a gene panel in cancer

Lin Li et al. Comput Struct Biotechnol J. .

Abstract

Microsatellite instability (MSI) is a genomic property of the cancers with defective DNA mismatch repair and is a useful marker for cancer diagnosis and treatment in diverse cancer types. In particular, MSI has been associated with the active immune checkpoint blockade therapy response in cancer. Most of computational methods for predicting MSI are based on DNA sequencing data and a few are based on mRNA expression data. Using the RNA-Seq pan-cancer datasets for three cancer cohorts (colon, gastric, and endometrial cancers) from The Cancer Genome Atlas (TCGA) program, we developed an algorithm (PreMSIm) for predicting MSI from the expression profiling of a 15-gene panel in cancer. We demonstrated that PreMSIm had high prediction performance in predicting MSI in most cases using both RNA-Seq and microarray gene expression datasets. Moreover, PreMSIm displayed superior or comparable performance versus other DNA or mRNA-based methods. We conclude that PreMSIm has the potential to provide an alternative approach for identifying MSI in cancer.

Keywords: ACC, adrenocortical carcinoma; AUC, area under the curve; Algorithm; BLCA, bladder urothelial carcinoma; BRCA, breast invasive carcinoma; CESC, cervical squamous cell carcinoma and endocervical adenocarcinoma; CHOL, cholangiocarcinoma; COAD, colon adenocarcinoma; CV, cross validation; Cancer; Classification; DLBC, lymphoid neoplasm diffuse large B-cell lymphoma; ESCA, esophageal carcinoma; GBM, glioblastoma multiforme; GEO, Gene Expression Omnibus; GO, gene ontology; Gene expression profiling; HNSC, head and neck squamous cell carcinoma; KICH, kidney chromophobe; KIRC, kidney renal clear cell carcinoma; KIRP, kidney renal papillary cell carcinoma; LAML, acute myeloid leukemia; LGG, brain lower grade glioma; LIHC, liver hepatocellular carcinoma; LUAD, lung adenocarcinoma; LUSC, lung squamous cell carcinoma; MESO, mesothelioma; MSI, microsatellite instability; MSS, microsatellite stability; Machine learning; Microsatellite instability; OV, ovarian serous cystadenocarcinoma; PAAD, pancreatic adenocarcinoma; PCPG, pheochromocytoma and paraganglioma; PPI, protein-protein interaction; PRAD, prostate adenocarcinoma; READ, rectum adenocarcinoma; RF, random forest; ROC, receiver operating characteristic; SARC, sarcoma; SKCM, skin cutaneous melanoma; STAD, stomach adenocarcinoma; SVM, support vector machine; TCGA, The Cancer Genome Atlas; TGCT, testicular germ cell tumors; THCA, thyroid carcinoma; THYM, thymoma; UCEC, uterine corpus endometrial carcinoma; UCS, uterine carcinosarcoma; UVM, uveal melanoma; XGBoost, extreme gradient boosting; k-NN, k-nearest neighbor.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

None
Graphical abstract
Fig 1
Fig 1
A summary of the PreMSIm algorithm and 15 gene signatures selected. A, Flowchart for the algorithm. B, Heatmap for the expression levels of 15 gene signatures in PreMSIm in the MSI-H and MSI-L/MSS subtypes of the TCGA pan-cancer. MSI-H: MSI-high. MSI-L/MSS: MSI-low/microsatellite stability.
Fig 2
Fig 2
Comparisons of the MSI prediction results by PreMSIm with those by other algorithms. A, B, and C, The overlapping rates of the MSI prediction results between PreMSIm and MOSAIC (A), MANTIS (B), and MSIsensor (C) in the TCGA pan-cancer and multiple individual cancer types. The Fisher’s exact test P-values are shown. *P < 0.05, **P < 0.01, ***P < 0.001. D and E, Comparisons of the prediction performance of PreMSIm with that of two other mRNA-based methods by Danaher et al. (D) and by Pacinkova et al. (E), respectively. BLCA: bladder urothelial carcinoma. BRCA: breast invasive carcinoma. CESC: cervical squamous cell carcinoma and endocervical adenocarcinoma. COAD: colon adenocarcinoma. ESCA: esophageal carcinoma. HNSC: head and neck squamous cell carcinoma. LUAD: lung adenocarcinoma. READ: rectum adenocarcinoma. STAD: stomach adenocarcinoma. UCEC: uterine corpus endometrial carcinoma. UCS: uterine carcinosarcoma.
Fig 3
Fig 3
Comparison of k-NN with other classifiers. A, The grid search with 10-fold CV in the TCGA pan-cancer to search for the optimal k(s) for k-NN. B, Comparison of the performance between four different k-NNs (k = 5, 7, 9, and 11) in predicting MSI. C, Comparison of the performance between k-NN (k = 5) and the RF, SVM, and XGBoost classifiers. RF: random forest. SVM: support vector machine. XGBoost: extreme gradient boosting.
Fig 4
Fig 4
Prediction performance of PreMSIm in predicting MSI. A, ROC curve analysis of TCGA colon cancer. B, ROC curve analysis of pan-cancer. All pan-cancer samples were separated into training (80% of samples) and test sets (20% of samples). In the training set, the 10-fold CV AUC was shown. C, ROC curve analysis of TCGA gastric and colon cancers using the TCGA endometrial cancers as the training set. D and E, ROC curve analysis of two gastric (D) and two colorectal (E) cancer cohorts in which the PreMSIm R package was used to predict MSI. MSI: microsatellite instability. CV: cross validation. AUC: area under the ROC curve. COAD: colon adenocarcinoma. STAD: stomach adenocarcinoma.

Similar articles

Cited by

References

    1. Vilar E., Gruber S.B. Microsatellite instability in colorectal cancer-the stable evidence. Nat Rev Clin Oncol. 2010;7(3):153–162. - PMC - PubMed
    1. de la Chapelle A., Hampel H. Clinical relevance of microsatellite instability in colorectal cancer. J Clin Oncol. 2010;28(20):3380–3387. - PMC - PubMed
    1. Le D.T. PD-1 blockade in tumors with mismatch-repair deficiency. N Engl J Med. 2015;372(26):2509–2520. - PMC - PubMed
    1. Umar A. Revised Bethesda Guidelines for hereditary nonpolyposis colorectal cancer (Lynch syndrome) and microsatellite instability. J Natl Cancer Inst. 2004;96(4):261–268. - PMC - PubMed
    1. Hegde M. ACMG technical standards and guidelines for genetic testing for inherited colorectal cancer (Lynch syndrome, familial adenomatous polyposis, and MYH-associated polyposis) Genet Med. 2014;16(1):101–116. - PubMed

LinkOut - more resources