Quantum processor-inspired machine learning in the biomedical sciences

Richard Y Li^{1

2

3}, Sharvari Gujja^{4

5}, Sweta R Bajaj^{4

5}, Omar E Gamel⁴, Nicholas Cilfone⁴, Jeffrey R Gulcher⁶, Daniel A Lidar^{1

3

7

8}, Thomas W Chittenden^{4

5

9}

Affiliations

¹ Department of Chemistry, University of Southern California, 920 Bloom Walk, Los Angeles, CA 90089, USA.
² Computational Biology and Bioinformatics Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA.
³ Center for Quantum Information Science & Technology, University of Southern California, Los Angeles, Boston, CA, USA.
⁴ Computational Statistics and Bioinformatics Group, Genuity AI Research Institute, Genuity Science, 90 Canal Street, Suite 120, Boston, MA 02114, USA.
⁵ Complex Biological Systems Alliance, Medford, MA, USA.
⁶ Cancer Genetics Group, Genuity Science, Boston, MA, USA.
⁷ Department of Electrical Engineering, University of Southern California, Los Angeles, CA, USA.
⁸ Department of Physics and Astronomy, University of Southern California, Los Angeles, CA, USA.
⁹ Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA.

PMID: 34179840
PMCID: PMC8212142
DOI: 10.1016/j.patter.2021.100246

Quantum processor-inspired machine learning in the biomedical sciences

Richard Y Li et al. Patterns (N Y). 2021.

. 2021 Apr 28;2(6):100246.

doi: 10.1016/j.patter.2021.100246. eCollection 2021 Jun 11.

Authors

Richard Y Li^{1

2

3}, Sharvari Gujja^{4

5}, Sweta R Bajaj^{4

5}, Omar E Gamel⁴, Nicholas Cilfone⁴, Jeffrey R Gulcher⁶, Daniel A Lidar^{1

3

7

8}, Thomas W Chittenden^{4

5

9}

Affiliations

¹ Department of Chemistry, University of Southern California, 920 Bloom Walk, Los Angeles, CA 90089, USA.
² Computational Biology and Bioinformatics Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA.
³ Center for Quantum Information Science & Technology, University of Southern California, Los Angeles, Boston, CA, USA.
⁴ Computational Statistics and Bioinformatics Group, Genuity AI Research Institute, Genuity Science, 90 Canal Street, Suite 120, Boston, MA 02114, USA.
⁵ Complex Biological Systems Alliance, Medford, MA, USA.
⁶ Cancer Genetics Group, Genuity Science, Boston, MA, USA.
⁷ Department of Electrical Engineering, University of Southern California, Los Angeles, CA, USA.
⁸ Department of Physics and Astronomy, University of Southern California, Los Angeles, CA, USA.
⁹ Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA.

PMID: 34179840
PMCID: PMC8212142
DOI: 10.1016/j.patter.2021.100246

Abstract

Recent advances in high-throughput genomic technologies coupled with exponential increases in computer processing and memory have allowed us to interrogate the complex molecular underpinnings of human disease from a genome-wide perspective. While the deluge of genomic information is expected to increase, a bottleneck in conventional high-performance computing is rapidly approaching. Inspired by recent advances in physical quantum processors, we evaluated several unconventional machine-learning (ML) strategies on actual human tumor data, namely "Ising-type" methods, whose objective function is formulated identical to simulated annealing and quantum annealing. We show the efficacy of multiple Ising-type ML algorithms for classification of multi-omics human cancer data from The Cancer Genome Atlas, comparing these classifiers to a variety of standard ML methods. Our results indicate that Ising-type ML offers superior classification performance with smaller training datasets, thus providing compelling empirical evidence for the potential future application of unconventional computing approaches in the biomedical sciences.

Keywords: The Cancer Genome Atlas; cancer genomics; machine learning.

PubMed Disclaimer

Conflict of interest statement

O.E.G., S.G., J.R.G., N.C., S.R.B., and T.W.C. were employed by Genuity Science during the research project. R.Y.L. was the recipient of a research grant from Genuity Science during the research project. The work of D.A.L. is based upon work (partially) supported by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via a US Army Research Office contract. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the ODNI, IARPA, or US government. The US government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation thereon. The authors declare no other competing interests.

Figures

**Figure 1**
Overview of strategy and cancer types used in this study (A) Overview of classification strategy. (i) Whole-exome sequencing, RNA-seq, miRNA-seq, DNA methylation array, and genotyping array (for CNVs) data were retrieved from The Cancer Genome Atlas for human cancer type and molecular subtype classification. Data were concatenated and transformed into a single scaled omics data matrix. The matrix was then repeatedly split into 100 unique training and independent test sets representing 80% and 20% of the total data, respectively. After the data were split, each training split was scaled to have zero mean and unit standard deviation. The same scaling was then applied to the corresponding test split. (ii) Principal-component analysis (PCA) was performed separately on each individual training set, and a subsequent matched test set was projected using training-set-specific PCA loadings. (iii) Several standard classical machine-learning (ML) algorithms were compared with quantum annealing and several classical algorithms that have the same objective function as quantum annealing. The standard classical ML methods assessed included least absolute shrinkage and selection operator (LASSO), ridge regression (RIDGE), random forest, naive Bayes, and support vector machine (SVM). Quantum annealing (D-Wave) was performed on D-Wave hardware by formulating the classification problem as an Ising problem (see experimental procedures). These classical Ising-type approaches include simulated annealing (SA), candidate solutions randomly generated and sorted according to the Ising energy (Random), and an approach that considers only local fields of the Ising problem (Field). Hyperparameters were tuned on the train data using a 10-fold cross-validation (see supplemental experimental procedures for a description of the ranges of hyperparameters used). (iv) After training, classification performance was validated with each corresponding test set (unseen during the tuning of hyperparameters and the training) for a variety of statistical metrics, including balanced accuracy, area under the ROC curve (AUC), and F1 score. Classification performance metrics were averaged for the 100 test sets for each model to provide statistics on the mean performance. (B) The six human cancer types used for the multiclass classification models. Patient sample sizes are indicated in parentheses.

**Figure 2**
Comparison of classification algorithms for five TCGA cancer datasets Human cancer datasets assessed: breast invasive carcinoma (BRCA) versus matched normal tissue (normal), estrogen receptor positive (ERpos) versus estrogen receptor negative (ERneg) breast cancers, kidney renal clear cell carcinoma (KIRC) versus kidney renal papillary cell carcinoma (KIRP), lung adenocarcinoma (LUAD) versus lung squamous cell carcinoma (LUSC), and luminal A (LumA) versus luminal B (LumB) breast cancers. To address class imbalance for each comparison, algorithm performance is ranked by mean balanced accuracy on the x axis. By and large, the other metrics indicate the same performance ranking. Classification performance metrics were averaged for the 100 unique training and test sets for each model (see experimental procedures). Performance metrics: accuracy (red), AUC (green), balanced accuracy (blue), and F1 score (purple). Data are presented as the mean ± SEM.

**Figure 3**
Test set balanced accuracy for LumA versus LumB binomial classification with incremental decreases from 95% to 20% of the original training set The algorithms evaluated are indicated in the legend. Averaged balanced accuracies were calculated for 50 independent training sets at each designated fraction of the original training data. Data are presented as the mean ± SEM.

**Figure 4**
Classification, hierarchical clustering, functional enrichment, and natural language processing of the top 44 genes of PC1 for LumA versus LumB binomial comparison (A) Gene-level classification of LumA versus LumB human breast cancers based on the top 44 genes of PC1. Data are presented as the mean ± SEM. (B) Classical hierarchical clustering algorithm (see experimental procedures). Note: genes are presented in rows and samples in columns. (C) GOseq functional enrichment analysis of the top 44 genes for PC1 shows enriched GO terms ordered by p value. (D) Circos plot representing semantic search of full-text articles within the PubMed Central database identifying published associations of the top 44 genes for PC1 to the query terms *cancer* and *breast cancer*. The red and blue outer bands represent “mRNA” and “methylation” data types, respectively. The inner blue band represents genes with known functional annotation. The intensity of the inner purple ring indicates the total number of publications on cancer and breast cancer for the top 44 genes of PC1. This band has six colored bins, where white is the lowest and dark purple the highest number of publications at the time of analysis. The thickness and color of the Circos plot ribbons indicate the number of published gene-to-query term associations: green represents cancer and yellow designates breast cancer.

See this image and copyright information in PMC

References

1. Golub T.R., Slonim D.K., Tamayo P., Huard C., Gaasenbeek M., Mesirov J.P., Coller H., Loh M.L., Downing J.R., Caligiuri M.A. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286:531–537. - PubMed
1. Nevins J.R., Potti A. Mining gene expression profiles: expression signatures as cancer phenotypes. Nat. Rev. Genet. 2007;8:601–609. - PubMed
1. Hoadley K.A., Yau C., Wolf D.M., Cherniack A.D., Tamborero D., Ng S., Leiserson M.D.M., Niu B., McLellan M.D., Uzunangelov V. Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell. 2014;158:929–944. - PMC - PubMed
1. Hoadley K.A., Yau C., Hinoue T., Wolf D.M., Lazar A.J., Drill E., Shen R., Taylor A.M., Cherniack A.D., Thorsson V. Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer. Cell. 2018;173:291–e6. - PMC - PubMed
1. Uhlen M., Zhang C., Lee S., Sjöstedt E., Fagerberg L., Bidkhori G., Benfeitas R., Arif M., Liu Z., Edfors F. A pathology atlas of the human cancer transcriptome. Science. 2017;357:eaan2507. - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Quantum processor-inspired machine learning in the biomedical sciences

Affiliations

Quantum processor-inspired machine learning in the biomedical sciences

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources