Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Dec;41(8):779-789.
doi: 10.1002/gepi.22066. Epub 2017 Sep 14.

Analysis of cancer gene expression data with an assisted robust marker identification approach

Affiliations

Analysis of cancer gene expression data with an assisted robust marker identification approach

Hao Chai et al. Genet Epidemiol. 2017 Dec.

Abstract

Gene expression (GE) studies have been playing a critical role in cancer research. Despite tremendous effort, the analysis results are still often unsatisfactory, because of the weak signals and high data dimensionality. Analysis is often further challenged by the long-tailed distributions of the outcome variables. In recent multidimensional studies, data have been collected on GEs as well as their regulators (e.g., copy number alterations (CNAs), methylation, and microRNAs), which can provide additional information on the associations between GEs and cancer outcomes. In this study, we develop an ARMI (assisted robust marker identification) approach for analyzing cancer studies with measurements on GEs as well as regulators. The proposed approach borrows information from regulators and can be more effective than analyzing GE data alone. A robust objective function is adopted to accommodate long-tailed distributions. Marker identification is effectively realized using penalization. The proposed approach has an intuitive formulation and is computationally much affordable. Simulation shows its satisfactory performance under a variety of settings. TCGA (The Cancer Genome Atlas) data on melanoma and lung cancer are analyzed, which leads to biologically plausible marker identification and superior prediction.

Keywords: assisted analysis; cancer; gene expression; robustness.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Histograms of the outcome variables for the SKCM (left) and LUAD (right) data.

Similar articles

Cited by

References

    1. Aggarwal CC, Hinneburg A, Keim DA. Lecture Notes in Computer Science. Springer; 2001. On the surprising behavior of distance metrics in high dimensional space; pp. 420–434.
    1. Bowman L. Doctors, researchers worry about accuracy of social security “death file”. 2011 www.dailyrepublic.com/usworld/doctors-researchers-worry-about-accuracy-o...
    1. Fall K, Stromberg F, Rosell J, Andren O, E V, Group, S.-E. R. P. C. Reliability of death certificates in prostate cancer patients. Scand J Urol Nephrol. 2008;42:352–357. - PubMed
    1. Fan J, Fan Y, Barut E. Adaptive robust variable selection. Ann Statist. 2014;42:324–351. - PMC - PubMed
    1. Gross SM, Tibshirani R. Collaborative regression. Biostatistics. 2015;16:326–338. - PMC - PubMed

Substances

LinkOut - more resources