Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Dec 23;10(Suppl 4):118.
doi: 10.1186/s12918-016-0358-0.

High-dimensional omics data analysis using a variable screening protocol with prior knowledge integration (SKI)

Affiliations

High-dimensional omics data analysis using a variable screening protocol with prior knowledge integration (SKI)

Cong Liu et al. BMC Syst Biol. .

Abstract

Background: High-throughput technology could generate thousands to millions biomarker measurements in one experiment. However, results from high throughput analysis are often barely reproducible due to small sample size. Different statistical methods have been proposed to tackle this "small n and large p" scenario, for example different datasets could be pooled or integrated together to provide an effective way to improve reproducibility. However, the raw data is either unavailable or hard to integrate due to different experimental conditions, thus there is an emerging need to develop a method for "knowledge integration" in high-throughput data analysis.

Results: In this study, we proposed an integrative prescreening approach, SKI, for high-throughput data analysis. A new rank is generated based on two initial ranks: (1) knowledge based rank; and (2) marginal correlation based rank. Our simulation shows the SKI outperforms other methods without knowledge-integration in terms of higher true positive rate given the same number of variables selected. We also applied our method in a drug response study and found its performance to be better than regular screening methods.

Conclusion: The proposed method provides an effective way to integrate knowledge for high-throughput analysis. It could easily implemented with our provided R package named SKI.

Keywords: Dimension reduction; Knowledge integration; SKI; Sure independence screening; Variable selection.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
A brief description of (i)SKI procedure. For each variable, two ranks are generated, one based on prior knowledge (R 0), the other based on marginal correlation (R 1). A predefined α, (or estimated based on the dev. ratio) is used to control the weight of prior knowledge. Variables are then sorted by weighted geometric mean of two ranks. SKI first reduces the variable number from p to d, and then a more sophisticated method such as SCAD is used to further refine the model to size d ’ and estimate the parameters. iSKI updates the marginal correlation based rank (R 1) by regressing residues over the rest p − d ’ variables. The procedure is repeated until the desired number of parameters obtained
Fig. 2
Fig. 2
Boxplot of squared error for selumtinib response prediction using two methods. Whiskers indicate min/max, upper box limit 75% percentile, low box limit 25% percentile and line the median

References

    1. Pepe MS, et al. Phases of biomarker development for early detection of cancer. J Natl Cancer Inst. 2001;93(14):1054–61. doi: 10.1093/jnci/93.14.1054. - DOI - PubMed
    1. Doecke JD, et al. Blood-based protein biomarkers for diagnosis of Alzheimer disease. Arch Neurol. 2012;69(10):1318–25. doi: 10.1001/archneurol.2012.1282. - DOI - PMC - PubMed
    1. Zheng B, et al. A three-gene panel that distinguishes benign from malignant thyroid nodules. Int J Cancer. 2015;136(7):1646–54. doi: 10.1002/ijc.29172. - DOI - PubMed
    1. Gu JL, et al. Multiclass classification of sarcomas using pathway based feature selection method. J Theor Biol. 2014;362:3–8. doi: 10.1016/j.jtbi.2014.06.038. - DOI - PubMed
    1. Cheang MC, et al. Basal-like breast cancer defined by five biomarkers has superior prognostic value than triple-negative phenotype. Clin Cancer Res. 2008;14(5):1368–76. doi: 10.1158/1078-0432.CCR-07-1658. - DOI - PubMed

LinkOut - more resources