Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 May 28:4:5081.
doi: 10.1038/srep05081.

Detecting a weak association by testing its multiple perturbations: a data mining approach

Affiliations

Detecting a weak association by testing its multiple perturbations: a data mining approach

Min-Tzu Lo et al. Sci Rep. .

Abstract

Many risk factors/interventions in epidemiologic/biomedical studies are of minuscule effects. To detect such weak associations, one needs a study with a very large sample size (the number of subjects, n). The n of a study can be increased but unfortunately only to an extent. Here, we propose a novel method which hinges on increasing sample size in a different direction-the total number of variables (p). We construct a p-based 'multiple perturbation test', and conduct power calculations and computer simulations to show that it can achieve a very high power to detect weak associations when p can be made very large. As a demonstration, we apply the method to analyze a genome-wide association study on age-related macular degeneration and identify two novel genetic variants that are significantly associated with the disease. The p-based method may set a stage for a new paradigm of statistical tests.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Powers of MPT for the sharp null (solid lines, theoretical power assuming independent auxiliary variables with perturbation proportion of, from left to right respectively, π = 1.0, 0.2, 0.1 and 0.05) and the conventional test for the crude null (dashed line), under different number of subjects (a: n = 500, b: n = 1,000, c: n = 5,000) and number of auxiliary variables. The power of the n-based increases with n. The power gain is only 30%, from 8% (n = 500, a) to 38% (n = 5,000, c). The power of the p-based MPT increases with p in all scenarios that we considered and surpasses the power of when p ≈ 3,000 for π = 1, p ≈ 60,000 for π = 0.2, p ≈ 250,000 for π = 0.1 and p ≈ 1,000,000 for π = 0.05. Under π = 1, the power of MPT can reach nearly 100% when p is sufficiently large (p > ~1,000,000 when n = 500; p > ~100,000 when n = 1,000; p > ~10,000 when n = 5,000). Under π < 1, ~100% power is also possible if p can be made even larger.
Figure 2
Figure 2. Fixation ((a–c), respectively for the 1st to the 3rd top SNPs on chromosome 1) and drifting ((d–f), for three purposefully chosen middle-to-bottom ranking SNPs on chromosome 1) of the P-values of MPT when only a certain number of perturbation SNPs are randomly incorporated for the age-related macular degeneration data. Each panel includes three lines (solid, dashed and dotted) representing three random incorporation sequences.
Each P-value is obtained from 1,000,000 rounds of permutation. The P-values initially fluctuate a lot, when the number of perturbation SNPs incorporated is small. But beyond a certain point, the P-values become ‘fixed' exactly to the abscissa (P-values = 0) (a and b), or almost so (P-values ≈ 0) (c). By comparison, the P-values of all three purposefully chosen middle-to-bottom ranking SNPs are ‘drifting' all the way without showing any sign of a fixation (d–f).
Figure 3
Figure 3. Power curve when a researcher includes the 100 informative variables (I = 0.02) known to him/her and then other low-informativity variables (dotted lines from left to right, for I = 0.001, 0.00025 and 0.0001, respectively) unselectively into MPT.

Similar articles

Cited by

References

    1. Siontis G. C. & Ioannidis J. P. Risk factors and interventions with statistically significant tiny effects. Int. J. Epidemiol. 40, 1292–1307 (2011). - PubMed
    1. Grontved A. & Hu F. B. Television viewing and risk of type 2 diabetes, cardiovascular disease, and all-cause mortality: a meta-analysis. JAMA 305, 2448–2455 (2011). - PMC - PubMed
    1. Hemila H. & Chalker E. Vitamin C for preventing and treating the common cold. Cochrane Database Syst. Rev. 1, CD000980 (2013). - PMC - PubMed
    1. Ioannidis J. P., Trikalinos T. A. & Khoury M. J. Implications of small effect sizes of individual genetic variants on the design and interpretation of genetic association studies of complex diseases. Am. J. Epidemiol. 164, 609–614 (2006). - PubMed
    1. Hindorff L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. U S A 106, 9362–9367 (2009). - PMC - PubMed

Publication types