Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul 15;35(14):2441-2448.
doi: 10.1093/bioinformatics/bty1005.

Accurate and efficient estimation of small P-values with the cross-entropy method: applications in genomic data analysis

Affiliations

Accurate and efficient estimation of small P-values with the cross-entropy method: applications in genomic data analysis

Yang Shi et al. Bioinformatics. .

Abstract

Motivation: Small P-values are often required to be accurately estimated in large-scale genomic studies for the adjustment of multiple hypothesis tests and the ranking of genomic features based on their statistical significance. For those complicated test statistics whose cumulative distribution functions are analytically intractable, existing methods usually do not work well with small P-values due to lack of accuracy or computational restrictions. We propose a general approach for accurately and efficiently estimating small P-values for a broad range of complicated test statistics based on the principle of the cross-entropy method and Markov chain Monte Carlo sampling techniques.

Results: We evaluate the performance of the proposed algorithm through simulations and demonstrate its application to three real-world examples in genomic studies. The results show that our approach can accurately evaluate small to extremely small P-values (e.g. 10-6 to 10-100). The proposed algorithm is helpful for the improvement of some existing test procedures and the development of new test procedures in genomic studies.

Availability and implementation: R programs for implementing the algorithm and reproducing the results are available at: https://github.com/shilab2017/MCMC-CE-codes.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Concordance between the true tail probabilities of the chi-squared random variables and the results from MCMC-CE. In each figure panel, the solid line represents the true tail probabilities and the dots represent the ones estimated from MCMC-CE. The detailed results are presented in Supplementary Tables S1–S4

References

    1. Ashburner M. et al. (2000) Gene Ontology: tool for the unification of biology. Nat. Genet., 25, 25–29. - PMC - PubMed
    1. Bangalore S.S. et al. (2009) How accurate are the extremely small P-values used in genomic research: an evaluation of numerical libraries. Comput. Stat. Data Anal., 53, 2446–2452. - PMC - PubMed
    1. Bausch J. (2013) On the efficient calculation of a linear combination of chi-square random variables with an application in counting string vacua. J. Phys. A Math. Theor., 46, 505202.
    1. Benjamini Y., Hochberg Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodological), 289–300.
    1. Bergemann T.L., Wilson J. (2011) Proportion statistics to detect differentially expressed genes: a comparison with log-ratio statistics. BMC Bioinformatics, 12, 228.. - PMC - PubMed

Publication types