. 2024 Dec 26;41(1):btaf017.

doi: 10.1093/bioinformatics/btaf017.

Funmap: integrating high-dimensional functional annotations to improve fine-mapping

Yuekai Li¹, Jiashun Xiao², Jingsi Ming³, Yicheng Zeng², Mingxuan Cai¹

Affiliations

¹ Department of Biostatistics, City University of Hong Kong, Hong Kong, China.
² Shenzhen International Center for Industrial and Applied Mathematics, Shenzhen Research Institute of Big Data, Shenzhen 518172, China.
³ Academy of Statistics and Interdisciplinary Sciences, KLATASDS-MOE, East China Normal University, Shanghai 200062, China.

PMID: 39799513
PMCID: PMC11769679
DOI: 10.1093/bioinformatics/btaf017

Funmap: integrating high-dimensional functional annotations to improve fine-mapping

Yuekai Li et al. Bioinformatics. 2024.

. 2024 Dec 26;41(1):btaf017.

doi: 10.1093/bioinformatics/btaf017.

Authors

Yuekai Li¹, Jiashun Xiao², Jingsi Ming³, Yicheng Zeng², Mingxuan Cai¹

Affiliations

¹ Department of Biostatistics, City University of Hong Kong, Hong Kong, China.
² Shenzhen International Center for Industrial and Applied Mathematics, Shenzhen Research Institute of Big Data, Shenzhen 518172, China.
³ Academy of Statistics and Interdisciplinary Sciences, KLATASDS-MOE, East China Normal University, Shanghai 200062, China.

PMID: 39799513
PMCID: PMC11769679
DOI: 10.1093/bioinformatics/btaf017

Abstract

Motivation: Fine-mapping aims to prioritize causal variants underlying complex traits by accounting for the linkage disequilibrium of genome-wide association study risk locus. The expanding resources of functional annotations serve as auxiliary evidence to improve the power of fine-mapping. However, existing fine-mapping methods tend to generate many false positive results when integrating a large number of annotations.

Results: In this study, we propose a unified method to integrate high-dimensional functional annotations with fine-mapping (Funmap). Funmap can effectively improve the power of fine-mapping by borrowing information from hundreds of functional annotations. Meanwhile, it relates the annotation to the causal probability with a random effects model that avoids the over-fitting issue, thereby producing a well-controlled false positive rate. Paired with a fast algorithm, Funmap enables scalable integration of a large number of annotations to facilitate prioritizing multiple causal single nucleotide polymorphisms. Our comprehensive simulations across a wide range of annotation relevance settings demonstrate that Funmap is the only method that produces well-calibrated false discovery rate under the setting of high-dimensional annotations while achieving better or comparable power gains as compared to existing methods. By integrating genome-wide association studies of 4 lipid traits with 187 functional annotations, Funmap consistently identified more variants that can be replicated in an independent cohort, achieving 15.5%-26.2% improvement over the runner-up in terms of replication rate.

Availability and implementation: The Funmap software and all analysis code are available at https://github.com/LeeHITsz/Funmap.

PubMed Disclaimer

Figures

**Figure 1.**
Comparison of FDR control in simulation studies. (a, b) Calibration of FDR with $n = 50 000, m = 100$ , while the number of causal SNPs is set to $L_{0} = 2$ (a) and $L_{0} = 3$ (b). Results are summarized from 500 replications across 10 regions. (c). An illustrative example generated by simulation. The first column shows the absolute correlation among the two candidate causal SNPs and their neighboring SNPs and the Manhattan plot. The second to fourth columns show the PIP obtained by with compared methods. Red dots represent causal SNPs. Dots with the same color of outline represent SNPs in the level-95% credible sets of a causal signal.

**Figure 2.**
Comparison of statistical power in simulation studies. (a, b) Statistical power of compared methods with $n = 50 000, m = 100$ while the number of causal SNPs is set to $L_{0} = 2$ (a) and $L_{0} = 3$ (b). (c) An illustrative example generated by simulation.

**Figure 3.**
Comparison of PIP and CPU timings. (a, b). Comparison of PIP between Funmap and SuSiE (left panel), CARMA+anno (middle panel), and PAINTOR+anno (right panel) with $n = 50 000, m = 100$ , while $L_{0}$ is varied at 2 (a) and 3 (b). (c) CPU timings are shown for increasing p with $m = 100$ (left panel) and increasing m with $p = 1833$ . (d) Boxplot displays the size of the 95% credible sets from the simulation results with $n = 50 000, m = 100, L_{0} \in {2, 3}$ .

**Figure 4.**
Replication analysis of Funmap, CARMA+anno, and PAINTOR+anno. Bar charts on the top shows the fraction and number of newly identified SNPs with P-value $< 5 \times 10^{- 8}$ in the replication cohorts of GLGC GWAS. Bar charts on the bottom shows the fraction and number of newly identified SNPs that are included in the $95 %$ -level credible sets generated from GLGC GWAS with SuSiE.

**Figure 5.**
Comparison of credible set size and fine-mapping results from a region of TC GWAS. (a) Box plots of credible set size across four lipid traits. (b) Fine-mapping results of TC from locus 6 Mb–7 Mb in chromosome 8. The first column shows the heatmap of absolute correlation between rs2928617 and its neighboring SNPs and the Manhattan plot. The red dashed line represents $5 \times 10^{- 8}$ . The second to fourth column show the PIP obtained by with compared methods. The purple square represents SNP rs2928617 and the color of the points represents the correlation between neighboring SNPs and rs2928617. Dots with the same color of outline represent SNPs in the level-95% credible sets of a causal signal.

**Figure 6.**
Box plot for Funmap annotation importance scores across 864 genomic regions of four lipid traits (190–374 regions per trait).

See this image and copyright information in PMC

Cited by

Genome-wide iterative fine-mapping for non-Gaussian phenotypes.
Xu S, Williams J, Tegge A, Ferreira MAR. Xu S, et al. Sci Rep. 2025 Aug 17;15(1):30080. doi: 10.1038/s41598-025-09270-x. Sci Rep. 2025. PMID: 40820093 Free PMC article.

References

1. Benner C, Spencer CC, Havulinna AS. et al. Finemap: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 2016;32:1493–501. - PMC - PubMed
1. Bouchard G. Efficient bounds for the softmax function and applications to approximate inference in hybrid models. In: NIPS 2007 Workshop for Approximate Bayesian Inference in Continuous/Hybrid Systems, Vancouver, Vol. 6. 2007.
1. Bycroft C, Freeman C, Petkova D. et al. The UK biobank resource with deep phenotyping and genomic data. Nature 2018;562:203–9. - PMC - PubMed
1. Cai M, Xiao J, Zhang S. et al. A unified framework for cross-population trait prediction by leveraging the genetic correlation of polygenic traits. Am J Hum Genet 2021;108:632–55. - PMC - PubMed
1. Carbonetto P, Stephens M.. Scalable variational inference for Bayesian variable selection in regression, and its accuracy in genetic association studies. Bayesian Anal 2012;7:73–108.

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

12301383/National Natural Science Foundation of China

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Funmap: integrating high-dimensional functional annotations to improve fine-mapping

Affiliations

Funmap: integrating high-dimensional functional annotations to improve fine-mapping

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources