Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 May 3;102(5):920-942.
doi: 10.1016/j.ajhg.2018.03.026.

FUN-LDA: A Latent Dirichlet Allocation Model for Predicting Tissue-Specific Functional Effects of Noncoding Variation: Methods and Applications

Affiliations

FUN-LDA: A Latent Dirichlet Allocation Model for Predicting Tissue-Specific Functional Effects of Noncoding Variation: Methods and Applications

Daniel Backenroth et al. Am J Hum Genet. .

Abstract

We describe a method based on a latent Dirichlet allocation model for predicting functional effects of noncoding genetic variants in a cell-type- and/or tissue-specific way (FUN-LDA). Using this unsupervised approach, we predict tissue-specific functional effects for every position in the human genome in 127 different tissues and cell types. We demonstrate the usefulness of our predictions by using several validation experiments. Using eQTL data from several sources, including the GTEx project, Geuvadis project, and TwinsUK cohort, we show that eQTLs in specific tissues tend to be most enriched among the predicted functional variants in relevant tissues in Roadmap. We further show how these integrated functional scores can be used for (1) deriving the most likely cell or tissue type causally implicated for a complex trait by using summary statistics from genome-wide association studies and (2) estimating a tissue-based correlation matrix of various complex traits. We found large enrichment of heritability in functional components of relevant tissues for various complex traits, and FUN-LDA yielded higher enrichment estimates than existing methods. Finally, using experimentally validated functional variants from the literature and variants possibly implicated in disease by previous studies, we rigorously compare FUN-LDA with state-of-the-art functional annotation methods and show that FUN-LDA has better prediction accuracy and higher resolution than these methods. In particular, our results suggest that tissue- and cell-type-specific functional prediction methods tend to have substantially better prediction accuracy than organism-level prediction methods. Scores for each position in the human genome and for each ENCODE and Roadmap tissue are available online (see Web Resources).

Keywords: functional genomics; noncoding variation; prediction of functional effect.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Jaccard Index of Overlap among Predicted Functional Variants in Different Cell Types and Tissues in Roadmap Epigenomics Hierarchical clustering is used for clustering the different cell types and tissues.
Figure 2
Figure 2
Tissue Correlations for 21 Common Traits Hierarchical clustering (average linkage method) is used for clustering diseases. The x symbol indicates that those correlations are not significant at the 0.0001 level.
Figure 3
Figure 3
Functional Predictions from Different Methods Valley scores for four activating histone marks and DNase, posterior probabilities from FUN-LDA, GenoSkyline, and ChromHMM, and segmentations from ChromHMM, IDEAS, and Segway are shown in 2 kb windows centered around the lead SNPs. For clarity, we highlight in the segmentations only the type of states we consider functional (enhancer states in red and promoter states in blue) for the different segmentation approaches.

References

    1. Lindblad-Toh K., Garber M., Zuk O., Lin M.F., Parker B.J., Washietl S., Kheradpour P., Ernst J., Jordan G., Mauceli E., Broad Institute Sequencing Platform and Whole Genome Assembly Team. Baylor College of Medicine Human Genome Sequencing Center Sequencing Team. Genome Institute at Washington University A high-resolution map of human evolutionary constraint using 29 mammals. Nature. 2011;478:476–482. - PMC - PubMed
    1. Khurana E., Fu Y., Colonna V., Mu X.J., Kang H.M., Lappalainen T., Sboner A., Lochovsky L., Chen J., Harmanci A., 1000 Genomes Project Consortium Integrative annotation of variants from 1092 humans: application to cancer genomics. Science. 2013;342:1235587. - PMC - PubMed
    1. ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. - PMC - PubMed
    1. Altshuler D., Daly M.J., Lander E.S. Genetic mapping in human disease. Science. 2008;322:881–888. - PMC - PubMed
    1. Khurana E., Fu Y., Chakravarty D., Demichelis F., Rubin M.A., Gerstein M. Role of non-coding sequence variants in cancer. Nat. Rev. Genet. 2016;17:93–108. - PubMed

Publication types

Substances

LinkOut - more resources