GARLIC: a bioinformatic toolkit for aetiologically connecting diseases and cell type-specific regulatory maps

Miloš Nikolic^{1

2}, Argyris Papantonis¹, Alvaro Rada-Iglesias^{1

2}

Affiliations

¹ Center for Molecular Medicine Cologne (CMMC), Robert-Koch-Str. 21, 50931 Cologne, Germany.
² The Cologne Cluster of Excellence in Cellular Stress Responses in Aging-associated Diseases (CECAD), Joseph-Stelzmann-Straße 26, 50931 Cologne, Germany.

PMID: 28007912
PMCID: PMC5409087
DOI: 10.1093/hmg/ddw423

GARLIC: a bioinformatic toolkit for aetiologically connecting diseases and cell type-specific regulatory maps

Miloš Nikolic et al. Hum Mol Genet. 2017.

. 2017 Feb 15;26(4):742-752.

doi: 10.1093/hmg/ddw423.

Authors

Miloš Nikolic^{1

2}, Argyris Papantonis¹, Alvaro Rada-Iglesias^{1

2}

Affiliations

¹ Center for Molecular Medicine Cologne (CMMC), Robert-Koch-Str. 21, 50931 Cologne, Germany.
² The Cologne Cluster of Excellence in Cellular Stress Responses in Aging-associated Diseases (CECAD), Joseph-Stelzmann-Straße 26, 50931 Cologne, Germany.

PMID: 28007912
PMCID: PMC5409087
DOI: 10.1093/hmg/ddw423

Abstract

Genome-wide association studies (GWAS) have emerged as a powerful tool to uncover the genetic basis of human common diseases, which often show a complex, polygenic and multi-factorial aetiology. These studies have revealed that 70-90% of all single nucleotide polymorphisms (SNPs) associated with common complex diseases do not occur within genes (i.e. they are non-coding), making the discovery of disease-causative genetic variants and the elucidation of the underlying pathological mechanisms far from straightforward. Based on emerging evidences suggesting that disease-associated SNPs are frequently found within cell type-specific regulatory sequences, here we present GARLIC (GWAS-based Prediction Toolkit for Connecting Diseases and Cell Types), a user-friendly, multi-purpose software with an associated database and online viewer that, using global maps of cis-regulatory elements, can aetiologically connect human diseases with relevant cell types. Additionally, GARLIC can be used to retrieve potential disease-causative genetic variants overlapping regulatory sequences of interest. Overall, GARLIC can satisfy several important needs within the field of medical genetics, thus potentially assisting in the ultimate goal of uncovering the elusive and complex genetic basis of common human disorders.

PubMed Disclaimer

Figures

**Figure 1.**
Overview of the GARLIC rationale and the included datasets. (A) SNPs causally involved in common complex diseases are predicted to occur within CRE present in disease-relevant cell types. These SNPs can alter the regulatory properties of CRE, which can lead to quantitative changes in gene expression and increased disease susceptibility. (B) GARLIC major underlying hypothesis is that the regulatory maps from the cell types or tissues most relevant for a given disease should be preferentially enriched in disease-associated SNPs in comparison with non-relevant cell types.

**Figure 2.**
GARLIC can be used to aetiologically connect human complex diseases and cell type-specific CRE maps. (A) GARLIC results obtained for a selected subset of diseases (rows) and CRE maps (columns) are shown as a heat map. The statistical connection between each disease and cell type is color coded according to GARLIC P-values, with the most and least significant connections represented in red and blue, respectively. (B) Radial plot summarizing the statistical connection between four selected diseases (indicated in the bottom left corner) and all cell types included in the GARLIC DB. The name of only a subset of all the investigated cell types is shown. Peaks closer to the outer border of the radial plot represent more significant connections, while those closer to the center are the least significant ones.

**Figure 3.**
Identification of SNPs located within CREs of interest. (A) SNPs overlapping CREs and their associated-diseases can be retrieved using as input either regulatory maps or (B) a single locus of interest.

**Figure 4.**
Graphical overview of GARLIC procedures. (A) As part of the data preprocessing, unique sets of GRRs for each disease and trait are generated. This is illustrated when either a single (Left) or multiple L-SNPs in LD (Right) are considered. (B) Each GRR gets assigned a GRR length and number of overlapping bp with a given regulatory map. The 20% shortest and longest GRRs are excluded and the remaining 60% of GRRs are used to calculate disease scores from which empirical P-values can be derived using the random sampling procedure. (C) Each seed map gets assigned a set of “complementary” regulatory maps, which are then used to generate “combined” maps. The number of overlapping bp with a given disease or trait of interest is calculated for each combination and only the one with the highest increase in coverage is kept for the next step of the procedure. (D) Seed combinations with the highest increase in bp coverage can then be tested for statistical connection with human diseases/traits using the same method employed with individual CRE maps. The number of seed combinations to be tested can be determined with input parameters based on coverage increase.

See this image and copyright information in PMC

References

1. Botstein D., Risch N. (2003) Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nat. Genet., 33 Suppl, 228–237. - PubMed
1. Hamosh A., Scott A.F., Amberger J., Bocchini C., Valle D., McKusick V.A. (2002) Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res., 30, 52–55. - PMC - PubMed
1. Altshuler D., Daly M.J., Lander E.S. (2008) Genetic mapping in human disease. Science, 322, 881–888. - PMC - PubMed
1. McCarthy M.I., Abecasis G.R., Cardon L.R., Goldstein D.B., Little J., Ioannidis J.P.A., Hirschhorn J.N. (2008) Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat. Rev. Genet., 9, 356–369. - PubMed
1. ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature, 489, 57–74. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

GARLIC: a bioinformatic toolkit for aetiologically connecting diseases and cell type-specific regulatory maps

Affiliations

GARLIC: a bioinformatic toolkit for aetiologically connecting diseases and cell type-specific regulatory maps

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical