Comparative Study

. 2008 Sep 10:5:21.

doi: 10.1186/1742-4682-5-21.

Feature context-dependency and complexity-reduction in probability landscapes for integrative genomics

Annick Lesne¹, Arndt Benecke

Affiliations

PMID: 18783599
PMCID: PMC2559821
DOI: 10.1186/1742-4682-5-21

Comparative Study

Feature context-dependency and complexity-reduction in probability landscapes for integrative genomics

Annick Lesne et al. Theor Biol Med Model. 2008.

. 2008 Sep 10:5:21.

doi: 10.1186/1742-4682-5-21.

Authors

Annick Lesne¹, Arndt Benecke

Affiliation

¹ Institut des Hautes Etudes Scientifiques, Bures-sur-Yvette, France. lesne@ihes.fr

PMID: 18783599
PMCID: PMC2559821
DOI: 10.1186/1742-4682-5-21

Abstract

Background: The question of how to integrate heterogeneous sources of biological information into a coherent framework that allows the gene regulatory code in eukaryotes to be systematically investigated is one of the major challenges faced by systems biology. Probability landscapes, which include as reference set the probabilistic representation of the genomic sequence, have been proposed as a possible approach to the systematic discovery and analysis of correlations amongst initially heterogeneous and un-relatable descriptions and genome-wide measurements. Much of the available experimental sequence and genome activity information is de facto, but not necessarily obviously, context dependent. Furthermore, the context dependency of the relevant information is itself dependent on the biological question addressed. It is hence necessary to develop a systematic way of discovering the context-dependency of functional genomics information in a flexible, question-dependent manner.

Results: We demonstrate here how feature context-dependency can be systematically investigated using probability landscapes. Furthermore, we show how different feature probability profiles can be conditionally collapsed to reduce the computational and formal, mathematical complexity of probability landscapes. Interestingly, the possibility of complexity reduction can be linked directly to the analysis of context-dependency.

Conclusion: These two advances in our understanding of the properties of probability landscapes not only simplify subsequent cross-correlation analysis in hypothesis-driven model building and testing, but also provide additional insights into the biological gene regulatory problems studied. Furthermore, insights into the nature of individual features and a classification of features according to their minimal context-dependency are achieved. The formal structure proposed contributes to a concrete and tangible basis for attempting to formulate novel mathematical structures for describing gene regulation in eukaryotes on a genome-wide scale.

PubMed Disclaimer

Figures

**Figure 1**
**Investigating context-dependency**. Point-wise comparison at a given genome location (the box underlines the location n+2) of probability profiles of a feature X obtained in condition B and under various additional prescriptions C_i(i = 1, 2, 3) with the joint profile constructed from the pooled data. We have denoted in short $P_{n}^{(i)} = P_{n}^{(X | B, C_{i})}$ and $P_{P_{n}}^{(i)} = P_{P_{n}}^{(X | B, C_{i})}$ the 'probabilities of probability', i.e. the functional distributions describing the estimated variability of the distributions $P_{n}^{(i)}$ . The comparison aims at determining whether the conditions C_iprovide additional information on X and decrease its indeterminacy or whether they can be ignored and the analysis performed on the pooled data. Essential conditions define the 'context' of the feature X.

**Figure 2**
**Defining local distance measures between probability profiles**. For the validity of the methodology and an unambiguous interpretation of its results, it is essential to proceed hierarchically, and to compare distributions obtained from restricted groups of data, respectively in conditions B∧C₁and B∧C₂, to the distribution obtained in the common biological condition B (pooled data). Each comparison is based on the computation of the Kullback-Leibler divergence $D_{n}^{(X)} (B \land C_{i} | B)$ between the distributions $P_{n}^{(X | B \land C_{i})}$ and $P_{n}^{(X | B)}$ . The significance of the comparison result depends on the variability of the distribution described by the functional distribution $P_{P_{n}}^{(X | B \land C_{i})}$ .

**Figure 3**
**Local and extended divergence**. From the knowledge of the point-wise distances $D_{n}^{(X)} (B \land C_{i} | B)$ (right box) an integrated comparison of the landscapes is performed by computing either and average distance or a cumulative distance ${\bar{D}}_{[n, n + Δ n]}^{(X)} (B \land C_{i} | B)$ (left box) as a weighted sum of the distances ${[D_{j}^{(X)}]}_{n \leq j \leq j + Δ n}$ . This procedure allows the extended sequence features (such as an exon for instance, black bar above nucleotide sequence) to be treated in a coherent manner. Individual nucleotide features (such as SNP data for instance), are compared directly (right box).

**Figure 4**
**Integration of distance profiles**. The local distance measure $D_{n}^{(X)} (B \land C_{i} | B)$ is computed over the entire profile length (genome). Unlike the individual feature probability profiles, the distance profile can be integrated to give rise to a meaningful genome wide distance measure. The proper integrated distance ${\bar{D}}_{I}^{(X)}$ might involve several genome intervals I = [n₁, n₁+ Δn₁] ∪ [n₂, n₂+ Δn₂] and/or an "infinite" interval [n₃, + ∞[. Obviously, other genome wide measures can be defined for the divergence such as the mean, median, sup, min, etc. Again, the divergence measure need not to be computed over all nucleotides but might be restricted to any combination of non-overlapping intervals I or individual positions n. In this way the global divergence measure computation can be restricted to particular sequence features such as coding regions.

**Figure 5**
**Feature probability quality profile construction for experimental data**. The set of conditions that are essential for feature X are determined hierarchically, either by considering more detailed prescriptions (additional disjoint conditions (C_i)_i) corresponding to a partition of the data in constructing the conditional profiles, or in aggregating the conditions if the conditions (C_i)_ihave no impact on the feature. This procedure can be performed recursively. Once sub-conditions have been collapsed to a biological condition, the biological condition can be compared using the same logic to the next higher level biological condition. Please note that for reasons of simplicity we only consider the two immediately concerned levels explicitly in the notation. Imagine for instance data pertaining to the transcriptome of different types of blood cells (C_i)_i. One might want to consider every cell type individually, or the red and white blood cells (B₁, B₂) jointly or the entire compartment (B₀).

**Figure 6**
**Flexible, question-driven profile collapse**. The context-dependency analysis is question dependent, and hence needs to be performed for each question individually. Thereby, individual sub-conditions can be combined in a non-exclusive manner as a function of their circumstantial context.

**Figure 7**
**A theoretical example of circumstantial context**. (A) Let Px be a subject from whom a blood sample has been drawn. CD4+CD25+, CD4+CD25- indicate the T-cell subpopulations for which transcriptome profiles have been recorded. Subject P3 carries an unknown genetic variant with limited but functional implication for the expression of some genes. The technical variability of the experiments is sufficiently small to warrant calculation of mean expression profiles. Depending on the circumstantial context either inter-cell type comparisons can be performed in a context-dependent manner (B-D) or subject heterogeneities can be studied (E-H). In either case the divergence between features, and therefore the context-dependency, will determine to what degree the probability profiles can be collapsed upon one another. (Please refer to the discussion section for a detailed description).

**Figure 8**
**A concrete example of circumstantial context analysis using transcriptome data**. (A) Schematic representation of the two biological conditions (B+ p53+/+, B- p53-/-), and the three biological replicates (C1, C2, C3) from the published study [8]. The small squares inside the rectangles for the biological replicates represent the 899 probes that are statistically significantly regulated between the two biological conditions and should be considered p53 regulated genes. (B) Median of the Kullback-Leibler divergence measures for the indicated comparisons. The mean of the median of the divergence for the three comparisons is also indicated. (C) Schematic illustration of the subsequent divergence analysis, where the biological replicates of one biological condition are analyzed with respect to the other biological condition. (D) The data for the experiment illustrated in (C) are shown in a similar manner to (B). (E) The probability profiles for the 899 p53 statistically significantly regulated probes were swapped between the two biological conditions. (Compare the small squares inside the rectangles of the biological replicates). (F) Results for the experiment illustrated in (E) as in (B, D). The data should be compared to the data in (B).

**Figure 9**
**Selective versus global circumstantial context analysis at the example of actual data**. (A) The Kullback-Leibler divergence measures were once calculated for the 899 p53 sensitive probes in the B+ case, and once using the B- probability profiles compared to the B+ biological condition. (B) Data for (A) as in (Figure 8B, D, F). Both tables can be directly compared. (C) Histogram of the Kullback-Leibler divergence distribution over the entire set of 31710 probes analyzed in the two indicated cases. Note that "C1mix" refers to the swapping experiment as illustrated in (Figure 8E). Note also that the final bin encompasses the interval]1..+∞[. (D) Similar histogram as in (C) for the 899 probes showing significant p53 regulation (compare (A)).

See this image and copyright information in PMC

Cited by

The R2R3-MYB, bHLH, WD40, and related transcription factors in flavonoid biosynthesis.
Zhao L, Gao L, Wang H, Chen X, Wang Y, Yang H, Wei C, Wan X, Xia T. Zhao L, et al. Funct Integr Genomics. 2013 Mar;13(1):75-98. doi: 10.1007/s10142-012-0301-4. Epub 2012 Nov 27. Funct Integr Genomics. 2013. PMID: 23184474
Critical dynamics in host-pathogen systems.
Benecke AG. Benecke AG. Curr Top Microbiol Immunol. 2013;363:235-59. doi: 10.1007/82_2012_260. Curr Top Microbiol Immunol. 2013. PMID: 22976347 Free PMC article. Review.
Dynamics of DNA damage induced pathways to cancer.
Tian K, Rajendran R, Doddananjaiah M, Krstic-Demonacos M, Schwartz JM. Tian K, et al. PLoS One. 2013 Sep 4;8(9):e72303. doi: 10.1371/journal.pone.0072303. eCollection 2013. PLoS One. 2013. PMID: 24023735 Free PMC article.

References

1. Benecke A. Genomic plasticity and information processing by transcription coregulators. ComPlexUs. 2003;1:65–76. doi: 10.1159/000070463. - DOI
1. Benecke A. Chromatin code, local non-equilibrium dynamics, and the emergence of transcription regulatory programs. Eur Phys J E. 2006;19:379–84. doi: 10.1140/epje/i2005-10068-8. - DOI - PubMed
1. Berg J. Dynamics of gene expression and the regulatory inference problem. Eur Phys Lett. 2008;82:28010. doi: 10.1209/0295-5075/82/28010. - DOI
1. Benecke A. Gene regulatory network inference using out of equilibrium statistical mechanics. HFSP J. 2008;2:183–8. doi: 10.2976/1.2957743. - DOI - PMC - PubMed
1. Lesne A, Benecke A. Probability landscapes for integrative genomics. Theor Biol Med Model. 2008;5:9. doi: 10.1186/1742-4682-5-9. - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Feature context-dependency and complexity-reduction in probability landscapes for integrative genomics

Affiliation

Feature context-dependency and complexity-reduction in probability landscapes for integrative genomics

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Molecular Biology Databases

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Molecular Biology Databases