. 2010 Apr 20:11:197.

doi: 10.1186/1471-2105-11-197.

Missing value imputation for epistatic MAPs

Colm Ryan¹, Derek Greene, Gerard Cagney, Pádraig Cunningham

Affiliations

PMID: 20406472
PMCID: PMC2873538
DOI: 10.1186/1471-2105-11-197

Missing value imputation for epistatic MAPs

Colm Ryan et al. BMC Bioinformatics. 2010.

. 2010 Apr 20:11:197.

doi: 10.1186/1471-2105-11-197.

Authors

Colm Ryan¹, Derek Greene, Gerard Cagney, Pádraig Cunningham

Affiliation

¹ School of Computer Science and Informatics, University College Dublin, Dublin, Ireland. colm.ryan@ucd.ie

PMID: 20406472
PMCID: PMC2873538
DOI: 10.1186/1471-2105-11-197

Abstract

Background: Epistatic miniarray profiling (E-MAPs) is a high-throughput approach capable of quantifying aggravating or alleviating genetic interactions between gene pairs. The datasets resulting from E-MAP experiments typically take the form of a symmetric pairwise matrix of interaction scores. These datasets have a significant number of missing values - up to 35% - that can reduce the effectiveness of some data analysis techniques and prevent the use of others. An effective method for imputing interactions would therefore increase the types of possible analysis, as well as increase the potential to identify novel functional interactions between gene pairs. Several methods have been developed to handle missing values in microarray data, but it is unclear how applicable these methods are to E-MAP data because of their pairwise nature and the significantly larger number of missing values. Here we evaluate four alternative imputation strategies, three local (Nearest neighbor-based) and one global (PCA-based), that have been modified to work with symmetric pairwise data.

Results: We identify different categories for the missing data based on their underlying cause, and show that values from the largest category can be imputed effectively. We compare local and global imputation approaches across a variety of distinct E-MAP datasets, showing that both are competitive and preferable to filling in with zeros. In addition we show that these methods are effective in an E-MAP from a different species, suggesting that pairwise imputation techniques will be increasingly useful as analogous epistasis mapping techniques are developed in different species. We show that strongly alleviating interactions are significantly more difficult to predict than strongly aggravating interactions. Finally we show that imputed interactions, generated using nearest neighbor methods, are enriched for annotations in the same manner as measured interactions. Therefore our method potentially expands the number of mapped epistatic interactions. In addition we make implementations of our algorithms available for use by other researchers.

Conclusions: We address the problem of missing value imputation for E-MAPs, and suggest the use of symmetric nearest neighbor based approaches as they offer consistently accurate imputations across multiple datasets in a tractable manner.

PubMed Disclaimer

Figures

**Figure 1**
**Swr1 cluster - genes displaying similar interaction profiles**. An example of coherence taken from the Chromosome Biology E-MAP. Members of the Swr1 complex display similar interaction profiles, and as a result are clustered together.

**Figure 2**
**E-MAP before and after imputation**. A visual representation of a pairwise symmetric E-MAP interaction matrix. On the left-hand side is shown an original E-MAP (Chromosome Biology), where gray points indicate missing values. On the right-hand side is the corresponding complete matrix, with all missing entries replaced by imputed values.

**Figure 3**
**Symmetric KNN**. Illustration of the symmetric KNN imputation process for parameter K = 1. To estimate the missing value (i, j), the values given by (i', j) and (i, j') would be combined.

**Figure 4**
**Effect of K on the accuracy of uKNN**. Impact of choice of value for parameter K on imputation accuracy (in terms of correlation) for **KNN** approach.

**Figure 5**
**Effect of K on the accuracy of wNN**. Impact of choice of value for parameter K on imputation accuracy (in terms of correlation) for **wNN** approach.

**Figure 6**
**Effect of K on the accuracy of LLS**. Impact of choice of value for parameter K on imputation accuracy (in terms of correlation) for **LLS** approach.

**Figure 7**
**Effect of the number of axes used on the accuracy and runtime of BPCA (ESP dataset)**. Impact of choice of value for the number of axes on the imputation accuracy (in terms of correlation) and runtime of the **BPCA** approach. Accuracy of LLS is shown for comparison, with K = 20. Running time is averaged across twenty runs. Note that these experiments were run on a 20 core machine with 128GB RAM, using all cores at 100%. Computation time on a standard desktop machine would therefore take substantially longer.

**Figure 8**
**Effect of the number of axes used on the accuracy and runtime of BPCA (Signalling dataset)**. Impact of choice of value for the number of axes on the imputation accuracy (in terms of correlation) and runtime of the **BPCA** approach. Accuracy of LLS is shown for comparison, with K = 20. Running time is averaged across twenty runs. Note that these experiments were run on a 20 core machine with 128GB RAM, using all cores at 100%. Computation time on a standard desktop machine would therefore take substantially longer.

**Figure 9**
**Fraction of each class of interaction which share an annotation (Chromosome E-MAP using wNN)**. a. Measured interactions, b. All imputed interactions, c. Imputed Chromosomal Neighbors, d. Imputed DAmP-DAmP pairs.

See this image and copyright information in PMC

Cited by

On protocols and measures for the validation of supervised methods for the inference of biological networks.
Schrynemackers M, Küffner R, Geurts P. Schrynemackers M, et al. Front Genet. 2013 Dec 3;4:262. doi: 10.3389/fgene.2013.00262. Front Genet. 2013. PMID: 24348517 Free PMC article. Review.
Prediction of Genetic Interactions Using Machine Learning and Network Properties.
Madhukar NS, Elemento O, Pandey G. Madhukar NS, et al. Front Bioeng Biotechnol. 2015 Oct 26;3:172. doi: 10.3389/fbioe.2015.00172. eCollection 2015. Front Bioeng Biotechnol. 2015. PMID: 26579514 Free PMC article. Review.
A comprehensive survey on computational learning methods for analysis of gene expression data.
Bhandari N, Walambe R, Kotecha K, Khare SP. Bhandari N, et al. Front Mol Biosci. 2022 Nov 7;9:907150. doi: 10.3389/fmolb.2022.907150. eCollection 2022. Front Mol Biosci. 2022. PMID: 36458095 Free PMC article. Review.
Data Imputation in Epistatic MAPs by Network-Guided Matrix Completion.
Žitnik M, Zupan B. Žitnik M, et al. J Comput Biol. 2015 Jun;22(6):595-608. doi: 10.1089/cmb.2014.0158. Epub 2015 Feb 6. J Comput Biol. 2015. PMID: 25658751 Free PMC article.
Searching for synergies: matrix algebraic approaches for efficient pair screening.
Gerlee P, Schmidt L, Monsefi N, Kling T, Jörnsten R, Nelander S. Gerlee P, et al. PLoS One. 2013 Jul 25;8(7):e68598. doi: 10.1371/journal.pone.0068598. Print 2013. PLoS One. 2013. PMID: 23935877 Free PMC article.

See all "Cited by" articles

References

1. Bandyopadhyay S, Kelley R, Krogan N, Ideker T. Functional maps of protein complexes from quantitative genetic interaction data. PLoS Computational Biology. 2008;4(4):e1000065. doi: 10.1371/journal.pcbi.1000065. - DOI - PMC - PubMed
1. Collins SR, Schuldiner M, Krogan NJ, Weissman JS. A strategy for extracting and analyzing large-scale quantitative epistatic interaction data. Genome Biol. 2006;7(7):R63. doi: 10.1186/gb-2006-7-7-r63. - DOI - PMC - PubMed
1. Collins SR, Miller KM, Maas NL, Roguev A, Fillingham J, Chu CS, Schuldiner M, Gebbia M, Recht J, Shales M, Ding H, Xu H, Han J, Ingvarsdottir K, Cheng B, Andrews B, Boone C, Berger SL, Hieter P, Zhang Z, Brown GW, Ingles CJ, Emili A, Allis CD, Toczyski DP, Weissman JS, Greenblatt JF, Krogan NJ. Functional dissection of protein complexes involved in yeast chromosome biology using a genetic interaction map. Nature. 2007;446(7137):806–810. doi: 10.1038/nature05649. - DOI - PubMed
1. Pu S, Ronen K, Vlasblom J, Greenblatt J, Wodak SJ. Local coherence in genetic interaction patterns reveals prevalent functional versatility. Bioinformatics. 2008;24(20):2376–2383. doi: 10.1093/bioinformatics/btn440. - DOI - PubMed
1. Ulitsky I, Shlomi T, Kupiec M, Shamir R. From E-MAPs to module maps: dissecting quantitative genetic interactions using physical interactions. Mol Syst Biol. 2008;4:209. doi: 10.1038/msb.2008.42. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Missing value imputation for epistatic MAPs

Affiliation

Missing value imputation for epistatic MAPs

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources