Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Apr 20:11:197.
doi: 10.1186/1471-2105-11-197.

Missing value imputation for epistatic MAPs

Affiliations

Missing value imputation for epistatic MAPs

Colm Ryan et al. BMC Bioinformatics. .

Abstract

Background: Epistatic miniarray profiling (E-MAPs) is a high-throughput approach capable of quantifying aggravating or alleviating genetic interactions between gene pairs. The datasets resulting from E-MAP experiments typically take the form of a symmetric pairwise matrix of interaction scores. These datasets have a significant number of missing values - up to 35% - that can reduce the effectiveness of some data analysis techniques and prevent the use of others. An effective method for imputing interactions would therefore increase the types of possible analysis, as well as increase the potential to identify novel functional interactions between gene pairs. Several methods have been developed to handle missing values in microarray data, but it is unclear how applicable these methods are to E-MAP data because of their pairwise nature and the significantly larger number of missing values. Here we evaluate four alternative imputation strategies, three local (Nearest neighbor-based) and one global (PCA-based), that have been modified to work with symmetric pairwise data.

Results: We identify different categories for the missing data based on their underlying cause, and show that values from the largest category can be imputed effectively. We compare local and global imputation approaches across a variety of distinct E-MAP datasets, showing that both are competitive and preferable to filling in with zeros. In addition we show that these methods are effective in an E-MAP from a different species, suggesting that pairwise imputation techniques will be increasingly useful as analogous epistasis mapping techniques are developed in different species. We show that strongly alleviating interactions are significantly more difficult to predict than strongly aggravating interactions. Finally we show that imputed interactions, generated using nearest neighbor methods, are enriched for annotations in the same manner as measured interactions. Therefore our method potentially expands the number of mapped epistatic interactions. In addition we make implementations of our algorithms available for use by other researchers.

Conclusions: We address the problem of missing value imputation for E-MAPs, and suggest the use of symmetric nearest neighbor based approaches as they offer consistently accurate imputations across multiple datasets in a tractable manner.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Swr1 cluster - genes displaying similar interaction profiles. An example of coherence taken from the Chromosome Biology E-MAP. Members of the Swr1 complex display similar interaction profiles, and as a result are clustered together.
Figure 2
Figure 2
E-MAP before and after imputation. A visual representation of a pairwise symmetric E-MAP interaction matrix. On the left-hand side is shown an original E-MAP (Chromosome Biology), where gray points indicate missing values. On the right-hand side is the corresponding complete matrix, with all missing entries replaced by imputed values.
Figure 3
Figure 3
Symmetric KNN. Illustration of the symmetric KNN imputation process for parameter K = 1. To estimate the missing value (i, j), the values given by (i', j) and (i, j') would be combined.
Figure 4
Figure 4
Effect of K on the accuracy of uKNN. Impact of choice of value for parameter K on imputation accuracy (in terms of correlation) for KNN approach.
Figure 5
Figure 5
Effect of K on the accuracy of wNN. Impact of choice of value for parameter K on imputation accuracy (in terms of correlation) for wNN approach.
Figure 6
Figure 6
Effect of K on the accuracy of LLS. Impact of choice of value for parameter K on imputation accuracy (in terms of correlation) for LLS approach.
Figure 7
Figure 7
Effect of the number of axes used on the accuracy and runtime of BPCA (ESP dataset). Impact of choice of value for the number of axes on the imputation accuracy (in terms of correlation) and runtime of the BPCA approach. Accuracy of LLS is shown for comparison, with K = 20. Running time is averaged across twenty runs. Note that these experiments were run on a 20 core machine with 128GB RAM, using all cores at 100%. Computation time on a standard desktop machine would therefore take substantially longer.
Figure 8
Figure 8
Effect of the number of axes used on the accuracy and runtime of BPCA (Signalling dataset). Impact of choice of value for the number of axes on the imputation accuracy (in terms of correlation) and runtime of the BPCA approach. Accuracy of LLS is shown for comparison, with K = 20. Running time is averaged across twenty runs. Note that these experiments were run on a 20 core machine with 128GB RAM, using all cores at 100%. Computation time on a standard desktop machine would therefore take substantially longer.
Figure 9
Figure 9
Fraction of each class of interaction which share an annotation (Chromosome E-MAP using wNN). a. Measured interactions, b. All imputed interactions, c. Imputed Chromosomal Neighbors, d. Imputed DAmP-DAmP pairs.

Similar articles

Cited by

References

    1. Bandyopadhyay S, Kelley R, Krogan N, Ideker T. Functional maps of protein complexes from quantitative genetic interaction data. PLoS Computational Biology. 2008;4(4):e1000065. doi: 10.1371/journal.pcbi.1000065. - DOI - PMC - PubMed
    1. Collins SR, Schuldiner M, Krogan NJ, Weissman JS. A strategy for extracting and analyzing large-scale quantitative epistatic interaction data. Genome Biol. 2006;7(7):R63. doi: 10.1186/gb-2006-7-7-r63. - DOI - PMC - PubMed
    1. Collins SR, Miller KM, Maas NL, Roguev A, Fillingham J, Chu CS, Schuldiner M, Gebbia M, Recht J, Shales M, Ding H, Xu H, Han J, Ingvarsdottir K, Cheng B, Andrews B, Boone C, Berger SL, Hieter P, Zhang Z, Brown GW, Ingles CJ, Emili A, Allis CD, Toczyski DP, Weissman JS, Greenblatt JF, Krogan NJ. Functional dissection of protein complexes involved in yeast chromosome biology using a genetic interaction map. Nature. 2007;446(7137):806–810. doi: 10.1038/nature05649. - DOI - PubMed
    1. Pu S, Ronen K, Vlasblom J, Greenblatt J, Wodak SJ. Local coherence in genetic interaction patterns reveals prevalent functional versatility. Bioinformatics. 2008;24(20):2376–2383. doi: 10.1093/bioinformatics/btn440. - DOI - PubMed
    1. Ulitsky I, Shlomi T, Kupiec M, Shamir R. From E-MAPs to module maps: dissecting quantitative genetic interactions using physical interactions. Mol Syst Biol. 2008;4:209. doi: 10.1038/msb.2008.42. - DOI - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources