Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Oct 19:13:271.
doi: 10.1186/1471-2105-13-271.

dsPIG: a tool to predict imprinted genes from the deep sequencing of whole transcriptomes

Affiliations

dsPIG: a tool to predict imprinted genes from the deep sequencing of whole transcriptomes

Hua Li et al. BMC Bioinformatics. .

Abstract

Background: Dysregulation of imprinted genes, which are expressed in a parent-of-origin-specific manner, plays an important role in various human diseases, such as cancer and behavioral disorder. To date, however, fewer than 100 imprinted genes have been identified in the human genome. The recent availability of high-throughput technology makes it possible to have large-scale prediction of imprinted genes. Here we propose a Bayesian model (dsPIG) to predict imprinted genes on the basis of allelic expression observed in mRNA-Seq data of independent human tissues.

Results: Our model (dsPIG) was capable of identifying imprinted genes with high sensitivity and specificity and a low false discovery rate when the number of sequenced tissue samples was fairly large, according to simulations. By applying dsPIG to the mRNA-Seq data, we predicted 94 imprinted genes in 20 cerebellum samples and 57 imprinted genes in 9 diverse tissue samples with expected low false discovery rates. We also assessed dsPIG using previously validated imprinted and non-imprinted genes. With simulations, we further analyzed how imbalanced allelic expression of non-imprinted genes or different minor allele frequencies affected the predictions of dsPIG. Interestingly, we found that, among biallelically expressed genes, at least 18 genes expressed significantly more transcripts from one allele than the other among different individuals and tissues.

Conclusion: With the prevalence of the mRNA-Seq technology, dsPIG has become a useful tool for analysis of allelic expression and large-scale prediction of imprinted genes. For ease of use, we have set up a web service and also provided an R package for dsPIG at http://www.shoudanliang.com/dsPIG/.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Simulation-based performance analysis of dsPIG.a, b, Simulated (natural log-transformed) posteriors of (a) biallelically expressed genes and (b) imprinted genes. The dashed line in both panels stands for the log-transformed prior (0.01). Results in (a) and (b) were based on 20,000-time simulations with geometric mean as the method of averaging posteriors. c, Sensitivity (the black solid line) and specificity (the read dashed line) of our model. d, the FDR of our predictions. When sample size was <5, the FDR was not computable as sensitivity and specificity were both 0. Results in (c) and (d) were based on 20,000-time simulations with geometric mean as the method of averaging posteriors.
Figure 2
Figure 2
FDRs of our predictions with respect to different allele frequencies. When minor allele frequency (mAF) decreased from 0.5 to 0.1, FDR generally increased if sample size was >10. Results were based on 20,000-time simulations. For detailed values of FDR, please refer to Additional file 4: Table S2.
Figure 3
Figure 3
Sample clustering in terms of imprinting-inclined SNPs. Spearman correlations were calculated between each pair of samples using the posterior on each SNP calculated by dsPIG in each sample. Hierarchical clustering was conducted with average linkage as the agglomerative method. Posterior probabilities of African American samples were computed with African American allele frequency in panel (a) and with Caucasian allele frequency in panel (b).
Figure 4
Figure 4
Effect of imbalanced transcript levels on the posteriors of biallelically expressed genes. Solid lines stand for simulated posteriors for imprinted genes (black line) and biallelically expressed genes (non-black lines). “Randomly imbalanced” means that in each sample we randomly picked one allele to have a lower expression level than the other allele. FIC indicates “Fixed Imbalanced Coefficient”, which means one allele is always expressed at a “FIC” level of the other one in all samples. The dashed line stands for the log-transformed prior. When FIC is low enough (typically <13%), posteriors are not able to tell the difference between imprinted (solid black line) and biallelic expression (green line).

Similar articles

Cited by

References

    1. Gimelbrant AA, Hutchinson JN, Thompson BR, Chess A. Widespread monoallelic expression on human autosomes. Science. 2007;318:1136–1140. doi: 10.1126/science.1148910. - DOI - PubMed
    1. Lyon MF. X chromosomes and dosage compensation. Nature. 1986;320:313. - PubMed
    1. Pernis B, Chiappino G, Kelus AS, Gell PG. Cellular localization of immunoglobulins with different allotypic specificities in rabbit lymphoid tissues. J Exp Med. 1965;122:853–876. doi: 10.1084/jem.122.5.853. - DOI - PMC - PubMed
    1. Chess A, Simon I, Cedar H, Axel R. Allelic inactivation regulates olfactory receptor gene expression. Cell. 1994;78:823–834. doi: 10.1016/S0092-8674(94)90562-2. - DOI - PubMed
    1. Rajewsky K. Clonal selection and learning in the antibody system. Nature. 1996;381:751–758. doi: 10.1038/381751a0. - DOI - PubMed

Publication types

LinkOut - more resources