Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Oct 29;9(10):e110888.
doi: 10.1371/journal.pone.0110888. eCollection 2014.

Integrative data mining highlights candidate genes for monogenic myopathies

Affiliations

Integrative data mining highlights candidate genes for monogenic myopathies

Osorio Abath Neto et al. PLoS One. .

Erratum in

Abstract

Inherited myopathies are a heterogeneous group of disabling disorders with still barely understood pathological mechanisms. Around 40% of afflicted patients remain without a molecular diagnosis after exclusion of known genes. The advent of high-throughput sequencing has opened avenues to the discovery of new implicated genes, but a working list of prioritized candidate genes is necessary to deal with the complexity of analyzing large-scale sequencing data. Here we used an integrative data mining strategy to analyze the genetic network linked to myopathies, derive specific signatures for inherited myopathy and related disorders, and identify and rank candidate genes for these groups. Training sets of genes were selected after literature review and used in Manteia, a public web-based data mining system, to extract disease group signatures in the form of enriched descriptor terms, which include functional annotation, human and mouse phenotypes, as well as biological pathways and protein interactions. These specific signatures were then used as an input to mine and rank candidate genes, followed by filtration against skeletal muscle expression and association with known diseases. Signatures and identified candidate genes highlight both potential common pathological mechanisms and allelic disease groups. Recent discoveries of gene associations to diseases, like B3GALNT2, GMPPB and B3GNT1 to congenital muscular dystrophies, were prioritized in the ranked lists, suggesting a posteriori validation of our approach and predictions. We show an example of how the ranked lists can be used to help analyze high-throughput sequencing data to identify candidate genes, and highlight the best candidate genes matching genomic regions linked to myopathies without known causative genes. This strategy can be automatized to generate fresh candidate gene lists, which help cope with database annotation updates as new knowledge is incorporated.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have read the journal's policy and have the following competing interests: OT and OP are the intellectual proprietors of the Manteia data mining tool. This does not alter the authors' adherence to PLOS ONE policies on sharing data and materials.

Figures

Figure 1
Figure 1. Integrated data mining workflow.
A signature of a disease group, composed of weighted terms, is generated from statistical analyses of genes already implicated in diseases of the group. Terms come from the three main annotation groups, GO (Gene Ontology), PO (Phenotype Ontology, an aggregate of Human Phenotype Ontology and Mammalian Phenotype Ontology) and IA (Interactions Annotation), are mined using Manteia and receive weights proportional to the their enrichment in the set of genes implicated in the disease group, as compared to the set of all genes in the human genome. Weights are attributed to terms so that annotation groups contribute equally to the composition of the signature. The signature of the disease group is then used to mine the genome for additional genes. Every gene in the genome receives a score equal to the sum of weights of terms that describe the gene if they match terms that define the disease group signature, for a maximum possible score of 3000. Further filtering steps mark genes that have low relative skeletal muscle expression or are annotated with known diseases.
Figure 2
Figure 2. Graph representation of relationships of known genes.
All known genes for the different disease groups were concurrently analyzed for matching terms in different ontologies. Nodes represent genes, and edges between two given nodes are depicted when the number of terms shared by the two connected genes is greater than a certain threshold. Edge width is proportional to the number of terms shared between two genes, and node size and color in a scale from green (lowest) to red (highest) is proportional to the number of associations of a gene in the graph. Closely related genes appear clustered together, and hubs in the graph appear centrally located. A: graph for combined terms from Gene Ontology (GO), Human Phenotype Ontology (HPO) and Interactions Annotation (IA), with a threshold of 30 matching terms. The cluster with a yellow background includes genes implicated in metabolic myopathies, the one with a red background groups congenital muscular dystrophy genes, and the cluster with a gray background represents genes associated with congenital myasthenic syndromes. B: graph for HPO terms with a threshold of 20 matching terms. C: graph for GO terms, with a threshold of 10 matching terms. Background colors correspond to clusters represented in A. D: IA terms with a threshold of 5 matching terms. The gray background highlights a cluster with gene that code subunits of cholinergic receptors, implicated in congenital myasthenic syndromes, the green one groups components of collagen VI, and the cluster with a blue background links elements of the contractile apparatus.
Figure 3
Figure 3. Venn diagrams of gene set overlaps.
A: Venn diagram showing the overlap of training set genes between muscular dystrophies (MD), congenital myopathies (CM) and congenital muscular dystrophies (CMD). B: Venn diagram showing the overlap of genes found within the top 50 ranked candidate genes in the three disease groups.

References

    1. Kaplan JC, Hamroun D (2013) The 2014 version of the gene table of monogenic neuromuscular disorders (nuclear genome). Neuromuscul Disord 23: 1081–1111. - PubMed
    1. Mercuri E, Muntoni F (2013) Muscular dystrophies. Lancet 381: 845–860. - PubMed
    1. Nance JR, Dowling JJ, Gibbs EM, Bonnemann CG (2012) Congenital myopathies: an update. Curr Neurol Neurosci Rep 12: 165–174. - PMC - PubMed
    1. Mercuri E, Muntoni F (2012) The ever-expanding spectrum of congenital muscular dystrophies. Ann Neurol 72: 9–17. - PubMed
    1. Heatwole CR, Statland JM, Logigian EL (2013) The diagnosis and treatment of myotonic disorders. Muscle Nerve 47: 632–648. - PubMed

Publication types