Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jun 15;26(12):i71-8.
doi: 10.1093/bioinformatics/btq173.

Using semantic web rules to reason on an ontology of pseudogenes

Affiliations

Using semantic web rules to reason on an ontology of pseudogenes

Matthew E Holford et al. Bioinformatics. .

Abstract

Motivation: Recent years have seen the development of a wide range of biomedical ontologies. Notable among these is Sequence Ontology (SO) which offers a rich hierarchy of terms and relationships that can be used to annotate genomic data. Well-designed formal ontologies allow data to be reasoned upon in a consistent and logically sound way and can lead to the discovery of new relationships. The Semantic Web Rules Language (SWRL) augments the capabilities of a reasoner by allowing the creation of conditional rules. To date, however, formal reasoning, especially the use of SWRL rules, has not been widely used in biomedicine.

Results: We have built a knowledge base of human pseudogenes, extending the existing SO framework to incorporate additional attributes. In particular, we have defined the relationships between pseudogenes and segmental duplications. We then created a series of logical rules using SWRL to answer research questions and to annotate our pseudogenes appropriately. Finally, we were left with a knowledge base which could be queried to discover information about human pseudogene evolution.

Availability: The fully populated knowledge base described in this document is available for download from http://ontology.pseudogene.org. A SPARQL endpoint from which to query the dataset is also available at this location.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Diagram showing the relationships between some of the base classes of the ontology. Dashed lines are used to indicate subclass relationships. Regular lines indicate property relationships. Classes in SO are highlighted in gray, while those which were added to our ontology have a white background.
Fig. 2.
Fig. 2.
Diagram showing the hierarchy of annotation attributes for our pseudogene ontology. The dashed lines denote subclass relationships. Classes from SO are highlighted in gray while classes add by our ontology have a white background.
Fig. 3.
Fig. 3.
Informal pseudocode description of the rules implemented in SWRL to traverse the flowchart.
Fig. 4.
Fig. 4.
The decision tree to be traversed by SWRL rules. Dashed lines indicate a ‘No’ answer; solid lines indicate a ‘Yes’ answer. The same convention is used in Figures 5 and 6.
Fig. 5.
Fig. 5.
The path traversed by Rule 5 on the decision tree. This path follows Case 1 in looking to examples of pseudogenes which evolve at a less rapid pace than their parent genes.
Fig. 6.
Fig. 6.
The path traversed by Rule 7 on the decision tree. This path follows Case 2 in looking for pseudogenes which have arisen from the duplication of another pseudogene rather than their parent gene.
Fig. 7.
Fig. 7.
A potential duplicated-processed pseudogene found by aligning one pseudogene with another on the same segment as the parent gene. The pseudogene, PGOHUM00000154773, is located on chromosome 8 of the reference sequence between bases 7199348 and 7200542. Its parent gene, ENSG00000205946 (USP17L6P), is found on chromosome 4 between bases 8978698 and 8979894. PGOHUM00000154773 is found on an SD segment located between 7199348 and 7200542 on chromosome 8. The parent gene is on the duplicate segment located between 8966987 and 9017856 on chromosome 4. The duplicate segment also contains another pseudogene, PGOHUM00000149316 between bases 8992177 and 8992537. Because this other pseudogene is a similar distance from the start of the segment as PGOHUM00000154773 is to the start of its segment (25 190 bp versus 24 249 bp) and the parent gene is in a different portion of the segment (11 711 bp from the start), the deduction that PGOHUM00000154773 is aligned to PGOHUM00000149316 rather than ENSG00000205946 makes sense. This was found by applying SWRL Rule 7.
Fig. 8.
Fig. 8.
Informal depiction of the coverage provided by the current ontology, including portions derived from SO, as well as areas to be covered in future work. In the diagram, plain lines indicate class hierarchy (is-a) relationships, while dashed lines indicate property (has-a) relationships.

Similar articles

Cited by

References

    1. Bailey JA, Eichler EE. Primate segmental duplications: crucibles of evolution, diversity and disease. Nat. Rev. Genet. 2006;7:552–564. - PubMed
    1. Bechhofer S, Philip Lord RV. ISWC 2003. Berlin: Springer; 2003. Cooking the semantic web with the OWL API; pp. 659–675.
    1. Ding Z, Peng Y. Proceedings of the 37th Hawaii International Conference On System Sciences (HICSS-37) Big Island: IEEE.; 2004. A probabilistic extension to ontology language owl.
    1. Duplication,S. 2010. [last accessed date January 8, 2010]. Available at http://humanparalogy.gs.washington.edu/build36/build36.htm.
    1. Eilbeck K, Lewis SE. Sequence ontology annotation guide. Comp. Funct. Genomics. 2004;5:642–647. - PMC - PubMed

Publication types