Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Oct 8;8(10):e77090.
doi: 10.1371/journal.pone.0077090. eCollection 2013.

A DNA-based semantic fusion model for remote sensing data

Affiliations

A DNA-based semantic fusion model for remote sensing data

Heng Sun et al. PLoS One. .

Abstract

Semantic technology plays a key role in various domains, from conversation understanding to algorithm analysis. As the most efficient semantic tool, ontology can represent, process and manage the widespread knowledge. Nowadays, many researchers use ontology to collect and organize data's semantic information in order to maximize research productivity. In this paper, we firstly describe our work on the development of a remote sensing data ontology, with a primary focus on semantic fusion-driven research for big data. Our ontology is made up of 1,264 concepts and 2,030 semantic relationships. However, the growth of big data is straining the capacities of current semantic fusion and reasoning practices. Considering the massive parallelism of DNA strands, we propose a novel DNA-based semantic fusion model. In this model, a parallel strategy is developed to encode the semantic information in DNA for a large volume of remote sensing data. The semantic information is read in a parallel and bit-wise manner and an individual bit is converted to a base. By doing so, a considerable amount of conversion time can be saved, i.e., the cluster-based multi-processes program can reduce the conversion time from 81,536 seconds to 4,937 seconds for 4.34 GB source data files. Moreover, the size of result file recording DNA sequences is 54.51 GB for parallel C program compared with 57.89 GB for sequential Perl. This shows that our parallel method can also reduce the DNA synthesis cost. In addition, data types are encoded in our model, which is a basis for building type system in our future DNA computer. Finally, we describe theoretically an algorithm for DNA-based semantic fusion. This algorithm enables the process of integration of the knowledge from disparate remote sensing data sources into a consistent, accurate, and complete representation. This process depends solely on ligation reaction and screening operations instead of the ontology.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. RDF graph of the remote sensing data ontology.
This figure contains 1,264 nodes and 2,030 edges. Nodes are a set of classes and concepts in the remote sensing domain, such as Worldwide_Reference_System, Multiple_Image_Alignment, and Spatial_Domain, etc. Edges are a set of specific properties that characterize these classes. Classes, properties, and domains are all considered as ontology elements. All the elements are partitioned according to their namespaces. The namespaces in ontology vocabulary show the Uniform Resource Identifier References (URIrefs) as the URLs of web resources that provide further information about this vocabulary. The xmlns:ersm (http://cs.jnu.edu.cn/sun/ontology/ersm), xmlns:rdfs (http://www.w3.org/2000/01/rdf-schema), and xmlns:rdf (http://www.w3.org/1999/02/22-rdf-syntax-ns) are used mainly in our remote sensing data ontology. (For interpretation of the references to color in this figure, the reader is referred to the web version of this paper.)
Figure 2
Figure 2. RDF instance description and visualization of an RSI.
This figure includes three interactive parts: an RSI in A, an RDF annotation of the RSI in B, and data instance visualization in C. (A) One example RSI's ID is 103001001E1EB700 and its resolution is 1.85 meter. (B) The RDF identifies the data instance using the URIref and the image data can be described by making statements. A statement, such as “An RSI 103001001E1EB700 has a nomspres (Nominal Spatial Resolution) whose value is 1.85 meter”, is represented by these two RDF/XML statement blocks. File S2 provides the complete RDF code of catalog ID 103001001E1EB700 imagery. (C) The 193 classes and concepts are partitioned into six colors according to their namespaces. Most of them (120 green nodes) represent blank nodes. They provide a way to more accurately make statements about data because constant values and most aggregate concepts may not have URIs. The other namespaces include xml:base (http://cs.jnu.edu.cn/sun/ontology/103001001E1EB700), xmlns:rdfs (http://www.w3.org/2000/01/rdf-schema), xmlns:ersm (http://cs.jnu.edu.cn/sun/ontology/ersm), xmlns:rdf (http://www.w3.org/1999/02/22-rdf-syntax-ns), and xmlns:owl (http://www.w3.org/2002/07/owl). (For interpretation of the references to color in this figure, the reader is referred to the web version of this paper.)
Figure 3
Figure 3. The linear model of semantic properties in three RSIs.
Figure 4
Figure 4. Network diagram of semantic property set.
Figure 5
Figure 5. Conversion performance on the test dataset.
The result dataset contain DNA sequence information corresponding to the test data. (A) The conversion time is about 4,937 seconds, 31,426 seconds and 81,536 seconds for three programming languages. Error bars depict Standard Error of the mean. (B) The sizes of the datasets are both 54.51 GB for the sequential C and the parallel C. The size is 57.89 GB for the Perl program because the code uses different data block size.
Figure 6
Figure 6. Semantic fusion pattern of an RSI.
(A) Two owners of the RSI E1EB7 select different properties to annotate it. One of them selects the properties cty and qa. The other selects the properties cty and cc. The property value null means the unannotated property. Certainly, both its data type and its unit are undefined. (B) The result property string after semantic fusion represents the complete semantic information of this RSI.
Figure 7
Figure 7. The oligonucleotides in the hybridization and ligation reaction.
For each property i including the labels start and end, a 48-nt oligonucleotide Vi is generated. For each edge ij, an oligonucleotide eij is derived from the 3′ 24-nt of Vi and the 5′ 24-nt of Vj.
Figure 8
Figure 8. DNA sequence representing the complete semantic information.

Similar articles

References

    1. Adleman LM (1994) Molecular computation of solutions to combinatorial problems. Science 266: 1021–1024. - PubMed
    1. Lipton R (1995) DNA solution of hard computational problems. Science 268: 542–545. - PubMed
    1. Bancroft C, Bowler T, Bloom B, Clelland CT (2001) Long-term storage of information in DNA. Science 293: 1763–1765. - PubMed
    1. Renear A, Palmer C (2009) Strategic reading, ontologies, and the future of scientific publishing. Science 325: 828–832. - PubMed
    1. Yoder MJ, Miko I, Seltmann KC, Bertone MA, Deans AR (2010) A gross anatomy ontology for hymenoptera. PLoS ONE 5(12): e15991. - PMC - PubMed

Publication types