Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Oct;2(2):43-56.
doi: 10.29268/stbd.2015.2.2.4.

Automated Predictive Big Data Analytics Using Ontology Based Semantics

Affiliations

Automated Predictive Big Data Analytics Using Ontology Based Semantics

Mustafa V Nural et al. Int J Big Data. 2015 Oct.

Abstract

Predictive analytics in the big data era is taking on an ever increasingly important role. Issues related to choice on modeling technique, estimation procedure (or algorithm) and efficient execution can present significant challenges. For example, selection of appropriate and optimal models for big data analytics often requires careful investigation and considerable expertise which might not always be readily available. In this paper, we propose to use semantic technology to assist data analysts and data scientists in selecting appropriate modeling techniques and building specific models as well as the rationale for the techniques and models selected. To formally describe the modeling techniques, models and results, we developed the Analytics Ontology that supports inferencing for semi-automated model selection. The SCALATION framework, which currently supports over thirty modeling techniques for predictive big data analytics is used as a testbed for evaluating the use of semantic technology.

Keywords: big-data-analytics; model-selection; ontology; semantics.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Predictive Analytics Workflow.
Figure 2
Figure 2
Estimating Coefficients of a PCR Model Using SVD in R.
Figure 3
Figure 3
Estimating Coefficients of a PCR Model Using SVD in SAS.
Figure 4
Figure 4
Estimating Coefficients of a PCR Model Using SVD in ScalaTion.
Figure 5
Figure 5
Main Object & DataType properties in the analytics ontology
Figure 6
Figure 6
Partial View of the Analytics Ontology (Only Shows Class Hierarchy)
Figure 7
Figure 7
Some of the Equivalence Class Axioms from the Ontology
Figure 8
Figure 8
Representation of AutoMPGModel in the Ontology
Figure 9
Figure 9
Algorithm for Filtering Suggestions
Figure 10
Figure 10
A Screenshot from SCALADASH Displaying Suggestions for AutoMPGModel

References

    1. Bernstein A, Hill S, Provost F. Intelligent Assistance for the Data Mining Process : An Ontology-based Approach. New York, NY, USA: 2002. (CeDER Working Paper IS-02-02).
    1. Calcagno V, de Mazancourt C. glmulti: An R Package for Easy Automated Model Selection with (Generalized) Linear Models. Journal of Statistical Software. 2010;34(12):1–29.
    1. Codd EF. A Relational Model of Data for Large Shared Data Banks. Commun ACM. 1970;13(6):377–387. http://doi.org/10.1145/362384.362685. - DOI - PubMed
    1. Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. Ann Statist. 2004;32(2):407–499. http://doi.org/10.1214/009053604000000067. - DOI
    1. Godambe VP. Estimating Functions. New York: Oxford University Press; 1991.

LinkOut - more resources