Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Dec 9;9(1):18641.
doi: 10.1038/s41598-019-55035-8.

Applied Veterinary Informatics: Development of a Semantic and Domain-Specific Method to Construct a Canine Data Repository

Affiliations

Applied Veterinary Informatics: Development of a Semantic and Domain-Specific Method to Construct a Canine Data Repository

Mary Regina Boland et al. Sci Rep. .

Abstract

Animals are used to study the pathogenesis of various human diseases, but typically as animal models with induced disease. However, companion animals develop disease spontaneously in a way that mirrors disease development in humans. The purpose of this study is to develop a semantic and domain-specific method to enable construction of a data repository from a veterinary hospital that would be useful for future studies. We developed a two-phase method that combines semantic and domain-specific approaches to construct a canine data repository of clinical data collected during routine care at the Matthew J Ryan Veterinary Hospital of the University of Pennsylvania (PennVet). Our framework consists of two phases: (1) a semantic data-cleaning phase and (2) a domain-specific data-cleaning phase. We validated our data repository using a gold standard of known breed predispositions for certain diseases (i.e., mitral valve disease, atrial fibrillation and osteosarcoma). Our two-phase method allowed us to maximize data retention (99.8% of data retained), while ensuring the quality of our result. Our final population contained 84,405 dogs treated between 2000 and 2017 from 194 distinct dog breeds. We observed the expected breed associations with mitral valve disease, atrial fibrillation, and osteosarcoma (P < 0.05) after adjusting for multiple comparisons. Precision ranged from 60.0 to 83.3 for the three diseases (avg. 74.2) and recall ranged from 31.6 to 83.3 (avg. 53.3). Our study describes a two-phase method to construct a clinical data repository using canine data obtained during routine clinical care at a veterinary hospital.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Schematic Diagram Illustrating the Construction of the Canine Data Repository at PennVet. All dog icons (“pomeranian”, “dalmatian”, “poodle”) within the figure are by: parkjisun, from thenounproject.com.
Figure 2
Figure 2
Schematic Diagram Illustrating the Validation and Iterative Refinement of Data Cleaning Method. If results of our age, breed and weight analysis or age and breed analysis revealed outliers, then we refined our data cleaning algorithm until we achieved results that were more inline with expected. In addition, if our disease – breed association analysis revealed peculiarities then we revisited our data cleaning algorithm until we achieved a cleaned and validated canine data repository. All dog icons (“pomeranian”, “dalmatian”, “poodle”) within the figure are by: parkjisun, from thenounproject.com.
Figure 3
Figure 3
Histogram of Weight Across All Dog Breeds and Ages in Our PennVet Canine Data Repository. Note that weight was missing for data obtained in year 2012 and 2013 and therefore it was set to 0. This accounts for the large spike at 0 in Fig. 3.
Figure 4
Figure 4
Average Age Across All Dog Breeds in Our PennVet Canine Data Repository.
Figure 5
Figure 5
Precision and Recall for PennVet vs. Random for Three Diseases in Our Test Set: Mitral Valve Disease, Atrial Fibrillation and Osteosarcoma. We found higher precision and recall for all three diseases in test set: Mitral Valve Disease, Atrial Fibrillation and Osteosarcoma in our cleaned PennVet dataset versus the random set. For each disease, we developed a random cohort of patients that was the same size as the case population. For example, there were 717 mitral valve patients. Therefore, we set 717 random patients as having the disease. We then performed breed association analysis adjusting the p-values for multiple hypotheses using the FDR metric. This was performed 1000 times for each disease. Distributions of the precision and recall are shown above in Fig. 5.

References

    1. Boland MR, Kashyap A, Xiong J, Holmes J, Lorch S. Development and validation of the PEPPER framework (Prenatal Exposure PubMed ParsER) with applications to food additives. Journal of the American Medical Informatics Association. 2018;25:1432–1443. doi: 10.1093/jamia/ocy119. - DOI - PMC - PubMed
    1. Gurda BL, Bradbury AM, Vite CH. Focus: Comparative Medicine: Canine and Feline Models of Human Genetic Diseases and Their Contributions to Advancing Clinical Therapies. The Yale journal of biology and medicine. 2017;90:417. - PMC - PubMed
    1. Casal M. L. & Me, H. In Mucopolysaccharidoses Update (Metabolic Diseases - Laboratory and Clinical Research) Ch. 35, 697–712 (2019).
    1. Boland, M. R., Dziuk, E., Kraus, M. & Gelzer, A. Cardiovascular Disease Risk Varies by Birth Month in Canines. Scientific Reports8, 10.1038/s41598-41018-25199-w (2018). - PMC - PubMed
    1. Karlsson EK, et al. Genome-wide analyses implicate 33 loci in heritable dog osteosarcoma, including regulatory variants near CDKN2A/B. Genome biology. 2013;14:R132. doi: 10.1186/gb-2013-14-12-r132. - DOI - PMC - PubMed

Publication types