Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Mar 1;25(3):230-238.
doi: 10.1093/jamia/ocx079.

Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record

Affiliations

Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record

Jason Walonoski et al. J Am Med Inform Assoc. .

Erratum in

Abstract

Objective: Our objective is to create a source of synthetic electronic health records that is readily available; suited to industrial, innovation, research, and educational uses; and free of legal, privacy, security, and intellectual property restrictions.

Materials and methods: We developed Synthea, an open-source software package that simulates the lifespans of synthetic patients, modeling the 10 most frequent reasons for primary care encounters and the 10 chronic conditions with the highest morbidity in the United States.

Results: Synthea adheres to a previously developed conceptual framework, scales via open-source deployment on the Internet, and may be extended with additional disease and treatment modules developed by its user community. One million synthetic patient records are now freely available online, encoded in standard formats (eg, Health Level-7 [HL7] Fast Healthcare Interoperability Resources [FHIR] and Consolidated-Clinical Document Architecture), and accessible through an HL7 FHIR application program interface.

Discussion: Health care lags other industries in information technology, data exchange, and interoperability. The lack of freely distributable health records has long hindered innovation in health care. Approaches and tools are available to inexpensively generate synthetic health records at scale without accidental disclosure risk, lowering current barriers to entry for promising early-stage developments. By engaging a growing community of users, the synthetic data generated will become increasingly comprehensive, detailed, and realistic over time.

Conclusion: Synthetic patients can be simulated with models of disease progression and corresponding standards of care to produce risk-free realistic synthetic health care records at scale.

Keywords: RS-EHR; clinical pathways; computer simulation; electronic health records; patient-specific modeling.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
PADARSER as the conceptual framework for Synthea.
Figure 2.
Figure 2.
Synthea software architecture.
Figure 3.
Figure 3.
Simplified example of a Synthea module.
Figure 4.
Figure 4.
Graph of age at diagnosis of type 2 diabetes.
Listing 1
Listing 1
Sample synthetic patient data (abridged).
Listing 2
Listing 2
Partial JSON representation of a Synthea module.

References

    1. Moniz L., Buczak A. L., Hung L., Babin S., Dorko M., Lombardo J.. Construction and Validation of Synthetic Electronic Medical Records. Online J Public Health Inform. 2009;11: ojphi.v1i1.2720. http://doi.org/10.5210/ojphi.v1i1.2720. - PMC - PubMed
    1. Vinzamuri B, Reddy C. Cox Regression with Correlation Based Regularization for Electronic Health Records. Wayne State University; 2013. http://dmkd.cs.vt.edu/papers/ICDM13.pdf
    1. Weiss J, Page D. Forest-based point process for event prediction from electronic health records.European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases . University of Wisconsin; 2013. http://www.ecmlpkdd2013.org/wp-content/uploads/2013/07/128.pdf
    1. Braunstein M. From EHR to Healthcare App Platform. Information Week: Healthcare. 2014. http://www.informationweek.com/healthcare/electronic-health-records/from.... Accessed July 25, 2017.
    1. Sweeney L, Abu A, Winn J. Identifying Participants in the Personal Genome Project by Name. Harvard University: Data Privacy Lab; 2013. http://dataprivacylab.org/projects/pgp/1021-1.pdf. Accessed July 25, 2017.