Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation

iDASH: integrating data for analysis, anonymization, and sharing

Lucila Ohno-Machado et al. J Am Med Inform Assoc. 2012 Mar-Apr.

Abstract

iDASH (integrating data for analysis, anonymization, and sharing) is the newest National Center for Biomedical Computing funded by the NIH. It focuses on algorithms and tools for sharing data in a privacy-preserving manner. Foundational privacy technology research performed within iDASH is coupled with innovative engineering for collaborative tool development and data-sharing capabilities in a private Health Insurance Portability and Accountability Act (HIPAA)-certified cloud. Driving Biological Projects, which span different biological levels (from molecules to individuals to populations) and focus on various health conditions, help guide research and development within this Center. Furthermore, training and dissemination efforts connect the Center with its stakeholders and educate data owners and data consumers on how to share and use clinical and biological data. Through these various mechanisms, iDASH implements its goal of providing biomedical and behavioral researchers with access to data, software, and a high-performance computing environment, thus enabling them to generate and test new hypotheses.

PubMed Disclaimer

Conflict of interest statement

Competing interests: None.

Figures

Figure 1
Figure 1
Integrating data for analysis, anonymization, and sharing. iDASH, the newest National Center for Biomedical Computing, is a scientific team effort that brings together a multi-institutional team of quantitative scientists (information and computer scientists, biostatisticians, mathematicians, and software engineers) to develop algorithms and tools, services, and a biomedical cyber-infrastructure for use by biomedical and behavioral researchers. An illustration of the major components within each area is depicted in this figure. BCI, biomedical cyber-infrastructure.
Figure 2
Figure 2
iDASH natural-language processing (NLP) Ecosystem. Among the multiple resources that iDASH is developing is an NLP Ecosystem to allow collaborative development and testing of algorithms and tools, which structure data from clinical text. This Ecosystem will enable a diverse group of users (eg, researchers, students, and developers) to access public and private datasets, annotations on the data, downloadable tools (eg, algorithms and widgets), and web services. The cyber-infrastructure will provide a computing environment for evaluating code against existing datasets (execution, evaluation, and benchmarking environments), visualizing system output (visualization environment), and performing manual annotations on the textual data (annotation environment). EMR, electronic medical record.
Figure 3
Figure 3
iDASH cyber-infrastructure. iDASH is architecting a Health Insurance Portability and Accountability Act (HIPAA)-compliant biomedical cyber-infrastructure to enable data sharing and analysis with high-performance computing (Gordon and Triton HPC). Security is maintained through separation of layers (presentation, application, data, and management zones) and firewalls that secure communication between the zones. The presentation zone, which separates the user from the application, data, and management zones, provides the primary gateway to the overall system and contains the main components of the biomedical cyber-infrastructure framework, such as load balancers and web servers. The application zone provides the business applications and logic to the presentation zone and contains various application servers. The data zone contains all sensitive data repositories, which are accessed by the application zone. Finally, the management zone, which is only accessible to authorized administrators from approved workstations, provides system management and monitoring, security infrastructure, backup tools, and administration tools. BCI, biomedical cyber-infrastructure.

References

    1. Kanehisa M, Bork P. Bioinformatics in the post-sequence era. Nat Genet 2003;(33 Suppl):305–10 - PubMed
    1. Lander ES, Linton LM, Birren B, et al. Initial sequencing and analysis of the human genome. Nature 2001;409:860–921 - PubMed
    1. Venter JC, Adams MD, Myers EW, et al. The sequence of the human genome. Science 2001;291:1304–51 - PubMed
    1. Malin B, Benitez K, Masys D. Never too old for anonymity: a statistical standard for demographic data sharing via the HIPAA Privacy Rule. J Am Med Inform Assoc 2011;18:3–10 - PMC - PubMed
    1. Loukides G, Denny JC, Malin B. The disclosure of diagnosis codes can breach research participants' privacy. J Am Med Inform Assoc 2010;17:322–7 - PMC - PubMed

Publication types