Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Oct 2;22(1):712.
doi: 10.1186/s12864-021-07996-2.

Privacy-preserving storage of sequenced genomic data

Affiliations

Privacy-preserving storage of sequenced genomic data

Rastislav Hekel et al. BMC Genomics. .

Abstract

Background: The current and future applications of genomic data may raise ethical and privacy concerns. Processing and storing of this data introduce a risk of abuse by potential offenders since the human genome contains sensitive personal information. For this reason, we have developed a privacy-preserving method, named Varlock providing secure storage of sequenced genomic data. We used a public set of population allele frequencies to mask the personal alleles detected in genomic reads. Each personal allele described by the public set is masked by a randomly selected population allele with respect to its frequency. Masked alleles are preserved in an encrypted confidential file that can be shared in whole or in part using public-key cryptography.

Results: Our method masked the personal variants and introduced new variants detected in a personal masked genome. Alternative alleles with lower population frequency were masked and introduced more often. We performed a joint PCA analysis of personal and masked VCFs, showing that the VCFs between the two groups cannot be trivially mapped. Moreover, the method is reversible and personal alleles in specific genomic regions can be unmasked on demand.

Conclusion: Our method masks personal alleles within genomic reads while preserving valuable non-sensitive properties of sequenced DNA fragments for further research. Personal alleles in the desired genomic regions may be restored and shared with patients, clinics, and researchers. We suggest that the method can provide an additional security layer for storing and sharing of the raw aligned reads.

Keywords: Genomic privacy; Genomic reads; Personal data.

PubMed Disclaimer

Conflict of interest statement

The authors are employees of Geneton s.r.o. a company which participated in the development of the submitted patent: A computer implemented method for privacy-preserving storage of raw genome data based on population variants - PCT/EP2019/067336. The patent does not restrict research application methods.

Figures

Fig. 1
Fig. 1
Intersections between the sets of positions with alternative alleles from three VCF files: population VCF, personal VCF, and masked VCF
Fig. 2
Fig. 2
The distribution of alternative allele frequency reported by population VCF, personal VCF, and masked VCF
Fig. 3
Fig. 3
The ratio of masked to not masked alleles and its relation to population allele frequency
Fig. 4
Fig. 4
Personal VCFs are clearly shifted from the original local population (non-Finnish European) to VCFs masked with alleles from all gnomAD populations. Lines link the individual original BAMs (circles) with their masked counterparts (triangles)
Fig. 5
Fig. 5
All masked VCFs, including outliers in their personal form, are clustered in the same region. The lines link the individual original BAMs (circles) with their masked counterparts (triangles). For details of the cluster, see Fig. 6
Fig. 6
Fig. 6
The detail of the cluster from Fig. 5. The lines link the individual original BAMs (circles) with their masked counterparts (triangles)
Fig. 7
Fig. 7
Personal and masked VCFs from African (red) and non-Finnish European (blue) population. Each of the two personal VCFs was masked with non-Finnish European (triangle) and African (square) population allele frequencies. The arrows point to a position of the masked version of the personal VCF, while the coloured arrow denotes masking with population allele frequencies matching the origin of the personal VCF
Fig. 8
Fig. 8
Workflow of the masking method, where BAM file and VOF file are processed into the masked BAM and BDIFF files. The BDIFF file is subsequently encrypted
Fig. 9
Fig. 9
Workflow of the unmasking method, where the BDIFF file is decrypted and used to unmask the masked BAM file to restore the personal BAM file
Fig. 10
Fig. 10
Workflow of the sharing method showing decryption of BDIFF and encryption of its subrange intended for a specific user
Fig. 11
Fig. 11
Flow of masking and unmasking alleles at a single variant position within covering alignments. The masking is represented as “mask alleles” in Fig. 8, and the unmasking is represented as “unmask alleles” in Fig. 9

References

    1. 1000 Genomes Project Consortium. Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, et al. A global reference for human genetic variation. Nature. 2015;526(7571):68–74. doi: 10.1038/nature15393. - DOI - PMC - PubMed
    1. Ashley EA. Towards precision medicine. Nat Rev Genet. 2016;17(9):507–522. doi: 10.1038/nrg.2016.86. - DOI - PubMed
    1. Ayday E, De Cristofaro, Hubaux J-P, Tsudik G. The chills and thrills of whole genome sequencing. Computer. 2013a. 10.1109/mc.2013.333.
    1. Ayday E, Raisaro JL, Hubaux J-P, Rougemont J. Protecting and evaluating genomic privacy in medical tests and personalized medicine. In: Proceedings of the 12th ACM workshop on workshop on privacy in the electronic society, 95–106: ACM; 2013b.
    1. Ayday E, Raisaro JL, Hengartner U, Molyneaux A, Hubaux J-P. Data privacy management and autonomous spontaneous security, edited by Joaquin Garcia-Alfaro, Georgios Lioudakis, Nora Cuppens-Boulahia, Simon Foley, and William M. Fitzgerald, 8247:133–47. Lecture notes in computer science. Berlin, Heidelberg: Springer Berlin Heidelberg; 2014. Privacy-preserving processing of raw genomic data.

LinkOut - more resources