Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul 13;12(1):4242.
doi: 10.1038/s41467-021-24496-9.

Data storage using peptide sequences

Affiliations

Data storage using peptide sequences

Cheuk Chi A Ng et al. Nat Commun. .

Abstract

Humankind is generating digital data at an exponential rate. These data are typically stored using electronic, magnetic or optical devices, which require large physical spaces and cannot last for a very long time. Here we report the use of peptide sequences for data storage, which can be durable and of high storage density. With the selection of suitable constitutive amino acids, designs of address codes and error-correction schemes to protect the order and integrity of the stored data, optimization of the analytical protocol and development of a software to effectively recover peptide sequences from the tandem mass spectra, we demonstrated the feasibility of this method by successfully storing and retrieving a text file and the music file Silent Night with 40 and 511 18-mer peptides respectively. This method for the first time links data storage with the peptide synthesis industry and proteomics techniques, and is expected to stimulate the development of relevant fields.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview of the process of storing and retrieving data into and from peptides.
The direction in blue represents the data storing process, while the direction in red represents the data retrieving process.
Fig. 2
Fig. 2. Overview of data retrieval from dataset A.
a The message of dataset A; b The chromatogram for analysis of the 40 peptides for dataset A; c A typical MS/MS spectrum for analysis of peptides in dataset A, and the sequence of one of the data-bearing peptide read out from the spectrum; d The highest-intensity-tag-based sequencing method used in the sequence recovery.
Fig. 3
Fig. 3
The chromatogram for analysis of the peptide mixture encoding dataset B.
Fig. 4
Fig. 4. A flowchart illustrating the method of highest-intensity-tag based sequencing.
i represents the iteration number and V is the maximum number of iterations. W represents the number of masses with the higher ranking used in the tag-finding processing and wi is the number of masses with the higher ranking for the ith iteration. J represents the ranking of intensity and Jmax is the maximum number of higher-ranking-intensity masses allowed to be the start point to find the tag.
Fig. 5
Fig. 5. A flowchart illustrating the method of two-stage sequencing.
AAC stands for amino acid combinations.
Fig. 6
Fig. 6. A flowchart illustrating the method of sequence grouping.
The procedure of grouping is shown in the dashed square.

References

    1. Hilbert M, López P. The World’s technological capacity to store, communicate, and compute information. Science. 2011;332:60. doi: 10.1126/science.1200970. - DOI - PubMed
    1. Hoist, A. Volume of data/information created, captured, copied, and consumed worldwide from 2010 to 2025. https://www.statista.com/statistics/871513/worldwide-data-created/ (accessed 28 May 2021).
    1. Clelland CT, Risca V, Bancroft C. Hiding messages in DNA microdots. Nature. 1999;399:533. doi: 10.1038/21092. - DOI - PubMed
    1. Bornholt J, et al. A DNA-based archival storage system. SIGPLAN Not. 2016;51:637–649. doi: 10.1145/2954679.2872397. - DOI
    1. Regalado, A. Microsoft has a plan to add DNA data storage to its cloud. MIT Technol. Rev. (2017).

Publication types