Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Jun;18(6):604-617.
doi: 10.1038/s41592-021-01143-1. Epub 2021 Jun 7.

The emerging landscape of single-molecule protein sequencing technologies

Affiliations
Review

The emerging landscape of single-molecule protein sequencing technologies

Javier Antonio Alfaro et al. Nat Methods. 2021 Jun.

Abstract

Single-cell profiling methods have had a profound impact on the understanding of cellular heterogeneity. While genomes and transcriptomes can be explored at the single-cell level, single-cell profiling of proteomes is not yet established. Here we describe new single-molecule protein sequencing and identification technologies alongside innovations in mass spectrometry that will eventually enable broad sequence coverage in single-cell profiling. These technologies will in turn facilitate biological discovery and open new avenues for ultrasensitive disease diagnostics.

PubMed Disclaimer

Conflict of interest statement

Competing interests

S.H. and C.M. are co-inventors on the patent application EP14158255. E.M.M. and E.V.A. are co-inventors on patent 9625469. D.S. is sponsored by Oxford Nanopore for his work on nanotip MS. E.M.M. and E.V.A. are co-founders and shareholders of Erisyon. B.K. and M. Wilhelm are founders and shareholders of OmicScouts and MSAID. They have no operational role in either company. M.D. and P.Y. are co-inventors on US patent 10006917. P.Y. is an inventor on US patent 10697974 and provisional patent and patent applications on various aspects of DNA nanotechnology–based protein sequencing methods described in this article. P.Y. is a co-founder, director and consultant of Ultivue Inc. and Spear Bio Inc. All remaining authors declare no competing interests. Some authors may be bound by confidentiality agreements that prevent them from disclosing their competing interests in this work; the corresponding authors are not aware of such cases.

Figures

Fig. 1 |
Fig. 1 |. The emerging landscape of single-molecule protein sequencing and fingerprinting technologies.
The new technologies address a range of analytes, methods of protein identification and target niches. Various techniques, particularly those involving complex readout signals, are suitable for characterizing short peptide sequences, while others are primed to characterize full-length proteins or larger complexes. The method of protein identification may fingerprint certain classes of amino acids (AA fingerprint) or reveal each amino acid down to its physiochemical class or better (AA sequencing). Technologies might characterize proteins by their mass or the mass of their fragments (mass spectrum). Other methods aim to characterize the properties of folded proteins (structure fingerprint). PTM, post-translational modification; PPI, protein–protein interaction; NEMS-MS, nanoelectromechanical systems MS.
Fig. 2 |
Fig. 2 |. The renaissance of classic techniques.
a,b, High-throughput fluorosequencing by Edman degradation featuring amino acid-specific chemical modification of peptides with fluorophores (a) and N-terminal amino acid recognition using a plurality of probes (b). c, Neutral-particle MS is a promising technique to characterize proteoforms. Currently, the technology can be used to characterize large megadalton-scale complexes using silicon-based nanosensors. Graphene nanosensors and further developments may push the technology toward smaller and smaller proteins and potentially lead to increased sequence coverage in global proteomics. ESI, electrospray ionization. d, Nanopore electrospray is a marriage of nanopores, classical electrospray and single-particle detection techniques to sequence single proteins by measuring amino acids one at a time. Panel a adapted with permission from ref., Springer Nature.
Fig. 3 |
Fig. 3 |. DNA-facilitated protein sequencing.
a, Schematic of specific amino acid labeling on a denatured protein with DNA strands. Each DNA strand contains a barcode for the specific amino acid and (optionally) a UMI. be, Various readout strategies of DNA-labeled samples for protein identification. b, Protein kinetic fingerprinting using qPAINT. c, Protein linear barcoding using molecular-resolution DNA-PAINT. d, DNA proximity recording. e, Protein structural fingerprinting using FRET-X.
Fig. 4 |
Fig. 4 |. Three strategies of nanopore-based protein sequencing and sensing.
In all cases, a voltage bias is applied across an insulating membrane (left panels) and the analytes translocate through the nanopore from top to bottom (red arrows). a, Reading unlabeled proteins or peptides using a biological nanopore. b, Identification of whole proteins or peptides by fingerprinting with deep learning algorithms. Residue-specific fluorescent labels (for example, at lysine, cysteine and methionine) can be used to fingerprint proteins and peptides alongside electrical current sensing. c, Identification of folded proteins using lipid tethering. Other possible tethers include DNA carriers, DNA origami anchors and plasmonic trapping.
None
Sequence coverage in global proteomics studies.
MS-based global proteomics studies identify and quantify proteins with variable sequence coverage. The single best run from the 47 publications present in ProteomicsDB shows how sample-specific protein sequence coverage improves with sample preparation methods. Sequence coverage generally decreases with sample complexity and increases with time (cost) dedicated to studying the sample.
None
Chemistry for protein sequencing.
a, Lysine labeling with NHS esters. b, Cysteine labeling with iodoacetamide reactive groups. c, Strategies for labeling the phenol ring of tyrosine. d, Aspartate/glutamate labeling. e, Tryptophan labeling with sulfenyl chlorides. f, C-terminal derivatization through monoalkylation of the insulin A chain (yield 41%).

References

    1. Breuza L et al. The UniProtKB guide to the human proteome. Database 2016, bav120 (2016). - PMC - PubMed
    1. Smith LM et al. Proteoform: a single term describing protein complexity. Nat. Methods 10, 186–187 (2013). - PMC - PubMed
    1. Seattle Times Business Staff. Seattle biotech startup Nautilus to get $350 million, stock listing in blank-check deal. The Seattle Times https://www.seattletimes.com/business/seattle-biotech-startup-nautilus-t... (8 February 2021).
    1. Reuters Staff. Protein sequencing firm Quantum-Si to go public via $1.46 billion SPAC merger. Reuters https://www.reuters.com/article/us-quantum-si-m-a-highcape-capital-idUSK... (18 February 2021).
    1. Cohen L & Walt DR Single-molecule arrays for protein and nucleic acid analysis. Annu. Rev. Anal. Chem 10, 345–363 (2017). - PubMed

Publication types

MeSH terms