Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr 22;12(1):2366.
doi: 10.1038/s41467-021-22555-9.

Machine learning guided aptamer refinement and discovery

Affiliations

Machine learning guided aptamer refinement and discovery

Ali Bashir et al. Nat Commun. .

Abstract

Aptamers are single-stranded nucleic acid ligands that bind to target molecules with high affinity and specificity. They are typically discovered by searching large libraries for sequences with desirable binding properties. These libraries, however, are practically constrained to a fraction of the theoretical sequence space. Machine learning provides an opportunity to intelligently navigate this space to identify high-performing aptamers. Here, we propose an approach that employs particle display (PD) to partition a library of aptamers by affinity, and uses such data to train machine learning models to predict affinity in silico. Our model predicted high-affinity DNA aptamers from experimental candidates at a rate 11-fold higher than random perturbation and generated novel, high-affinity aptamers at a greater rate than observed by PD alone. Our approach also facilitated the design of truncated aptamers 70% shorter and with higher binding affinity (1.5 nM) than the best experimental candidate. This work demonstrates how combining machine learning and physical approaches can be used to expedite the discovery of better diagnostic and therapeutic agents.

PubMed Disclaimer

Conflict of interest statement

Q.Y., W.C., Q.G., J.J., H.K., A.S., J.W., and B.S.F. are employees and/or shareholders of Aptitude Medical Systems Inc. A.B., S.H., C.M., G.D., Z.A., A.P., G.E.D., M.B., and M.D. are employees of Google, a technology company that sells machine learning services as part of its business.

Figures

Fig. 1
Fig. 1. MLPD overview.
a Aptamer candidates from a physical pool (white dots) sample a small portion of the fitness landscape (dark gray line), each with a corresponding affinity level (light to dark green). b Particle display discerns the affinity level of each candidate by interrogating the library at multiple stringency levels. c Aptamer sequences and their corresponding affinity levels are used to train and validate a neural network ML model. d The ML model extrapolates new sequences on the fitness landscape in two ways: (1) mutating existing candidates (white dots) in a model-guided fashion (orange dots), and (2) nominating novel sequences in silico, predicting their position on the fitness landscape (white diamonds) and walking top-performing sequences to higher affinity levels (orange diamonds). The extrapolated candidates are synthesized and experimentally tested. e MLPD yields more candidates at each affinity level compared to the initial library, and enables sequence truncation without reduction in affinity.
Fig. 2
Fig. 2. Design of particle display training data and concordance across experimental affinity thresholds.
a Two rounds (denoted R1 or R2, respectively) of particle display (PD) experiments were run with increasing stringency (decreasing protein concentrations) such that the lowest stringency (light green) should contain all aptamers observed at higher stringencies. At each stringency level, we obtained positive pools of aptamers each with affinities that pass the affinity threshold (green shades) and negative pools that do not pass the affinity threshold. R1 positive pools were amplified then mixed as the template for the R2 particle display experiment. All pools were NGS sequenced. bc Venn diagram of unique aptamer clusters in (b) the original particle display experiment and (c) the machine learning guided particle display (MLPD) positive pools. Green-colored sections indicate sequences observed at a particular stringency and all lower stringencies. The dotted line and pie chart in (c) show the concordance (dark green) of the fourth and highest stringency run in the MLPD experiment (< 8 nM).
Fig. 3
Fig. 3. Experimental validation of machine learning predictions.
a Observed affinity for experimental seeds used in machine learning guided particle display (MLPD). Greens correspond to sequences at particular affinity thresholds (with darker greens indicating higher affinity); white corresponds to sequences below the lowest screened affinity, and gray corresponds to sequences with ambiguous affinity. b Candidates generated by three machine learning (ML) model walks (SuperBin (blue), Binned (orange), Counts (purple)) and Random walks (Red) as a fraction of the input pool size. The MLPD panels show ML-directed walks starting from (top) the original particle display experimental (expt.) seeds, (middle) randomly screened and model ranked ML seeds, (bottom) completely random seeds. Independent of the seed category, the ML-directed walks substantially outperform random walks and the original particle display.
Fig. 4
Fig. 4. Performance of machine learning directed aptamer length truncation.
a, b Box-plots and swarmplots showing model scores for candidate aptamers calculated across multiple sequence backgrounds. Each swarm/box plot corresponds to one core sequence: each point represents the core sequence in a different sequence background, each box represents the median, lower, and upper quartiles, and whiskers correspond to 1.5x the interquartile range. Sequence lengths with multiple swarm/box plots indicate cases where multiple different subsequences of the same length were tested experimentally. Particle display affinity level for a truncated sequence is shown by shade of green in the corresponding swarm. c, d KD curves showing the affinity of the full-length sequences (orange, purple) and 23 nt truncations (blue, teal) for G12 and G13, respectively. e, f Secondary structures of the full-length sequence and 23 nucleotide (nt) truncations for G12 and G13, respectively. Each nt is indicated by a small circle (A (maroon), C (blue), G (brown), T (green)). Covalent bonds in the phosphodiester backbone are shown in black and hydrogen bonds between bases are shown in magenta). The TGGATAG motif is outlined in blue.

References

    1. Zhou J, Rossi J. Aptamers as targeted therapeutics: current potential and challenges. Nat. Rev. Drug Discov. 2017;16:440. doi: 10.1038/nrd.2017.86. - DOI - PubMed
    1. Keefe AD, Pai S, Ellington A. Aptamers as therapeutics. Nat. Rev. Drug Discov. 2010;9:537–550. doi: 10.1038/nrd3141. - DOI - PMC - PubMed
    1. Panigaj M, et al. Aptamers as modular components of therapeutic nucleic acid nanotechnology. ACS Nano. 2019;13:12301–12321. doi: 10.1021/acsnano.9b06522. - DOI - PMC - PubMed
    1. Tan, W. et al. Nucleic acid aptamers for molecular diagnostics and therapeutics: advances and perspectives. Angew. Chem. Int. Ed Engl. (2020) 10.1002/anie.202003563. - PubMed
    1. Xiang D, et al. Superior performance of aptamer in tumor penetration over antibody: implication of aptamer-based theranostics in solid tumors. Theranostics. 2015;5:1083–1097. doi: 10.7150/thno.11711. - DOI - PMC - PubMed