Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Aug 4;15(1):6602.
doi: 10.1038/s41467-024-50555-y.

Systematic multi-trait AAV capsid engineering for efficient gene delivery

Affiliations

Systematic multi-trait AAV capsid engineering for efficient gene delivery

Fatma-Elzahraa Eid et al. Nat Commun. .

Abstract

Broadening gene therapy applications requires manufacturable vectors that efficiently transduce target cells in humans and preclinical models. Conventional selections of adeno-associated virus (AAV) capsid libraries are inefficient at searching the vast sequence space for the small fraction of vectors possessing multiple traits essential for clinical translation. Here, we present Fit4Function, a generalizable machine learning (ML) approach for systematically engineering multi-trait AAV capsids. By leveraging a capsid library that uniformly samples the manufacturable sequence space, reproducible screening data are generated to train accurate sequence-to-function models. Combining six models, we designed a multi-trait (liver-targeted, manufacturable) capsid library and validated 88% of library variants on all six predetermined criteria. Furthermore, the models, trained only on mouse in vivo and human in vitro Fit4Function data, accurately predicted AAV capsid variant biodistribution in macaque. Top candidates exhibited production yields comparable to AAV9, efficient murine liver transduction, up to 1000-fold greater human hepatocyte transduction, and increased enrichment relative to AAV9 in a screen for liver transduction in macaques. The Fit4Function strategy ultimately makes it possible to predict cross-species traits of peptide-modified AAV capsids and is a critical step toward assembling an ML atlas that predicts AAV capsid performance across dozens of traits.

PubMed Disclaimer

Conflict of interest statement

BED is a scientific founder at Apertura Gene Therapy and a scientific advisory board member at Tevard Biosciences. BED, FEE, and KYC are named inventors on patent applications filed by the Broad Institute of MIT and Harvard related to the design and use of Fit4Function libraries (WO2021222636) and AAV sequences developed as part of this study. The remaining authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1. Systematic multi-trait protein optimization paradigm.
a An insertion-modified AAV library that uniformly samples the 7-mer sequence space (1.28 billion possible variants) is designed and used to produce AAV particles. Variant production fitness is measured via Next-Generation Sequencing (NGS) of nuclease-resistant Cap-containing genomes (VRPM) relative to the number of genomes in the DNA library (DRPM). b The production fitness data is used to train a sequence-to-production-fitness ML model that is then used to design the Fit4Function library, which uniformly and exclusively samples the production-fit sequence space. c The Fit4Function library can be screened in vitro or in vivo for functions of interest, and the data are used to derive ML models that predict these functions from random 7-mer sequences. d The production fitness and functional fitness models are used in combination to populate MultiFunction libraries consisting of variants predicted to perform well across the desired traits (see checkered areas that represent the overlap between the functional sequence spaces of interest). e The MultiFunction AAV libraries are produced and screened for all functions of interest. The top-performing variants are then individually validated.
Fig. 2
Fig. 2. Mapping and learning the 7-mer production fitness landscape.
a The correlation between the production fitness scores of codon replicate pairs is shown. The vertical and horizontal marginal histograms correspond to missing cases where only one codon replicate of a pair was detected. b The production fitness distribution of the modeling library represents the variants detected in at least one of the 24 replicates (92.4% of total variants). The distributions representing non-fit versus production-fit variants are depicted. c The amino acid distribution by position for the variants in the 74.5K most abundant sequences in an NNK library versus the production-fit distribution of the modeling library (26.2K out of 74.5K). d The production fitness replication quality is shown for the control set (10K) that is shared between the modeling and assessment libraries. The Pearson correlations between the predicted versus measured production fitness scores are shown when the model is trained on a subset of the modeling library and e tested on another subset of the same modeling library (n = 30.6K) versus when f tested on the independent assessment library, not including the overlapping 10K set (n = 57.7K after removing the undetected variants). g The performance of the production fitness prediction model is shown across different training set sizes (n = 10 models, mean ± s.d.). Source data are provided in a Source Data file.
Fig. 3
Fig. 3. Fit4Function libraries uniformly sample the production-fit space and enable more accurate functional screening and prediction.
a The distributions of the measured fitness scores are shown for 100K randomly sampled variants from the Fit4Function library versus the uniform modeling library. b The amino acid distribution by position for the variants in the Fit4Function library, production-fit distribution of the modeling library, and 240K most abundant sequences in an NNK library are shown. c The pairwise Pearson correlations among biological triplicates across functional screens (mean ± s.d.; one-tailed paired t-test, n = 5 screens; p = 0.0065) using the Fit4Function library (240K) versus an NNK library (top 240K variants) are shown. hCMEC/D3: human brain endothelial cell line, mBMVEC: primary mouse brain microvascular endothelial cells, hBMVEC: primary human brain microvascular endothelial cells, HEK293: HEK293T/17 cells. Binding and transduction are indicated by ‘b’ and ‘t’, respectively. d The measured versus predicted functional fitness (log2 enrichment scores) for models trained on Fit4Function versus NNK library data are shown. e The replication quality (mean ± s.d.) between pairs of animals (n = 6 pairs across four animals) for the Fit4Function library biodistribution in eight tissues is shown. f The prediction performance of models trained on the in vivo biodistribution of the Fit4Function library across eight organs is shown. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. Liver-targeted MultiFunction library design, validation in human cells and mice, and translation to macaque.
a The Pearson correlation of measured versus predicted enrichment are shown for production fitness and hepatocyte-targeting assays. b The enrichment distributions are shown across variants sampled from the Uniform, Fit4Function, Positive Control (Fit4Function variants satisfying the six conditions), and MultiFunction libraries. The histograms are density-normalized, including non-detected variants (ND). c The hit rates are shown for variants satisfying the six conditions in each listed variant set. d The on-target and off-target measurements for capsids BI151–157 and AAV9 in the MultiFunction library pool are shown as log2 enrichments of the selected capsid (two codon replicates) as compared to AAV9 (four codon replicates). The measured enrichment was linearly normalized according to the maximum and minimum values for each assay. Individual replicates are plotted as points. The average normalized enrichments across replicates are plotted as polygon vertices. e HEPG2 and THLE-2 transduction were assessed 24 h post-transduction at 3000 vg/cell using a luciferase assay (n = 4 transduction replicates per group, mean ± s.d., ****p < 1e−4, unpaired, one-sided t-tests on log-transformed values, and Bonferroni corrected for multiple hypotheses). The measurements were normalized to AAV9. f A 100K variant Fit4Function library was injected intravenously into a cynomolgus macaque and the vector genome distribution was assessed four hours later. Variants predicted to meet all six trait conditions were highly enriched in the cynomolgus macaque liver (biodistribution). The density plot shows the distribution of variants normalized to the sum of counts for each indicated set of variants. g The fraction of the indicated MultiFunction variants enriched in the cynomolgus macaque liver (defined as at least two-fold log2 enrichment greater than that of AAV9) are shown for each combination of predicted traits. Binding and transduction are indicated by ‘b’ and ‘t’, respectively. h The rhesus macaque liver transduction efficiency, measured by transcript levels 4-weeks post-administration, for the MultiFunction variants are shown (n = 2 rhesus macaques). Each variant was represented by two codon replicates while AAV9 was represented by three codon replicates. Source data are provided in a Source Data file.

References

    1. Deverman, B. E. et al. Cre-dependent selection yields AAV variants for widespread gene transfer to the adult brain. Nat. Biotechnol.34, 204–209 (2016). 10.1038/nbt.3440 - DOI - PMC - PubMed
    1. Lisowski, L. et al. Selection and evaluation of clinically relevant AAV variants in a xenograft liver model. Nature506, 382–386 (2014). 10.1038/nature12875 - DOI - PMC - PubMed
    1. Dalkara, D. et al. In vivo-directed evolution of a new adeno-associated virus for therapeutic outer retinal gene delivery from the vitreous. Sci. Transl. Med.5, 189ra76 (2013). 10.1126/scitranslmed.3005708 - DOI - PubMed
    1. Chan, K. Y. et al. Engineered AAVs for efficient noninvasive gene delivery to the central and peripheral nervous systems. Nat. Neurosci.20, 1172–1179 (2017). 10.1038/nn.4593 - DOI - PMC - PubMed
    1. Hanlon, K. S. et al. Selection of an efficient AAV vector for robust CNS transgene expression. Mol. Ther. Methods Clin. Dev.15, 320–332 (2019). 10.1016/j.omtm.2019.10.007 - DOI - PMC - PubMed

Substances