. 2022 Nov 8:11:e79777.

doi: 10.7554/eLife.79777.

Targeted genomic sequencing with probe capture for discovery and surveillance of coronaviruses in bats

Kevin S Kuchinski^{1

2}, Kara D Loos^{3

4}, Danae M Suchan^{3

4}, Jennifer N Russell^{3

4}, Ashton N Sies^{3

4}, Charles Kumakamba⁵, Francisca Muyembe⁵, Placide Mbala Kingebeni^{5

6}, Ipos Ngay Lukusa⁵, Frida N'Kawa⁵, Joseph Atibu Losoma⁵, Maria Makuwa^{5

7}, Amethyst Gillis^{8

9}, Matthew LeBreton¹⁰, James A Ayukekbong^{11

12}, Nicole A Lerminiaux^{3

4}, Corina Monagin^{8

13}, Damien O Joly^{11

14}, Karen Saylors^{7

8}, Nathan D Wolfe⁸, Edward M Rubin⁸, Jean J Muyembe Tamfum⁶, Natalie A Prystajecky^{1

2}, David J McIver^{11

15}, Christian E Lange^{7

11}, Andrew D S Cameron^{3

4}

Affiliations

¹ Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada.
² Public Health Laboratory, British Columbia Centre for Disease Control, Vancouver, Canada.
³ Department of Biology, Faculty of Science, University of Regina, Regina, Canada.
⁴ Institute for Microbial Systems and Society, Faculty of Science, University of Regina, Regina, Canada.
⁵ Metabiota Inc, Kinshasa, Democratic Republic of the Congo.
⁶ Institut National de Recherche Biomédicale, Kinshasa, Democratic Republic of the Congo.
⁷ Labyrinth Global Health Inc, St. Petersburg, United States.
⁸ Metabiota Inc, San Francisco, United States.
⁹ Development Alternatives, Washington, United States.
¹⁰ Mosaic, Yaoundé, Cameroon.
¹¹ Metabiota, Nanaimo, Canada.
¹² Southbridge Care, Cambridge, Canada.
¹³ One Health Institute, School of Veterinary Medicine, University of California, Davis, Davis, United States.
¹⁴ Nyati Health Consulting, Nanaimo, Canada.
¹⁵ Institute for Global Health Sciences, University of California, San Francisco, San Francisco, United States.

PMID: 36346652
PMCID: PMC9643004
DOI: 10.7554/eLife.79777

Targeted genomic sequencing with probe capture for discovery and surveillance of coronaviruses in bats

Kevin S Kuchinski et al. Elife. 2022.

. 2022 Nov 8:11:e79777.

doi: 10.7554/eLife.79777.

Authors

Affiliations

¹ Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada.
² Public Health Laboratory, British Columbia Centre for Disease Control, Vancouver, Canada.
³ Department of Biology, Faculty of Science, University of Regina, Regina, Canada.
⁴ Institute for Microbial Systems and Society, Faculty of Science, University of Regina, Regina, Canada.
⁵ Metabiota Inc, Kinshasa, Democratic Republic of the Congo.
⁶ Institut National de Recherche Biomédicale, Kinshasa, Democratic Republic of the Congo.
⁷ Labyrinth Global Health Inc, St. Petersburg, United States.
⁸ Metabiota Inc, San Francisco, United States.
⁹ Development Alternatives, Washington, United States.
¹⁰ Mosaic, Yaoundé, Cameroon.
¹¹ Metabiota, Nanaimo, Canada.
¹² Southbridge Care, Cambridge, Canada.
¹³ One Health Institute, School of Veterinary Medicine, University of California, Davis, Davis, United States.
¹⁴ Nyati Health Consulting, Nanaimo, Canada.
¹⁵ Institute for Global Health Sciences, University of California, San Francisco, San Francisco, United States.

PMID: 36346652
PMCID: PMC9643004
DOI: 10.7554/eLife.79777

Abstract

Public health emergencies like SARS, MERS, and COVID-19 have prioritized surveillance of zoonotic coronaviruses, resulting in extensive genomic characterization of coronavirus diversity in bats. Sequencing viral genomes directly from animal specimens remains a laboratory challenge, however, and most bat coronaviruses have been characterized solely by PCR amplification of small regions from the best-conserved gene. This has resulted in limited phylogenetic resolution and left viral genetic factors relevant to threat assessment undescribed. In this study, we evaluated whether a technique called hybridization probe capture can achieve more extensive genome recovery from surveillance specimens. Using a custom panel of 20,000 probes, we captured and sequenced coronavirus genomic material in 21 swab specimens collected from bats in the Democratic Republic of the Congo. For 15 of these specimens, probe capture recovered more genome sequence than had been previously generated with standard amplicon sequencing protocols, providing a median 6.1-fold improvement (ranging up to 69.1-fold). Probe capture data also identified five novel alpha- and betacoronaviruses in these specimens, and their full genomes were recovered with additional deep sequencing. Based on these experiences, we discuss how probe capture could be effectively operationalized alongside other sequencing technologies for high-throughput, genomics-based discovery and surveillance of bat coronaviruses.

Keywords: DNA sequencing; bat; coronavirus; genome; infectious disease; microbiology; probe capture; viruses.

PubMed Disclaimer

Conflict of interest statement

KK, KL, DS, JR, AS, ML, NL, JM, NP, AC No competing interests declared, CK, FM, PM, IN, FN, JA, JA, CM, ER, DM were employed by Metabiota Inc, MM, KS, CL are employees of Labyrinth Global Health Inc and were employed by Metabiota Inc, AG is an employee of Development Alternatives Inc and was employed by Metabiota Inc, DJ is an employee of Nyati Health Consulting and was employed by Metabiota Inc, NW is an employee of Metabiota Inc

Figures

**Figure 1.. Custom hybridization probe panel provided broadly inclusive coverage of known bat coronavirus diversity in silico.**
Bat coronavirus (CoV) sequences were obtained by downloading all available *alphacoronavirus*, *betacoronavirus*, and unclassified *coronaviridae* and *coronavirinae* sequences from GenBank on 4 October 2020 and searching for bat-related keywords in sequence headers. A custom panel of 20,000 probes was designed to target these sequences using the *makeprobes* module in the ProbeTools package. The ProbeTools *capture* and *stats* modules were used to assess probe coverage of bat CoV reference sequences. (A) Each bat CoV sequence is represented as a dot plotted according to its probe coverage, that is, the percentage of its nucleotide positions covered by at least one probe in the custom panel. (B) The same analysis was performed on the subset of sequences representing full-length genomes (>25 kb in length).

**Figure 2.. De novo assembly of probe captured libraries yielded more genome sequence than standard amplicon sequencing methods for most specimens.**
Reads from probe captured libraries were assembled de novo with coronaSPAdes, and coronavirus contigs were identified by local alignment against a database of all *coronaviridae* sequences in GenBank. (A) The size distribution of contigs from all libraries is shown. Dots are coloured to indicate whether the length of the contig exceeded partial RNA-dependent RNA polymerase (RdRP) gene amplicons previously sequenced from these specimens. (B) Total assembly size and assembly N50 distributions for all libraries. (C) Each contig is represented as a dot plotted according to its length. Assembly N50 sizes and total assembly sizes are indicated by the height of their bars.

**Figure 3.. Coverage of reference sequences by probe captured libraries was used to assess extent and location of recovery.**
Reference sequences were chosen for each previously identified phylogenetic group (indicated in panel titles). Coverage of these reference sequences was determined by mapping reads and aligning contigs from probe captured libraries. Dark grey profiles show depth of read coverage along reference sequences. Blue shading indicates spans where contigs aligned. The locations of spike and RNA-dependent RNA polymerase (RdRP) genes are indicated in each reference sequence and shaded light grey. This figure shows the six libraries with the most extensive reference sequence coverage. Similar plots are provided as figure supplements for all libraries where any coronavirus sequence was recovered (Figure 3—figure supplements 1–4) .

**Figure 3—figure supplement 1.. Coverage of reference sequence by probe captured libraries for specimens from phylogenetic group Q-Alpha-4.**
Coverage of reference sequence was determined by mapping reads and aligning contigs from probe captured libraries. Dark grey profiles show depth of read coverage along reference sequence. Blue shading indicates spans where contigs aligned. The locations of spike and RNA-dependent RNA polymerase (RdRP) genes are indicated and shaded light grey.

**Figure 3—figure supplement 2.. Coverage of reference sequence by probe captured libraries for specimens from phylogenetic group W-Beta-2.**
Coverage of reference sequence was determined by mapping reads and aligning contigs from probe captured libraries. Dark grey profiles show depth of read coverage along reference sequence. Blue shading indicates spans where contigs aligned. The locations of spike and RNA-dependent RNA polymerase (RdRP) genes are indicated and shaded light grey.

**Figure 3—figure supplement 3.. Coverage of reference sequence by probe captured libraries for specimens from phylogenetic group W-Beta-3.**
Coverage of reference sequence was determined by mapping reads and aligning contigs from probe captured libraries. Dark grey profiles show depth of read coverage along reference sequence. Blue shading indicates spans where contigs aligned. The locations of spike and RNA-dependent RNA polymerase (RdRP) genes are indicated and shaded light grey.

**Figure 3—figure supplement 4.. Coverage of reference sequence by probe captured libraries for specimens from phylogenetic group W-Beta-4.**
Coverage of reference sequence was determined by mapping reads and aligning contigs from probe captured libraries. Dark grey profiles show depth of read coverage along reference sequence. Blue shading indicates spans where contigs aligned. The locations of spike gene are indicated and shaded light grey. Ambiguous bases (Ns) are shaded orange.

**Figure 4.. Probe captured libraries provided more extensive coverage of reference genomes than standard amplicon sequencing protocols for most specimens.**
Reference sequences were selected for the previously identified phylogenetic groups to which these specimens had been assigned by Kumakamba et al., 2021. (A) Coverage of these reference sequences was determined by mapping reads and aligning contigs from probe captured libraries. Each library is represented as a dot, and dots are coloured according to whether reference sequence coverage exceeded the length of the partial RNA-dependent RNA polymerase (RdRP) gene sequence that had been previously generated by amplicon sequencing. (B) The number of reference sequence positions covered by probe captured libraries was divided by the length of the partial RdRP amplicon sequences from these specimens. This provided the fold-difference in recovery between probe capture and standard amplicon sequencing methods. (C) Percent coverage of the spike and RdRP genes were calculated for each specimen.

**Figure 5.. Recovery of coronavirus (CoV) genomic material was limited in vitro by method sensitivity.**
(A) Sensitivity was assessed by evaluating recovery of partial RNA-dependent RNA polymerase (RdRp) gene regions that had been previously sequenced in these specimens by amplicon sequencing. Probe coverage of partial RdRp sequences was assessed in silico to exclude insufficient probe design as an alternate explanation for incomplete recovery of these targets. (B) Input RNA concentration, RNA integrity numbers (RINs), and CoV genome abundance were measured for each specimen. The impact of these specimen characteristics on recovery by probe capture (as measured by reference sequence coverage) was assessed using Spearman’s rank correlation (test results stated in plots). An outlier was omitted from this analysis: RNA concentration for specimen CDAB0160R was recorded as 190 ng/μl, a value 4.7 SDs from the mean of the distribution.

**Figure 6.. In silico assessment of probe panel coverage for reference genomes.**
Reference sequences were chosen for each previously identified phylogenetic group (indicated in panel titles). Blue profiles show the number of probes covering each nucleotide position along the reference sequence. Probe coverage, that is, the percentage of nucleotide positions covered by at least one probe, is stated in panel titles. Ambiguity nucleotides (Ns) are shaded in orange, and these positions were excluded from the probe coverage calculations. The locations of spike and RNA-dependent RNA polymerase (RdRP) genes are indicated in each reference sequence (where available) and shaded grey.

**Figure 7.. Coronavirus (CoV) genomic material was low abundance in swab specimens but effectively enriched by probe capture.**
(A) Reads from uncaptured, deep metagenomic sequenced libraries were mapped to complete genomes recovered from these specimens to assess abundance of CoV genomic material. On-target rate was calculated as the percentage of total reads mapping that mapped to the CoV genome sequence. (B) Reads from probe captured libraries were also mapped to assess enrichment and removal of background material. Most libraries used for probe capture (-PRE and -TRI) had insufficient volume remaining for deep metagenomic sequencing, so new libraries were prepared (-DEEP) from the same specimens.

**Figure 8.. Phylogenetic tree of translated spike gene sequences from *alphacoronaviruses*.**
Spike sequences are coloured according to whether they were from study specimens (blue), human CoVs (red), RefSeq (black), or GenBank (grey). Only the 25 closest-matching spike sequences from GenBank were included, as determined by blastp bitscores. GenBank and RefSeq accession numbers are provided in parentheses. The scale bar measures amino acid substitutions per site.

**Figure 9.. Phylogenetic tree of translated spike gene sequences from *betacoronaviruses*.**
Spike sequences are coloured according to whether they were from study specimens (blue), human coronaviruses (CoVs) (red), RefSeq (black), or GenBank (grey). Only the 25 closest-matching spike sequences from GenBank were included, as determined by blastp bitscores. GenBank and RefSeq accession numbers are provided in parentheses. The scale bar measures amino acid substitutions per site.

See this image and copyright information in PMC

References

1. Alkhovsky S, Lenshin S, Romashin A, Vishnevskaya T, Vyshemirsky O, Bulycheva Y, Lvov D, Gitelman A. Sars-Like coronaviruses in horseshoe bats (Rhinolophus spp.) in Russia, 2020. Viruses. 2022;14:113. doi: 10.3390/v14010113. - DOI - PMC - PubMed
1. Anthony SJ, Johnson CK, Greig DJ, Kramer S, Che X, Wells H, Hicks AL, Joly DO, Wolfe ND, Daszak P, Karesh W, Lipkin WI, Morse SS, PREDICT Consortium. Mazet JAK, Goldstein T. Global patterns in coronavirus diversity. Virus Evolution. 2017;3:vex012. doi: 10.1093/ve/vex012. - DOI - PMC - PubMed
1. Bonsall D, Ansari MA, Ip C, Trebes A, Brown A, Klenerman P, Buck D, Piazza P, Barnes E, Bowden R, STOP-HCV Consortium Ve-SEQ: robust, unbiased enrichment for streamlined detection and whole-genome sequencing of HCV and other highly diverse pathogens. F1000Research. 2015;4:1062. doi: 10.12688/f1000research.7111.1. - DOI - PMC - PubMed
1. Briese T, Kapoor A, Mishra N, Jain K, Kumar A, Jabado OJ, Lipkin WI. Virome capture sequencing enables sensitive viral diagnosis and comprehensive virome analysis. MBio. 2015;6:e01491-15. doi: 10.1128/mBio.01491-15. - DOI - PMC - PubMed
1. Brown JR, Roy S, Ruis C, Yara Romero E, Shah D, Williams R, Breuer J. Norovirus whole-genome sequencing by sureselect target enrichment: a robust and sensitive method. Journal of Clinical Microbiology. 2016;54:2530–2537. doi: 10.1128/JCM.01052-16. - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Targeted genomic sequencing with probe capture for discovery and surveillance of coronaviruses in bats

Affiliations

Targeted genomic sequencing with probe capture for discovery and surveillance of coronaviruses in bats

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Medical

Miscellaneous