Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 8:11:e79777.
doi: 10.7554/eLife.79777.

Targeted genomic sequencing with probe capture for discovery and surveillance of coronaviruses in bats

Affiliations

Targeted genomic sequencing with probe capture for discovery and surveillance of coronaviruses in bats

Kevin S Kuchinski et al. Elife. .

Abstract

Public health emergencies like SARS, MERS, and COVID-19 have prioritized surveillance of zoonotic coronaviruses, resulting in extensive genomic characterization of coronavirus diversity in bats. Sequencing viral genomes directly from animal specimens remains a laboratory challenge, however, and most bat coronaviruses have been characterized solely by PCR amplification of small regions from the best-conserved gene. This has resulted in limited phylogenetic resolution and left viral genetic factors relevant to threat assessment undescribed. In this study, we evaluated whether a technique called hybridization probe capture can achieve more extensive genome recovery from surveillance specimens. Using a custom panel of 20,000 probes, we captured and sequenced coronavirus genomic material in 21 swab specimens collected from bats in the Democratic Republic of the Congo. For 15 of these specimens, probe capture recovered more genome sequence than had been previously generated with standard amplicon sequencing protocols, providing a median 6.1-fold improvement (ranging up to 69.1-fold). Probe capture data also identified five novel alpha- and betacoronaviruses in these specimens, and their full genomes were recovered with additional deep sequencing. Based on these experiences, we discuss how probe capture could be effectively operationalized alongside other sequencing technologies for high-throughput, genomics-based discovery and surveillance of bat coronaviruses.

Keywords: DNA sequencing; bat; coronavirus; genome; infectious disease; microbiology; probe capture; viruses.

PubMed Disclaimer

Conflict of interest statement

KK, KL, DS, JR, AS, ML, NL, JM, NP, AC No competing interests declared, CK, FM, PM, IN, FN, JA, JA, CM, ER, DM were employed by Metabiota Inc, MM, KS, CL are employees of Labyrinth Global Health Inc and were employed by Metabiota Inc, AG is an employee of Development Alternatives Inc and was employed by Metabiota Inc, DJ is an employee of Nyati Health Consulting and was employed by Metabiota Inc, NW is an employee of Metabiota Inc

Figures

Figure 1.
Figure 1.. Custom hybridization probe panel provided broadly inclusive coverage of known bat coronavirus diversity in silico.
Bat coronavirus (CoV) sequences were obtained by downloading all available alphacoronavirus, betacoronavirus, and unclassified coronaviridae and coronavirinae sequences from GenBank on 4 October 2020 and searching for bat-related keywords in sequence headers. A custom panel of 20,000 probes was designed to target these sequences using the makeprobes module in the ProbeTools package. The ProbeTools capture and stats modules were used to assess probe coverage of bat CoV reference sequences. (A) Each bat CoV sequence is represented as a dot plotted according to its probe coverage, that is, the percentage of its nucleotide positions covered by at least one probe in the custom panel. (B) The same analysis was performed on the subset of sequences representing full-length genomes (>25 kb in length).
Figure 2.
Figure 2.. De novo assembly of probe captured libraries yielded more genome sequence than standard amplicon sequencing methods for most specimens.
Reads from probe captured libraries were assembled de novo with coronaSPAdes, and coronavirus contigs were identified by local alignment against a database of all coronaviridae sequences in GenBank. (A) The size distribution of contigs from all libraries is shown. Dots are coloured to indicate whether the length of the contig exceeded partial RNA-dependent RNA polymerase (RdRP) gene amplicons previously sequenced from these specimens. (B) Total assembly size and assembly N50 distributions for all libraries. (C) Each contig is represented as a dot plotted according to its length. Assembly N50 sizes and total assembly sizes are indicated by the height of their bars.
Figure 3.
Figure 3.. Coverage of reference sequences by probe captured libraries was used to assess extent and location of recovery.
Reference sequences were chosen for each previously identified phylogenetic group (indicated in panel titles). Coverage of these reference sequences was determined by mapping reads and aligning contigs from probe captured libraries. Dark grey profiles show depth of read coverage along reference sequences. Blue shading indicates spans where contigs aligned. The locations of spike and RNA-dependent RNA polymerase (RdRP) genes are indicated in each reference sequence and shaded light grey. This figure shows the six libraries with the most extensive reference sequence coverage. Similar plots are provided as figure supplements for all libraries where any coronavirus sequence was recovered (Figure 3—figure supplements 1–4) .
Figure 3—figure supplement 1.
Figure 3—figure supplement 1.. Coverage of reference sequence by probe captured libraries for specimens from phylogenetic group Q-Alpha-4.
Coverage of reference sequence was determined by mapping reads and aligning contigs from probe captured libraries. Dark grey profiles show depth of read coverage along reference sequence. Blue shading indicates spans where contigs aligned. The locations of spike and RNA-dependent RNA polymerase (RdRP) genes are indicated and shaded light grey.
Figure 3—figure supplement 2.
Figure 3—figure supplement 2.. Coverage of reference sequence by probe captured libraries for specimens from phylogenetic group W-Beta-2.
Coverage of reference sequence was determined by mapping reads and aligning contigs from probe captured libraries. Dark grey profiles show depth of read coverage along reference sequence. Blue shading indicates spans where contigs aligned. The locations of spike and RNA-dependent RNA polymerase (RdRP) genes are indicated and shaded light grey.
Figure 3—figure supplement 3.
Figure 3—figure supplement 3.. Coverage of reference sequence by probe captured libraries for specimens from phylogenetic group W-Beta-3.
Coverage of reference sequence was determined by mapping reads and aligning contigs from probe captured libraries. Dark grey profiles show depth of read coverage along reference sequence. Blue shading indicates spans where contigs aligned. The locations of spike and RNA-dependent RNA polymerase (RdRP) genes are indicated and shaded light grey.
Figure 3—figure supplement 4.
Figure 3—figure supplement 4.. Coverage of reference sequence by probe captured libraries for specimens from phylogenetic group W-Beta-4.
Coverage of reference sequence was determined by mapping reads and aligning contigs from probe captured libraries. Dark grey profiles show depth of read coverage along reference sequence. Blue shading indicates spans where contigs aligned. The locations of spike gene are indicated and shaded light grey. Ambiguous bases (Ns) are shaded orange.
Figure 4.
Figure 4.. Probe captured libraries provided more extensive coverage of reference genomes than standard amplicon sequencing protocols for most specimens.
Reference sequences were selected for the previously identified phylogenetic groups to which these specimens had been assigned by Kumakamba et al., 2021. (A) Coverage of these reference sequences was determined by mapping reads and aligning contigs from probe captured libraries. Each library is represented as a dot, and dots are coloured according to whether reference sequence coverage exceeded the length of the partial RNA-dependent RNA polymerase (RdRP) gene sequence that had been previously generated by amplicon sequencing. (B) The number of reference sequence positions covered by probe captured libraries was divided by the length of the partial RdRP amplicon sequences from these specimens. This provided the fold-difference in recovery between probe capture and standard amplicon sequencing methods. (C) Percent coverage of the spike and RdRP genes were calculated for each specimen.
Figure 5.
Figure 5.. Recovery of coronavirus (CoV) genomic material was limited in vitro by method sensitivity.
(A) Sensitivity was assessed by evaluating recovery of partial RNA-dependent RNA polymerase (RdRp) gene regions that had been previously sequenced in these specimens by amplicon sequencing. Probe coverage of partial RdRp sequences was assessed in silico to exclude insufficient probe design as an alternate explanation for incomplete recovery of these targets. (B) Input RNA concentration, RNA integrity numbers (RINs), and CoV genome abundance were measured for each specimen. The impact of these specimen characteristics on recovery by probe capture (as measured by reference sequence coverage) was assessed using Spearman’s rank correlation (test results stated in plots). An outlier was omitted from this analysis: RNA concentration for specimen CDAB0160R was recorded as 190 ng/μl, a value 4.7 SDs from the mean of the distribution.
Figure 6.
Figure 6.. In silico assessment of probe panel coverage for reference genomes.
Reference sequences were chosen for each previously identified phylogenetic group (indicated in panel titles). Blue profiles show the number of probes covering each nucleotide position along the reference sequence. Probe coverage, that is, the percentage of nucleotide positions covered by at least one probe, is stated in panel titles. Ambiguity nucleotides (Ns) are shaded in orange, and these positions were excluded from the probe coverage calculations. The locations of spike and RNA-dependent RNA polymerase (RdRP) genes are indicated in each reference sequence (where available) and shaded grey.
Figure 7.
Figure 7.. Coronavirus (CoV) genomic material was low abundance in swab specimens but effectively enriched by probe capture.
(A) Reads from uncaptured, deep metagenomic sequenced libraries were mapped to complete genomes recovered from these specimens to assess abundance of CoV genomic material. On-target rate was calculated as the percentage of total reads mapping that mapped to the CoV genome sequence. (B) Reads from probe captured libraries were also mapped to assess enrichment and removal of background material. Most libraries used for probe capture (-PRE and -TRI) had insufficient volume remaining for deep metagenomic sequencing, so new libraries were prepared (-DEEP) from the same specimens.
Figure 8.
Figure 8.. Phylogenetic tree of translated spike gene sequences from alphacoronaviruses.
Spike sequences are coloured according to whether they were from study specimens (blue), human CoVs (red), RefSeq (black), or GenBank (grey). Only the 25 closest-matching spike sequences from GenBank were included, as determined by blastp bitscores. GenBank and RefSeq accession numbers are provided in parentheses. The scale bar measures amino acid substitutions per site.
Figure 9.
Figure 9.. Phylogenetic tree of translated spike gene sequences from betacoronaviruses.
Spike sequences are coloured according to whether they were from study specimens (blue), human coronaviruses (CoVs) (red), RefSeq (black), or GenBank (grey). Only the 25 closest-matching spike sequences from GenBank were included, as determined by blastp bitscores. GenBank and RefSeq accession numbers are provided in parentheses. The scale bar measures amino acid substitutions per site.

References

    1. Alkhovsky S, Lenshin S, Romashin A, Vishnevskaya T, Vyshemirsky O, Bulycheva Y, Lvov D, Gitelman A. Sars-Like coronaviruses in horseshoe bats (Rhinolophus spp.) in Russia, 2020. Viruses. 2022;14:113. doi: 10.3390/v14010113. - DOI - PMC - PubMed
    1. Anthony SJ, Johnson CK, Greig DJ, Kramer S, Che X, Wells H, Hicks AL, Joly DO, Wolfe ND, Daszak P, Karesh W, Lipkin WI, Morse SS, PREDICT Consortium. Mazet JAK, Goldstein T. Global patterns in coronavirus diversity. Virus Evolution. 2017;3:vex012. doi: 10.1093/ve/vex012. - DOI - PMC - PubMed
    1. Bonsall D, Ansari MA, Ip C, Trebes A, Brown A, Klenerman P, Buck D, Piazza P, Barnes E, Bowden R, STOP-HCV Consortium Ve-SEQ: robust, unbiased enrichment for streamlined detection and whole-genome sequencing of HCV and other highly diverse pathogens. F1000Research. 2015;4:1062. doi: 10.12688/f1000research.7111.1. - DOI - PMC - PubMed
    1. Briese T, Kapoor A, Mishra N, Jain K, Kumar A, Jabado OJ, Lipkin WI. Virome capture sequencing enables sensitive viral diagnosis and comprehensive virome analysis. MBio. 2015;6:e01491-15. doi: 10.1128/mBio.01491-15. - DOI - PMC - PubMed
    1. Brown JR, Roy S, Ruis C, Yara Romero E, Shah D, Williams R, Breuer J. Norovirus whole-genome sequencing by sureselect target enrichment: a robust and sensitive method. Journal of Clinical Microbiology. 2016;54:2530–2537. doi: 10.1128/JCM.01052-16. - DOI - PMC - PubMed

Publication types