Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 18;22(7):e3002697.
doi: 10.1371/journal.pbio.3002697. eCollection 2024 Jul.

Single-fly genome assemblies fill major phylogenomic gaps across the Drosophilidae Tree of Life

Affiliations

Single-fly genome assemblies fill major phylogenomic gaps across the Drosophilidae Tree of Life

Bernard Y Kim et al. PLoS Biol. .

Abstract

Long-read sequencing is driving rapid progress in genome assembly across all major groups of life, including species of the family Drosophilidae, a longtime model system for genetics, genomics, and evolution. We previously developed a cost-effective hybrid Oxford Nanopore (ONT) long-read and Illumina short-read sequencing approach and used it to assemble 101 drosophilid genomes from laboratory cultures, greatly increasing the number of genome assemblies for this taxonomic group. The next major challenge is to address the laboratory culture bias in taxon sampling by sequencing genomes of species that cannot easily be reared in the lab. Here, we build upon our previous methods to perform amplification-free ONT sequencing of single wild flies obtained either directly from the field or from ethanol-preserved specimens in museum collections, greatly improving the representation of lesser studied drosophilid taxa in whole-genome data. Using Illumina Novaseq X Plus and ONT P2 sequencers with R10.4.1 chemistry, we set a new benchmark for inexpensive hybrid genome assembly at US $150 per genome while assembling genomes from as little as 35 ng of genomic DNA from a single fly. We present 183 new genome assemblies for 179 species as a resource for drosophilid systematics, phylogenetics, and comparative genomics. Of these genomes, 62 are from pooled lab strains and 121 from single adult flies. Despite the sample limitations of working with small insects, most single-fly diploid assemblies are comparable in contiguity (>1 Mb contig N50), completeness (>98% complete dipteran BUSCOs), and accuracy (>QV40 genome-wide with ONT R10.4.1) to assemblies from inbred lines. We present a well-resolved multi-locus phylogeny for 360 drosophilid and 4 outgroup species encompassing all publicly available (as of August 2023) genomes for this group. Finally, we present a Progressive Cactus whole-genome, reference-free alignment built from a subset of 298 suitably high-quality drosophilid genomes. The new assemblies and alignment, along with updated laboratory protocols and computational pipelines, are released as an open resource and as a tool for studying evolution at the scale of an entire insect family.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Cladogram of drosophilid species with whole-genome data, with some groups collapsed (gray triangles).
Species relationships were inferred from 1,000 orthologs (see Methods). Node values are the local posterior probabilities reported by ASTRAL-MP [26]. Counts of described species for each group were obtained from the TaxoDros database [22]. Values in the colored boxes indicate, as of August 2023, the number of species with whole-genome sequences for each taxon. The count of short-read and long-read datasets, and data available before this study (including [8]) and new genomes presented here are shown separately. *Note that Scaptodrosophila, Hirtodrosophila, Zaprionus, immigrans, and histrio groups are potentially rendered polyphyletic by these samples. The positions of nigrosparsa and pinicola groups are currently considered to be uncertain. The data underlying this figure can be found in https://doi.org/10.5281/zenodo.11200891.
Fig 2
Fig 2. High genome-wide consensus accuracy with Nanopore R10.4.1 sequencing.
The Phred-scaled consensus accuracy (left axis) and per-base consensus error rate (rate) are shown for genomes built with 20× to 60× coverage of ONT reads. Dashed gray lines show consensus accuracy estimates for R9.4.1 + Illumina [8] and the dm6 reference genome [20]. The data underlying this figure can be found in https://doi.org/10.5281/zenodo.11200891.
Fig 3
Fig 3. Consensus accuracy for single-fly genomes is greatly improved by diploid assembly.
Phred-scaled consensus quality (QV) is shown for a subset of 25 R10.4.1 and Illumina hybrid single-fly genomes assembled with haploid (left) and diploid (right) pipelines. The data underlying this figure can be found in https://doi.org/10.5281/zenodo.11200891.
Fig 4
Fig 4. The distribution of genome quality metrics for 168 new long-read assemblies.
Distributions of genome N50, the percentage of complete dipteran BUSCOs [35], and Phred-scaled QV are plotted separately for R10.4.1 and R9.4.1 assemblies from lab strains and from single flies. The black dashed line is the value computed for the D. melanogaster dm6 reference genome. The 16 samples that were only sequenced with Illumina are omitted from these plots. Data underlying this figure and additional sample information is provided in S4 Table.
Fig 5
Fig 5. Proportion of D. melanogaster genomic elements aligning to other species as a function of 4-fold divergence from D. melanogaster.
Each dot represents 1 species. Alignment coverage is defined as the proportion of genomic elements in D. melanogaster that uniquely map to another species. The data underlying this figure can be found in https://doi.org/10.5281/zenodo.11200891.

Update of

References

    1. Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, et al.. The Genome Sequence of Drosophila melanogaster. Science. 2000;287:2185–2195. doi: 10.1126/science.287.5461.2185 - DOI - PubMed
    1. Richards S, Liu Y, Bettencourt BR, Hradecky P, Letovsky S, Nielsen R, et al.. Comparative genome sequencing of Drosophila pseudoobscura: Chromosomal, gene, and cis-element evolution. Genome Res. 2005;15:1–18. doi: 10.1101/gr.3059305 - DOI - PMC - PubMed
    1. Clark AG, Eisen MB, Smith DR, Bergman CM, Oliver B, Markow TA, et al.. Evolution of genes and genomes on the Drosophila phylogeny. Nature. 2007;450:203–218. doi: 10.1038/nature06341 - DOI - PubMed
    1. modENCODE Consortium T, Roy S, Ernst J, Kharchenko PV, Kheradpour P, Negre N, et al.. Identification of Functional Elements and Regulatory Circuits by Drosophila modENCODE. Science. 2010;330:1787–1797. doi: 10.1126/science.1198374 - DOI - PMC - PubMed
    1. Mackay TFC, Richards S, Stone EA, Barbadilla A, Ayroles JF, Zhu D, et al.. The Drosophila melanogaster Genetic Reference Panel. Nature. 2012;482:173–178. doi: 10.1038/nature10811 - DOI - PMC - PubMed