Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Apr 7;356(6333):92-95.
doi: 10.1126/science.aal3327. Epub 2017 Mar 23.

De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds

Affiliations

De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds

Olga Dudchenko et al. Science. .

Abstract

The Zika outbreak, spread by the Aedes aegypti mosquito, highlights the need to create high-quality assemblies of large genomes in a rapid and cost-effective way. Here we combine Hi-C data with existing draft assemblies to generate chromosome-length scaffolds. We validate this method by assembling a human genome, de novo, from short reads alone (67× coverage). We then combine our method with draft sequences to create genome assemblies of the mosquito disease vectors Aeaegypti and Culex quinquefasciatus, each consisting of three scaffolds corresponding to the three chromosomes in each species. These assemblies indicate that almost all genomic rearrangements among these species occur within, rather than between, chromosome arms. The genome assembly procedure we describe is fast, inexpensive, and accurate, and can be applied to many species.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1. Starting with a draft assembly, we used Hi-C data to correct mis-joins, scaffold, and merge overlaps, thereby generating an assembly of the Ae. aegypti mosquito genome with chromosome-length scaffolds
Here we show contact matrices generated by aligning a Hi-C data set to both the AaegL2 assembly (18) that we used as input (left) and the final AaegL4 assembly generated by our algorithm (right). Pixel intensity in the contact matrix indicates how often a pair of loci colocate in the nucleus. The loci corresponding to each row and column are illustrated using chromograms. The chromograms on the left depict the three linkage groups [Lnk1, Lnk2, Lnk3, or unassigned (U)] reported in AaegL2; the chromograms on the right depict the three chromosome-length scaffolds in AaegL4 (chr1, chr2, and chr3). To create the chromogram, we assigned each AaegL4 arm a linear color gradient, thereby specifying a color for each AaegL4 locus. The same colors are then used for the corresponding loci in AaegL2 (left) and in the illustration of our procedure (center, though with increased contrast). Chromogram discontinuities indicate differences with AaegL4. In the center, we illustrate our assembly algorithm using an input scaffold from Lnk1 of AaegL2 (“supercontig 1.12,” see bracket). First, the scaffold is examined for misjoins and split such that the resulting segments each exhibit a continuous Hi-C signal (center, top row). Next, the segments are used as input for iterative scaffolding. Ultimately, only one of the segments is assigned to chromosome 1 of AaegL4. The rest of supercontig 1.12 is assigned to 2q, in the vicinity of several scaffolds that were not anchored in AaegL2 (center, middle row). Finally, segments exhibiting a similar 3D signal are examined for evidence of overlapping sequence (green rectangle) and merged (center, bottom row). The final contact map is consistent with the Rabl configuration, i.e., the spatial clustering of centromeres and telomeres.
Fig. 2
Fig. 2. Comparison of AaegL4 and CpipJ3 with genetic maps
(A) We compared AaegL4 with a genetic map of Ae. aegypti (19). Our assembly agreed with the genetic map on 1822 out of 1826 markers. The exceptions are due to misjoins in AaegL2 that were not corrected in AaegL4. (B) Similarly, CpipJ3 is in agreement with a genetic map of Cx. quinquefasciatus (21).
Fig. 3
Fig. 3. The content of chromosome arms is strongly conserved across mosquitoes
Here each 100-kb locus in Ae. aegypti is assigned a color. For the other species, each 100-kb locus is assigned a combination of the colors of the corresponding DNA sequences in Ae. aegypti, weighted by length. MYA, million years ago.

References

    1. Harmon A. Team of Rival Scientists Comes Together to Fight Zika. New York Times. 2016 Mar 30; www.nytimes.com/2016/03/31/us/mapping-a-genetic-strategy-to-fight-the-zi....
    1. Gnerre S, et al. Proc Natl Acad Sci USA. 2011;108:1513–1518. - PMC - PubMed
    1. Williams LJS, et al. Genome Res. 2012;22:2241–2249. - PMC - PubMed
    1. Lieberman-Aiden E, et al. Science. 2009;326:289–293. - PubMed
    1. Rao SSP, et al. Cell. 2014;159:1665–1680. - PMC - PubMed

Publication types