Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul;19(4):1015-1026.
doi: 10.1111/1755-0998.13020. Epub 2019 May 17.

Improving Illumina assemblies with Hi-C and long reads: An example with the North African dromedary

Affiliations

Improving Illumina assemblies with Hi-C and long reads: An example with the North African dromedary

Jean P Elbers et al. Mol Ecol Resour. 2019 Jul.

Abstract

Researchers have assembled thousands of eukaryotic genomes using Illumina reads, but traditional mate-pair libraries cannot span all repetitive elements, resulting in highly fragmented assemblies. However, both chromosome conformation capture techniques, such as Hi-C and Dovetail Genomics Chicago libraries and long-read sequencing, such as Pacific Biosciences and Oxford Nanopore, help span and resolve repetitive regions and therefore improve genome assemblies. One important livestock species of arid regions that does not have a high-quality contiguous reference genome is the dromedary (Camelus dromedarius). Draft genomes exist but are highly fragmented, and a high-quality reference genome is needed to understand adaptation to desert environments and artificial selection during domestication. Dromedaries are among the last livestock species to have been domesticated, and together with wild and domestic Bactrian camels, they are the only representatives of the Camelini tribe, which highlights their evolutionary significance. Here we describe our efforts to improve the North African dromedary genome. We used Chicago and Hi-C sequencing libraries from Dovetail Genomics to resolve the order of previously assembled contigs, producing almost chromosome-level scaffolds. Remaining gaps were filled with Pacific Biosciences long reads, and then scaffolds were comparatively mapped to chromosomes. Long reads added 99.32 Mbp to the total length of the new assembly. Dovetail Chicago and Hi-C libraries increased the longest scaffold over 12-fold, from 9.71 Mbp to 124.99 Mbp and the scaffold N50 over 50-fold, from 1.48 Mbp to 75.02 Mbp. We demonstrate that Illumina de novo assemblies can be substantially upgraded by combining chromosome conformation capture and long-read sequencing.

Keywords: chromosome conformation capture; chromosome mapping; dromedary; genome annotation; genome assembly; scaffolding.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Dovetail Genomics’ Hi‐C linkage density plot for Hi‐C reads mapped to the Hi‐C assembly. X‐ and Y‐axes give the cumulative mapping positions of the first and second read in a read pair respectively, grouped into bins. The colour of each square gives the number of reads pairs within that bin. Grey vertical and white horizontal lines separate borders between scaffolds. Only scaffolds >1 Mbp are shown [Colour figure can be viewed at wileyonlinelibrary.com]
Figure 2
Figure 2
Cumulative assembly length for scaffolds of the original North African dromedary assembly (CamDro1; Fitak et al., 2016; GenBank accession: GCA_000803125.1); the North African dromedary assembly after improvement (CamDro2); and for the Arabian dromedary assembly (Wu et al., 2014; GCA_000767585.1). Circles and triangles indicate L50 and L90 values, respectively. L50/L90 are the smallest number of scaffolds that make up at least 50/90% of the total assembly length [Colour figure can be viewed at wileyonlinelibrary.com]
Figure 3
Figure 3
D‐GENIES (Cabanettes & Klopp, 2018) dot plot made with Minimap2 (Li, 2018) whole‐genome alignment between CamDro1 and CamDro2 assemblies. Contigs are sorted and matches are filtered out by size using ≤0.001% dot plot width and identity ≤0.75 [Colour figure can be viewed at wileyonlinelibrary.com]
Figure 4
Figure 4
Frequency polygons of query sequence length (predicted proteins) divided by subject (UniProt/TrEMBL) sequence length for diamond (Buchfink et al., 2015) mapped maker (Holt & Yandell, 2011) predicted proteins against UniProt/TrEMBL release 2018_04 database for: (red line) the original North African dromedary genome (CamDro1; Fitak et al., 2016 predicted protein sequences; GenBank accession: GCA_000803125.1); (green line) the North African dromedary genome after adding ~11× PacBio sequencing reads (CamDro2) for MAKER run 1; and (blue line) MAKER run 2. Values near 1.0 are desired, indicating untruncated proteins due to lack of indels from PacBio reads [Colour figure can be viewed at wileyonlinelibrary.com]
Figure 5
Figure 5
Cumulative proportion of transcripts with specific or lower annotation edit distance (AED) for each MAKER run. MAKER run 1 (solid line) had AED ≤0.50 for 78.4% transcripts, whilst MAKER run 2 (dashed line) had 39.2% transcripts with AED ≤0.50. Grey vertical line indicates AED = 0.50. Note that having a larger proportion of lower AED values indicates a genome annotation that is more congruent with the evidence used during the annotation process

References

    1. Abdussamad, A. M. , Charruau, R. , Kalla, D. J. U. , & Burger, P. A. (2015). Validating local knowledge on camels: Colour phenotypes and genetic variation of dromedaries in the Nigeria‐Niger corridor. Livestock Science, 181, 131–136.
    1. Alim, F. Z. D. , Romanova, E. V. , Tay, Y.‐L. , Rahman, A. Y. B. A. , Chan, K.‐G. , Hong, K.‐W. , … Hindmarch, C. C. T. (2019). Seasonal adaptations of the hypothalamo‐neurohypophyseal system of the dromedary camel. PloS One. - PMC - PubMed
    1. Almathen, F. , Charruau, P. , Mohandesan, E. , Mwacharo, J. M. , Orozco‐ter Wengel, P. , Pitt, D. , … Burger, P. A. (2016). Ancient and modern DNA reveal dynamics of domestication and cross‐continental dispersal of the dromedary. Proceedings of the National Academy of Sciences of the United States of America, 113, 6707–6712. - PMC - PubMed
    1. Altschul, S. , Gish, W. , Miller, W. , Meyers,E. W. , & Lipman, D. J. (1990). Basic local alignment search tool. Journal of Molecular Biology, 215, 403–410. - PubMed
    1. Avila, F. , Baily, M. P. , Perelman, P. , Das, P. J. , Pontius, J. , Chowdhary, R. , … Raudsepp, T. (2014). A comprehensive whole‐genome integrated cytogenetic map for the alpaca (Lama pacos). Cytogenetic and Genome Research, 144, 196–207. - PubMed

Associated data