Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Nov 5;10(11):4115-4128.
doi: 10.1534/g3.120.401485.

Chromonomer: A Tool Set for Repairing and Enhancing Assembled Genomes Through Integration of Genetic Maps and Conserved Synteny

Affiliations

Chromonomer: A Tool Set for Repairing and Enhancing Assembled Genomes Through Integration of Genetic Maps and Conserved Synteny

Julian Catchen et al. G3 (Bethesda). .

Abstract

The pace of the sequencing and computational assembly of novel reference genomes is accelerating. Though DNA sequencing technologies and assembly software tools continue to improve, biological features of genomes such as repetitive sequence as well as molecular artifacts that often accompany sequencing library preparation can lead to fragmented or chimeric assemblies. If left uncorrected, defects like these trammel progress on understanding genome structure and function, or worse, positively mislead this research. Fortunately, integration of additional, independent streams of information, such as a marker-dense genetic map and conserved orthologous gene order from related taxa, can be used to scaffold together unlinked, disordered fragments and to restructure a reference genome where it is incorrectly joined. We present a tool set for automating these processes, one that additionally tracks any changes to the assembly and to the genetic map, and which allows the user to scrutinize these changes with the help of web-based, graphical visualizations. Chromonomer takes a user-defined reference genome, a map of genetic markers, and, optionally, conserved synteny information to construct an improved reference genome of chromosome models: a "chromonome". We demonstrate Chromonomer's performance on genome assemblies and genetic maps that have disparate characteristics and levels of quality.

Keywords: Genome assembly; RADseq; conserved synteny; genetic map.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The primary Chromonomer algorithm. The algorithm takes a set of scaffolds (seen here as rectangles), a set of markers (typically DNA sequence, i.e., RAD markers; represented here as shapes within the rectangles), an assembly file (AGP file), which describes how contigs and gaps are formed into scaffolds in the assembly, and a genetic map, which provides order to the markers. (A-C) Scaffolds are first evaluated to identify sets of markers mapped to different linkage groups. Those scaffolds will be split at the nearest gap (B) or pruned out (C) if a consistent set of markers cannot be found. (D) Scaffolds are anchored to their positions in the genetic map; if a scaffold appears in two locations in the genetic map, it is anchored twice. (E) A consistent ordering of markers is determined, with inconsistent markers discarded. (F) Scaffolds are oriented or split at the nearest gap, as dictated by the genetic map.
Figure 2
Figure 2
Using depth of coverage to create virtual gaps, and rescaffolding the assembly. Assemblies constructed using long reads often consist purely of contigs, with no gaps. In these cases, we can input into Chromonomer depth of coverage data, generated by aligning raw reads back to the assembly, and we can identify anomalous values in depth of coverage to direct where to create virtual breakpoints in the assembly. Here, in (A) the markers clearly show a misassembly in the center region of the contig (red markers). With no gaps, the normal algorithm to split the contig will fail. (B) The scaffolding algorithm instead assumes the genetic map is the correct source of information and identifies where the contig should be broken, according to a consistent set of reordered map markers. Depth of coverage information (C) is incorporated to identify logical break points and (D) the contig is split into the respective pieces.
Figure 3
Figure 3
The Chromonomer algorithm as employed in the Gulf pipefish (Sygnathus scovelli) assembly. The figure shows all the numbered scaffolds belonging to LG5 before (A) and after (B) processing. In the diagrams, each marker in the linkage group (left) is connected by a line to its alignment position on each scaffold (right). In red in (A), scaffold_8 demonstrates markers with conflicting physical and map orders. In (B), the order of markers has been resolved and some conflicting markers discarded. Scaffold_76 (green) and scaffold_12 (blue), which are each anchored in two map positions, demonstrate examples of scaffolds that need to be split so a third scaffold can be inserted into the rift.
Figure 4
Figure 4
Including conserved gene synteny into the Chromonomer algorithm, as employed in the Gulf pipefish (Sygnathus scovelli) assembly. The figure shows LG14 before (A) and after (B) processing. In this example we have incorporated conserved gene synteny from the close relative Sygnathus acus to order and orient scaffolds whose position and orientation are left ambiguous by the genetic map. Colored scaffolds indicate where synteny was employed.
Figure 5
Figure 5
Ortholog-directed scaffold rearrangements in the Gulf pipefish (Sygnathus scovelli). Potential improvements in LG14 integrated assembly by incorporation of gene synteny between S. scovelli and S. acus. Colored scaffolds indicate where synteny was employed, and colors are consistent with Figure 4. In each panel, the S. scovelli genetic map is shown above, linking the scaffolds of the physical assembly together. Lines also connect each pair of gene orthologs between S. scovelli and S. acus.
Figure 6
Figure 6
Using virtual gaps and the rescaffold algorithm in platyfish (Xiphophorus maculatus). (A) The platyfish assembly shows a clear misassembly (an inverted segment between ∼35-53cM) when compared against the genetic map. (B) A consistent order of markers is found on the map, and depth of coverage is employed to split the CM008951.1 contig into 2 components that can then be independently reoriented.
Figure 7
Figure 7
Improvements in the platyfish (Xiphophorus maculatus) chromosome-level assembly. Conserved gene synteny between platyfish (Xma) and medaka (Oryzias latipes, Ola) illustrates improvements in the LG14 integrated assembly by application of the rescaffold algorithm. The top panel (A) shows synteny prior to correction; several inversions are present, including one associated with the platyfish assembly (orange, colored to match the scaffolds in Fig. 6). After correction (B), inversions and ordering are rectified.
Figure 8
Figure 8
The rockcod (Notothenia coriiceps) assembly. All of the large scaffolds in the rockcod assembly appear to be large inter- and intra-chromosomal chimeras. When we examine LG1 in rockcod (A) we can see that orthologous rockcod genes are found scattered across the genome of blackfin icefish, a related taxon. The largest rockcod scaffold, KL668296.1 is highlighted by the dotted line and we can see that it is composed of sizeable pieces from all over the genome. (B) After processing with Chromonomer, the scaffold is broken up and redistributed in the assembly. We can now clearly see the conserved, two-to-one gene synteny between the icefish and rockcod.
Figure 9
Figure 9
Chromonomer improves the rockcod (Notothenia coriiceps) assembly. The rockcod assembly can be chromonomed using the genetic map. (A) shows genome-wide conserved gene synteny prior to integrating the genome. (B) shows marked improvement genome-wide in the assembly after breaking down the largest scaffolds using the genetic map. However, smaller assembly errors remain.

References

    1. Amores A., Catchen J., Nanda I., Warren W., Walter R. et al. , 2014. A RAD-Tag Genetic Map for the Platyfish (Xiphophorus maculatus) Reveals Mechanisms of Karyotype Evolution Among Teleost Fish. Genetics 197: 625–641. 10.1534/genetics.114.164293 - DOI - PMC - PubMed
    1. Amores A., Wilson C. A., Allard C. A. H., Detrich H. W., and Postlethwait J. H., 2017. Cold Fusion: Massive Karyotype Evolution in the Antarctic Bullhead Notothen Notothenia coriiceps. G3 (Bethesda) 7: 2195–2207. 10.1534/g3.117.040063 - DOI - PMC - PubMed
    1. Andrews K. R., Good J. M., Miller M. R., Luikart G., and Hohenlohe P. A., 2016. Harnessing the power of RADseq for ecological and evolutionary genomics. Nat. Rev. Genet. 17: 81–92. 10.1038/nrg.2015.28 - DOI - PMC - PubMed
    1. Baird N. A., Etter P. D., Atwood T. S., Currey M. C., Shiver A. L. et al. , 2008. Rapid SNP Discovery and Genetic Mapping Using Sequenced RAD Markers. PLoS One 3: e3376 10.1371/journal.pone.0003376 - DOI - PMC - PubMed
    1. Bednar J., and Watt T., 1984. Alpha-trimmed means and their relationship to median filters. IEEE Trans. Acoust. Speech Signal Process. 32: 145–153. 10.1109/TASSP.1984.1164279 - DOI

Publication types

Substances

LinkOut - more resources