Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Nov 16:1:197.
doi: 10.1038/s42003-018-0199-z. eCollection 2018.

Improved reference genome for the domestic horse increases assembly contiguity and composition

Affiliations

Improved reference genome for the domestic horse increases assembly contiguity and composition

Theodore S Kalbfleisch et al. Commun Biol. .

Erratum in

Abstract

Recent advances in genomic sequencing technology and computational assembly methods have allowed scientists to improve reference genome assemblies in terms of contiguity and composition. EquCab2, a reference genome for the domestic horse, was released in 2007. Although of equal or better quality compared to other first-generation Sanger assemblies, it had many of the shortcomings common to them. In 2014, the equine genomics research community began a project to improve the reference sequence for the horse, building upon the solid foundation of EquCab2 and incorporating new short-read data, long-read data, and proximity ligation data. Here, we present EquCab3. The count of non-N bases in the incorporated chromosomes is improved from 2.33 Gb in EquCab2 to 2.41 Gb in EquCab3. Contiguity has also been improved nearly 40-fold with a contig N50 of 4.5 Mb and scaffold contiguity enhanced to where all but one of the 32 chromosomes is comprised of a single scaffold.

PubMed Disclaimer

Conflict of interest statement

I.T.F. is an employee of 10× Genomics, Inc. R.E.G. is a co-founder and scientific adviser of Dovetail Genomics, LLC. The other authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Percentages of RNA-seq reads from eight tissues from two horses (designated 683610 and 686521) and genomic reads mapping to EquCab2 vs. EquCab3. We used sequence data from the Functional Annotation of Animal Genomes (FAANG) project for this mapping. More RNA-seq reads map to EquCab3 than to EquCab2 for every tissue in both horses. The percentage of genomic reads (last two rows; WGS) mapping to EquCab3 is also larger than those mapping to EquCab2, but the difference is not as large
Fig. 2
Fig. 2
Number of reads from the Functional Annotation of Animal Genomes (FAANG) project WGS dataset mapping to EquCab2 and EquCab3. Significantly more reads map only to EquCab3 than only to EquCab2 (one-tailed two-sample binomial test p < 2.2 × 10–16)
Fig. 3
Fig. 3
A comparison of equine chromosome 31 between EquCab2 and EquCab3. a Average coverage per 10 kb window across chr31 in EquCab2 and EquCab3, with a large inversion between them highlighted. EquCab3 has fewer coverage drops and more total sequence than EquCab2. b An alignment of chr31 in EquCab2 and EquCab3 shows a large inversion between the two reference genomes. The radiation hybrid (RH) map (c) and Hi-C contact heat maps for EquCab2 (d) and EquCab3 (e) indicate that this discrepancy is the result of a misassembly in EquCab2
Fig. 4
Fig. 4
Annotation of EquCab2 and EquCab3 with the Comparative Annotation Toolkit shows substantial improvement in EquCab3. a More genes found in related species were annotated in EquCab3 than in EquCab2. b Fewer genes were split between contigs in EquCab3 than in EquCab2. c The gene coverage distribution is better in EquCab3 than in EquCab2

References

    1. Outram AK, et al. The earliest horse harnessing and milking. Science. 2009;323:1332–1335. doi: 10.1126/science.1168594. - DOI - PubMed
    1. Wade CM, et al. Genome sequence, comparative analysis, and population genetics of the domestic horse. Science. 2009;326:865–867. doi: 10.1126/science.1178158. - DOI - PMC - PubMed
    1. Coleman SJ, et al. Structural annotation of equine protein-coding genes determined by mRNA sequencing. Anim. Genet. 2010;41(Suppl. 2):121–130. doi: 10.1111/j.1365-2052.2010.02118.x. - DOI - PubMed
    1. Vanderman KS, et al. Brother of CDO (BOC) expression in equine articular cartilage. Osteoarthr. Cartil. 2011;19:435–438. doi: 10.1016/j.joca.2011.01.011. - DOI - PubMed
    1. Schaefer RJ, et al. Developing a 670k genotyping array to tag ~2M SNPs across 24 horse breeds. BMC Genom. 2017;18:565. doi: 10.1186/s12864-017-3943-8. - DOI - PMC - PubMed