Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May 30;12(6):847.
doi: 10.3390/genes12060847.

Dog10K_Boxer_Tasha_1.0: A Long-Read Assembly of the Dog Reference Genome

Affiliations

Dog10K_Boxer_Tasha_1.0: A Long-Read Assembly of the Dog Reference Genome

Vidhya Jagannathan et al. Genes (Basel). .

Abstract

The domestic dog has evolved to be an important biomedical model for studies regarding the genetic basis of disease, morphology and behavior. Genetic studies in the dog have relied on a draft reference genome of a purebred female boxer dog named "Tasha" initially published in 2005. Derived from a Sanger whole genome shotgun sequencing approach coupled with limited clone-based sequencing, the initial assembly and subsequent updates have served as the predominant resource for canine genetics for 15 years. While the initial assembly produced a good-quality draft, as with all assemblies produced at the time, it contained gaps, assembly errors and missing sequences, particularly in GC-rich regions, which are found at many promoters and in the first exons of protein-coding genes. Here, we present Dog10K_Boxer_Tasha_1.0, an improved chromosome-level highly contiguous genome assembly of Tasha created with long-read technologies that increases sequence contiguity >100-fold, closes >23,000 gaps of the CanFam3.1 reference assembly and improves gene annotation by identifying >1200 new protein-coding transcripts. The assembly and annotation are available at NCBI under the accession GCF_000002285.5.

Keywords: Canis lupus familiaris; Pacific biosciences; annotation; contiguity; high quality; resource.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Dog_10k_Boxer_Tasha_1.0 assembly. (a) Assembly workflow pipeline. The different algorithms used in the pipeline have been indicated. N50 is the contig/scaffold length in the assembly where equal or longer contigs contain 50% of the genome. L50 count is the number of contigs whose length sum makes N50. (b) Ideogram showing chromosomes, contigs, and gaps. The grey regions indicate contigs of size less than 3 Mb.
Figure 2
Figure 2
Size distribution of insertion–deletion differences identified between the Dog10K_Boxer_Tasha_1.0 and CanFam3.1 assemblies. The sizes of 22,330 sequences present in CanFam3.1 but absent in Dog10K_Boxer_Tasha_1.0 (red, deletions) and of 32,999 sequences present in Dog10K_Boxer_Tasha_1.0 but absent in CanFam3.1 (blue, insertions) are shown. The bins of each histogram are of equal size on a logarithmic scale.
Figure 3
Figure 3
Discovery of deletion variants using PacBio reads. Deletions were identified based on alignment of PacBio reads to the CanFam3.1 (left) or Dog10K_Boxer_Tasha_1.0 (right) assemblies. The bins of each histogram are of equal size on a logarithmic scale.
Figure 4
Figure 4
Structural variation at the amylase locus. A genome browser view illustrating structural variation at the amylase locus in Tasha is shown. The orange bars at the top indicate the locations of tandem duplications identified using the raw PacBio long-read data. This includes a large, 1.9 Mbp duplication (chr6:47977592-49898283) as well as a 14.8 kbp duplication (chr6:49729008-49743863). A read depth profile showing copy number estimated from Illumina sequencing data is depicted as a bar plot across the interval. An elevated copy number of 3, corresponding to the 1.9 Mb duplication, is observed, as well as a spike in copy number overlapping with the AMY2B gene. Mappings of discordant fosmid end sequences are shown in orange below the copy number profile. Each depicted clone has end sequences that align in an everted orientation consistent with the presence of a tandem duplication. The position of gene models derived from the NCBI gene annotation, release 106, are shown at the bottom of the figure. The LOC607460 gene model corresponds to pancreatic α-amylase (AMY2B).

References

    1. Lindblad-Toh K., Wade C.M., Mikkelsen T.S., Karlsson E.K., Jaffe D.B., Kamal M., Clamp M., Chang J.L., Kulbokas E.J., Zody M.C., et al. Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature. 2005;438:803–819. doi: 10.1038/nature04338. - DOI - PubMed
    1. Jagannathan V., Drögemüller C., Leeb T., Aguirre G., André C., Bannasch D., Becker D., Davis B., Ekenstedt K., Faller K., et al. A comprehensive biomedical variant catalogue based on whole genome sequences of 582 dogs and eight wolves. Anim. Genet. 2019;50:695–704. doi: 10.1111/age.12834. - DOI - PMC - PubMed
    1. Plassais J., Kim J., Davis B.W., Karyadi D.M., Hogan A.N., Harris A.C., Decker B., Parker H.G., Ostrander E.A. Whole genome sequencing of canids reveals genomic regions under selection and variants influencing morphology. Nat. Commun. 2019;10 doi: 10.1038/s41467-019-09373-w. - DOI - PMC - PubMed
    1. Xie X., Lu J., Kulbokas E.J., Golub T.R., Mootha V., Lindblad-Toh K., Lander E.S., Kellis M. Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals. Nature. 2005;434:338–345. doi: 10.1038/nature03441. - DOI - PMC - PubMed
    1. Dermitzakis E.T., Kirkness E., Schwarz S., Birney E., Reymond A., Antonarakis S.E. Comparison of human chromosome 21 conserved nongenic sequences (CNGs) with the mouse and dog genomes shows that their selective constraint is independent of their genic environment. Genome Res. 2004;14:852–859. doi: 10.1101/gr.1934904. - DOI - PMC - PubMed

Publication types