Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 11;11(1):762.
doi: 10.1038/s41597-024-03581-w.

A near complete genome assembly of the East Friesian sheep genome

Affiliations

A near complete genome assembly of the East Friesian sheep genome

Xiaoxiao You et al. Sci Data. .

Erratum in

Abstract

Advancements in sequencing have enabled the assembly of numerous sheep genomes, significantly advancing our understanding of the link between genetic variation and phenotypic traits. However, the genome of East Friesian sheep (Ostfriesisches Milchschaf), a key high-yield milk breed, remains to be fully assembled. Here, we constructed a near-complete and gap-free East Friesian genome assembly using PacBio HiFi, ultra-long ONT and Hi-C sequencing. The resulting genome assembly spans approximately 2.96 Gb, with a contig N50 length of 104.1 Mb and only 164 unplaced sequences. Remarkably, our assembly has captured 41 telomeres and 24 centromeres. The assembled sequence is of high quality on completeness (BUSCO score: 97.1%) and correctness (QV: 69.1). In addition, a total of 24,580 protein-coding genes were predicted, of which 97.2% (23,891) carried at least one conserved functional domain. Collectively, this assembly provides not only a near T2T gap-free genome, but also provides a valuable genetic resource for comparative genome studies of sheep and will serve as an important tool for the sheep research community.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Circos plot of the EFS v2.0 genome. From inside to outside, I: GC content in nonoverlapping 1 Mb windows (histograms); II: percent coverage of repetitive sequences in nonoverlapping 1 Mb windows (heat maps); III: gene density calculated based on the number of genes in nonoverlapping 1 Mb windows (heat maps); IV: 27 super-scaffolds. Lengths are shown in Mb.
Fig. 2
Fig. 2
Overview of the near T2T and gap-free EFS v2.0 reference genome. The box represents the 35 closed gaps identified from GCA_018804185.1. The triangle represents the telomere region, and the circle represents the centromere region.
Fig. 3
Fig. 3
Quality assessment of the protein-coding genes in the EFS v2.0 assembly. (a) Comparison of exon length among four sheep gene sets. Window refers to the length of every point. (b) Comparison of exon number among four sheep gene sets. No obvious unexpected differences exist among these four organisms, indicating the high quality of gene structure annotation. (c) BUSCO assessment results of protein-coding genes in the EFS v2.0 assembly. (d) Gene function annotation results in a statistics Venn diagram using five public databases: NR, InterPro, KEGG, SwissProt and KOG.
Fig. 4
Fig. 4
Heatmap representation of new assembled genes. Rows represent new assembled genes, and columns represent 5 different samples. The bar in the upper right corner represents log 2 transformed TPM values. Blue and red boxes represent genes showing lower and higher expression levels, respectively. “Hea_t” represents heart, “Rum_n” represents rumen, “Sub_t” represents subcutaneous fat, “Lun_g” represents lung, and “Per_t” represents perirenal fat.
Fig. 5
Fig. 5
Using IGV to demonstrate the coverage of ONT and PacBio reads in the gap 1 region. The IGV images for Gap 1 through Gap 8 are available through the Figshare database.
Fig. 6
Fig. 6
The accuracy and completeness of the EFS v2.0 genome assembly. Whole-genome Hi-C heatmap of EFS v2.0 within and between 27 chromosomes.
Fig. 7
Fig. 7
The identification of syntenic regions for EFS v2.0, Rambouillet sheep and Tibetan sheep was based on conducting homology searches using MCScan (Python version), with a minimum requirement of 30 genes per block. Macrosynteny connecting blocks of >30 one-to-one gene pairs are shown.
Fig. 8
Fig. 8
BUSCO plot of the several sheep genomes. C: Complete BUSCOs; S: Complete and single-copy BUSCOs; D: Complete and duplicated BUSCOs; F: Fragmented BUSCOs; M: Missing BUSCOs; n: Total BUSCO groups searched. East Friesian sheepa: GCA_018804185.1; East Friesian sheepb: EFS v2.0.

References

    1. Mohamadipoor Saadatabadi, L. et al. Signature selection analysis reveals candidate genes associated with production traits in Iranian sheep breeds. BMC Veterinary Research17 (2021). - PMC - PubMed
    1. Li, X. et al. Whole genome re-sequencing reveals artificial and natural selection for milk traits in East Friesian sheep. Frontiers in veterinary science9, 1034211 (2022). 10.3389/fvets.2022.1034211 - DOI - PMC - PubMed
    1. Kominakis, A., Hager-Theodorides, A. L., Saridaki, A., Antonakos, G. & Tsiamis, G. Genome-wide population structure and evolutionary history of the Frizarta dairy sheep. Animal: an international journal of animal bioscience11, 1680–1688 (2017). 10.1017/S1751731117000428 - DOI - PubMed
    1. Nguyen, Q. V. et al. Supplementing grazing dairy ewes with plant-derived oil and rumen-protected EPA+DHA pellets enhances health-beneficial n-3 long-chain polyunsaturated fatty acids in sheep milk. European Journal of Lipid Science and Technology120, 1700256 (2018). 10.1002/ejlt.201700256 - DOI
    1. Afolayan, R. A. et al. Genetic evaluation of crossbred lamb production. 3. Growth and carcass performance of second-cross lambs. Australian Journal of Agricultural Research58, 5 (2007). 10.1071/AR06310 - DOI

LinkOut - more resources