Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep 3;56(1):60.
doi: 10.1186/s12711-024-00927-1.

Population structure and breed identification of Chinese indigenous sheep breeds using whole genome SNPs and InDels

Affiliations

Population structure and breed identification of Chinese indigenous sheep breeds using whole genome SNPs and InDels

Chang-Heng Zhao et al. Genet Sel Evol. .

Abstract

Background: Accurate breed identification is essential for the conservation and sustainable use of indigenous farm animal genetic resources. In this study, we evaluated the phylogenetic relationships and genomic breed compositions of 13 sheep breeds using SNP and InDel data from whole genome sequencing. The breeds included 11 Chinese indigenous and 2 foreign commercial breeds. We compared different strategies for breed identification with respect to different marker types, i.e. SNPs, InDels, and a combination of SNPs and InDels (named SIs), different breed-informative marker detection methods, and different machine learning classification methods.

Results: Using WGS-based SNPs and InDels, we revealed the phylogenetic relationships between 11 Chinese indigenous and two foreign sheep breeds and quantified their purities through estimated genomic breed compositions. We found that the optimal strategy for identifying these breeds was the combination of DFI_union for breed-informative marker detection, which integrated the methods of Delta, Pairwise Wright's FST, and Informativeness for Assignment (namely DFI) by merging the breed-informative markers derived from the three methods, and KSR for breed assignment, which integrated the methods of K-Nearest Neighbor, Support Vector Machine, and Random Forest (namely KSR) by intersecting their results. Using SI markers improved the identification accuracy compared to using SNPs or InDels alone. We achieved accuracies over 97.5% when using at least the 1000 most breed-informative (MBI) SI markers and even 100% when using 5000 SI markers.

Conclusions: Our results provide not only an important foundation for conservation of these Chinese local sheep breeds, but also general approaches for breed identification of indigenous farm animal breeds.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Phylogenetic relationships between 13 sheep breeds revealed using 822,488 SNPs. The breed names represented by the codes for different colors are presented in Table 1
Fig. 2
Fig. 2
Genomic breed compositions estimated using supervised admixture analysis. The breed names represented by the codes for different colors are presented in Table 1
Fig. 3
Fig. 3
Accuracies of genomic breed composition (GBC) estimation using different numbers of most breed-information SNPs, InDels, and SIs revealed by DFI_union. a Overall correlations between GBCs using different numbers of most breed-information markers and GBCs using 822,488 SNPs for all of the 347 individuals. b Correlations for the 314 most-likely purebred individuals. c Correlations for the rest 33 individuals
Fig. 4
Fig. 4
Breed identification accuracies using different numbers of most breed-informative SNPs under different scenarios. ac and d refer to machine learning classification methods KNN, RF, SVM and KSR, respectively. KNN K-Nearest Neighbor; RF Random Forest; SVM Support Vector Machine; KSR an integration of KNN, SVM and RF
Fig. 5
Fig. 5
Breed identification accuracies using three different types of most breed-informative markers revealed using DFI_union. ad refer to machine learning classification methods KNN, RF, SVM and KSR, respectively. KNN K-Nearest Neighbor; RF Random Forest; SVM Support Vector Machine; KSR an integration of KNN, SVM and RF

References

    1. Niu LL, Li HB, Ma YH, Du LX. Genetic variability and individual assignment of Chinese indigenous sheep populations (Ovisaries) using microsatellites. Anim Genet. 2012;43:108–11. 10.1111/j.1365-2052.2011.02212.x - DOI - PubMed
    1. Yaro M, Munyard KA, Stear MJ, Groth DM. Molecular identification of livestock breeds: a tool for modern conservation biology. Biol Rev Camb Philos Soc. 2017;92:993–1010. 10.1111/brv.12265 - DOI - PubMed
    1. Getachew T, Huson HJ, Wurzinger M, Burgstaller J, Gizaw S, Haile A, et al. Identifying highly informative genetic markers for quantification of ancestry proportions in crossbred sheep populations: Implications for choosing optimum levels of admixture. BMC Genet. 2017;18:80. 10.1186/s12863-017-0526-2 - DOI - PMC - PubMed
    1. Xu Z, Diao S, Teng J, Chen Z, Feng X, Cai X, et al. Breed identification of meat using machine learning and breed tag SNPs. Food Control. 2021;125: 107971. 10.1016/j.foodcont.2021.107971 - DOI
    1. Wang J, Lei Q, Cao D, Zhou Y, Han H, Liu W, et al. Whole genome SNPs among 8 chicken breeds enable identification of genetic signatures that underlie breed features. J Integr Agric. 2023;22:2200–12. 10.1016/j.jia.2022.11.007 - DOI

LinkOut - more resources