Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 4:13:1060882.
doi: 10.3389/fgene.2022.1060882. eCollection 2022.

Using whole genome sequence to compare variant callers and breed differences of US sheep

Affiliations

Using whole genome sequence to compare variant callers and breed differences of US sheep

Morgan R Stegemiller et al. Front Genet. .

Abstract

As whole genome sequence (WGS) data sets have become abundant and widely available, so has the need for variant detection and scoring. The aim of this study was to compare the accuracy of commonly used variant calling programs, Freebayes and GATK HaplotypeCaller (GATK-HC), and to use U.S. sheep WGS data sets to identify novel breed-associated SNPs. Sequence data from 145 sheep consisting of 14 U.S. breeds were filtered and biallelic single nucleotide polymorphisms (SNPs) were retained for genotyping analyses. Genotypes from both programs were compared to each other and to genotypes from bead arrays. The SNPs from WGS were compared to the bead array data with breed heterozygosity, principal component analysis and identifying breed associated SNPs to analyze genetic diversity. The average sequence read depth was 2.78 reads greater with 6.11% more SNPs being identified in Freebayes compared to GATK-HC. The genotype concordance of the variant callers to bead array data was 96.0% and 95.5% for Freebayes and GATK-HC, respectively. Genotyping with WGS identified 10.5 million SNPs from all 145 sheep. This resulted in an 8% increase in measured heterozygosity and greater breed separation in the principal component analysis compared to the bead array analysis. There were 1,849 SNPs identified in only the Romanov sheep where all 10 rams were homozygous for one allele and the remaining 135 sheep from 13 breeds were homozygous for the opposite allele. Both variant calling programs had greater than 95% concordance of SNPs with bead array data, and either was suitably accurate for ovine WGS data sets. The use of WGS SNPs improved the resolution of PCA analysis and was critical for identifying Romanov breed-associated SNPs. Subsets of such SNPs could be used to estimate germplasm composition in animals without pedigree information.

Keywords: GATK HaplotypeCaller (HC); freebayes; sheep; variant callers; whole genome sequence.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Density distributions of the mean sequence read depth for SNPs identified by Freebayes and GATK-HC for the Hampshire breed. (A). The sequence read depth density curves for both variant callers. (B). The sequence read density curves overlapped with the medians centered.
FIGURE 2
FIGURE 2
Number of SNPs identified by Freebayes and GATK HaplotypeCaller for each breed cohort.
FIGURE 3
FIGURE 3
Total non-concordant SNPs that were common and unique between the two variant callers compared to the bead array.
FIGURE 4
FIGURE 4
Principal component analysis of variant data for the sheep breeds (A). Bead array variant data and (B). The consensus WGS variant data from Freebayes. Note: MARC III Composite breed is labeled Composite.

Similar articles

Cited by

References

    1. Berry D.P., O’Brien A., Wall E., McDerott K., Randles S., Flynn P., Park S., Grose J., Weld R., McHugh N. (2016). Inter- and intra-reproducibility of genotypes from sheep technical replicates on Illumina and Affymetrix platforms. Genet. Sel. Evol. 48 (86), 86. 10.1186/s12711-016-0267-0 - DOI - PMC - PubMed
    1. Danecek P., Auton A., Abecasis G., Albers C. A., Banks E., DePristo M. A., et al. (2011). 1000 Genomes Project Analysis GroupThe variant call format and VCFtools. Bioinformatics 27 (15), 2156–2158. 10.1093/bioinformatics/btr330 - DOI - PMC - PubMed
    1. Davenport K.M., Hiemke C., McKay S.D., Thorne J.W., Lewis R.M., Taylor T., Murdoch B.M. (2020). Genetic Structure and Admixture in Sheep from Terminal Breeds in the United States. Anim. Genet. 51 (2), 284–291. 10.1111/age.12905 - DOI - PMC - PubMed
    1. Deniskova T. E., Dotsev A.V., Selionova M.I., Kunz E., Medugorac I., Reyer H., et al. (2018). Population structure and genetic diversity of 25 Russian sheep breeds based on whole-genome genotyping. Genet. Sel. Evol. 50 (29), 29. 10.1186/s12711-018-0399-5 - DOI - PMC - PubMed
    1. Fan B., Du Z.Q., Gorbach D.M., Rothschild M.F. (2010). Development and Application of high-density SNP Arrays in Genomic Studies of Domestic Animals. Asian-Australas. J. Anim. Sci. 23 (7), 833–847. 10.5713/ajas.2010.r.03 - DOI