Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Mar 7;9(3):663-673.
doi: 10.1534/g3.118.200913.

polyRAD: Genotype Calling with Uncertainty from Sequencing Data in Polyploids and Diploids

Affiliations

polyRAD: Genotype Calling with Uncertainty from Sequencing Data in Polyploids and Diploids

Lindsay V Clark et al. G3 (Bethesda). .

Abstract

Low or uneven read depth is a common limitation of genotyping-by-sequencing (GBS) and restriction site-associated DNA sequencing (RAD-seq), resulting in high missing data rates, heterozygotes miscalled as homozygotes, and uncertainty of allele copy number in heterozygous polyploids. Bayesian genotype calling can mitigate these issues, but previously has only been implemented in software that requires a reference genome or uses priors that may be inappropriate for the population. Here we present several novel Bayesian algorithms that estimate genotype posterior probabilities, all of which are implemented in a new R package, polyRAD. Appropriate priors can be specified for mapping populations, populations in Hardy-Weinberg equilibrium, or structured populations, and in each case can be informed by genotypes at linked markers. The polyRAD software imports read depth from several existing pipelines, and outputs continuous or discrete numerical genotypes suitable for analyses such as genome-wide association and genomic prediction.

Keywords: Bayesian genotype calling; genotype imputation; next-generation DNA sequencing; polyploidy; single nucleotide polymorphism.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of polyRAD algorithms for genotype estimation. Genotype posterior probabilities are estimated iteratively until allele frequencies converge, except in the case of mapping populations, where allele frequencies are only estimated once. Purple boxes indicate inputs to the pipeline (read depth, contamination rate, and optionally, genomic positions of loci). Blue boxes indicate estimated parameters (allele frequencies, genotype likelihoods and prior and posterior probabilities, linkage between alleles, and probability of sampling each allele). Green boxes indicate alternative methodologies for genotype prior probability estimation (mapping, HWE, and population structure). Priors for the HWE and population structure models can be adjusted for self-fertilization according to De Silva et al. (2005). Orange boxes indicate sample × allele matrices indicating approximate allele copy number. Dashed arrows indicate steps that happen only once at the beginning or end of the pipeline, whereas solid arrows indicate iterative steps. Circular arrows highlight cycles of iteration. Equations 1-4 are provided in the main manuscript, and Equations 5-19 are provided in Supplemental Materials.
Figure 2
Figure 2
Genotyping error of EBG, fitPoly, updog, polyRAD, LinkImpute, and rrBLUP in a diversity panel of 565 diploid Miscanthus sinensis. The benefits of incorporating population structure into the genotyping model and using continuous rather than discrete genotypes are illustrated. Genotypes were coded on a scale of 0 to 2. Root mean squared error (RMSE) was calculated between actual genotypes and genotypes ascertained from simulated RAD-seq reads at 395 SNP markers (lower RMSE = higher accuracy). Each point represents one SNP. Median read depth is indicated by color, including genotypes with zero reads. The RMSE for continuous genotypes output by the polyRAD PopStruct LD method is shown on the x-axis, and the RMSE of other methods and types of genotypes (continuous or discrete) is shown on the y-axis. The dashed line indicates the ordinary least-squares regression with slope and intercept estimates, with standard errors. The “norm” model was used with updog. (A) RMSE calculated using only genotypes with more than zero reads. (B) RMSE calculated using only genotypes with zero reads, by genotyping or imputation method and genotype type.
Figure 3
Figure 3
Genotyping error of EBG, fitPoly, updog, polyRAD, and rrBLUP in a simulated tetraploid diversity panel derived from genotypes of 565 diploid Miscanthus sinensis. The benefits of incorporating population structure into the genotyping model and using continuous rather than discrete genotypes are illustrated. Genotypes were coded on a scale of 0 to 4. Root mean squared error (RMSE) was calculated between actual genotypes and genotypes ascertained from simulated RAD-seq reads at 395 SNP markers (lower RMSE = higher accuracy). Each point represents one SNP. Median read depth is indicated by color, including genotypes with zero reads. The RMSE for continuous genotypes output by the polyRAD PopStruct LD method is shown on the x-axis, and the RMSE of other methods and types of genotypes (continuous or discrete) is shown on the y-axis. The dashed line indicates the ordinary least-squares regression with slope and intercept estimates, with standard errors. The “norm” model was used with updog. (A) RMSE calculated using only genotypes with more than zero reads. (B) RMSE calculated using only genotypes with zero reads, by genotyping or imputation method and genotype type. LinkImpute was not included given that it works for diploids only.
Figure 4
Figure 4
Genotyping error of EBG, fitPoly, updog, polyRAD, LinkImpute, and rrBLUP in an F1 mapping population of 83 diploid Miscanthus sinensis. The benefits of incorporating linkage into the genotyping model and using continuous rather than discrete genotypes are illustrated. Genotypes were coded on a scale of 0 to 2. Root mean squared error (RMSE) was calculated between actual genotypes and genotypes ascertained from simulated RAD-seq reads at 241 SNP markers (lower RMSE = higher accuracy). Each point represents one SNP. Median read depth is indicated by color, including genotypes with zero reads. The RMSE for continuous genotypes output by the polyRAD mapping method with linkage is shown on the x-axis, and the RMSE of other methods and types of genotypes (continuous or discrete) is shown on the y-axis. The dashed line indicates the ordinary least-squares regression with slope and intercept estimates, with standard errors. The “f1” model was used with updog. (A) RMSE calculated using only genotypes with more than zero reads. (B) RMSE calculated using only genotypes with zero reads, by genotyping or imputation method and genotype type.
Figure 5
Figure 5
Genotyping error of EBG, updog, polyRAD, and rrBLUP in an F1 mapping population of tetraploid potato with 238 progeny. The benefits of incorporating linkage into the genotyping model and using continuous rather than discrete genotypes are illustrated. Genotypes were coded on a scale of 0 to 4. Root mean squared error (RMSE) was calculated between actual genotypes and genotypes ascertained from simulated RAD-seq reads at 2538 SNP markers (lower RMSE = higher accuracy). Each point represents one SNP. Median read depth is indicated by color, including genotypes with zero reads. The RMSE for continuous genotypes output by the polyRAD mapping method with linkage is shown on the x-axis, and the RMSE of other methods and types of genotypes (continuous or discrete) is shown on the y-axis. The dashed line indicates the ordinary least-squares regression with slope and intercept estimates, with standard errors. The “f1” model was used with updog. fitPoly results are omitted since it failed for all markers, and LinkImpute was not run since LinkImpute is for diploids only. (A) RMSE calculated using only genotypes with more than zero reads. (B) RMSE calculated using only genotypes with zero reads, by genotyping or imputation method and genotype type.

References

    1. Andrews K. R., Good J. M., Miller M. R., Luikart G., Hohenlohe P. A., 2016. Harnessing the power of RADseq for ecological and evolutionary genomics. Nat. Rev. Genet. 17: 81–92. 10.1038/nrg.2015.28 - DOI - PMC - PubMed
    1. Beissinger T. M., Hirsch C. N., Sekhon R. S., Foerster J. M., Johnson J. M., et al. , 2013. Marker density and read depth for genotyping populations using genotyping-by-sequencing. Genetics 193: 1073–1081. 10.1534/genetics.112.147710 - DOI - PMC - PubMed
    1. Blischak P. D., Kubatko L. S., Wolfe A. D., 2018. SNP genotyping and parameter estimation in polyploids using low-coverage sequencing data. Bioinformatics 34: 407–415. 10.1093/bioinformatics/btx587 - DOI - PubMed
    1. Bourke P. M., van Geest G., Voorrips R. E., Jansen J., Kranenburg T., et al. , 2018a polymapR—linkage analysis and genetic map construction from F1 populations of outcrossing polyploids. Bioinformatics 34: 3496–3502. 10.1093/bioinformatics/bty371 - DOI - PMC - PubMed
    1. Bourke P. M., Voorrips R. E., Visser R. G. F., Maliepaard C., 2018b Tools for genetic studies in experimental populations of polyploids. Front. Plant Sci. 9: 513 10.3389/fpls.2018.00513 - DOI - PMC - PubMed

Publication types

LinkOut - more resources