Automated tetraploid genotype calling by hierarchical clustering
- PMID: 28070610
- DOI: 10.1007/s00122-016-2845-5
Automated tetraploid genotype calling by hierarchical clustering
Abstract
New software to make tetraploid genotype calls from SNP array data was developed, which uses hierarchical clustering and multiple F1 populations to calibrate the relationship between signal intensity and allele dosage. SNP arrays are transforming breeding and genetics research for autotetraploids. To fully utilize these arrays, the relationship between signal intensity and allele dosage must be calibrated for each marker. We developed an improved computational method to automate this process, which is provided as the R package ClusterCall. In the training phase of the algorithm, hierarchical clustering within an F1 population is used to group samples with similar intensity values, and allele dosages are assigned to clusters based on expected segregation ratios. In the prediction phase, multiple F1 populations and the prediction set are clustered together, and the genotype for each cluster is the mode of the training set samples. A concordance metric, defined as the proportion of training set samples equal to the mode, can be used to eliminate unreliable markers and compare different algorithms. Across three potato families genotyped with an 8K SNP array, ClusterCall scored 5729 markers with at least 0.95 concordance (94.6% of its total), compared to 5325 with the software fitTetra (82.5% of its total). The three families were used to predict genotypes for 5218 SNPs in the SolCAP diversity panel, compared with 3521 SNPs in a previous study in which genotypes were called manually. One of the additional markers produced a significant association for vine maturity near a well-known causal locus on chromosome 5. In conclusion, when multiple F1 populations are available, ClusterCall is an efficient method for accurate, autotetraploid genotype calling that enables the use of SNP data for research and plant breeding.
Similar articles
-
FitTetra 2.0 - improved genotype calling for tetraploids with multiple population and parental data support.BMC Bioinformatics. 2019 Mar 20;20(1):148. doi: 10.1186/s12859-019-2703-y. BMC Bioinformatics. 2019. PMID: 30894135 Free PMC article.
-
Genotype calling in tetraploid species from bi-allelic marker data using mixture models.BMC Bioinformatics. 2011 May 19;12:172. doi: 10.1186/1471-2105-12-172. BMC Bioinformatics. 2011. PMID: 21595880 Free PMC article.
-
Linkage analysis and QTL mapping using SNP dosage data in a tetraploid potato mapping population.PLoS One. 2013 May 21;8(5):e63939. doi: 10.1371/journal.pone.0063939. Print 2013. PLoS One. 2013. PMID: 23704960 Free PMC article.
-
Development and analysis of a 20K SNP array for potato (Solanum tuberosum): an insight into the breeding history.Theor Appl Genet. 2015 Dec;128(12):2387-401. doi: 10.1007/s00122-015-2593-y. Epub 2015 Aug 12. Theor Appl Genet. 2015. PMID: 26263902 Free PMC article.
-
Integrating haplotype-specific linkage maps in tetraploid species using SNP markers.Theor Appl Genet. 2016 Nov;129(11):2211-2226. doi: 10.1007/s00122-016-2768-1. Epub 2016 Aug 25. Theor Appl Genet. 2016. PMID: 27561740 Free PMC article.
Cited by
-
The recombination landscape and multiple QTL mapping in a Solanum tuberosum cv. 'Atlantic'-derived F1 population.Heredity (Edinb). 2021 May;126(5):817-830. doi: 10.1038/s41437-021-00416-x. Epub 2021 Mar 22. Heredity (Edinb). 2021. PMID: 33753876 Free PMC article.
-
pSBVB: A Versatile Simulation Tool To Evaluate Genomic Selection in Polyploid Species.G3 (Bethesda). 2019 Feb 7;9(2):327-334. doi: 10.1534/g3.118.200942. G3 (Bethesda). 2019. PMID: 30573468 Free PMC article.
-
FitTetra 2.0 - improved genotype calling for tetraploids with multiple population and parental data support.BMC Bioinformatics. 2019 Mar 20;20(1):148. doi: 10.1186/s12859-019-2703-y. BMC Bioinformatics. 2019. PMID: 30894135 Free PMC article.
-
Tools for Genetic Studies in Experimental Populations of Polyploids.Front Plant Sci. 2018 Apr 18;9:513. doi: 10.3389/fpls.2018.00513. eCollection 2018. Front Plant Sci. 2018. PMID: 29720992 Free PMC article. Review.
-
Linkage Disequilibrium and Evaluation of Genome-Wide Association Mapping Models in Tetraploid Potato.G3 (Bethesda). 2018 Oct 3;8(10):3185-3202. doi: 10.1534/g3.118.200377. G3 (Bethesda). 2018. PMID: 30082329 Free PMC article.
References
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources