Genotype calling and phasing using next-generation sequencing reads and a haplotype scaffold
- PMID: 23093610
- DOI: 10.1093/bioinformatics/bts632
Genotype calling and phasing using next-generation sequencing reads and a haplotype scaffold
Abstract
Motivation: Given the current costs of next-generation sequencing, large studies carry out low-coverage sequencing followed by application of methods that leverage linkage disequilibrium to infer genotypes. We propose a novel method that assumes study samples are sequenced at low coverage and genotyped on a genome-wide microarray, as in the 1000 Genomes Project (1KGP). We assume polymorphic sites have been detected from the sequencing data and that genotype likelihoods are available at these sites. We also assume that the microarray genotypes have been phased to construct a haplotype scaffold. We then phase each polymorphic site using an MCMC algorithm that iteratively updates the unobserved alleles based on the genotype likelihoods at that site and local haplotype information. We use a multivariate normal model to capture both allele frequency and linkage disequilibrium information around each site. When sequencing data are available from trios, Mendelian transmission constraints are easily accommodated into the updates. The method is highly parallelizable, as it analyses one position at a time.
Results: We illustrate the performance of the method compared with other methods using data from Phase 1 of the 1KGP in terms of genotype accuracy, phasing accuracy and downstream imputation performance. We show that the haplotype panel we infer in African samples, which was based on a trio-phased scaffold, increases downstream imputation accuracy for rare variants (R2 increases by >0.05 for minor allele frequency <1%), and this will translate into a boost in power to detect associations. These results highlight the value of incorporating microarray genotypes when calling variants from next-generation sequence data.
Availability: The method (called MVNcall) is implemented in a C++ program and is available from http://www.stats.ox.ac.uk/∼marchini/#software.
Similar articles
-
Leveraging reads that span multiple single nucleotide polymorphisms for haplotype inference from sequencing data.Bioinformatics. 2013 Sep 15;29(18):2245-52. doi: 10.1093/bioinformatics/btt386. Epub 2013 Jul 3. Bioinformatics. 2013. PMID: 23825370 Free PMC article.
-
Genotype calling and haplotyping in parent-offspring trios.Genome Res. 2013 Jan;23(1):142-51. doi: 10.1101/gr.142455.112. Epub 2012 Oct 11. Genome Res. 2013. PMID: 23064751 Free PMC article.
-
A strategy to improve phasing of whole-genome sequenced individuals through integration of familial information from dense genotype panels.Genet Sel Evol. 2017 May 16;49(1):46. doi: 10.1186/s12711-017-0321-6. Genet Sel Evol. 2017. PMID: 28511677 Free PMC article.
-
Genotype Imputation in Genome-Wide Association Studies.Curr Protoc Hum Genet. 2019 Jun;102(1):e84. doi: 10.1002/cphg.84. Curr Protoc Hum Genet. 2019. PMID: 31216114 Review.
-
Genotype and SNP calling from next-generation sequencing data.Nat Rev Genet. 2011 Jun;12(6):443-51. doi: 10.1038/nrg2986. Nat Rev Genet. 2011. PMID: 21587300 Free PMC article. Review.
Cited by
-
A high-quality human reference panel reveals the complexity and distribution of genomic structural variants.Nat Commun. 2016 Oct 6;7:12989. doi: 10.1038/ncomms12989. Nat Commun. 2016. PMID: 27708267 Free PMC article.
-
The contributions of mitochondrial and nuclear mitochondrial genetic variation to neuroticism.Nat Commun. 2023 May 30;14(1):3146. doi: 10.1038/s41467-023-38480-y. Nat Commun. 2023. PMID: 37253732 Free PMC article.
-
An integrated Asian human SNV and indel benchmark established using multiple sequencing methods.Sci Rep. 2020 Jun 17;10(1):9821. doi: 10.1038/s41598-020-66605-6. Sci Rep. 2020. PMID: 32555294 Free PMC article.
-
InPhaDel: integrative shotgun and proximity-ligation sequencing to phase deletions with single nucleotide polymorphisms.Nucleic Acids Res. 2016 Jul 8;44(12):e111. doi: 10.1093/nar/gkw281. Epub 2016 Apr 21. Nucleic Acids Res. 2016. PMID: 27105843 Free PMC article.
-
Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel.Nat Commun. 2014 Jun 13;5:3934. doi: 10.1038/ncomms4934. Nat Commun. 2014. PMID: 25653097 Free PMC article.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources