Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 5;110(1):161-165.
doi: 10.1016/j.ajhg.2022.11.008. Epub 2022 Nov 29.

Statistical phasing of 150,119 sequenced genomes in the UK Biobank

Affiliations

Statistical phasing of 150,119 sequenced genomes in the UK Biobank

Brian L Browning et al. Am J Hum Genet. .

Abstract

The first release of UK Biobank whole-genome sequence data contains 150,119 genomes. We present an open-source pipeline for filtering, phasing, and indexing these genomes on the cloud-based UK Biobank Research Analysis Platform. This pipeline makes it possible to apply haplotype-based methods to UK Biobank whole-genome sequence data. The pipeline uses BCFtools for marker filtering, Beagle for genotype phasing, and Tabix for VCF indexing. We used the pipeline to phase 406 million single-nucleotide variants on chromosomes 1-22 and X at a cost of £2,309. The maximum time required to process a chromosome was 2.6 days. In order to assess phase accuracy, we modified the pipeline to exclude trio parents. We observed a switch error rate of 0.0016 on chromosome 20 in the White British trio offspring. If we exclude markers with nonmajor allele frequency < 0.1% after phasing, this switch error rate decreases by 80% to 0.00032.

Keywords: UK Biobank; genotype phasing; haplotype; haplotype phasing.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

Figure 1
Figure 1
Single and paired switch errors Each column of squares represents a true or estimated haplotype at eight heterozygous genotypes. Tan and blue squares represent alleles inherited from the mother and father, respectively. The left pair of columns shows the true haplotypes. A switch error is a heterozygote that is phased incorrectly with respect to the preceding heterozygote. A single switch error is a switch error that is not immediately preceded or followed by another switch error. A paired switch error is immediately preceded or followed by another (paired) switch error. The two haplotypes in the middle pair of columns have one single switch error since the fourth heterozygote is incorrectly phased with respect to the preceding heterozygote. The two haplotypes in the right pair of columns have two consecutive paired switch errors since both the fourth and fifth heterozygotes are incorrectly phased with respect to the preceding heterozygote. This figure is based on a figure in Browning and Browning.

References

    1. Halldorsson B.V., Eggertsson H.P., Moore K.H.S., Hauswedell H., Eiriksson O., Ulfarsson M.O., Palsson G., Hardarson M.T., Oddsson A., Jensson B.O., et al. The sequences of 150, 119 genomes in the UK Biobank. Nature. 2022;607:732–740. - PMC - PubMed
    1. Howie B., Fuchsberger C., Stephens M., Marchini J., Abecasis G.R. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat. Genet. 2012;44:955–959. - PMC - PubMed
    1. Maples B.K., Gravel S., Kenny E.E., Bustamante C.D. RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. Am. J. Hum. Genet. 2013;93:278–288. - PMC - PubMed
    1. Ramstetter M.D., Dyer T.D., Lehman D.M., Curran J.E., Duggirala R., Blangero J., Mezey J.G., Williams A.L. Benchmarking relatedness inference methods with genome-wide data from thousands of relatives. Genetics. 2017;207:75–82. - PMC - PubMed
    1. Zhou Y., Browning S.R., Browning B.L. A fast and simple method for detecting identity-by-descent segments in large-scale data. Am. J. Hum. Genet. 2020;106:426–437. - PMC - PubMed

Publication types

LinkOut - more resources