Personalized pangenome references
- PMID: 39261641
- PMCID: PMC12643174
- DOI: 10.1038/s41592-024-02407-2
Personalized pangenome references
Abstract
Pangenomes reduce reference bias by representing genetic diversity better than a single reference sequence. Yet when comparing a sample to a pangenome, variants in the pangenome that are not part of the sample can be misleading, for example, causing false read mappings. These irrelevant variants are generally rarer in terms of allele frequency, and have previously been dealt with by filtering rare variants. However, this blunt heuristic both fails to remove some irrelevant variants and removes many relevant variants. We propose a new approach that imputes a personalized pangenome subgraph by sampling local haplotypes according to k-mer counts in the reads. We implement the approach in the vg toolkit ( https://github.com/vgteam/vg ) for the Giraffe short-read aligner and compare its accuracy to state-of-the-art methods using human pangenome graphs from the Human Pangenome Reference Consortium. This reduces small variant genotyping errors by four times relative to the Genome Analysis Toolkit and makes short-read structural variant genotyping of known variants competitive with long-read variant discovery methods.
© 2024. The Author(s), under exclusive licence to Springer Nature America, Inc.
Conflict of interest statement
Competing interests
P.-C.C. and A.C. are employees of Google LLC and own Alphabet stock as part of the standard compensation package. The other authors declare no competing interests.
Figures
Update of
-
Personalized Pangenome References.bioRxiv [Preprint]. 2023 Dec 15:2023.12.13.571553. doi: 10.1101/2023.12.13.571553. bioRxiv. 2023. Update in: Nat Methods. 2024 Nov;21(11):2017-2023. doi: 10.1038/s41592-024-02407-2. PMID: 38168361 Free PMC article. Updated. Preprint.
References
MeSH terms
Grants and funding
- U01 HG010961/HG/NHGRI NIH HHS/United States
- OT3HL142481/U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- R01HG010485/U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
- OT2OD033761/U.S. Department of Health & Human Services | National Institutes of Health (NIH)
- OT3 HL142481/HL/NHLBI NIH HHS/United States
- OT2 OD033761/OD/NIH HHS/United States
- U24HG011853/U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
- U01HG010961/U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
- U24 HG010262/HG/NHGRI NIH HHS/United States
- R01 HG010485/HG/NHGRI NIH HHS/United States
- U24HG010262/U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
- U24 HG011853/HG/NHGRI NIH HHS/United States
- U01 HG013748/HG/NHGRI NIH HHS/United States
LinkOut - more resources
Full Text Sources
Miscellaneous
