GeniePool 2.0: advancing variant analysis through CHM13-T2T, AlphaMissense, gnomAD V4 integration, and variant co-occurrence queries
- PMID: 39729312
- PMCID: PMC11673193
- DOI: 10.1093/database/baae130
GeniePool 2.0: advancing variant analysis through CHM13-T2T, AlphaMissense, gnomAD V4 integration, and variant co-occurrence queries
Abstract
Originally developed to meet the challenges of genomic data deluge, GeniePool emerged as a pioneering platform, enabling efficient storage, accessibility, and analysis of vast genomic datasets, enabled due to its data lake architecture. Building on this foundation, GeniePool 2.0 advances genomic analysis through the integration of cutting-edge variant databases, such as CHM13-T2T, AlphaMissense, and gnomAD V4, coupled with the capability for variant co-occurrence queries. This evolution offers an unprecedented level of granularity and scope in genomic analyses, from enhancing our understanding of variant pathogenicity and phenotypic associations to facilitating research collaborations. The introduction of CHM13-T2T provides a more accurate reference for human genetic variation, AlphaMissense enriches the platform with protein-level impact predictions of missense mutations, and gnomAD V4 offers a comprehensive view of human genetic diversity. Additionally, the innovative feature for variant co-occurrence analysis is pivotal for exploring the combined effects of genetic variations, advancing our comprehension of compound heterozygosity, epistasis, and polygenic risk factors in disease pathogenesis. GeniePool 2.0 is a comprehensive and scalable platform, which aims to enhance genomic data analysis and contribute to genomic research, potentially supporting new discoveries and clinical innovations. Database URL: https://GeniePool.link.
© The Author(s) 2024. Published by Oxford University Press.
Conflict of interest statement
None declared.
Figures
Similar articles
-
GeniePool: genomic database with corresponding annotated samples based on a cloud data lake architecture.Database (Oxford). 2023 Jun 13;2023:baad043. doi: 10.1093/database/baad043. Database (Oxford). 2023. PMID: 37311148 Free PMC article.
-
VARista: a free web platform for streamlined whole-genome variant analysis across T2T, hg38, and hg19.Hum Genet. 2024 May;143(5):695-701. doi: 10.1007/s00439-024-02671-4. Epub 2024 Apr 12. Hum Genet. 2024. PMID: 38607411
-
Variant graph craft (VGC): a comprehensive tool for analyzing genetic variation and identifying disease-causing variants.BMC Bioinformatics. 2024 Sep 3;25(1):288. doi: 10.1186/s12859-024-05875-7. BMC Bioinformatics. 2024. PMID: 39227781 Free PMC article.
-
Variant interpretation using population databases: Lessons from gnomAD.Hum Mutat. 2022 Aug;43(8):1012-1030. doi: 10.1002/humu.24309. Epub 2021 Dec 16. Hum Mutat. 2022. PMID: 34859531 Free PMC article. Review.
-
The evolution of dbSNP: 25 years of impact in genomic research.Nucleic Acids Res. 2025 Jan 6;53(D1):D925-D931. doi: 10.1093/nar/gkae977. Nucleic Acids Res. 2025. PMID: 39530225 Free PMC article. Review.
References
-
- Weintraub G, Hadar N, Gudes E. et al. Analyzing large-scale genomic data with cloud data lakes. Proceedings of the 16th ACM International Conference on Systems and Storage, SYSTOR 2023, Haifa, Israel.2023. Vol. 142.
-
- Weintraub G, Gudes E, Dolev S.. Needle in a haystack queries in cloud data lakes. EDBT/ICDT Workshops. Nicosia, Cyprus, 2021.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources