Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 27:2024:baae130.
doi: 10.1093/database/baae130.

GeniePool 2.0: advancing variant analysis through CHM13-T2T, AlphaMissense, gnomAD V4 integration, and variant co-occurrence queries

Affiliations

GeniePool 2.0: advancing variant analysis through CHM13-T2T, AlphaMissense, gnomAD V4 integration, and variant co-occurrence queries

Grisha Weintraub et al. Database (Oxford). .

Abstract

Originally developed to meet the challenges of genomic data deluge, GeniePool emerged as a pioneering platform, enabling efficient storage, accessibility, and analysis of vast genomic datasets, enabled due to its data lake architecture. Building on this foundation, GeniePool 2.0 advances genomic analysis through the integration of cutting-edge variant databases, such as CHM13-T2T, AlphaMissense, and gnomAD V4, coupled with the capability for variant co-occurrence queries. This evolution offers an unprecedented level of granularity and scope in genomic analyses, from enhancing our understanding of variant pathogenicity and phenotypic associations to facilitating research collaborations. The introduction of CHM13-T2T provides a more accurate reference for human genetic variation, AlphaMissense enriches the platform with protein-level impact predictions of missense mutations, and gnomAD V4 offers a comprehensive view of human genetic diversity. Additionally, the innovative feature for variant co-occurrence analysis is pivotal for exploring the combined effects of genetic variations, advancing our comprehension of compound heterozygosity, epistasis, and polygenic risk factors in disease pathogenesis. GeniePool 2.0 is a comprehensive and scalable platform, which aims to enhance genomic data analysis and contribute to genomic research, potentially supporting new discoveries and clinical innovations. Database URL: https://GeniePool.link.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
GeniePool 2.0 user interface. Genomic coordinates for either a single variant or co-occurrence of two variants are searched within NGS samples from SRA. Resulting variants can be filtered by sample attributes. Selected variants will provide information about each variant and its study.

Similar articles

References

    1. Hadar N, Weintraub G, Gudes E. et al.. GeniePool: genomic database with corresponding annotated samples based on a cloud data lake architecture. Database 2023;2023:baad043. doi: 10.1093/database/baad043 - DOI - PMC - PubMed
    1. Weintraub G, Hadar N, Gudes E. et al. Analyzing large-scale genomic data with cloud data lakes. Proceedings of the 16th ACM International Conference on Systems and Storage, SYSTOR 2023, Haifa, Israel.2023. Vol. 142.
    1. Weintraub G, Gudes E, Dolev S.. Needle in a haystack queries in cloud data lakes. EDBT/ICDT Workshops. Nicosia, Cyprus, 2021.
    1. Leinonen R, Sugawara H, Shumway M. The sequence read archive. Nucleic Acids Res 2011;39:D19. doi: 10.1093/nar/gkq1019 - DOI - PMC - PubMed
    1. Barrett T, Clark K, Gevorgyan R. et al. BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata. Nucleic Acids Res 2012;40:D57–D63. doi: 10.1093/nar/gkr1163 - DOI - PMC - PubMed

Publication types

LinkOut - more resources