Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Jan 18;22(1):55-65.
doi: 10.1093/bib/bbaa033.

GenoPheno: cataloging large-scale phenotypic and next-generation sequencing data within human datasets

Affiliations
Review

GenoPheno: cataloging large-scale phenotypic and next-generation sequencing data within human datasets

Alba Gutiérrez-Sacristán et al. Brief Bioinform. .

Abstract

Precision medicine promises to revolutionize treatment, shifting therapeutic approaches from the classical one-size-fits-all to those more tailored to the patient's individual genomic profile, lifestyle and environmental exposures. Yet, to advance precision medicine's main objective-ensuring the optimum diagnosis, treatment and prognosis for each individual-investigators need access to large-scale clinical and genomic data repositories. Despite the vast proliferation of these datasets, locating and obtaining access to many remains a challenge. We sought to provide an overview of available patient-level datasets that contain both genotypic data, obtained by next-generation sequencing, and phenotypic data-and to create a dynamic, online catalog for consultation, contribution and revision by the research community. Datasets included in this review conform to six specific inclusion parameters that are: (i) contain data from more than 500 human subjects; (ii) contain both genotypic and phenotypic data from the same subjects; (iii) include whole genome sequencing or whole exome sequencing data; (iv) include at least 100 recorded phenotypic variables per subject; (v) accessible through a website or collaboration with investigators and (vi) make access information available in English. Using these criteria, we identified 30 datasets, reviewed them and provided results in the release version of a catalog, which is publicly available through a dynamic Web application and on GitHub. Users can review as well as contribute new datasets for inclusion (Web: https://avillachlab.shinyapps.io/genophenocatalog/; GitHub: https://github.com/hms-dbmi/GenoPheno-CatalogShiny).

Keywords: Large-scale datasets; biobanks; catalog; next-generation sequencing data; phenotypic data; precision medicine.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Curation and usage of the catalog. After locating rich datasets containing both phenotypic and genomic data from the same patients, the descriptive information is extracted from each dataset, submitted through the Shiny App form and displayed in an interactive Shiny app, where users can run dynamic searches.
Figure 2
Figure 2
New submission and validation process. Users first search for the information available in the current version of the catalog. Those interested in updating or adding new information can edit the CSV file in GitHub. New information will be reviewed and the accepted changes in the CSV file in GitHub will automatically update the Web Shiny app.

References

    1. Kohane IS. Ten things we have to do to achieve precision medicine. Science 2015;349:37–8. - PubMed
    1. Collins FS, Varmus H. A new initiative on precision medicine. N Engl J Med 2015;372:793–5. - PMC - PubMed
    1. National Research Council, Division on Earth and Life Studies, Board on Life Sciences, et al. Toward Precision Medicine: Building a Knowledge Network for Biomedical Research and a New Taxonomy of Disease. 2012.
    1. Ginsburg GS, Phillips KA. Precision medicine: from science to value. Health Aff 2018;37:694–701. - PMC - PubMed
    1. Mailman MD, Feolo M, Jin Y, et al. The NCBI dbGaP database of genotypes and phenotypes. Nat Genet 2007;39:1181–6. - PMC - PubMed

Publication types

MeSH terms