Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Jan;66(1):61-65.
doi: 10.1038/s10038-020-0822-y. Epub 2020 Aug 11.

Artificial intelligence powered statistical genetics in biobanks

Affiliations
Review

Artificial intelligence powered statistical genetics in biobanks

Akira Narita et al. J Hum Genet. 2021 Jan.

Abstract

Large-scale, sometimes nationwide, prospective genomic cohorts biobanking rich biological specimens such as blood, urine and tissues, have been established and released their vast amount of data in several countries. These genetic and epidemiological resources are expected to allow investigators to disentangle genetic and environmental components conferring common complex diseases. There are, however, two major challenges to statistical genetics for this goal: small sample size-high dimensionality and multilayered-heterogenous endophenotypes. Rather counterintuitively, biobank data generally have small sample size relative to their data dimensionality consisting of genomic variation, lifestyle questionnaire, and sometimes their interaction. This is a widely acknowledged difficulty in data analysis, so-called "p»n problem" in statistics or "curse of dimensionality" in machine-learning field. On the other hand, we have too many measurements of individual health status, which are endophenotypes, such as health check-up data, images, psychological test scores in addition to metabolomics and proteomics data. These endophenotypes are rich but not so tractable because of their worsen dimensionality, and substantial correlation, sometimes confusing causation among them. We have tried to overcome the problems inherent to biobank data, using statistical machine-learning and deep-learning technologies.

PubMed Disclaimer

References

    1. Snow, J. On the mode of communication of cholera. 2nd ed. London: John Churchill; 1855.
    1. Taubes G. Epidemiology faces its limits. Science. 1995;269:164–9. - DOI
    1. Maher B. Personal genomes: the case of the missing heritability. Nature. 2008;456:18–21. - DOI
    1. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al. Finding the missing heritability of complex diseases. Nature. 2009;461:747–53. - DOI
    1. Manolio TA, Bailey-Wilson JE, Collins FS. Genes, environment and the value of prospective cohort studies. Nat Rev Genet. 2006;7:812–20. - DOI

MeSH terms