Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct;634(8035):795-803.
doi: 10.1038/s41586-024-07721-5. Epub 2024 Oct 23.

The importance of family-based sampling for biobanks

Affiliations

The importance of family-based sampling for biobanks

Neil M Davies et al. Nature. 2024 Oct.

Abstract

Biobanks aim to improve our understanding of health and disease by collecting and analysing diverse biological and phenotypic information in large samples. So far, biobanks have largely pursued a population-based sampling strategy, where the individual is the unit of sampling, and familial relatedness occurs sporadically and by chance. This strategy has been remarkably efficient and successful, leading to thousands of scientific discoveries across multiple research domains, and plans for the next wave of biobanks are underway. In this Perspective, we discuss the strengths and limitations of a complementary sampling strategy for future biobanks based on oversampling of close genetic relatives. Such family-based samples facilitate research that clarifies causal relationships between putative risk factors and outcomes, particularly in estimates of genetic effects, because they enable analyses that reduce or eliminate confounding due to familial and demographic factors. Family-based biobank samples would also shed new light on fundamental questions across multiple fields that are often difficult to explore in population-based samples. Despite the potential for higher costs and greater analytical complexity, the many advantages of family-based samples should often outweigh their potential challenges.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Figure 1.
Figure 1.. Factors that contribute to genetic associations in GWAS.
We consider the genetic association between a focal SNP x and an example trait (height). A. Direct genetic effect: Absent confounders (green), the association of phenotype and x will reflect its direct genetic effect (red). B. Population stratification: When trait means and allele frequencies differ between populations, genetic associations will also reflect any correlation between genotype and environment. Here, environmental influences in population 2 lead to shorter stature. If both populations are analysed together, the genetic association of x overestimates its direct genetic effect because increasing environmental factors and alleles that increase height are more frequent in population 1 (note that the bias is opposite for the bottommost SNP). C. Indirect genetic effect: A heritable parental trait that influences an offspring trait via parental behaviour creates an indirect causal path between parental alleles and the offspring trait mediated through the family environment. Here, x is associated with offspring height because of a direct genetic effect plus the effect of x on parental behaviour (e.g., dietary preferences), which in turn influences offspring height. D. Assortative mating: When parents assort on heritable traits, offspring inherit alleles with concordant effects from both parents. Over 2+ generations, trait-affecting alleles inherited from the same parents also correlate due to recombination. Here, the genetic association of height on x reflects both its direct effect plus the partial effects of all other height-increasing alleles due to their assortative mating-induced correlations with x.
Figure 2.
Figure 2.. Sample sizes by relative type in biobank-scale datasets.
The y-axis in Panel a shows the number (in 1000’s) of genotyped individuals in the corresponding biobank. The y-axes in Panels b, c and d represent the number (in 1000s) of full sibling pairs, parent-offspring pairs and trios (i.e., two parents and one child) in each biobank, respectively. The same individuals can be included multiple times across panels. For example, the 22K sibling pairs in the UK Biobank in Panel b are also included as ~44K individuals in Panel a, and the 44K MoBa trios in Panel d are also included as ~88K parent-offspring pairs in Panel c. Notes: MVP = Million Veterans Program, MoBa=The Norwegian Mother, Father and Child Cohort Study, MCPS = Mexico City Prospective Study; Chinese Biobank = Chinese Kadoorie Biobank; HUNT=The Trøndelag Health Study.
Figure 3.
Figure 3.. The statistical power and accuracy of GWAS estimates in family-based versus population-based designs.
All results are based on theoretical derivations confirmed through simulation (see Supplementary Materials). A. Shown is the effective sample size for estimating a parameter (either genetic association or the direct genetic effect) in either n/2 sibling pairs or n/3 trios (n total genotyped samples for both) compared to estimating genetic association using population-based GWAS in n unrelated individuals (relative neffective; y-axis), as a function of the trait heritability (x-axis). B. Shows the inaccuracy of GWAS estimates, quantified as the Mean Squared Error (MSE) of the estimates as a function of log10 sample size (x-axis): either n unrelated individuals, n sibling individuals (in n/2 sibling pairs), or n trio individuals (in n/3 parent-offspring trios). The hypothetical bias (dashed orange line) of .0033 (bias2 = 1e-5) represents about 1/3 of a typical genome-wide significant SNP association for height.

References

    1. Bycroft C et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018). - PMC - PubMed
    1. Nagai A et al. Overview of the BioBank Japan Project: Study design and profile. J. Epidemiol. 27, S2–S8 (2017). - PMC - PubMed
    1. The All of Us Research Program Investigators. The “All of Us” Research Program. N. Engl. J. Med. 381, 668–676 (2019). - PMC - PubMed
    1. Our Future Health. Our Future Health Study Protocol. https://s42615.pcdn.co/wp-content/uploads/Our-Future-Health-protocol-for... (2021).
    1. Davies NM, Dickson M, Davey Smith G, van den Berg GJ & Windmeijer F The causal effects of education on health outcomes in the UK Biobank. Nat. Hum. Behav. 2, 117–125 (2018). - PMC - PubMed

LinkOut - more resources