Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jun 23;12(1):53.
doi: 10.1186/s13073-020-00755-0.

Whole-genome sequence association analysis of blood proteins in a longitudinal wellness cohort

Affiliations

Whole-genome sequence association analysis of blood proteins in a longitudinal wellness cohort

Wen Zhong et al. Genome Med. .

Abstract

Background: The human plasma proteome is important for many biological processes and targets for diagnostics and therapy. It is therefore of great interest to understand the interplay of genetic and environmental factors to determine the specific protein levels in individuals and to gain a deeper insight of the importance of genetic architecture related to the individual variability of plasma levels of proteins during adult life.

Methods: We have combined whole-genome sequencing, multiplex plasma protein profiling, and extensive clinical phenotyping in a longitudinal 2-year wellness study of 101 healthy individuals with repeated sampling. Analyses of genetic and non-genetic associations related to the variability of blood levels of proteins in these individuals were performed.

Results: The analyses showed that each individual has a unique protein profile, and we report on the intra-individual as well as inter-individual variation for 794 plasma proteins. A genome-wide association study (GWAS) using 7.3 million genetic variants identified by whole-genome sequencing revealed 144 independent variants across 107 proteins that showed strong association (P < 6 × 10-11) between genetics and the inter-individual variability on protein levels. Many proteins not reported before were identified (67 out of 107) with individual plasma level affected by genetics. Our longitudinal analysis further demonstrates that these levels are stable during the 2-year study period. The variability of protein profiles as a consequence of environmental factors was also analyzed with focus on the effects of weight loss and infections.

Conclusions: We show that the adult blood levels of many proteins are determined at birth by genetics, which is important for efforts aimed to understand the relationship between plasma proteome profiles and human biology and disease.

Keywords: Blood; Genetics; Genome-wide associations; Protein levels; Whole-genome sequence.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Overview of the study. a In total, 101 subjects were included in the study. The upper part shows the number of individuals that came to each of the six visits (red, blue, green, purple, orange, and gray). The lower part shows the distribution of each visit for the subjects that completed the program across 2 years. b The rectangular plot shows the types of data that is collected in the study; see more details in Table S2. c The MDS plot shows the pairwise genetic distances between 101 subjects based on the whole genome sequencing. The color code indicates the origin of the parents of each individual (upwards triangle, mother; inverted triangle, father)
Fig. 2
Fig. 2
Longitudinal plasma protein profiling. a The distribution of the Log2 fold change of protein concentration per sample versus the average protein concentration level with FDA approved drug targets highlighted. b The inter-individual and intra-individual variation of protein levels calculated as the coefficient of variation (CV) for each protein within each visit and across all analyzed individuals (n = 90), and as the mean CV for each protein within each individual across all visits (n = 6), respectively, colored by the median concentration level of protein. The protein concentration variation across visits one to six, with each individual connected with a dotted line for c growth hormone 2 (GH2) and d RAS p21 protein activator 1 (RASA1). The color code indicates females and males. e Hierarchical clustering based on pairwise Pearson correlation distance of the protein concentration in all 540 samples is shown with labels color coded by individual (see more details in Fig.S3). f Violin plot showing the distribution of inter- and intra-individual Pearson correlation for all samples. g Ternary plot based on two-factor ANOVA for all proteins, assessing the relative effect of the inter-individual variation, visits, and residuals. The color code indicates the median concentration level of protein
Fig. 3
Fig. 3
Global analysis of the genetic regulation of the proteome. a Chord diagram showing the distributions of cis- and trans-pQTLs in 23 chromosomes. Each link represents cis- or trans-pQTLs in a chromosome, respectively, with the ribbon width reflecting the number of pQTLs. b Genomic locations of the pQTL variants and the associated proteins, colored by cis- and trans-pQTLs. c The fractions of cis- and trans- pQTLs in different types of genomic regions. d Manhattan plot of the sentinel pQTL per protein. The color code indicates the cis- and trans-pQTLs for the 107 proteins with significant associations, and the gray dots represent the none significant associations
Fig. 4
Fig. 4
Examples of three proteins with the top most significant pQTLs. a Manhattan plot of protein FOLR3 shows the genome locations of all associated pQTLs. b Bee-swarm and box plot of protein FOLR3 shows the association between genotype of rs71891516 with median concentration of FOLR3. c The longitudinal protein concentration across visits one to six with each individual connected with a dotted line for FOLR3. d Manhattan plot for protein PDGFR3. e Bee-swarm and box plot showing the associations between genotype of rs3816018 with median concentration of PDGFRB. f Longitudinal protein concentration levels of PDGFR3. g Manhattan plot for protein MEP1B. h Bee-swarm and box plot showing the associations between genotype of rs3816018 with median concentration of MEP1B. i Longitudinal protein concentration levels of MEP1B. The color indicates the genotypes of rs71891516, rs3816018 and rs620982, respectively
Fig. 5
Fig. 5
Influence of genetic and environmental factors on the blood protein level variability. a Barplot of variance explanation fraction of each component for 794 proteins (green: Genetics; purple: Environmental; gray: Sex; red: Visit) determined by a linear mixed model. b Barplot of variance explanation fraction of each component for 107 proteins, color coded by different variables. b Barplot of the top 30 proteins most strongly associated with environmental components, with the most significant variables labeled and using the color code in (b). c Canonical correspondence analysis (CCA) triplot showing correlations between protein levels and the clinical or anthropometric variables, as well as all individual samples
Fig. 6
Fig. 6
Dynamic molecular profiling changes and impact on weight loss and infection. a Chord diagram of the 50 most significant proteins related to body composition (bioimpedance fat, bioimpedance muscle, bioimpedance bone, weight, waist and BMI). The size of the link is defined as the absolute value of coefficient of the corresponding effect, and proteins are sorted based on the coefficient calculated using mixed-effect modeling. b A radar plot showing the protein profiles of the 37 most significant proteins positively related to body composition for the subject W0010 who had a 15.4 kg weight loss in 3 months between 3 and 4 and a total weight loss of 16.6 kg during the 2 years. c Chord diagram of the 50 most significant proteins related to CRP and including top six other parameters with significant effect to the same proteins. The size of the link is defined as the absolute value of coefficient of the corresponding effect and proteins are sorted based on the coefficient calculated using mixed-effect modeling. d Radar plots of the positively correlated proteins (n = 44) showing the relative abundance level in subject W0022 who had an increased CRP of 79 between visit 1 and 2

Similar articles

Cited by

References

    1. Johansson A, Enroth S, Palmblad M, Deelder AM, Bergquist J, Gyllensten U. Identification of genetic variants influencing the human plasma proteome. Proc Natl Acad Sci U S A. 2013;110(12):4673–4678. doi: 10.1073/pnas.1217238110. - DOI - PMC - PubMed
    1. Wu L, Candille SI, Choi Y, Xie D, Jiang L, Li-Pook-Than J, et al. Variation and genetic control of protein abundance in humans. Nature. 2013;499(7456):79–82. doi: 10.1038/nature12223. - DOI - PMC - PubMed
    1. Liu Y, Buil A, Collins BC, Gillet LC, Blum LC, Cheng LY, et al. Quantitative variability of 342 plasma proteins in a human twin population. Mol Syst Biol. 2015;11(1):786. doi: 10.15252/msb.20145728. - DOI - PMC - PubMed
    1. Solomon T, Lapek JD, Jr, Jensen SB, Greenwald WW, Hindberg K, Matsui H, et al. Identification of common and rare genetic variation associated with plasma protein levels using whole-exome sequencing and mass spectrometry. Circ Genom Precis Med. 2018;11(12):e002170. doi: 10.1161/CIRCGEN.118.002170. - DOI - PMC - PubMed
    1. Sun BB, Maranville JC, Peters JE, Stacey D, Staley JR, Blackshaw J, et al. Genomic atlas of the human plasma proteome. Nature. 2018;558(7708):73–79. doi: 10.1038/s41586-018-0175-2. - DOI - PMC - PubMed

Publication types

LinkOut - more resources