Guidelines for Large-Scale Sequence-Based Complex Trait Association Studies: Lessons Learned from the NHLBI Exome Sequencing Project

Affiliations

¹ Zilber School of Public Health, University of Wisconsin-Milwaukee, Milwaukee, WI 53205, USA; Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.
² Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; Department of Epidemiology, School of Public Health, University of Washington, Seattle, WA 98195, USA.
³ Center for Statistical Genetics, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA.
⁴ Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA.
⁵ Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Vertex Pharmaceuticals, Boston, MA 02210, USA.
⁶ Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Department of Pediatrics, University of Washington, Seattle, WA 98195, USA.
⁷ Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.
⁸ Department of Pathology, University of Vermont, Colchester, VT 05405, USA; Department of Biochemistry, University of Vermont, Burlington, VT 05405, USA.
⁹ Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA.
¹⁰ Center for Statistical Genetics, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA. Electronic address: sleal@bcm.edu.

PMID: 27666372
PMCID: PMC5065683
DOI: 10.1016/j.ajhg.2016.08.012

Guidelines for Large-Scale Sequence-Based Complex Trait Association Studies: Lessons Learned from the NHLBI Exome Sequencing Project

Paul L Auer et al. Am J Hum Genet. 2016.

. 2016 Oct 6;99(4):791-801.

doi: 10.1016/j.ajhg.2016.08.012. Epub 2016 Sep 22.

Authors

Affiliations

¹ Zilber School of Public Health, University of Wisconsin-Milwaukee, Milwaukee, WI 53205, USA; Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.
² Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; Department of Epidemiology, School of Public Health, University of Washington, Seattle, WA 98195, USA.
³ Center for Statistical Genetics, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA.
⁴ Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA.
⁵ Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Vertex Pharmaceuticals, Boston, MA 02210, USA.
⁶ Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Department of Pediatrics, University of Washington, Seattle, WA 98195, USA.
⁷ Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.
⁸ Department of Pathology, University of Vermont, Colchester, VT 05405, USA; Department of Biochemistry, University of Vermont, Burlington, VT 05405, USA.
⁹ Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA.
¹⁰ Center for Statistical Genetics, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA. Electronic address: sleal@bcm.edu.

PMID: 27666372
PMCID: PMC5065683
DOI: 10.1016/j.ajhg.2016.08.012

Abstract

Massively parallel whole-genome sequencing (WGS) data have ushered in a new era in human genetics. These data are now being used to understand the role of rare variants in complex traits and to advance the goals of precision medicine. The technological and computing advances that have enabled us to generate WGS data on thousands of individuals have also outpaced our ability to perform analyses in scientifically and statistically rigorous and thoughtful ways. The past several years have witnessed the application of whole-exome sequencing (WES) to complex traits and diseases. From our analysis of NHLBI Exome Sequencing Project (ESP) data, not only have a number of important disease and complex trait association findings emerged, but our collective experience offers some valuable lessons for WGS initiatives. These include caveats associated with generating automated pipelines for quality control and analysis of rare variants; the importance of studying minority populations; sample size requirements and efficient study designs for identifying rare-variant associations; and the significance of incidental findings in population-based genetic research. With the ESP as an example, we offer guidance and a framework on how to conduct a large-scale association study in the era of WGS.

PubMed Disclaimer

Figures

**Figure 1**
Schematic of the Work Flow for Sample Selection and Data Analysis in ESP Primary traits were selected from large, population-based studies with widely available data on secondary traits. Both European and African American samples were selected for sequencing. Association analyses were conducted using both genes and single variants as units of analysis.

**Figure 2**
Coding Variants Observed in the NHLBI-ESP (A) The average number of missense, synonymous, nonsense, and splice site variants per study subject for 2,307 African Americans and 4,392 European Americans and all study subjects (n = 6,699) for the intersect of all four targets. The vertical lines display the smallest and largest number of variants of each type observed per person. (B) The number of missense, synonymous, nonsense, and splice sites observed for NHLBI-ESP (n = 6,699) study subjects. Represented in each pie chart is the number of singletons, doubletons, and variant sites with an MAF of ≤1%, >1%–5%, and >5%. (C) The average number of unique missense, synonymous, nonsense, and splice site variants per individual. The variants are not only exclusive to the NHLB-ESP but also are not observed in either dbSNP or 1000 Genomes. (D) Comparison of the number of coding variant sites observed in AAs and EAs. The number of missense, synonymous, nonsense, and splice site variants that are unique to each population are observed in both populations and have a MAF of ≥1%. The numbers displayed are exclusive to one category. In order to fairly compare the number of variant sites in African Americans and European Americans, equal numbers of African Americans (n = 2,312) and European Americans (n = 2,312) were studied.

**Figure 3**
Triglyceride Rare Variant Association Analysis and Association of Rare Variants in *APOC3* (A) QQ plot of the meta-analysis for African Americans and European Americans of rare variant burden analysis of triglyceride levels. Base 10 –log values of the observed p values are displayed versus their expected values. Rare variant association analysis was performed separately for African Americans (n = 1,654) and European Americans (n = 2,074) using the CMC analyzing those variant sites with a MAF ≤ 0.01. (B) Distribution of triglyceride levels for NHLBI-ESP study subjects and triglyceride levels for individuals with an *APOC3* variant. The quantitative trait distribution of triglycerides after natural log transformation for African Americans and European Americans who are study subjects in the NHLBI-ESP. For the 27 individuals (8 African American and 19 European American) who are heterozygous for one of the 7 coding variants (3 splice, 1 stop-gain, and 3 missense), a tick represents their triglyceride levels after natural log transformation. For each variant site a diamond (red for African Americans and blue for European Americans) represents the average triglyceride levels for carriers of that variant. (C) Distribution of triglyceride levels for study subjects from the Women’s Health Initiative (WHI) and triglyceride levels for individuals with an APOC3 variant. The quantitative trait distribution of triglycerides after natural log transformation for African Americans (n = 1,820) and European Americans (n = 1,643) who are study subjects from the WHI. The DNA samples from the study subjects were genotyped on the exome chip. Of the seven variants that were observed in NHLBI-ESP, four were represented on the exome chip.

**Figure 4**
An Analysis of Statistical Power to Detect Associations across the Exome (A) Sample sizes necessary to detect associations for a binary trait across the exome. (B) Sample sizes for a quantitative trait. Results from the SKAT, CMC, and BRV rare-variant association tests are shown in blue, green, and red, respectively.

See this image and copyright information in PMC

References

1. Collins F.S., Varmus H. A new initiative on precision medicine. N. Engl. J. Med. 2015;372:793–795. - PMC - PubMed
1. Li H., Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. - PMC - PubMed
1. Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R., 1000 Genome Project Data Processing Subgroup The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. - PMC - PubMed
1. Jun G., Wing M.K., Abecasis G.R., Kang H.M. An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data. Genome Res. 2015;25:918–925. - PMC - PubMed
1. Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M.A., Bender D., Maller J., Sklar P., de Bakker P.I., Daly M.J., Sham P.C. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007;81:559–575. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Guidelines for Large-Scale Sequence-Based Complex Trait Association Studies: Lessons Learned from the NHLBI Exome Sequencing Project

Affiliations

Guidelines for Large-Scale Sequence-Based Complex Trait Association Studies: Lessons Learned from the NHLBI Exome Sequencing Project

Authors

Affiliations

Abstract

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous