Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011;6(8):e23473.
doi: 10.1371/journal.pone.0023473. Epub 2011 Aug 17.

Efficient replication of over 180 genetic associations with self-reported medical data

Affiliations

Efficient replication of over 180 genetic associations with self-reported medical data

Joyce Y Tung et al. PLoS One. 2011.

Abstract

While the cost and speed of generating genomic data have come down dramatically in recent years, the slow pace of collecting medical data for large cohorts continues to hamper genetic research. Here we evaluate a novel online framework for obtaining large amounts of medical information from a recontactable cohort by assessing our ability to replicate genetic associations using these data. Using web-based questionnaires, we gathered self-reported data on 50 medical phenotypes from a generally unselected cohort of over 20,000 genotyped individuals. Of a list of genetic associations curated by NHGRI, we successfully replicated about 75% of the associations that we expected to (based on the number of cases in our cohort and reported odds ratios, and excluding a set of associations with contradictory published evidence). Altogether we replicated over 180 previously reported associations, including many for type 2 diabetes, prostate cancer, cholesterol levels, and multiple sclerosis. We found significant variation across categories of conditions in the percentage of expected associations that we were able to replicate, which may reflect systematic inflation of the effects in some initial reports, or differences across diseases in the likelihood of misdiagnosis or misreport. We also demonstrated that we could improve replication success by taking advantage of our recontactable cohort, offering more in-depth questions to refine self-reported diagnoses. Our data suggest that online collection of self-reported data from a recontactable cohort may be a viable method for both broad and deep phenotyping in large populations.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: JYT, CBD, DAH, AKK, JMM, ABC, UF, BTN, JLM, AW, AND NE are employed by 23andMe and own stock options in the company. There are no patents, products in development or marketed products to declare. This does not alter the authors' adherence to all the PLoS ONE policies on sharing data and materials, as detailed online in the guide for authors. However, the authors' obligations to protect their customers' privacy (as outlined in our Terms of Service and Privacy Statement) prevent them from making their customers' individual-level data publicly available. Aggregate-level data (for example, in the form of 2×3 tables that were used for their statistics) can be made available upon request.

Figures

Figure 1
Figure 1. Replicated SNPs for binary traits.
Our log ORs and 95% confidence intervals are shown as black circles and lines. Published ORs are shown as blue Xs.
Figure 2
Figure 2. Success rate (versus total power) by disease class.
Replications = number of associations we successfully replicated. Expected = number of associations we expected to replicate. Attempts = number of associations we attempted to replicate. The blue dot represents our success ratio (number of successful replications divided by number of expected replications). The black line represents the 95% prediction interval for the success ratio. The nine associations that we had high power to detect but had known conflicting data were not included in this figure (see text and Table 1). Conditions assigned to each class (also see Methods S1): Asthma: childhood asthma; Autoimmune: Crohn's disease, inflammatory bowel disease, lupus, multiple sclerosis, psoriasis, type 1 diabetes, ulcerative colitis; Cancer: basal cell carcinoma, bladder cancer, breast cancer, colorectal cancer, prostate cancer, lung cancer, melanoma, pancreatic cancer, scleroderma, testicular cancer, thyroid cancer; Celiac: celiac disease; Diabetes: type 2 diabetes; Heart: blood clots, coronary artery disease, heart attack; Pigment/Hair: eye color, freckling, hair color, red hair color, male pattern baldness; Neuro: Alzheimer's disease, autism, Parkinson's disease; Other: chronic obstructive pulmonary disease, kidney stones, stroke, osteoarthritis; Psychiatric: alcohol abuse, bipolar disorder, schizophrenia.

References

    1. Bilder RM, Sabb FW, Cannon TD, London ED, Jentsch JD, et al. Phenomics: the systematic study of phenotypes on a genome-wide scale. Neuroscience. 2009;164:30–42. doi: 10.1016/j.neuroscience.2009.01.027. - DOI - PMC - PubMed
    1. Houle D, Govindaraju DR, Omholt S. Phenomics: the next challenge. Nat Rev Genet. 2010;11:855–866. doi: 10.1038/nrg2897. - DOI - PubMed
    1. Lee K, Sawcer S. Detecting genes in complex disease: does phenotype accuracy limit the horizon? Trends Genet. 2010;26:241–242; author reply 242–243. doi: 10.1016/j.tig.2010.03.003. - DOI - PMC - PubMed
    1. Okura Y, Urban LH, Mahoney DW, Jacobsen SJ, Rodeheffer RJ. Agreement between self-report questionnaires and medical record data was substantial for diabetes, hypertension, myocardial infarction and stroke but not for heart failure. J Clin Epidemiol. 2004;57:1096–1103. doi: 10.1016/j.jclinepi.2004.04.005. - DOI - PubMed
    1. Smith B, Chu LK, Smith TC, Amoroso PJ, Boyko EJ, et al. Challenges of self-reported medical conditions and electronic medical records among members of a large military cohort. BMC Med Res Methodol. 2008;8:37. doi: 10.1186/1471-2288-8-37. - DOI - PMC - PubMed