Genotyping Array Design and Data Quality Control in the Million Veteran Program
- PMID: 32243820
- PMCID: PMC7118558
- DOI: 10.1016/j.ajhg.2020.03.004
Genotyping Array Design and Data Quality Control in the Million Veteran Program
Abstract
The Million Veteran Program (MVP), initiated by the Department of Veterans Affairs (VA), aims to collect biosamples with consent from at least one million veterans. Presently, blood samples have been collected from over 800,000 enrolled participants. The size and diversity of the MVP cohort, as well as the availability of extensive VA electronic health records, make it a promising resource for precision medicine. MVP is conducting array-based genotyping to provide a genome-wide scan of the entire cohort, in parallel with whole-genome sequencing, methylation, and other 'omics assays. Here, we present the design and performance of the MVP 1.0 custom Axiom array, which was designed and developed as a single assay to be used across the multi-ethnic MVP cohort. A unified genetic quality-control analysis was developed and conducted on an initial tranche of 485,856 individuals, leading to a high-quality dataset of 459,777 unique individuals. 668,418 genetic markers passed quality control and showed high-quality genotypes not only on common variants but also on rare variants. We confirmed that, with non-European individuals making up nearly 30%, MVP's substantial ancestral diversity surpasses that of other large biobanks. We also demonstrated the quality of the MVP dataset by replicating established genetic associations with height in European Americans and African Americans ancestries. This current dataset has been made available to approved MVP researchers for genome-wide association studies and other downstream analyses. Further data releases will be available for analysis as recruitment at the VA continues and the cohort expands both in size and diversity.
Keywords: GWAS; Million Veteran Program; SNP array design; VA; biobank; clinical variants; genetic ancestry; genetic relatedness; genotype data; quality control.
Published by Elsevier Inc.
Conflict of interest statement
The authors declare no competing interests.
Figures




Similar articles
-
Genome-wide association study of traumatic brain injury in U.S. military veterans enrolled in the VA million veteran program.Mol Psychiatry. 2024 Jan;29(1):97-111. doi: 10.1038/s41380-023-02304-8. Epub 2023 Oct 24. Mol Psychiatry. 2024. PMID: 37875548
-
Million Veteran Program: A mega-biobank to study genetic influences on health and disease.J Clin Epidemiol. 2016 Feb;70:214-23. doi: 10.1016/j.jclinepi.2015.09.016. Epub 2015 Oct 9. J Clin Epidemiol. 2016. PMID: 26441289
-
Genome-wide association study of obstructive sleep apnoea in the Million Veteran Program uncovers genetic heterogeneity by sex.EBioMedicine. 2023 Apr;90:104536. doi: 10.1016/j.ebiom.2023.104536. Epub 2023 Mar 28. EBioMedicine. 2023. PMID: 36989840 Free PMC article.
-
A landmark federal interagency collaboration to promote data science in health care: Million Veteran Program-Computational Health Analytics for Medical Precision to Improve Outcomes Now.JAMIA Open. 2024 Nov 6;7(4):ooae126. doi: 10.1093/jamiaopen/ooae126. eCollection 2024 Dec. JAMIA Open. 2024. PMID: 39507405 Free PMC article. Review.
-
Unique roles of rare variants in the genetics of complex diseases in humans.J Hum Genet. 2021 Jan;66(1):11-23. doi: 10.1038/s10038-020-00845-2. Epub 2020 Sep 18. J Hum Genet. 2021. PMID: 32948841 Free PMC article. Review.
Cited by
-
A Diabetes Genetic Risk Score Is Associated With All-Cause Dementia and Clinically Diagnosed Vascular Dementia in the Million Veteran Program.Diabetes Care. 2022 Nov 1;45(11):2544-2552. doi: 10.2337/dc22-0105. Diabetes Care. 2022. PMID: 36041056 Free PMC article.
-
Protein-truncating variant in APOL3 increases chronic kidney disease risk in epistasis with APOL1 risk alleles.JCI Insight. 2024 Oct 8;9(19):e181238. doi: 10.1172/jci.insight.181238. JCI Insight. 2024. PMID: 39163132 Free PMC article.
-
Genetic Loci Associated With COVID-19 Positivity and Hospitalization in White, Black, and Hispanic Veterans of the VA Million Veteran Program.Front Genet. 2022 Feb 3;12:777076. doi: 10.3389/fgene.2021.777076. eCollection 2021. Front Genet. 2022. PMID: 35222515 Free PMC article.
-
Pharmacogenetic allele variant frequencies: An analysis of the VA's Million Veteran Program (MVP) as a representation of the diversity in US population.PLoS One. 2023 Feb 24;18(2):e0274339. doi: 10.1371/journal.pone.0274339. eCollection 2023. PLoS One. 2023. PMID: 36827430 Free PMC article.
-
Effects of childhood and adult height on later life cardiovascular disease risk estimated through Mendelian randomization.Eur J Epidemiol. 2025 Feb;40(2):167-176. doi: 10.1007/s10654-025-01203-2. Epub 2025 Mar 19. Eur J Epidemiol. 2025. PMID: 40106116 Free PMC article.
References
-
- Gaziano J.M., Concato J., Brophy M., Fiore L., Pyarajan S., Breeling J., Whitbourne S., Deen J., Shannon C., Humphries D. Million Veteran Program: A mega-biobank to study genetic influences on health and disease. J. Clin. Epidemiol. 2016;70:214–223. - PubMed
-
- Banda Y., Kvale M.N., Hoffmann T.J., Hesselson S.E., Ranatunga D., Tang H., Sabatti C., Croen L.A., Dispensa B.P., Henderson M. Characterizing race/ethnicity and genetic ancestry for 100,000 subjects in the genetic epidemiology research on adult health and aging (GERA) cohort. Genetics. 2015;200:1285–1295. - PMC - PubMed
-
- Kvale M.N., Hesselson S., Hoffmann T.J., Cao Y., Chan D., Connell S., Croen L.A., Dispensa B.P., Eshragh J., Finn A. Genotyping informatics and quality control for 100,000 subjects in the genetic epidemiology research on adult health and aging (GERA) cohort. Genetics. 2015;200:1051–1060. - PMC - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Medical
Molecular Biology Databases
Miscellaneous