Second-generation PLINK: rising to the challenge of larger and richer datasets
- PMID: 25722852
- PMCID: PMC4342193
- DOI: 10.1186/s13742-015-0047-8
Second-generation PLINK: rising to the challenge of larger and richer datasets
Abstract
Background: PLINK 1 is a widely used open-source C/C++ toolset for genome-wide association studies (GWAS) and research in population genetics. However, the steady accumulation of data from imputation and whole-genome sequencing studies has exposed a strong need for faster and scalable implementations of key functions, such as logistic regression, linkage disequilibrium estimation, and genomic distance evaluation. In addition, GWAS and population-genetic data now frequently contain genotype likelihoods, phase information, and/or multiallelic variants, none of which can be represented by PLINK 1's primary data format.
Findings: To address these issues, we are developing a second-generation codebase for PLINK. The first major release from this codebase, PLINK 1.9, introduces extensive use of bit-level parallelism, [Formula: see text]-time/constant-space Hardy-Weinberg equilibrium and Fisher's exact tests, and many other algorithmic improvements. In combination, these changes accelerate most operations by 1-4 orders of magnitude, and allow the program to handle datasets too large to fit in RAM. We have also developed an extension to the data format which adds low-overhead support for genotype likelihoods, phase, multiallelic variants, and reference vs. alternate alleles, which is the basis of our planned second release (PLINK 2.0).
Conclusions: The second-generation versions of PLINK will offer dramatic improvements in performance and compatibility. For the first time, users without access to high-end computing resources can perform several essential analyses of the feature-rich and very large genetic datasets coming into use.
Keywords: Computational statistics; GWAS; High-density SNP genotyping; Population genetics; Whole-genome sequencing.
Figures



Similar articles
-
coPLINK: A complementary tool to PLINK.PLoS One. 2020 Sep 18;15(9):e0239144. doi: 10.1371/journal.pone.0239144. eCollection 2020. PLoS One. 2020. PMID: 32946477 Free PMC article.
-
Scalable linkage-disequilibrium-based selective sweep detection: a performance guide.Gigascience. 2016 Feb 8;5:7. doi: 10.1186/s13742-016-0114-9. eCollection 2016. Gigascience. 2016. PMID: 26862394 Free PMC article.
-
Stepwise Distributed Open Innovation Contests for Software Development: Acceleration of Genome-Wide Association Analysis.Gigascience. 2017 May 1;6(5):1-10. doi: 10.1093/gigascience/gix009. Gigascience. 2017. PMID: 28327993 Free PMC article.
-
SNPrune: an efficient algorithm to prune large SNP array and sequence datasets based on high linkage disequilibrium.Genet Sel Evol. 2018 Jun 26;50(1):34. doi: 10.1186/s12711-018-0404-z. Genet Sel Evol. 2018. PMID: 29940846 Free PMC article.
-
Genotype Imputation in Genome-Wide Association Studies.Curr Protoc Hum Genet. 2019 Jun;102(1):e84. doi: 10.1002/cphg.84. Curr Protoc Hum Genet. 2019. PMID: 31216114 Review.
Cited by
-
Genome-wide association study revealed some new candidate genes associated with flowering and maturity time of soybean in Central and West Siberian regions of Russia.Front Plant Sci. 2024 Oct 11;15:1463121. doi: 10.3389/fpls.2024.1463121. eCollection 2024. Front Plant Sci. 2024. PMID: 39464279 Free PMC article.
-
Autosomal recessive loci contribute significantly to quantitative variation of male fertility in a dairy cattle population.BMC Genomics. 2021 Mar 30;22(1):225. doi: 10.1186/s12864-021-07523-3. BMC Genomics. 2021. PMID: 33784962 Free PMC article.
-
Genetic variants for prediction of gestational diabetes mellitus and modulation of susceptibility by a nutritional intervention based on a Mediterranean diet.Front Endocrinol (Lausanne). 2022 Oct 13;13:1036088. doi: 10.3389/fendo.2022.1036088. eCollection 2022. Front Endocrinol (Lausanne). 2022. PMID: 36313769 Free PMC article. Clinical Trial.
-
Unique genetic and risk-factor profiles in clusters of major depressive disorder-related multimorbidity trajectories.Nat Commun. 2024 Aug 21;15(1):7190. doi: 10.1038/s41467-024-51467-7. Nat Commun. 2024. PMID: 39168988 Free PMC article.
-
Predictive ability of multi-population genomic prediction methods of phenotypes for reproduction traits in Chinese and Austrian pigs.Genet Sel Evol. 2024 Jun 26;56(1):49. doi: 10.1186/s12711-024-00915-5. Genet Sel Evol. 2024. PMID: 38926647 Free PMC article.
References
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources