. 2025 Aug 28;57(1):46.

doi: 10.1186/s12711-025-00994-y.

randPedPCA: rapid approximation of principal components from large pedigrees

Hanbin Lee¹, Rosalind Françoise Craddock², Gregor Gorjanc², Hannes Becher³

Affiliations

¹ Department of Statistics, University of Michigan, Ann Arbor, MI, 48109, USA.
² The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian, EH25 9RG, UK.
³ The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian, EH25 9RG, UK. h.becher@ed.ac.uk.

PMID: 40877802
PMCID: PMC12392600
DOI: 10.1186/s12711-025-00994-y

randPedPCA: rapid approximation of principal components from large pedigrees

Hanbin Lee et al. Genet Sel Evol. 2025.

. 2025 Aug 28;57(1):46.

doi: 10.1186/s12711-025-00994-y.

Authors

Hanbin Lee¹, Rosalind Françoise Craddock², Gregor Gorjanc², Hannes Becher³

Affiliations

¹ Department of Statistics, University of Michigan, Ann Arbor, MI, 48109, USA.
² The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian, EH25 9RG, UK.
³ The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian, EH25 9RG, UK. h.becher@ed.ac.uk.

PMID: 40877802
PMCID: PMC12392600
DOI: 10.1186/s12711-025-00994-y

Abstract

Background: Pedigrees continue to be extremely important in agriculture and conservation genetics, with the pedigrees of modern breeding programmes easily comprising millions of records. This size can make visualising the structure of such pedigrees challenging. Being graphs, pedigrees can be represented as matrices, including, most commonly, the additive (numerator) relationship matrix, $A$ , and its inverse. With these matrices, the structure of pedigrees can then, in principle, be visualised via principal component analysis (PCA). However, the naive PCA of matrices for large pedigrees is challenging due to computational and memory constraints. Furthermore, computing a few leading principal components is usually sufficient for visualising the structure of a pedigree.

Results: We present the open-access R package randPedPCA for rapid pedigree PCA using sparse matrices. Our rapid pedigree PCA builds on the fact that matrix-vector multiplications with the additive relationship matrix can be carried out implicitly using the extremely sparse inverse relationship factor, $L^{- 1}$ , which can be directly obtained from a given pedigree. We implemented two methods. Randomised singular value decomposition tends to be faster when very few principal components are requested, and Eigen decomposition via the RSpectra library tends to be faster when more principal components are of interest. On simulated data, our package delivers a speed-up greater than 10,000 times compared to naive PCA. It further enables analyses that are impossible with naive PCA. When only two principal components are desired, the randomised PCA method can half the running time required compared to RSpectra, which we demonstrate by analysing the pedigree of the UK Kennel Club registered Labrador Retriever population of almost 1.5 million individuals.

Conclusions: The leading principal components of pedigree matrices can be efficiently obtained using randomised singular value decomposition and other methods. Scatter plots of these scores allow for intuitive visualisation of large pedigrees. For large pedigrees, this is considerably faster than rendering plots of a pedigree graph.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: Does not apply. Consent for publication: Does not apply. Competing interests: The authors declare that they have no competing interests.

Figures

**Algorithm 1**
Efficiently multiplying the additive relationship matrix $A$ with a vector $x$ via the Cholesky factor, $L^{- 1}$ , of the precision matrix

**Algorithm 2**
Efficiently multiplying the ‘centred’ additive relationship matrix $\tilde{A}$ with a vector $x$ via the Cholesky factor, $L^{- 1}$ , of the precision matrix

**Algorithm 3**
Approximate PCA of the additive relationship matrix $A$ (possibly ‘centred’) via the randomised SVD of the precision matrix’s Cholesky factor $L^{- 1}$

**Fig. 1**
Scatter plots of the first two principal components computed from the pedigree and SNP genotypes of the 2pop scenario with centring (top row) or without centring (bottom row). The plots on the left were generated with randPedPCA and thus show approximate scores and percentage of captured variance. We ran the standard PCA on all SNP markers. The legend applies to all panels.

**Fig. 2**
Scatter plots of the first two principal components computed from the pedigree and SNP genotypes of the 4pop scenario with centring (top row) or without centring (bottom row). The plots on the left were generated with randPedPCA and thus show approximate scores and percentage of captured variance. We ran the standard PCA on all genotype markers. The legend applies to all panels

**Fig. 3**
3D scatter plots of the first three principle components for the UK Kennel Club’s Labrador Pedigree from 1955 to 2024. The left-hand plot highlights coat colour, while the right-hand plot highlights generation. Projections onto coordinate planes are provided with the same colour used in the main plots.

See this image and copyright information in PMC

References

1. Garbe JR, Da Y. A software tool for the graphical visualization of large and complex populations. Acta Genet Sin. 2003;30(12):1193–5. - PubMed
1. Kepner J, Gilbert J. Graph algorithms in the language of linear algebra. Philadelphia: Society for Industrial and Applied Mathematics; 2011. p. 389. 10.5555/2039367.
1. Lauritzen SL. Graphical models. Oxford statistical science series. Clarendon Press; 1996. p. 308.
1. Pearson K. On lines and planes of closest fit to systems of points in space. London Edinburgh Dublin Philos Mag J Sci. 1901;2(11):559–72. 10.1080/14786440109462720. - DOI
1. Hotelling H. Analysis of a complex of statistical variables into principal components. J Educ Psychol. 1933;24(6):417–41. 10.1037/h0071325. - DOI

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- BioMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

randPedPCA: rapid approximation of principal components from large pedigrees

Affiliations

randPedPCA: rapid approximation of principal components from large pedigrees

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources