Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 May 16;12(5):e0177638.
doi: 10.1371/journal.pone.0177638. eCollection 2017.

Identification of key contributors in complex population structures

Affiliations

Identification of key contributors in complex population structures

Markus Neuditschko et al. PLoS One. .

Abstract

Evaluating the genetic contribution of individuals to population structure is essential to select informative individuals for genome sequencing, genotype imputation and to ascertain complex population structures. Existing methods for the selection of informative individuals for genomic imputation solely focus on the identification of key ancestors, which can lead to a loss of phasing accuracy of the reference population. Currently many methods are independently applied to investigate complex population structures. Based on the Eigenvalue Decomposition (EVD) of a genomic relationship matrix we describe a novel approach to evaluate the genetic contribution of individuals to population structure. We combined the identification of key contributors with model-based clustering and population network visualization into an integrated three-step approach, which allows identification of high-resolution population structures and substructures around such key contributors. The approach was applied and validated in four disparate datasets including a simulated population (5,100 individuals and 10,000 SNPs), a highly structured experimental sheep population (1,421 individuals and 44,693 SNPs) and two large complex pedigree populations namely horse (1,077 individuals and 38,124 SNPs) and cattle (2,457 individuals and 45,765 SNPs). In the simulated and experimental sheep dataset, our method, which is unsupervised, successfully identified all known key contributors. Applying our three-step approach to the horse and cattle populations, we observed high-resolution population substructures including the absence of obvious important key contributors. Furthermore, we show that compared to commonly applied strategies to select informative individuals for genotype imputation including the computation of marginal gene contributions (Pedig) and the optimization of genetic relatedness (Rel), the selection of key contributors provided the highest phasing accuracies within the selected reference populations. The presented approach opens new perspectives in the characterization and informed management of populations in general, and in areas such as conservation genetics and selective animal breeding in particular, where assessing the genetic contribution of influential and admixed individuals is crucial for research and management applications. As such, this method provides a valuable complement to common applied tools to visualize complex population structures and to select individuals for re-sequencing.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Workflow of the high-resolution population structure analysis.
Schematically representation of the different analyses involved in the integrated three-step procedure.
Fig 2
Fig 2. Identification of key contributors within the four populations.
Proportion of variation corresponding to the number of significant components and genetic contribution scores (gcj) of each selected dataset (A-F). Top key contributors according to the number of significant components are indicated by red (male) and green stars (female), respectively.
Fig 3
Fig 3. Admixture of experimental sheep.
Cluster assignment assessed with Admixture at K = 2. Individuals are presented by single vertical column, whilst the length of the colored segment represents the estimated level of admixture (Awassi = brown; Merino = yellow).
Fig 4
Fig 4. High-resolution population structure of experimental sheep.
Network visualization of 1,421 sheep. Each sheep is represented by a node; with individual node size associated with gcj, whilst the different node colors represent aj between Awassi (brown) and Merino (yellow). Top seven key contributors are represented by an increased node size. The thickness of edges varies in proportion to the genetic distance to visualize individual relationships within the population. The progeny of the four different F1 sires are separated by dashed lines, whilst the different progeny cluster: backross (BC), double backcross (DBC) and intercoss (INT) are denoted by an arrow.
Fig 5
Fig 5. High-resolution population structure of horse.
Network visualization of 1,077 horses. Each horse is represented by a node; with individual node size associated with gcj, whilst the two different node colors represent aj between Swiss Franches-Montagnes (FM) (green) and Warmblood (red). Top 41 key contributors are represented by an increased node size. The thickness of edges varies in proportion to the genetic distance to visualize individual relationships within the population. The topology of the network reflects the population structure of the FM horse breed and reveals sub-structures caused by the progeny of most influential stallions. The progeny clusters of the three most influential stallions and un-genotyped sires (PUG) are indicated by a dashed circles.
Fig 6
Fig 6. High-resolution population structure of cattle.
Network visualization of 2,457 cattle. Each cattle is represented by a node; with individual node size associated with gcj, whilst the node color (dark blue) indicates top 55 key contributors. The thickness of edges varies in proportion to the genetic distance to visualize individual relationships within the population. The network structure of indicates that key contributors are well distributed within the population and highlights the existence of a substructure according to less influential bulls (dashed circle).
Fig 7
Fig 7. Overlap between informative individuals using three different selection strategies.
Venn Diagrams representing the overlap between the three different strategies (Con, Rel and Ped), when selecting top key contributors in each population.

References

    1. Loman NJ, Misra RV, Dallman TJ, Constantinidou C, Gharbia SE, Wain J, et al. Performance comparison of benchtop high-throughput sequencing platforms. Nat Biotech. 2012;30(5):434–9. - PubMed
    1. Fan J-B, Oliphant A, Shen R, Kermani BG, Garcia F, Gunderson KL, et al. Highly Parallel SNP Genotyping. Cold Spring Harbor Symposia on Quantitative Biology. 2003;68:69–78. - PubMed
    1. Elsik CG, Tellam RL, Worley KC. The Genome Sequence of Taurine Cattle: A Window to Ruminant Biology and Evolution. Science. 2009;324(5926):522–8. 10.1126/science.1169588 - DOI - PMC - PubMed
    1. Archibald AL, Cockett NE, Dalrymple BP, Faraut T, Kijas JW, Maddox JF, et al. The sheep genome reference sequence: a work in progress. Animal Genetics. 2010;41(5):449–53. 10.1111/j.1365-2052.2010.02100.x - DOI - PubMed
    1. Wade CM, Giulotto E, Sigurdsson S, Zoli M, Gnerre S, Imsland F, et al. Genome Sequence, Comparative Analysis, and Population Genetics of the Domestic Horse. Science. 2009;326(5954):865–7. 10.1126/science.1178158 - DOI - PMC - PubMed

LinkOut - more resources