Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Sep;15(5):1031-45.
doi: 10.1111/1755-0998.12369. Epub 2015 Jan 21.

Linkage disequilibrium network analysis (LDna) gives a global view of chromosomal inversions, local adaptation and geographic structure

Affiliations

Linkage disequilibrium network analysis (LDna) gives a global view of chromosomal inversions, local adaptation and geographic structure

Petri Kemppainen et al. Mol Ecol Resour. 2015 Sep.

Abstract

Recent advances in sequencing allow population-genomic data to be generated for virtually any species. However, approaches to analyse such data lag behind the ability to generate it, particularly in nonmodel species. Linkage disequilibrium (LD, the nonrandom association of alleles from different loci) is a highly sensitive indicator of many evolutionary phenomena including chromosomal inversions, local adaptation and geographical structure. Here, we present linkage disequilibrium network analysis (LDna), which accesses information on LD shared between multiple loci genomewide. In LD networks, vertices represent loci, and connections between vertices represent the LD between them. We analysed such networks in two test cases: a new restriction-site-associated DNA sequence (RAD-seq) data set for Anopheles baimaii, a Southeast Asian malaria vector; and a well-characterized single nucleotide polymorphism (SNP) data set from 21 three-spined stickleback individuals. In each case, we readily identified five distinct LD network clusters (single-outlier clusters, SOCs), each comprising many loci connected by high LD. In A. baimaii, further population-genetic analyses supported the inference that each SOC corresponds to a large inversion, consistent with previous cytological studies. For sticklebacks, we inferred that each SOC was associated with a distinct evolutionary phenomenon: two chromosomal inversions, local adaptation, population-demographic history and geographic structure. LDna is thus a useful exploratory tool, able to give a global overview of LD associated with diverse evolutionary phenomena and identify loci potentially involved. LDna does not require a linkage map or reference genome, so it is applicable to any population-genomic data set, making it especially valuable for nonmodel species.

Keywords: Anopheles dirus; Anopheles gambiae; chromosomal rearrangement; graph theory; landscape genomics; r package.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Outline of linkage disequilibrium network analysis (LDna). (A) Starting from a pairwise matrix of LD values between loci, LDna partitions all loci into clusters comprising vertices (loci) connected by edges that represent LD values above given thresholds. (B) The order in which clusters merge with decreasing threshold can be visualized as a tree where only one connection between clusters is required for clusters to be considered as merged. For each cluster in the tree, the change in median LD of all pairwise connections between loci in a cluster at merger is measured by λ (see Materials and methods). (C) All lambda values plotted in order of increasing value (Index). Clusters with exceptionally high values of λ relative to the median across all the values in a tree (above the, user-controlled, dashed line) are considered as outliers. In (B) and (C), red colour highlights clusters that do not have any other outlier clusters nested within them (single-outlier clusters, SOCs), and blue highlights the outlier cluster that contains multiple SOCs (compound outlier cluster, COC).
Fig. 2
Fig. 2
LDna on data simulated for a subdivided population. (A) Outline of modelled scenario where an ancestral population splits into three populations followed by 1000 generations of independent evolution. (B) Resulting LDna network showing clusters formed above an LD threshold of 0.8 (C) Tree showing LD clusters across LD thresholds (comparable to Fig.1B). LDna identified three SOCs, highlighted in red, at the parameter values shown. (D) PCAs for each of the three SOCs identified in (C). The amount of variation explained is indicated on each axis.
Fig. 3
Fig. 3
LDna of Anopheles baimaii RAD sequence data set. (A) A clustering tree (cf. Figs1B and 2C) of all pairwise r2 values from 3828 SNPs derived from a landscape genomics RAD sequence data set from A. baimaii. Branches corresponding to SOCs and COCs are indicated in red and blue, respectively, throughout the figure. (B) All λ values in increasing order with values above λlim corresponding to outlier clusters. Parameter values for φ and |E|min are shown above plots (A) and (B). See Fig.6 and Appendix S1 and S2, Supporting Information, for details of parameter value selection. (C) A snapshot of a full network at an LD threshold value just above that at which any of the five SOCs merge. (D) Each SOC is shown at an LD threshold where it is joined by a single link to other loci, in decreasing order of threshold from left to right, top to bottom. For each of these mergers, we have indicated, in brackets after the COC name, which SOCs are nested within each COC. COCs are shown here but were not analysed further.
Fig. 4
Fig. 4
Population-genetic analyses of LD clusters from Anopheles baimaii. (A) For the non-SOC loci and each set of SOC loci, individuals were separated into genetically distinct groups (see main text and Fig. S2 for details) and coloured according to these groups. In (A), the separation of these groups (numbered 1–4) is visualized along the first two PCA axes, with per cent variation explained indicated on the axes. (B) The distribution of FIS values for loci from each group indicated in (A).
Fig. 5
Fig. 5
Mapping of SOC loci. For each linkage group, the linkage map (from Anopheles baimaii) is shown to the left and genomic scaffolds (from A. dirus) to which SOC loci map are shown to the right. Accession numbers are given above each scaffold. Horizontal bars indicate the positions of loci, coloured according to the figure key. Length of the bars indicates the median of all intracluster r2 values. The asterisk indicates one locus from SOC 1128_0.27 that mapped far from all other loci from this SOC. Two scaffolds for SOC 739_0.48 (top left corner) could not be anchored to the linkage maps.
Fig. 6
Fig. 6
The effects of parameter choice on LDna. The two user-defined input parameters for LDna are φ, which controls when clusters are defined as outliers, and |E|min, the minimum number of edges required for a cluster to be considered as an outlier. (A) We used the results from the original LDna analyses (that identified five SOCs associated with inversions) as a reference point ①. With respect to this reference, we assessed how many of the SOCs were not identified (losses), and how many additional SOCs were identified (gains) by LDna. White indicates parameter space where results exactly matched the reference. In addition to the reference (Tree ①), (B) shows five examples of LDna results (Trees ②–⑥) at different combinations of φ and |E|min as indicated above the trees and in (A).
Fig. 7
Fig. 7
LDna on population-genomic data from the three-spined stickleback. (A) A clustering tree of pairwise LD values among 5962 SNPs from combined freshwater and marine ecotypes from the Atlantic and Pacific oceans. The data set includes only SNPs from the three chromosomes (I, XI and XXI) that contain known inversions. Clusters identified as SOCs by LDna (at the parameter values indicated in the figure) are also shown with likely evolutionary cause indicated (see main text and Fig.8 for details). (B) A full network for LD threshold = 0.95. Each locus is coloured according the chromosome to which it belongs: green, red and blue for I, XI and XXI, respectively. All large clusters (|E| > 10 at a threshold of 0.95) with loci from more than one chromosome are nested within SOC 495_0.82.
Fig. 8
Fig. 8
Population-genetic analyses of stickleback SOCs. (A) Population structuring of SOC loci (as identified in Fig.7A) based on the first two components from PCA. Each circle represents an individual, coloured blue for Freshwater or red for Marine environment. Open and filled circles represent Pacific or Atlantic origin, respectively. Per cent variation explained by each component is indicated along the axes. (B) Bars show the positions of SOC loci on each of the chromosomes: I, XI and XXI. Each column has loci from one SOC as labelled in part (A) above. Bar height shows the median of all intra-SOC LD values for a given locus. Green regions indicate the position of inversions on each chromosome.

Similar articles

Cited by

References

    1. Andrew RL, Bernatchez L, Bonin A, et al. A road map for molecular ecology. Molecular Ecology. 2013;22:2605–2626. - PubMed
    1. Ardlie KG, Kruglyak L, Seielstad M. Patterns of linkage disequilibrium in the human genome. Nature Reviews Genetics. 2002;3:299–309. - PubMed
    1. Baimai V, Poopittayasataporn A, Kijchalao U. Cytological differences and chromosomal rearrangements in four members of the Anopheles dirus complex (Diptera: Culicidae) Genome. 1988a;30:372–379. - PubMed
    1. Baimai V, Thu MM, Paing M. Distribution and chromosomal polymorphism of the malaria vector Anopheles dirus species D. The Southeast Asian Journal of Tropical Medicine and Public Health. 1988b;19:661–665. - PubMed
    1. Barton NH. Estimating linkage disequilibria. Heredity. 2011;106:205–206. - PMC - PubMed

Publication types