Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar 10;23(2):bbac043.
doi: 10.1093/bib/bbac043.

Fully exploiting SNP arrays: a systematic review on the tools to extract underlying genomic structure

Affiliations

Fully exploiting SNP arrays: a systematic review on the tools to extract underlying genomic structure

Laura Balagué-Dobón et al. Brief Bioinform. .

Abstract

Single nucleotide polymorphisms (SNPs) are the most abundant type of genomic variation and the most accessible to genotype in large cohorts. However, they individually explain a small proportion of phenotypic differences between individuals. Ancestry, collective SNP effects, structural variants, somatic mutations or even differences in historic recombination can potentially explain a high percentage of genomic divergence. These genetic differences can be infrequent or laborious to characterize; however, many of them leave distinctive marks on the SNPs across the genome allowing their study in large population samples. Consequently, several methods have been developed over the last decade to detect and analyze different genomic structures using SNP arrays, to complement genome-wide association studies and determine the contribution of these structures to explain the phenotypic differences between individuals. We present an up-to-date collection of available bioinformatics tools that can be used to extract relevant genomic information from SNP array data including population structure and ancestry; polygenic risk scores; identity-by-descent fragments; linkage disequilibrium; heritability and structural variants such as inversions, copy number variants, genetic mosaicisms and recombination histories. From a systematic review of recently published applications of the methods, we describe the main characteristics of R packages, command-line tools and desktop applications, both free and commercial, to help make the most of a large amount of publicly available SNP data.

Keywords: GWAS; SNP arrays; bioinformatic methods; genomic structures; software; structural variants.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Chromosomic inversions appear when two breaks occur in the same chromosome and the cleaved fragment rotates before re-joining. They can be found in heterozygosis (center) or homozygosis (right). One of the methods for inversion detection is the clustering detection performed by invClust, which classifies the inversion genotypes into clusters of similar haplotype origin.
Figure 2
Figure 2
Representation of a CNV region in a normal state, gain of genetic material, loss of genetic material and CNN LOH.
Figure 3
Figure 3
Changes in the BAF and LRR within CNVs of different types. (Orange) Normal state where the BAF (a measure of heterozygosity) is on average 0 or 1 for homozygous probes and 0.5 for heterozygous probes and the LLR (a normalized measure of DNA content) is on average 0. (Blue) CN gain is represented by a split of the BAF signal at 1/3 and 2/3 and a gain in LRR. (Red) CN loss is represented by a loss BAF for heterozygous probes (0.5) and a loss in LRR signal. (Green) Loss of heterozygosis by CNV is represented by a loss BAF for heterozygous probes and no change in the LRR signal.
Figure 4
Figure 4
Changes in the BAF and LRR depending on the type of mosaicism. (Top) A mosaic CN gain is represented by a split of the BAF signal between in values between 1/3 and 2/3 and a gain in LRR. (Middle) A mosaic CN loss is represented by a BAF split between 0 and 1 and a loss of LRR. (Bottom) A mosaic loss of heterozygosity is represented by a BAF split between 0 and 1 and a normal LRR.

Similar articles

Cited by

References

    1. Alkan C, Coe BP, Eichler EE. Genome structural variation discovery and genotyping. Nat Rev Genet 2011;12:363–76. - PMC - PubMed
    1. Wang DG, Fan JB, Siao CJ, et al. Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science (80-) 1998;280:1077–82. - PubMed
    1. Peters A, Nawrot TS, Baccarelli AA. Hallmarks of environmental insults. Cell 2021;184:1455–68. - PMC - PubMed
    1. Samuels DC, Below JE, Ness S, et al. Alternative applications of genotyping array data using multivariant methods. Trends Genet 2020;36:857–67. - PMC - PubMed
    1. Mielczarek M, Szyda J. Review of alignment and SNP calling algorithms for next-generation sequencing data. J Appl Genet 2016;57:71–9. - PubMed

Publication types