Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Dec 5;12(12):2030.
doi: 10.3390/life12122030.

A Pipeline for Phasing and Genotype Imputation on Mixed Human Data (Parents-Offspring Trios and Unrelated Subjects) by Reviewing Current Methods and Software

Affiliations
Review

A Pipeline for Phasing and Genotype Imputation on Mixed Human Data (Parents-Offspring Trios and Unrelated Subjects) by Reviewing Current Methods and Software

Giulia Nicole Baldrighi et al. Life (Basel). .

Abstract

Genotype imputation has become an essential prerequisite when performing association analysis. It is a computational technique that allows us to infer genetic markers that have not been directly genotyped, thereby increasing statistical power in subsequent association studies, which consequently has a crucial impact on the identification of causal variants. Many features need to be considered when choosing the proper algorithm for imputation, including the target sample on which it is performed, i.e., related individuals, unrelated individuals, or both. Problems could arise when dealing with a target sample made up of mixed data, composed of both related and unrelated individuals, especially since the scientific literature on this topic is not sufficiently clear. To shed light on this issue, we examined existing algorithms and software for performing phasing and imputation on mixed human data from SNP arrays, specifically when related subjects belong to trios. By discussing the advantages and limitations of the current algorithms, we identified LD-based methods as being the most suitable for reconstruction of haplotypes in this specific context, and we proposed a feasible pipeline that can be used for imputing genotypes in both phased and unphased human data.

Keywords: LD-based method; SNPs; imputation pipeline; mixed data; trios; unrelated subjects.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Workflow followed for the critical review. The critical review steps were grouped into four areas: identification of the problem, screening of related literature, eligibility of scientific works, and summary of the qualitative data according to the criteria declared in the Material and Methods section.
Figure 2
Figure 2
Qualitative synthesis. In this figure, the chosen 7 [15,25,52,55,58,61,65] out of 15 scientific works on which we established the qualitative synthesis for our pipeline are represented, along with highlights of their utility in the choice of pipeline steps.
Figure 3
Figure 3
Pipelines for genotype imputation on mixed type data. The key steps of the pipeline are summarized in this scheme: pre-processing steps (top left region enclosed in the purpose square), phasing and imputation (blue arrow) for phased data or, alternatively, direct imputation without phasing (red arrow) for unphased data. Only essential commands and options are reported. A detailed list of all the functions and options is available on the respective software websites.

References

    1. Marchini J., Howie B. Genotype Imputation for Genome-Wide Association Studies. Nat. Rev. Genet. 2010;11:499–511. doi: 10.1038/nrg2796. - DOI - PubMed
    1. Daya M., der Merwe L., Galal U., Möller M., Salie M., Chimusa E.R., Galanter J.M., van Helden P.D., Henn B.M., Gignoux C.R., et al. A Panel of Ancestry Informative Markers for the Complex Five-Way Admixed South African Coloured Population. PLoS ONE. 2013;8:e82224. doi: 10.1371/journal.pone.0082224. - DOI - PMC - PubMed
    1. Ha N.T., Freytag S., Bickeboeller H. Coverage and Efficiency in Current SNP Chips. Eur. J. Hum. Genet. 2014;22:1124–1130. doi: 10.1038/ejhg.2013.304. - DOI - PMC - PubMed
    1. Howie B., Marchini J., Stephens M. Genotype Imputation with Thousands of Genomes. G3 Genes Genomes Genet. 2011;1:457–470. doi: 10.1534/g3.111.001198. - DOI - PMC - PubMed
    1. Yu K., Das S., LeFaive J., Kwong A., Pleiness J., Forer L., Schönherr S., Fuchsberger C., Smith A.V., Abecasis G.R. Meta-Imputation: An Efficient Method to Combine Genotype Data after Imputation with Multiple Reference Panels. Am. J. Hum. Genet. 2022;109:1007–1015. doi: 10.1016/j.ajhg.2022.04.002. - DOI - PMC - PubMed