Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Apr 3;21(1):131.
doi: 10.1186/s12859-020-3462-5.

MI-MAAP: marker informativeness for multi-ancestry admixed populations

Affiliations

MI-MAAP: marker informativeness for multi-ancestry admixed populations

Siqi Chen et al. BMC Bioinformatics. .

Abstract

Background: Admixed populations arise when two or more previously isolated populations interbreed. A powerful approach to addressing the genetic complexity in admixed populations is to infer ancestry. Ancestry inference including the proportion of an individual's genome coming from each population and its ancestral origin along the chromosome of an admixed population requires the use of ancestry informative markers (AIMs) from reference ancestral populations. AIMs exhibit substantial differences in allele frequency between ancestral populations. Given the huge amount of human genetic variation data available from diverse populations, a computationally feasible and cost-effective approach is becoming increasingly important to extract or filter AIMs with the maximum information content for ancestry inference, admixture mapping, forensic applications, and detecting genomic regions that have been under recent selection.

Results: To address this gap, we present MI-MAAP, an easy-to-use web-based bioinformatics tool designed to prioritize informative markers for multi-ancestry admixed populations by utilizing feature selection methods and multiple genomics resources including 1000 Genomes Project and Human Genome Diversity Project. Specifically, this tool implements a novel allele frequency-based feature selection algorithm, Lancaster Estimator of Independence (LEI), as well as other genotype-based methods such as Principal Component Analysis (PCA), Support Vector Machine (SVM), and Random Forest (RF). We demonstrated that MI-MAAP is a useful tool in prioritizing informative markers and accurately classifying ancestral populations. LEI is an efficient feature selection strategy to retrieve ancestry informative variants with different allele frequency/selection pressure among (or between) ancestries without requiring computationally expensive individual-level genotype data.

Conclusions: MI-MAAP has a user-friendly interface which provides researchers an easy and fast way to filter and identify AIMs. MI-MAAP can be accessed at https://research.cchmc.org/mershalab/MI-MAAP/login/.

Keywords: AIMs; Aancestry informative markers; Admixed population; Admixture mapping; LEI; Lancaster estimator of Independence; MI-MAAP.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Schematic representation of MI-MAAP workflow. In the flow diagram, once the user enters the information and clicks the submit button, information is mined from specific local databases, and MI-MAAP algorithm computes all comparisons for LEI to output Ancestry Informative Markers (AIMs)
Fig. 2
Fig. 2
Geographical distribution of 26 populations in the 1000 Genomes Project. The major continental regions are represented in 1000 Genomes Project. Source: The International Genome Sample Resource (IGSR) and 1000 Genomes Project https://www.internationalgenome.org)
Fig. 3
Fig. 3
MI-MAAP web interface. The architectural design consists of six layers where a user selects populations, markers and filtering criteria to generate AIMs. The web interface allows user to either display the information on the browser or to download to a local hard drive
Fig. 4
Fig. 4
SNPs output display from chromosome 22 for population CEU, CHB and YRI. We have identified a set SNPs on chromosome 22 which we want to investigate the ancestry informativeness of the SNPs among three populations CEU, CHB and YRI. Using MI-MAAP, we compute LEI measure for the desired set of SNPs. The SNP rs2294368 has the highest LEI value (0.78) as it exhibits relatively large allele frequency differences among the three populations (CEU: 0.79, CHB: 0.17, and YRI: 0.99), which suggests that it is the most informative SNP among the SNPs under consideration to be ancestry informative
Fig. 5
Fig. 5
SNP attributes from chromosome 22 for population CEU, CHB and YRI populations: (a) SNP/Gene Information, (b) Gene Expression Information and (c) Ortholog Information
Fig. 6
Fig. 6
Scatterplots of principal components axis of PC1 and PC2. (a) CEU, CHB and YRI populations: the two-dimensional PCA plot reveals distinct separation of CEU, CHB and YRI racial ancestry populations. The first PC contributes 73.5% of the total variation, which clearly distinguishes between Africans from Yoruba and non-Africans samples from Han Chinese. The second PC, contributing 6.8% of the total variation, distinguishes between Europeans and Han Chinese; (b) ASW, CEU and YRI populations: CEU and YRI samples form relatively dense clusters, whereas ASW has a lower density and the sample variance is large. Most of the ASW samples are much closer to YRI than CEU and CEU is separated from the other two populations along the PC1 axis. The first PC explains 64.5% of the total variance and the second PC explains 6.4% of the total variance

Similar articles

Cited by

References

    1. Mersha TB. Mapping asthma-associated variants in admixed populations. Front Genet. 2015;6:292. doi: 10.3389/fgene.2015.00292. - DOI - PMC - PubMed
    1. Baye TM, Wilke RA, Olivier M. Genomic and geographic distribution of private SNPs and pathways in human populations. Per Med. 2009;6(6):623–641. doi: 10.2217/pme.09.54. - DOI - PMC - PubMed
    1. International HapMap C A haplotype map of the human genome. Nature. 2005;437(7063):1299–1320. doi: 10.1038/nature04226. - DOI - PMC - PubMed
    1. Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, O'Donnell-Luria AH, Ware JS, Hill AJ, Cummings BB, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536(7616):285–291. doi: 10.1038/nature19057. - DOI - PMC - PubMed
    1. Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S, Cann HM, Barsh GS, Feldman M, Cavalli-Sforza LL, et al. Worldwide human relationships inferred from genome-wide patterns of variation. Science. 2008;319(5866):1100–1104. doi: 10.1126/science.1153717. - DOI - PubMed

Substances

LinkOut - more resources