Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 9;8(1):584.
doi: 10.1038/s42003-025-08020-z.

Moroccan genome project: genomic insight into a North African population

Affiliations

Moroccan genome project: genomic insight into a North African population

Elmostafa El Fahime et al. Commun Biol. .

Abstract

Africa's 1.5 billion people are underrepresented in genomic databases. The African Genome Variation Project exclusively focuses on Sub-Saharan populations, making Morocco, located in North Africa, a valuable site for studying genetic diversity. Understanding genetic variation and customized therapy requires population-specific reference genomes. This study presents Phase 1 results from the Moroccan Genome Project (MGP), which sequenced 109 Moroccan genomes. We report over 27 million variants, including 1.4 million novel ones, of which 15,378 are highly prevalent in the Moroccan population. Furthermore, we propose a Moroccan Major Allele Reference Genome (MMARG), generated using high-coverage consensus sequences from the 109 whole genomes. This MMARG represents more accurately the Moroccan genetic variation than GRCh38. This baseline study also generates an informative genetic variation database that supports regional population-specific initiatives and precision medicine in Morocco and North Africa. The results stress the necessity of population-relevant data in Human genetic research.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests. Inclusion and ethics statement: All collaborators of this study who met the authorship criteria set by Nature Portfolio journals have been included as authors, as their contributions were vital to the design and implementation of the research.

Figures

Fig. 1
Fig. 1. Histogram depicting allele frequency (AF) distribution for all filtered variants across 109 Moroccan samples.
The histogram shows the count of variants per 5% AF interval. The most prominent peak corresponds to variants with rare alternate AFs below 5%, whereas rare or unobserved reference variants have 100% alternate AFs, indicating the less common variants.
Fig. 2
Fig. 2. Circular plot showing the spatial distribution of variant counts in 2 Mbp windows and pathogenic variants across the exome.
From outer to inner rings: blue represents SNV distribution, green shows deletions, orange indicates insertions, and red depicts complex variants (scales adjusted for visibility). The innermost ring displays pathogenic variants and their frequency in the Moroccan population.
Fig. 3
Fig. 3. Genetic structure of the Moroccan population.
a Principal component analysis (PCA) was conducted using data from 3,586 individuals representing various populations worldwide. The points are color-coded according to the superpopulations. b ADMIXTURE results at K = 19 with a zoom on the Moroccan population, showing four major ancestral components. c. Heat map of pairwise Fst values between Moroccan genomes and various populations. The shown values correspond to the Fst multiplied by 1000. d. Box plot of the total length of Runs Of Homozygosity (ROHs) for Moroccans compared with other populations. Colors indicate superpopulations. The number of individuals per population is shown in brackets. Box plots indicate the median and lower/upper quartiles; whiskers represent the most extreme data points, not exceeding 1.5 times the interquartile range; and outliers are data points that fall outside the whiskers. Additionally, P-values comparing the mean total lengths of ROH have been estimated using ggpubr. The choice of populations for calculating Fst and ROHs was based on their proximity to the Moroccan population, based on the PCA and ADMIXTURE results. The results of the PCA, Fst, and ROH analyses were visualized using R. The ADMIXTURE results were visualized with Pong v 1.5.
Fig. 4
Fig. 4. Mitochondrial haplogroup distribution and frequency.
a Total haplogroups frequency for the Moroccan population. b DNA D-loop Haplotype Network: Median-Joining Network Comparing 109 Moroccans with African, European, and American Populations. Green circle indicates Moroccan haplogroups.
Fig. 5
Fig. 5. Y-chromosome haplogroup distribution in 109 Moroccan males.
The bar chart depicts the frequencies of Y-chromosome haplogroups found in a sample of 109 Moroccans. E1b1b1 is the most common (36.6%), followed by F (19.5%) and G2 (17.1%). Less frequent haplogroups include E1b1, R1, E1, R1b1, and K. Colored bars represent each haplogroup, with corresponding percentages indicated.

Similar articles

References

    1. Ellegren, H. & Galtier, N. Determinants of genetic diversity. Nat. Rev. Genet.17, 422–433 (2016). - PubMed
    1. Campbell, M. C. & Tishkoff, S. A. African genetic diversity: implications for human demographic history, modern human origins, and complex disease mapping. Annu. Rev. Genomics Hum. Genet.9, 403–433 (2008). - PMC - PubMed
    1. Gaibar, M. et al. Usefulness of autosomal STR polymorphisms beyond forensic purposes: data on Arabic- and Berber-speaking populations from central Morocco. Ann. Hum. Biol.39, 297–304 (2012). - PubMed
    1. Schneider, V. A. et al. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res.27, 849–864 (2017). - PMC - PubMed
    1. Bustamante, C. D., De La Vega, F. M. & Burchard, E. G. Genomics for the world. Nature475, 163–165 (2011). - PMC - PubMed

LinkOut - more resources