. 2025 Apr;9(4):652-671.

doi: 10.1038/s41559-025-02643-5. Epub 2025 Mar 28.

Adaptive genomic signatures of globally invasive populations of the yellow fever mosquito Aedes aegypti

Alejandro N Lozada-Chávez^#¹, Irma Lozada-Chávez^#², Niccolò Alfano^{3

4}, Umberto Palatini^{3

5}, Davide Sogliani³, Samia Elfekih⁶, Teshome Degefa⁷, Maria V Sharakhova⁸, Athanase Badolo⁹, Patchara Sriwichai¹⁰, Mauricio Casas-Martínez¹¹, Bianca C Carlos^{12

13}, Rebeca Carballar-Lejarazú^{3

14}, Louis Lambrechts¹⁵, Jayme A Souza-Neto^{12

16}, Mariangela Bonizzoni¹⁷

Affiliations

¹ Department of Biology and Biotechnology, University of Pavia, Pavia, Italy. nabor.lozada@gmail.com.
² Evo-devo, Bioinformatics and Neuromorphic Information Processing groups, Institute of Computer Science and Faculty of Mathematics and Computer Science, Leipzig University, Leipzig, Germany.
³ Department of Biology and Biotechnology, University of Pavia, Pavia, Italy.
⁴ Human Technopole, Milan, Italy.
⁵ Laboratory of Neurogenetics and Behavior, The Rockefeller University, New York, NY, USA.
⁶ Australian Centre for Disease Preparedness, CSIRO Australia Bio21 Institute, School of Biosciences, University of Melbourne, Melbourne, Victoria, Australia.
⁷ School of Medical Laboratory Sciences, Institute of Health, Jimma University, Jimma, Ethiopia.
⁸ Department of Entomology and the Fralin Life Science Institute, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA.
⁹ Laboratoire d'Entomologie Fondamentale et Appliquée, Université Joseph Ki-Zerbo, Ouagadougou, Burkina Faso.
¹⁰ Department of Medical Entomology, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand.
¹¹ Centro Regional de Investigación en Salud Pública, Instituto Nacional de Salud Pública, Tapachula, México.
¹² School of Agricultural Sciences, São Paulo State University, Botucatu, Brazil.
¹³ Research Group on Integrated Pest Management, School of Agronomy, Crop Protection Department, São Paulo State University, Botucatu, Brazil.
¹⁴ Department of Microbiology and Molecular Genetics, University of California, Irvine, Irvine, CA, USA.
¹⁵ Insect-Virus Interactions Unit, Institut Pasteur, Université Paris Cité, CNRS UMR2000, Paris, France.
¹⁶ College of Veterinary Medicine, Kansas State University, Manhattan, KS, USA.
¹⁷ Department of Biology and Biotechnology, University of Pavia, Pavia, Italy. mariangela.bonizzoni@unipv.it.

^# Contributed equally.

PMID: 40155778
PMCID: PMC11976285
DOI: 10.1038/s41559-025-02643-5

Adaptive genomic signatures of globally invasive populations of the yellow fever mosquito Aedes aegypti

Alejandro N Lozada-Chávez et al. Nat Ecol Evol. 2025 Apr.

. 2025 Apr;9(4):652-671.

doi: 10.1038/s41559-025-02643-5. Epub 2025 Mar 28.

Authors

Affiliations

¹ Department of Biology and Biotechnology, University of Pavia, Pavia, Italy. nabor.lozada@gmail.com.
² Evo-devo, Bioinformatics and Neuromorphic Information Processing groups, Institute of Computer Science and Faculty of Mathematics and Computer Science, Leipzig University, Leipzig, Germany.
³ Department of Biology and Biotechnology, University of Pavia, Pavia, Italy.
⁴ Human Technopole, Milan, Italy.
⁵ Laboratory of Neurogenetics and Behavior, The Rockefeller University, New York, NY, USA.
⁶ Australian Centre for Disease Preparedness, CSIRO Australia Bio21 Institute, School of Biosciences, University of Melbourne, Melbourne, Victoria, Australia.
⁷ School of Medical Laboratory Sciences, Institute of Health, Jimma University, Jimma, Ethiopia.
⁸ Department of Entomology and the Fralin Life Science Institute, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA.
⁹ Laboratoire d'Entomologie Fondamentale et Appliquée, Université Joseph Ki-Zerbo, Ouagadougou, Burkina Faso.
¹⁰ Department of Medical Entomology, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand.
¹¹ Centro Regional de Investigación en Salud Pública, Instituto Nacional de Salud Pública, Tapachula, México.
¹² School of Agricultural Sciences, São Paulo State University, Botucatu, Brazil.
¹³ Research Group on Integrated Pest Management, School of Agronomy, Crop Protection Department, São Paulo State University, Botucatu, Brazil.
¹⁴ Department of Microbiology and Molecular Genetics, University of California, Irvine, Irvine, CA, USA.
¹⁵ Insect-Virus Interactions Unit, Institut Pasteur, Université Paris Cité, CNRS UMR2000, Paris, France.
¹⁶ College of Veterinary Medicine, Kansas State University, Manhattan, KS, USA.
¹⁷ Department of Biology and Biotechnology, University of Pavia, Pavia, Italy. mariangela.bonizzoni@unipv.it.

^# Contributed equally.

PMID: 40155778
PMCID: PMC11976285
DOI: 10.1038/s41559-025-02643-5

Abstract

In the arboviral vector Aedes aegypti, adaptation to anthropogenic environments has led to a major evolutionary shift separating the domestic Aedes aegypti aegypti (Aaa) ecotype from the wild Aedes aegypti formosus (Aaf) ecotype. Aaa mosquitoes are distributed globally and have higher vectorial capacity than Aaf, which remained in Africa. Despite the evolutionary and epidemiological relevance of this separation, inconsistent morphological data and a complex population structure have hindered the identification of genomic signals distinguishing the two ecotypes. Here we assessed the correspondence between the geographic distribution, population structure and genome-wide selection of 511 Aaf and 123 Aaa specimens and report adaptive signals in 186 genes that we call Aaa molecular signatures. Our results indicate that Aaa molecular signatures arose from standing variation associated with extensive ancestral polymorphisms in Aaf populations and have been co-opted for self-domestication through genomic and functional redundancy and local adaptation. Overall, we show that the behavioural shift of Ae. aegypti mosquitoes to live in association with humans relied on the fine regulation of chemosensory, neuronal and metabolic functions, as seen in the domestication processes of rabbits and silkworms. Our results also provide a foundation for the investigation of new genic targets for the control of Ae. aegypti populations.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1. Worldwide population structure and genetic diversity of African and out-of-Africa samples of *Aedes aegypti.*
a, Map of the worldwide collection sites of *Ae. aegypti* populations used in this study (Supplementary Table 1). The site numbers correspond to the populations shown in b. b, Top: an ADMIXTURE analysis of population structure generated with k = 13 and 1.5 million biallelic NR-SNPs for all of the sampled populations. On the y axis, each vertical bar represents the probability (q values from 0 to 1) of the assignment of a single individual to each genetic cluster. On the x axis, population names and numbers are reported according to the map in a. Based on their primary ancestry assignments, the out-of-Africa populations are grouped into three genetic clusters: America (AME), Asia and the Pacific Islands (PI) and the African populations are grouped into four genetic clusters: western (THI and NGY; cluster k₂), western–central (cluster k₅), central (cluster k₃) and eastern Africa (cluster k₄). Bottom: SNP count distribution of the ~314.4 million high-confidence SNPs detected in this study across repetitive and non-repetitive regions of the *Ae. aegypti* genome and for each population analysed (Supplementary Information and Supplementary Data 1). Populations with the lowest numbers of samples are highlighted according to the legend. c, PCA generated with 1.5 million biallelic NR-SNPs for 554 samples (Supplementary Information). The populations are colour coded by country. Samples from the human-feeding mosquitoes from Africa RABd, NGY and THI are highlighted according to the legend. The five clusters depicting western, central, western–central and eastern African populations are highlighted in yellow (see text). AMS, American Samoa; BFA, Burkina Faso; BRZ, Brazil; CAM, Cameroon; GAB, Gabon; GHA, Ghana; NIG, Nigeria; KEN, Kenya; MEX, Mexico; NC, New Caledonia; SAA, Saudi Arabia; SEN, Senegal; THA, Thailand; UGA, Uganda. Map adapted from ref. , GNU General Public Licence.

**Fig. 2. Evolutionary relationships and genetic divergence among 554 *Ae. aegypti* genomes.**
a, Maximum likelihood tree for 554 individuals reconstructed with the core-exome SNP dataset (Methods and Supplementary Information). b, Maximum likelihood tree for 40 populations reconstructed with SNP allele frequencies estimated from the dataset of the maximum likelihood tree in a. In both maximum likelihood trees^,, *Ae. albopictus* was used as an outgroup and the branch lengths are proportional to the amount of genetic divergence that has occurred, as shown in the corresponding scales (Supplementary Data 3). Bootstrap support for each relationship is colour coded according to the legend. Green stars on both maximum likelihood trees depict the close relationship between human-feeding mosquitoes in Africa (THI, NGY and RABd; indicated by blue circles) and out-of-Africa populations (indicated by red circles). c, Heatmap showing the clustering of pairwise genetic divergences for 40 populations based on weighted F_ST-based distances calculated from the subset of 1.5 million biallelic NR-SNPs present in >90% of all individuals per population, according to the Weir–Cockerham approach and after 1,000 replicates (Supplementary Table 8). The diagonal in the matrix represents the comparison with the same population (zero difference; in black) and the degree of divergence for each comparison is colour coded according to the corresponding legend. Mosquitoes from Africa, out-of-Africa and human-feeding mosquitoes in Africa are depicted in a–c by black, red and blue circles or squares, respectively. Individuals in a and populations in b and c are also colour coded according to the admixture clustering they belong to, as depicted in Fig. 1b.

**Fig. 3. Genomic signals of adaptation across *Ae. aegypti* populations by three methods and prediction of *Aaa* molecular signatures.**
a, Circular Manhattan plot displaying the distribution of candidate adaptive variants detected in out-of-Africa (OoA) populations by three selection-based methods across *Ae. aegypti* chromosomes. The inner circle (A) shows the μ values predicted with RAiSD for 8,120 hard selective sweeps harbouring globally associated variants in OoA populations exclusively; the high-scoring top 1% of signals are shown in black and non-significant signals are shown in grey. The middle circle (B) shows the 10,030 SNP outliers predicted with PCAdapt. Significant OoA-associated outliers (as described in b) are plotted in black and were obtained with an FDR < 1% of the adjusted P values (−log₁₀) from the Mahalanobis test; non-significant outliers are plotted in grey. The outer circle (C) shows the genomic coordinates of 356 protein-coding genes harbouring positively selected signals (in red) in OoA populations exclusively, according to MKT–DoS tests. Genomic coordinates are shown for 186 *Aaa* molecular signature genes identified by intersecting the three methods (Venn diagram; Supplementary Table 26). b, Boxplots depicting the variation of clustering scores from 10,030 outliers detected with PCAdapt across the genome and three selected principal components (Extended Data Fig. 8). Central lines depict mean values, the box edges are the 25th and 75th percentiles and the whiskers represent mean values ± 1.5× the interquartile range. The asterisks represent significant associations of the mean value of clustering scores for that population with both the corresponding principal component (one-sample two-sided t-test; µ ≠ 0; P < 0.001) and Africa or OoA (two-sided pairwise Welch’s t-test; µ_i ≠ µ_j; P < 0.001), underscoring outliers more strongly associated with OoA (PC1 and PC4), Africa (PC2) or both (PC2). All t-test P values were adjusted using the Benjamini–Hochberg method (Supplementary Tables 19–21). c, DoS values for 929 protein-coding genes (x axis) plotted across all 40 populations (y axis) for six functional categories considered relevant for *Ae. aegypti*’s domestication and immunity (Supplementary Table 13 and Supplementary Data 6). Note that most genes are weakly selected (DoS score < 0) or evolving (nearly) neutrally (DoS score = 0) across populations (Extended Data Fig. 9c, Supplementary Table 24 and Supplementary Data 8). The frequency (in bars) of positively selected genes across the 40 populations is shown in the outer circle.

**Fig. 4. A look into *Aaa* molecular signature genes.**
a, Annotated Gene Ontology terms for 186 *Aaa* molecular signature genes are significantly enriched (Fisher’s exact test; P < 0.05) in four functional categories: chemosensory, neuronal, metabolic and regulatory (Extended Data Fig. 6d). The bar plot (left) shows the number of genes annotated for each Gene Ontology term. The heatmap shows the enriched Gene Ontology functions that are shared (black squares) across the predictions from the three selection methods (Extended Data Fig. 6e–g). Key examples (right) are highlighted for each category (Supplementary Table 26). b, Manhattan plots for the region between 80 and 120 megabases (Mb) on chromosome 3 displaying the genomic context of signals overlapping five *Aaa* molecular signature genes (red boxes). Metrics for OoA populations are shown in sliding windows of 250 kb; from top to bottom (Supplementary Data 12): RAiSD’s μ values show the high-scoring top 1% of outliers (green dots) within hard selective sweeps; PCAdapt’s adjusted P values (−log₁₀) with FDR < 1% indicate significant OoA-associated outliers (green dots; as described in Fig. 3b); larger F_ST values indicate greater genetic differentiation between OoA and African populations than that detected from the genomic background (lower values); nucleotide diversity (π) and Tajima’s D values show an expected decrease in genetic variation around adaptive outliers. Regions encoding *Aaa* molecular signature genes (pink shadows) show consistent signals of selection and significant association with OoA populations, whereas candidate signals were discarded when they were not consistent with at least two selection methods (grey shadows) or when they were not located within annotated protein-coding genes or ncRNAs (blue shadows). c, Boxplots showing significant allele frequency changes (y axis) of non-synonymous SNPs resulting in amino acid changes (x axis) for seven *Aaa* gene markers across OoA, RABd/THI/NGY and all of the other African populations (one-way ANOVA and Tukey’s tests; P < 0.05). All P values from Tukey’s test were adjusted using the Benjamini–Hochberg method (Supplementary Tables 27 and 28). Central lines depict mean values, the box edges are the 25th and 75th percentiles and the whiskers represent the minima and maxima of the datapoints. Significant allele frequency changes for these *Aaa* markers in available samples from Florida (FL) and Colombia (CO) are also depicted (Supplementary Table 28). snoRNA, small nucleolar RNA.

**Extended Data Fig. 1. The workflow of this research.**
We used Whole-Genome Sequencing (WGS) data for 686 *Aedes spp*. mosquitoes to assess: 1) population structure, 2) genetic divergence, and 3) signals of genomic selection between the domestic *Aedes aegypti aegypti* (Aaa) and the generalist *Aedes aegypti formosus* (Aaf) mosquitoes. Left panel: data collection includes 581 WGS sequences publicly available and the sampling/sequencing of 105 mosquitoes from 7 localities, which were also analyzed for sex determination and species identity. Following mapping of WGS to reference genomes, identification of SNPs was performed with two Variant Callings over a custom “golden SNPs dataset”. Middle panel: after filtering of SNP datasets, SNP statistics and genetic diversity were estimated to analyze population structure, phylogenetic relationships and genetic differentiation across populations. Right panel: Candidate adaptive variants were predicted at two scales: (1) ‘globally’ grouping populations from Africa, out-of-Africa and African mosquitoes behaving like ‘Aaa’ (from RABd, NGY and THI populations), which most likely explain the historical switch from ‘Aaf’ to ‘Aaa’ behaviors in *Ae. aegypti*; and (2) ‘locally’ on each population, which most likely reflect a mix between the historical switch from ‘Aaf’ to ‘Aaa’ and “local adaptations” due to recent environmental and anthropogenic pressures. Three different and complementary methods were used for prediction of adaptive outliers: (1) *RAiSD* predicts hard selective sweeps; (2) *PCAdapt* identifies SNP-outliers concerning population structure; and (3) McDonald-Kreitman test (MKT) and its derived Direction of Selection statistic (DoS) estimate gene selection by contrasting polymorphism and divergence data from the closest outgroup *Ae. albopictus*. By intersecting the strongest predictions of the global approach in out-of-Africa populations from the three methods, a consensus set of robust adaptive outliers mapping 186 genes is called *“Aaa molecular signatures”*, 68 of which harbor 483 nonsynonymous variants predicted as significant *“Aaa markers”*. Functional assignments and GO-enrichments were performed over robust predicted and curated annotations, followed by estimation of ancestral standing variation across the adaptive variants predicted by each selection method.

**Extended Data Fig. 2. *Nix* gene identification and SNP counts for females and males across *Ae. aegypti* populations.**
(a) PCR results using *Nix*-specific primers in males (lanes 1 and 2), mated (lanes 3 and 4) and virgin (lane 5 and 6) females. Each lane is the amplification product of the DNA of one individual mosquito; each DNA was amplified once with a nested PCR. The expected product was 320 base pairs (bps) for the first PCR reaction and 212 bp for the second (N) PCR reaction. Results of the first and second amplifications are shown in adjacent lanes for each tested sample. The amplification results from the DNA of the two tested males and the two tested mated females were the same. We did not observe any amplification from the DNA of the two tested virgin females nor from the negative control (-N). (b) SNPs counting distribution (Y-axis) for females and males for each population (X-axis), grouped by their corresponding country (top headers, see abbreviations of population names in Fig. 1 and Supplementary Table 1). The middle line, bottom and top of the box show the mean, 25^th and 75^th percentiles, respectively; whiskers present the minima and maxima of data points.

**Extended Data Fig. 3. Distribution of SNPs density and Tajima’s D scores across the *Ae. aegypti* genome and populations.**
(a) The distribution of 89.6 million billalelic NR-SNPs across the genome (bottom axis) was calculated and plotted over a non-overlapping sliding window of 50 kilobases (kb), showing from low (dark blue) to high (red) SNP density for each population (left axis) and chromosome (Supplementary Data 1, Supplementary Information). We found that SNPs are not randomly distributed across non-repetitive regions (one-sided chi-squared test, p<0.05 in all cases, Supplementary Table 3), and that SNP density is higher in telomeres. Significant differences were also found in the number of SNPs located across chromosomes (arms and centromeres) in both African (n=31) and out-of-Africa (n=8) populations (paired-samples two-sided t-test, p<0.05 in all cases; Supplementary Table 2). P-values were adjusted using a Bonferroni correction with a False Positive Rate (FPR) of 5% (alpha = 0.05). (b) Tajima’s D scores for each population were calculated and plotted across the genome using the same SNPs dataset and sliding window as in (a). Tajimas’ D scores that are different from zero (D=0, grey) were classified as ‘negative values’ when D<0 (dark cyan) and as ‘positive values’ when D>0 (purple). Sliding windows with no Tajima’s D scores (black) were defined as “Not calculated” (NC). Populations were grouped according to their geographical region in Africa (Western, Central, and Eastern) or out-of-Africa (Supplementary Table 1). Most African populations were found to have more genome intervals with negative Tajima’s D values on each chromosome and more concentrated towards telomeres (63% of all sliding windows). Conversely, out-of-Africa populations were found to have more genome intervals with positive Tajima’s D values. In both panels (a, b), previously identified human-feeding mosquitoes from three African populations are highlighted in red font: THI, NGY, and RABd. Descriptive statistics based on different sliding windows (500 kb, 250 kb, 100 kb, 50 kb, 10 kb) for each population are shown in Supplementary Tables 3 and 5.

**Extended Data Fig. 4. Population structure of *Ae. aegypti* samples based on Principal Component Analyses (PCA) and admixture analyses.**
(a) Admixture analyses performed with four SNPs datasets are shown depicting different regions of the *Ae. aegypti* genome (see Methods and Supplementary Information): (i) whole genome, (ii) exome, (iii) repetitive sequences, and (iv) non-repetitive sequences. (b) The cross-validation error plot for the Admixture analyses in (a) is shown using a range of cluster numbers (from k=2 to k=39) on each dataset associated to specific regions of the genome. (c) PCA analyses generated with three SNPs datasets representing different regions of the genome (as in (a)) recapitulate the same clustering patterns across populations. Symbology: individuals are color-coded by country (filled circles) and continent (different symbols). (d) The analysis of genetic relatedness among *Ae. aegypti* samples was performed with PCAs using the subset of the 89.6 million biallelic NR-SNPs that is present in >90% of all individuals *per* population (see Methods, Supplementary Information). Same symbology as in (c). On the left, a PCA analysis of all 634 samples shows the four clusters formed according to the genetic relatedness of the samples for each population. The PCA at the center shows the clustering of 539 samples, after the removal of 95 highly related individuals. Note that all the samples from Rabai previously classified as domesticated (RABd, black solid outlined circles close to out-of-Africa in the PCA at the left) are no longer present in this plot. The PCA on the right shows the clustering of the final 554 individuals considered for all the analyses of this study, including 15 individuals from RABd (see Supplementary Table 12, Supplementary Information).

**Extended Data Fig. 5. Identification of Nonretroviral Endogenous Viral Elements (nrEVEs) in *Ae. aegypti* genomes.**
(a) PCA analysis based on frequency distribution of reference nrEVEs across *Ae. aegypti* populations, which are color-coded according to the symbology. (b) Comparison of the percentage of amino acid identity for all reference (in blue) and new (in red) nrEVEs with respect to the closest related viral species (see Supplementary Information). Black lines represent the mean value. Groups were compared with the Welch’s unequal variances t-test, four stars indicate a p-value<0.0001. (c) Results of PCR amplification for a subset of the 7 novel nrEVEs identified by bioinformatics analyses. The template DNA for nrEVE amplification was an aliquot of the same genomic DNA that had been used for WGS and in which the tested nrEVE had been identified. A positive amplification in presence of a clear negative control validated the bioinformatics prediction for the tested sample. PCR amplification was done once. The name of the nrEVEs is coded with an upper letter at the base of each lane (see symbology below), alongside the sample in which it was tested. PCR primers were designed based on predictions by ViR (Supplementary Table 32, Supplementary Information). The first bar at the left is the control, nucleotide length (in bps) is highlighted in yellow, and “the negative” is abbreviated as “neg”. “Negative” is the amplification with the absence of the DNA template. Symbology, A: *Aedes aegypti* toti-Like nrEVEs; B: *Aedes aegypti* toti-Like nrEVEs; C: *Aedes aegypti* toti-Like nrEVEs; D: CFAV_5 with cfav5_F2/R2 primers; E: *Culex pseudovishnui* rhabdolike_2; F: Liao Ning_1 with primers LN_F1 and LN_R1. (d) Each dot in the plot represents a nrEVE, which is located on the X-axis based on its length and on the Y-axis based on the viral family that it matches with the highest nucleotide identity. nrEVEs that are uniquely detected in *Ae. aegypti* genomes are depicted in red if they are newly identified across the 554 genomes (Supplementary Table 30) or in blue if they are reference nrEVEs (Supplementary Table 31). nrEVEs are depicted in gray dots if they are also found in WGS data of *Ae. mascarensis*.

**Extended Data Fig. 6. Diagnostic plots of *RAiSD* predictions, and GO-clustering of protein-coding genes harboring adaptive out-of-Africa-associated signals.**
(**a-b**) Comparison of high-scoring top signals predicted with *RAiSD* in out-of-Africa populations, at the global population scale, using two different score threshold methods. (a) The bar plot in Y2-axis shows the total number of high-scoring top outliers within hard selective sweeps obtained with five equivalent cutoffs, as calculated with a “percentile threshold” (for example, only the high-scoring top 1% signals are retained) and with an “FDR-adjusted p-value threshold” (for example, only the high-scoring top signals with FDR < 5% resulting in false positives are retained). The Y1-axis shows the proportion (%, dots) of intersected protein-coding genes harboring high-scoring top signals from each threshold method and across equivalent cutoffs. (b) The bar plots show the distribution of the number of peak positions (outliers) within hard selective sweeps that are mapping protein-coding genes for equivalent cutoffs, as obtained with a top 1% percentile score threshold (left) and with an FDR-adjusted p-value <5% score threshold (right). Note that most genes harbor several high-scoring top outliers (>2) with either method. (c) The number of “*Aaa molecular signature*” genes obtained from the intersection of *RAiSD*, *PCAdapt* and MKT-DoS methods is shown by different percentile cutoffs applied for the high-scoring top signals detected with *RAiSD*. (d) A GO enrichment analysis is shown for 185 “*Aaa molecular signature*” protein-coding genes with an annotated GO-term; categories with a p-value <0.05 threshold from the weighted-Fisher test were considered significantly enriched. P-values were not adjusted for multiple testing, as recommended in Alexa et al. (2006). For each GO-term, the significance level (black line, top Y-axis) and the observed-expected ratio of genes annotated to the respective GO-term (black bars, bottom Y-axis) are plotted. (**e-g**) Clustering of the enriched GO-terms for the predicted protein-coding genes harboring adaptive out-of-Africa-associated signals is shown separately for (e) *RAiSD*, (f) *PCAdapt* and (g) MKT-DoS, and shows the convergence into five major functional categories: chemosensory (blue), neuronal (red), metabolic (green), regulatory (black) and others (purple). Note that several of the analyzed genes lack of an annotated or predicted GO-term function. The results of GO enrichment analyses from the selection methods are available in Supplementary Tables 15, 17, 19, 22, 23 and 26; and the full list of GO-terms and merged GO information, which was also used to plot (**e-g**), is available at the GitHub repository: https://github.com/naborlozada/Aaegypti_domestication.

**Extended Data Fig. 7. Diagnostic plots of *PCAdapt* predictions.**
(a) Discarding the influence of Linkage Disequilibrium (LD) in outlier detection after “SNP thinning” with *PCAdapt*. Manhattan plots show the “loadings distribution” (contributions of each SNP to the Principal Component [PC]) for each chromosome and PC, after a “LD pruning” was carried out for the entire dataset. We observe that loadings are not clustered in a single or several genomic regions (depicting most likely regions of strong LD), but rather the distribution of the loadings is evenly distributed across the chromosomes. Only at the center of the chromosome, the number of loadings decreases due to a small genetic diversity. These plots confirm that the outliers detected with *PCAdapt* correspond to regions involved most likely in adaptation, rather than to regions of low recombination (high LD). (b) The scree plot for each chromosome displays the percentage of variance explained (Y-axis) by each PC in a descending order (X-axis); and it is used to identify the best K’s number that should be used in *PCAdapt* as a measurement of population structure. This analysis was also reinforced with a Tracy-Widow test (p<0.05) and a pairwise comparison of each PC (see Methods). (c) The Quantile-Quantile plot for each chromosome confirms that most of the estimated p-values (Y-axis) follow the expected uniform distribution (X-axis, a 45-degree line is plotted). Yet, the smallest p-values are smaller than expected, confirming the presence of outliers. (d) The histogram for each chromosome shows the (uniform) distribution of the p-values (X-axis, values between 0 and 1) and their frequency (Y-axis). The excess of small p-values indicates the presence of outliers. The p-values were obtained from the Mahalanobis distance, and then were transformed into q-values to detect top-high scoring outliers using an FDR-adjusted p-value-score threshold of 1% (α=0.01).

Extended Data Fig. 8. Association of outliers across *Ae. aegypti* populations using Principal Component (PC) scores from *PCAdapt.*
Boxplots depict the variation of the “clustering scores” from 10,030 outliers detected with *PCAdapt* across each chromosome and six Principal Components (PCs). The middle line, bottom and top of the box show the mean, 25^th and 75^th percentiles, respectively; whiskers present mean values +/-1.5×IQR. “Clustering scores” equal to zero are denoted with a horizontal dotted red line. The asterisks (*) over boxplots represent significant associations of the mean value of “clustering scores” for that population to both, the corresponding PC (one-sample two-sided t-test, µ≠0, p<0.001) and to Africa or out-of-Africa (AMER: Americas; Asia; PI: Pacific Islands) (two sided pairwise Welch’s t-test, µ_i≠µ_j, p<0.001), underscoring outliers that are more strongly associated with out-of-Africa (PC1, PC3-PC6) or Africa (PC2) or both (for example, PC2) populations than expected by genetic drift only. All t-tests p-values were adjusted with the Benjamini-Hochberg method. See the full results from both tests for each PC and population in Supplementary Tables 19-21. Noteworthy, 95% of the total variation is explained by the first three PCs (PC1-PC3), whereas the remaining 5% of the variation is explained by PC4-PC6 and it falls exclusively in out-of-Africa populations.

Extended Data Fig. 9. Estimation of protein-coding gene selection with MKT-DoS tests across 11,651 orthologs between *Ae. aegypti* and *Ae. albopictus.*
(a) Heatmaps show the clustering of 11,402 out of 11,651 orthologous protein-coding genes estimated to be under positive selection (Y-axis), according to DoS > 0 scores (left) and to MKT test: Dn/Ds > Pn/Ps (right), across *Ae. aegypti* populations (X-axis). Genes and populations were clustered using a binary matrix depicting the presence (red) or absence (grey) of positive selection in a gene; an analysis of distance and a clustering procedure were carried out with the method ‘war.D’. Only 356 positively selected genes, as estimated with the MKT and DoS tests, were detected in out-of-Africa populations exclusively. The genomic location of 354 of these adaptive protein-coding genes is widely distributed across the three chromosomes, and only two protein-coding genes were located in contigs (Supplementary Tables 22-23). (b) Top: the histogram shows the frequency distribution of MKT values (X-axis) for all orthologous protein-coding genes (Y-axis) included in the selection analyses, according to significant MKT values for positive selection in out-of-Africa populations (Dn/Ds > Pn/Ps; Fisher’s exact test, p-values adjusted for multiple testing with the Benjamini-Hochberg method and an FDR of 5%; Supplementary Table 22, Supplementary Data 7). Bottom: the histogram shows the frequency distribution of DoS values (X-axis) for all orthologous protein-coding genes (Y-axis) included in the selection analyses, according to DoS scores for positive (DoS > 0) and weak negative (DoS < 0) selection and also for neutral evolution (DoS = 0) (see Eq. (2) under Methods, Supplementary Table 23, Supplementary Data 6 and 8). (c) Overview of the DoS scores estimates for 11,402 out of 11,651 orthologous protein-coding genes across the 40 populations analyzed (see Methods, Supplementary Data 6 and 8). Note the proteome-wide presence of weak selection and (nearly) neutral evolution across protein-coding genes and *Ae. aegypti* populations (Supplementary Table 24).

**Extended Data Fig. 10. Estimation of standing variation located within protein-coding genes and ncRNAs harboring adaptive variants in out-of-Africa populations.**
The boxplots show the proportions of polymorphic SNPs located within 2,130 protein-coding genes and 217 ncRNAs harboring adaptive variants across eight out-of-Africa populations (as detected by the three selection methods), which depict either shared polymorphic SNPs with individuals from at least one African population (in green) or population-specific SNPs in out-of-Africa populations (*that is*, “private variants”, in yellow) (see Methods, Supplementary Data 9). The middle line, bottom and top of the box show the mean, 25^th and 75^th percentiles, respectively; whiskers present the minima and maxima of data points. On average, 65.8% (95% CI [64.59, 66.96]) and 44.7% (95% CI [43.72, 45.75]) of all SNPs located within adaptive protein-coding genes and ncRNAs in out-of-Africa populations, respectively, were also found to be polymorphic in African populations, suggesting an origin from ancestral “standing genetic variation”. Noteworthy, the proportion of out-of-Africa-associated SNPs shared with African populations is significantly higher for adaptive protein-coding genes than that found for the entire genome (avg. 47.5%, 95% CI [46.57, 48.52]), according to the Fisher’s exact test (one-sided, ‘greater’), P=2.2×10^-16, p<0.05 (Supplementary Table 25).

See this image and copyright information in PMC

References

1. Soghigian, J. et al. Genetic evidence for the origin of Aedes aegypti, the yellow fever mosquito, in the southwestern Indian Ocean. Mol. Ecol.29, 3593–3606 (2020). - PMC - PubMed
1. Tchouassi, D. P., Agha, S. B., Villinger, J., Sang, R. & Torto, B. The distinctive bionomics of Aedes aegypti populations in Africa. Curr. Opin. Insect Sci.54, 100986 (2022). - PubMed
1. Powell, J. R. & Tabachnick, W. J. History of domestication and spread of Aedes aegypti—a review. Mem.Inst. Oswaldo Cruz108, 11–17 (2013). - PMC - PubMed
1. Aubry, F. et al. Enhanced Zika virus susceptibility of globally invasive Aedes aegypti populations. Science370, 991–996 (2020). - PubMed
1. Xia, S. et al. Genetic structure of the mosquito Aedes aegypti in local forest and domestic habitats in Gabon and Kenya. Parasit. Vectors13, 417 (2020). - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

682394/EC | EU Framework Programme for Research and Innovation H2020 | H2020 Priority Excellent Science | H2020 European Research Council (H2020 Excellent Science - European Research Council)

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
- PubMed Central
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Adaptive genomic signatures of globally invasive populations of the yellow fever mosquito Aedes aegypti

Affiliations

Adaptive genomic signatures of globally invasive populations of the yellow fever mosquito Aedes aegypti

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous