Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2026 Jan 5:16:1725866.
doi: 10.3389/fgene.2025.1725866. eCollection 2025.

Monitoring diversity in genome-wide association studies requires measuring and reporting on immigration-related factors

Affiliations

Monitoring diversity in genome-wide association studies requires measuring and reporting on immigration-related factors

Yao Tu et al. Front Genet. .

Abstract

Genome-wide association studies (GWAS) have made remarkable progress to date in deciphering the genetic foundations of complex traits, yet persistent gaps remain in how sample heterogeneity is measured and reported. Current practices typically emphasize diversity by broad ancestry categories or stratification by country of recruitment, but these dimensions alone fail to capture the immigration-related factors that contribute to the genetic or environmental origins of heterogeneity. We argue that incorporating variables, such as country of origin, in descriptions and analyses provides essential context for interpreting genetic associations, particularly in increasingly multi-population and trans-national GWAS samples. We highlight how neglected these variables are in the literature using the GWAS Catalog. We provide suggestions for reporting on these data in future studies. By advocating for a more comprehensive view of diversity in GWAS, we aim to address the under-representation of immigrants in GWAS and thereby strengthen the validity and interpretability of future genomic studies.

Keywords: Country of birth; country of recruitment; diversity and inclusion; environment; gene-environment (G-E) interaction; genome-wide association studies; immigration.

PubMed Disclaimer

Conflict of interest statement

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The author LF-R declared that they were an editorial board member of Frontiers at the time of submission. This had no impact on the peer review process and the final decision.

Figures

FIGURE 1
FIGURE 1
Summary of GWAS Catalog (A) Curation Process and (B) Genome-Wide Association Studies with Population Descriptors. Panel (A) shows a flowchart of the curation process was summarized based on the method descriptions in the GWAS Catalog. Panel (B) presents estimates of study numbers that were obtained using the “All ancestry data v1.0” in the GWAS Catalog (7/10/2025), by summing the number of studies with population descriptors (“Broad Ancestral Category”, labeled as “ANCESTRY”; Country of Recruitment, labeled as ‘RECRUITMENT’; and “Country of Origin”, labeled as “ORIGIN”). We filtered out records with identical information on the study’s PubMed ID, stage, ancestry, and country information (country of recruitment and country of origin). The upset plot was plotted by the R package UpSetR (Conway et al., 2017).
FIGURE 2
FIGURE 2
Total Participant Sample in Genome-Wide Association Studies of the GWAS Catalog Mapped by (A) Country of Recruitment and (B) Country of Origin. Estimates of the total participants per country were obtained by summing the number of individuals across all studies, stages and ancestral categories in the same filtered dataset used in (B). When studies listed multiple countries, we assumed that the number of individuals in this specific study from each country was uniform and calculated the country-specific sample as the number of individuals/number of countries listed. Interactive hyml version of these figures can be founded online: (A) https://yao876.github.io/GWAS-Catalog-maps/Participants%20by%20Country%20of%20Recruitment_max100M.html (B): https://yao876.github.io/GWAS-Catalog-maps/Participants%20by%20Country%20of%20Origin.html); if possible, as they are interactive for the user.

References

    1. Allot A., Lee K., Chen Q., Luo L., Lu Z. (2021). LitSuggest: a web-based system for literature recommendation and curation using machine learning. Nucleic Acids Res. 49, W352–W358. 10.1093/nar/gkab326 - DOI - PMC - PubMed
    1. Burkart K. M., Sofer T., London S. J., Manichaikul A., Hartwig F. P., Yan Q., et al. (2018). A genome-wide association Study in Hispanics/Latinos identifies novel signals for lung function. The Hispanic community health Study/Study of latinos. Am. J. Respir. Crit. Care Med. 198, 208–219. 10.1164/rccm.201707-1493oc - DOI - PMC - PubMed
    1. Commodore‐Mensah Y., Ukonu N., Obisesan O., Aboagye J. K., Agyemang C., Reilly C. M., et al. (2016). Length of residence in the United States is associated with a higher prevalence of cardiometabolic risk factors in immigrants: a contemporary analysis of the National Health Interview survey. J. Am. Hear. Assoc. 5, e004059. 10.1161/jaha.116.004059 - DOI - PMC - PubMed
    1. Conway J. R., Lex A., Gehlenborg N. (2017). UpSetR: an R package for the visualization of intersecting sets and their properties. Bioinformatics 33, 2938–2940. 10.1093/bioinformatics/btx364 - DOI - PMC - PubMed
    1. Fernández-Rhodes L. (2023). Beyond borders: a commentary on the benefit of promoting immigrant populations in genome-wide association studies. Hum. Genet. Genom. Adv. 4, 100205. 10.1016/j.xhgg.2023.100205 - DOI - PMC - PubMed

LinkOut - more resources