Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Sep 4:2025.09.02.673402.
doi: 10.1101/2025.09.02.673402.

Tractor Workflow Pipeline: A Scalable Nextflow Framework for Local Ancestry-Aware Genome-Wide Association Studies

Affiliations

Tractor Workflow Pipeline: A Scalable Nextflow Framework for Local Ancestry-Aware Genome-Wide Association Studies

Nirav N Shah et al. bioRxiv. .

Abstract

The routine exclusion of admixed individuals from traditional Genome-Wide Association Studies (GWAS) due to concerns about spurious associations has hindered genetic analyses involving multiple ancestries. Tractor GWAS addresses this issue by incorporating local ancestry into its analysis, empowering identification of ancestry-enriched hits and generating ancestry-specific summary statistics. However, Tractor requires accurate genomic phasing and local ancestry inference as prerequisite steps, which requires additional bioinformatics expertise and decision points regarding reference panel setup. To streamline, harmonize, and automate this process, we present a scalable Nextflow workflow that integrates all necessary steps, minimizing the need for manual intervention while remaining modular and customizable. The workflow supports multiple commonly used tools and offers flexibility in how Tractor is implemented. To demonstrate its utility, we applied this pipeline to analyze 32 blood biomarkers in 6,245 two-way AFR-EUR admixed individuals from the UK Biobank. This pipeline ran efficiently at scale, replicated known associations, and identified novel ancestry-specific loci. These novel associations were largely driven by variants present on African ancestral tracts but absent from European tracts, underscoring the value of local ancestry-aware methods in uncovering previously missed genetic signals. By enabling the efficient analysis of admixed individuals, our workflow facilitates Tractor use, paving the way for more broader genetic discovery.

PubMed Disclaimer

Figures

Fig. 1:
Fig. 1:. Overview of Tractor Nextflow Workflow:
The workflow comprises three modules: (1) Phasing, where QC’d unphased cohort data is phased using SHAPEIT5 (optionally with reference panels); (2) Local Ancestry Inference, using RFMix2, GNomix or FLARE to generate ancestry segment calls from the phased VCF and reference panels; and (3) Tractor GWAS, which combines phased genotypes, local ancestry tracts, and covariates to perform ancestry-aware GWAS and produce ancestry-specific summary statistics. The lower panel illustrates how unphased genotypes are resolved into haplotypes, followed by local ancestry labeling using reference populations (e.g., AFR, EUR), which are then used in Tractor GWAS.
Fig. 2.
Fig. 2.. Manhattan and Q-Q plots from Tractor GWAS of Apolipoprotein B (ApoB) levels for AFR and EUR ancestral tracts in an admixed AFR-EUR cohort from the UK Biobank (N = 5,795).
(a, c) Manhattan plots showing p-values for AFR and EUR tracts, respectively. A shared genome-wide significant locus was identified at APOE, with five additional significant loci detected on AFR tracts at PCSK9, CELSR2, MYO1H, PMFBP1, and LDLR. (b, d) Q-Q plots for AFR and EUR tracts, indicating well-calibrated Type I error control in both ancestry-specific GWAS.

References

    1. Alexander D.H. et al. (2009) Fast model-based estimation of ancestry in unrelated individuals. Genome Res, 19, 1655–1664. - PMC - PubMed
    1. Atkinson E.G. et al. (2022) Cross-ancestry genomic research: time to close the gap. Neuropsychopharmacology 2022, 1–2.
    1. Atkinson E.G. et al. (2021) Tractor uses local ancestry to enable the inclusion of admixed individuals in GWAS and to boost power. Nat Genet, 53, 195–204. - PMC - PubMed
    1. Bergström A. et al. (2020) Insights into human genetic variation and population history from 929 diverse genomes. Science (1979), 367. - PubMed
    1. Browning S.R. et al. (2023) Fast, accurate local ancestry inference with FLARE. Am J Hum Genet, 110, 326–335. - PMC - PubMed

Publication types

LinkOut - more resources