Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Sep 2;34(9):1593-1599.
doi: 10.1158/1055-9965.EPI-25-0371.

An Analytic Pipeline to Obtain Reliable Genetic Ancestry Estimates from Tumor-Derived RNA Sequencing Data

Affiliations

An Analytic Pipeline to Obtain Reliable Genetic Ancestry Estimates from Tumor-Derived RNA Sequencing Data

Courtney E Johnson et al. Cancer Epidemiol Biomarkers Prev. .

Abstract

Background: Germline genetics may influence tumor molecular characteristics and ultimately cancer survival. Studies of tumor characteristics, including our epithelial ovarian cancer (EOC) studies of Black women in the United States, may have RNA sequencing (RNA-seq) data from archival tumor tissue but lack germline DNA for at least some individuals. Incomplete germline DNA measurements impede analyses of important measures such as global genetic ancestry, often used in downstream analyses, by reducing sample sizes.

Methods: The study population consists of 184 women who participated in two population-based studies of EOC with both germline and formalin-fixed, paraffin-embedded (FFPE) tumor samples and an additional 58 women diagnosed with EOC from the same two studies with only FFPE tumor tissue. We used tumor RNA-seq data to calculate proportions of African, European, and Asian genetic ancestry using a pipeline built on the packages SeqKit, HISAT2, SAMtools, BCFtools, PLINK, and ADMIXTURE. Women from the 1000 Genomes Project were used as the reference populations, and germline genetic ancestry estimates from blood or saliva were used as the baseline comparison. We evaluated multiple quality control strategies to improve genetic ancestry estimation.

Results: Correlations between tumor RNA-seq-derived estimates of genetic ancestry from our pipeline and germline-derived African and European genetic ancestry ranged between 0.76 and 0.94.

Conclusions: RNA-seq data from archival FFPE tumor tissue can be confidently and efficiently used to approximate global genetic ancestry in an admixed population when germline DNA is unavailable.

Impact: This approach supports analyses of genetic ancestry and cancer when germline samples are not available.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest:

The authors declare no potential conflicts of interest.

Figures

Figure 1.
Figure 1.
A: Tumor RNASeq-derived global genetic ancestry pipeline. B: Distribution of germline DNA-derived global genetic ancestry among Black women diagnosed with epithelial ovarian cancer in the United States, N=184.
Figure 2.
Figure 2.
Density distribution of number of SNPs called (A), total reads (B), and percentage of low-quality reads (C) by age of the tumor sample.
Figure 3.
Figure 3.
Scatter plots of correlation between germline DNA- and tumor RNASeq-derived genetic ancestry at optimal QC metrics; A. MAF=0.05, HWE=0.001, B. Correlation of genetic ancestries at different tumor RNASeq-derived SNP thresholds. The optimal threshold was >=32,995 SNPs (n=84). C. Correlation between germline DNA-derived and tumor RNASeq-derived estimates of genetic ancestry by tissue age at extraction. AFR=African genetic ancestry, ASI=Asian genetic ancestry, EUR=European genetic ancestry.
Figure 4.
Figure 4.
Violin plots of the distribution of tumor-derived genetic ancestry among Black women diagnosed with epithelial ovarian cancer in the informative subset with germline DNA available (n=184) and the exploratory subset without germline DNA available (n=58).

Similar articles

References

    1. Arora K, Tran TN, Kemel Y, Mehine M, Liu YL, Nandakumar S, et al. Genetic Ancestry Correlates with Somatic Differences in a Real-World Clinical Cancer Sequencing Cohort. Cancer Discov. 2022;12(11):2552. doi: 10.1158/2159-8290.CD-22-0312 - DOI - PMC - PubMed
    1. Iyer HS, Zeinomar N, Omilian AR, Perlstein M, Davis MB, Omene CO, et al. Neighborhood Disadvantage, African Genetic Ancestry, Cancer Subtype, and Mortality Among Breast Cancer Survivors. JAMA Netw Open. 2023;6(8):e2331295. doi: 10.1001/jamanetworkopen.2023.31295 - DOI - PMC - PubMed
    1. Lee KK, Rishishwar L, Ban D, Nagar SD, Mariño-Ramírez L, McDonald JF, et al. Association of Genetic Ancestry and Molecular Signatures with Cancer Survival Disparities: A Pan-Cancer Analysis. Cancer Res. 2022;82(7):1222. doi: 10.1158/0008-5472.CAN-21-2105 - DOI - PMC - PubMed
    1. Martini R, Delpe P, Chu TR, Arora K, Lord B, Verma A, et al. African Ancestry-Associated Gene Expression Profiles in Triple-Negative Breast Cancer Underlie Altered Tumor Biology and Clinical Outcome in Women of African Descent. Cancer Discov. 2022;12(11):2530–2551. doi: 10.1158/2159-8290.CD-22-0138 - DOI - PMC - PubMed
    1. McHugh J, Saunders EJ, Dadaev T, McGrowder E, Bancroft E, Kote-Jarai Z, et al. Prostate cancer risk in men of differing genetic ancestry and approaches to disease screening and management in these groups. Br J Cancer. 2021;126(10):1366. doi: 10.1038/s41416-021-01669-3 - DOI - PMC - PubMed

LinkOut - more resources