Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Nov 11;20(1):231.
doi: 10.1186/s13059-019-1849-2.

Impact of mouse contamination in genomic profiling of patient-derived models and best practice for robust analysis

Affiliations

Impact of mouse contamination in genomic profiling of patient-derived models and best practice for robust analysis

Se-Young Jo et al. Genome Biol. .

Abstract

Background: Patient-derived xenograft and cell line models are popular models for clinical cancer research. However, the inevitable inclusion of a mouse genome in a patient-derived model is a remaining concern in the analysis. Although multiple tools and filtering strategies have been developed to account for this, research has yet to demonstrate the exact impact of the mouse genome and the optimal use of these tools and filtering strategies in an analysis pipeline.

Results: We construct a benchmark dataset of 5 liver tissues from 3 mouse strains using human whole-exome sequencing kit. Next-generation sequencing reads from mouse tissues are mappable to 49% of the human genome and 409 cancer genes. In total, 1,207,556 mouse-specific alleles are aligned to the human genome reference, including 467,232 (38.7%) alleles with high sensitivity to contamination, which are pervasive causes of false cancer mutations in public databases and are signatures for predicting global contamination. Next, we assess the performance of 8 filtering methods in terms of mouse read filtration and reduction of mouse-specific alleles. All filtering tools generally perform well, although differences in algorithm strictness and efficiency of mouse allele removal are observed. Therefore, we develop a best practice pipeline that contains the estimation of contamination level, mouse read filtration, and variant filtration.

Conclusions: The inclusion of mouse cells in patient-derived models hinders genomic analysis and should be addressed carefully. Our suggested guidelines improve the robustness and maximize the utility of genomic analysis of these models.

Keywords: Benchmark; Best practice; Genomic analysis; Mouse contamination; Patient-derived model; Read filtering.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Impact assessment of mouse genome on human genome analysis. a Schematic overview of the data production to simulate mouse contaminated sample. b Coverage of five mouse samples on human genome reference (hg19). c Top ranked human functional gene sets enriched by mouse reads. Functional terms are annotated by Gene Ontology (GO). d Distributions of mouse read RPKM in all genes targeted by WES kit, Cancer Gene Census genes, and genes containing cancer hotspot mutations defined in cancer hotspots
Fig. 2
Fig. 2
Schematic overview and characteristics of human genome-aligned mouse allele (HAMA). a Definition of HAMA and their allele frequency. Hf is defined as x/d, where d is the total depth of given position, and x is the depth of all allele from mouse reads. b Common and Strain-specific HAMA. c Types of HAMA alleles. HAMA alleles consist of 87.37% homozygous SNVs, 7.56% heterozygous SNVs, and 5.07% indels. If any of the five mouse samples were reported as heterozygous SNVs, we counted as heterozygous SNVs. d Example of genomic regions that contains high-risk HAMAs (50% contamination ratio, TP53, exons 1–5). The coverage of human reads colored in yellow and mouse reads in blue. Red arrows indicate the genomic regions where the coverage of mouse reads dominates that of human reads. e Distributions of Hf for all HAMA sites in four different global contamination levels (5%, 10%, 20%, and 50%). Median Hf is denoted by dotted lines. f Estimation results of all in silico contaminated dataset based on the linear regression of median Hf. Red dotted line indicates the perfect estimation line
Fig. 3
Fig. 3
Impact of mouse alleles on SNV calling. a A schematic overview of somatic mutation calling on benchmark dataset. b Number of HAMAs and their ratios in somatic mutation call. Numbers are averaged from all the benchmark set. c Number of studies that have reported COSMIC confirmed variants with specified sample origins. Sample origin notation follows the classification of COSMIC database
Fig. 4
Fig. 4
Performance of eight filtering methods measured in the benchmark dataset. a Sensitivity, specificity, and F-scores of eight filtering methods in terms of mouse read filtration. b Total sums of Hf reduction after filtration. c Numbers of callable HAMA (Hf > 5%, alternative allele count > 5) after filtration. d Numbers of mutation calls in high-risk HAMA and non-HAMA sites after filtration
Fig. 5
Fig. 5
Best practice for analysis of PDM sequencing. A robust workflow to analyze human genome data contaminated by mouse genome. ConcatRef, Disambiguate, and XenofilteR are the best suggested filtering method for general purpose. Alternatively, Xenome, XenofilteR, and ConcatRef are also recommended for SNV analysis. After applying a filtering method, further filtering can be optionally achieved by blacklisting using HAMA list. Estimation of contamination ratio can be used as an indicator of whether strict or lenient blacklisting should be applied

References

    1. Williams Juliet. Using PDX for Preclinical Cancer Drug Discovery: The Evolving Field. Journal of Clinical Medicine. 2018;7(3):41. doi: 10.3390/jcm7030041. - DOI - PMC - PubMed
    1. Pompili L, Porru M, Caruso C, Biroccio A, Leonetti C. Patient-derived xenografts: a relevant preclinical model for drug development. J Exp Clin Cancer Res. 2016;35:189. doi: 10.1186/s13046-016-0462-4. - DOI - PMC - PubMed
    1. Gao H, Korn JM, Ferretti S, Monahan JE, Wang Y, Singh M, Zhang C, Schnell C, Yang G, Zhang Y, et al. High-throughput screening using patient-derived tumor xenografts to predict clinical trial drug response. Nat Med. 2015;21:1318. doi: 10.1038/nm.3954. - DOI - PubMed
    1. Guan Z, Lan H, Chen X, Jiang X, Wang X, Jin K. Individualized drug screening based on next generation sequencing and patient derived xenograft model for pancreatic cancer with bone metastasis. 2017. - PMC - PubMed
    1. Chivukula IV, Ramskold D, Storvall H, Anderberg C, Jin S, Mamaeva V, Sahlgren C, Pietras K, Sandberg R, Lendahl U. Decoding breast cancer tissue-stroma interactions using species-specific sequencing. Breast Cancer Res. 2015;17:109. doi: 10.1186/s13058-015-0616-x. - DOI - PMC - PubMed

Publication types

LinkOut - more resources