Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Apr 15:17:69.
doi: 10.1186/s13059-016-0929-9.

OncoNEM: inferring tumor evolution from single-cell sequencing data

Affiliations

OncoNEM: inferring tumor evolution from single-cell sequencing data

Edith M Ross et al. Genome Biol. .

Abstract

Single-cell sequencing promises a high-resolution view of genetic heterogeneity and clonal evolution in cancer. However, methods to infer tumor evolution from single-cell sequencing data lag behind methods developed for bulk-sequencing data. Here, we present OncoNEM, a probabilistic method for inferring intra-tumor evolutionary lineage trees from somatic single nucleotide variants of single cells. OncoNEM identifies homogeneous cellular subpopulations and infers their genotypes as well as a tree describing their evolutionary relationships. In simulation studies, we assess OncoNEM's robustness and benchmark its performance against competing methods. Finally, we show its applicability in case studies of muscle-invasive bladder cancer and essential thrombocythemia.

Keywords: Cancer evolution; Phylogenetic tree; Single-cell sequencing; Tumor evolution; Tumor heterogeneity.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Toy example of the OncoNEM scoring model. a Hypothesis of a clonal lineage tree that describes the subpopulations of a tumor (grey circles) and their relationships (black arrows). b This tree can be represented as a prediction matrix that predicts the mutation pattern we expect to see across all k cells for a mutation that occurred in a certain clone θ. c Assuming that we know the originating clone of every mutation (blue lines in clonal lineage tree), we can extend the prediction matrix to a full matrix of expected genotypes. d To score the tree, expected genotypes are compared to observed genotypes. The more mismatches there are, the lower the likelihood of the tree given the data. Since the origin of a mutation is unknown a priori, the full likelihood of the lineage tree is calculated by marginalizing over all possible origins for every mutation. FN false negative, FP false positive
Fig. 2
Fig. 2
Toy example of OncoNEM inference steps. Given the observed genotypes and the input parameters α and β, the log-likelihood of the start tree, which is by default a star-shaped tree, is −47.61. In the first step of the initial search, all neighbors of the star tree are scored. The highest scoring tree obtained in this step has a log-likelihood of −34.26. In this toy example, the highest scoring tree of the first step is also the best cell lineage tree, overall. Therefore, the initial search terminates with this tree as a solution. In the first refinement step, we find that inserting an unobserved node into the branch point of our current tree increases the log-likelihood by 3.82. Since this improvement is larger than the Bayes factor threshold of 2.3, the solution with the unobserved clone is accepted. In the final refinement step, cells are clustered along edges. In the toy example, only one clustering step does not decrease the log-likelihood by more than log(ε)
Fig. 3
Fig. 3
Parameter estimation. a Dependence of OncoNEM results on inference parameters. Log Bayes factor of highest scoring model inferred with given parameter combination relative to highest scoring model overall. The inferred parameters (α^=0.22, β^=0.08) are close to the ground truth (α=0.2, β=0.1). A large range of parameter combinations around the ground truth parameters yield solutions close to the ground truth tree in terms of pairwise cell shortest-path distance and V-measure. The distance was normalized to the largest distance observed between any inferred tree and the ground truth. b Parameter estimation accuracy. FPRs and FNRs estimated by OncoNEM for various simulation settings with five replicates each. The blue lines mark the ground truth parameters. The grey lines mark the grid values over which FPR and FNR were optimized
Fig. 4
Fig. 4
Dependence of OncoNEM’s clustering solution on Bayes factor threshold ε. This figure shows the V-measure and the number of clones of the OncoNEM solution as a function of ε for various simulation scenarios. Every line corresponds to one data set of the method comparison study. Lines are color coded by parameter setting for the varied simulation parameter. In all simulation scenarios, the number of clones is largely independent of ε, unless it is set to be unreasonably small (ε<5). The threshold ε used throughout the simulation and case studies is 10 (dashed line), and thus well within the stable range
Fig. 5
Fig. 5
OncoNEM performance assessment. a Performance comparison of OncoNEM and five baseline methods. Shown are the distance and V-measure of inferred trees to ground truth. Results of single simulations are marked by dots and colored by method, while black horizontal bars indicate the mean over five simulations for each method. The distances shown were normalized for the number of cells n in the trees and were obtained by dividing the pairwise cell shortest-path distances by n(n−1)/2. Distances could only be calculated for three of the baseline methods. Values of the varied parameters are shown in the panels at the top. As default parameters, we used an FNR of 0.1, an FPR of 0.2, 200 sites, ten clones, no unobserved clones, 20 cells and 20 % missing values. b Performance comparison of OncoNEM and Kim and Simon’s oncogenetic tree method. Shown is the mutation order accuracy of the inferred trees for each of the simulated data sets. This measure is undefined for data sets without mutually exclusive mutations. Therefore, no values are shown for the single-clone case and the first replicate of the five-clone scenario, for which the simulated tree is linear
Fig. 6
Fig. 6
Case study results. a, b Results inferred by OncoNEM on bladder cancer data set. The estimated error rates are α=0.185 and β=0.08. The inferred tree suggests a branching evolution with three major subpopulations. c, d Results inferred by OncoNEM on the essential thrombocythemia data set. The estimated error rates are α=0.255 and β=0.185. The inferred tree suggests a largely linear evolution with some small subpopulations branching off late during tumor evolution
Fig. 7
Fig. 7
Comparing clonal trees with the pairwise cell shortest-path distance. The yellow entries in the pairwise distance matrices indicate differences from the reference tree

References

    1. Nowell PC. The clonal evolution of tumor cell populations. Science. 1976;194(4260):23–8. doi: 10.1126/science.959840. - DOI - PubMed
    1. Chowdhury SA, Shackney SE, Heselmeyer-Haddad K, Ried T, Schäffer AA, Schwartz R. Phylogenetic analysis of multiprobe fluorescence in situ hybridization data from tumor cell populations. Bioinformatics. 2013;29(13):189–98. doi: 10.1093/bioinformatics/btt205. - DOI - PMC - PubMed
    1. Sidow A, Spies N. Concepts in solid tumor evolution. Trends Genet. 2015;31(4):208–14. doi: 10.1016/j.tig.2015.02.001. - DOI - PMC - PubMed
    1. Nik-Zainal S, Van Loo P, Wedge DC, Alexandrov LB, Greenman CD, Lau KW, et al. The life history of 21 breast cancers. Cell. 2012;149(5):994–1007. doi: 10.1016/j.cell.2012.04.023. - DOI - PMC - PubMed
    1. Oesper L, Mahmoody A, Raphael BJ. THetA: inferring intra-tumor heterogeneity from high- throughput DNA sequencing data. Genome Biol. 2013; 14. [doi:10.1186/gb-2013-14-7-r80]. - DOI - PMC - PubMed

Publication types