Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Sep 1;34(17):i671-i679.
doi: 10.1093/bioinformatics/bty589.

SPhyR: tumor phylogeny estimation from single-cell sequencing data under loss and error

Affiliations

SPhyR: tumor phylogeny estimation from single-cell sequencing data under loss and error

Mohammed El-Kebir. Bioinformatics. .

Abstract

Motivation: Cancer is characterized by intra-tumor heterogeneity, the presence of distinct cell populations with distinct complements of somatic mutations, which include single-nucleotide variants (SNVs) and copy-number aberrations (CNAs). Single-cell sequencing technology enables one to study these cell populations at single-cell resolution. Phylogeny estimation algorithms that employ appropriate evolutionary models are key to understanding the evolutionary mechanisms behind intra-tumor heterogeneity.

Results: We introduce Single-cell Phylogeny Reconstruction (SPhyR), a method for tumor phylogeny estimation from single-cell sequencing data. In light of frequent loss of SNVs due to CNAs in cancer, SPhyR employs the k-Dollo evolutionary model, where a mutation can only be gained once but lost k times. Underlying SPhyR is a novel combinatorial characterization of solutions as constrained integer matrix completions, based on a connection to the cladistic multi-state perfect phylogeny problem. SPhyR outperforms existing methods on simulated data and on a metastatic colorectal cancer.

Availability and implementation: SPhyR is available on https://github.com/elkebir-group/SPhyR.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Tumor phylogeny estimation from single-cell sequencing (SCS) data. Heterogeneous tumors are composed of distinct cellular populations with distinct complements of somatic mutations, including single-nucleotide variants (SNVs) and copy-number aberrations (CNAs). During cancer progression, SNVs are frequently lost due to copy-number aberrations, but rarely introduced more than once. Here, single-cell sequencing of a tumor yields an input matrix D, whose m rows are cells and n columns are SNVs. Matrix D has incorrect and/or missing entries. We aim to simultaneously correct errors in matrix D and infer the evolutionary history of the m cells, yielding output matrix B and the corresponding phylogenetic tree T. The evolutionary model employed by our method SPhyR is the k-Dollo parsimony model, where each SNV can only be gained once but lost at most k times. SPhyR is based on a combinatorial characterization of k-Dollo phylogenetic trees T as k-Dollo completions A of a binary matrix B
Fig. 2.
Fig. 2.
Cutting plane and column generation enables SPhyR to efficiently solve practical k-DP instances. We show results for m × n binary matrices B=[bp,c] where m = n. (a) The number of solved instances for varying dimensions and maximum number k of character losses. For each k and m × n, there are 60 simulated instances. SPhyR solved all k = 1 instances (blue) to optimality, but exceeded the run time limit for k = 3 instances (red) with dimensions 100 × 100. (b) The run time in seconds (logarithmic scale) increased with increasing k and m × n. (c) The fraction of entries bp,c=0. (d) The percentage of model variables ap,c,i instantiated during column generation. (e) The percentage of model constraints (logarithmic scale) added during separation. (f) The number of column generation iterations. Only a single iteration is required if B is a perfect phylogeny matrix
Fig. 3.
Fig. 3.
Simulation setup and comparison measures. (a) Given the number m of taxa and n of characters, we use the ms package (Hudson, 2002) to simulate a perfect phylogeny tree. Subsequently, we introduce at most k losses per character using a rate λ, yielding the simulated phylogenetic tree T* and matrix B*. (b) We then perturb the entries of B*=[bp,c*] given a false positive rate FPR(D)=α* and false negative rate FPNR(D)=β*, yielding the input matrix D=[dp,c]. Entry dp,c=0 is a true negative (TN) if bp,c*=0 and a false negative (FN) if bp,c*=1. Conversely, dp,c=1 is a false positive (FP) if bp,c*=0 and a true positive (TP) if bp,c*=1. (c) Given D, α* and β*, a phylogeny estimation method yields output matrix B=[bp,c]. (d) In addition, such a method outputs a phylogenetic tree T whose leaves form the rows of output matrix B. (e) To compare T and T*, we compute the recall in terms of pairs of character states that are ancestral (A), on distinct branches (incomparable, I), or on the same edge (clustered, C). A recall of 1 for all three measures implies that (the internal nodes of) T and T* are identical. To compare B and B*, we compute FPR(B) and FNR(B)—if both are 0 then B=B*
Fig. 4.
Fig. 4.
SPhyR more accurately recovers the simulated matrices B* and trees T* than SCITE and SiFit. Given the same input matrix D, each method inferred an output matrix B and phylogenetic tree T. (a–c) The tradeoff between the false negative rate (FNR) and the false positive rate (FPR) for each matrix B output by each method. (d–f) Three different measures that assess similarity between T* and T in terms of character states occurring on the same branch (d), on distinct branches (e) and on the same edge (f) in both T* and T

References

    1. Agarwala R., Fernández-Baca D. (1994) A polynomial-time algorithm for the perfect phylogeny problem when the number of character states is fixed. SIAM J. Comput., 23, 1216–1224.
    1. Bodlaender H.L., et al. (1992) Two strikes against perfect phylogeny. In: Kuich W. (ed.) Automata, Languages and Programming. ICALP 1992. Lecture Notes in Computer Science. Springer, Berlin, Heidelberg: Vol 623.
    1. Bonizzoni P., et al. (2012) The binary perfect phylogeny with persistent characters. Theor. Comput. Sci., 454, 51–63.
    1. Bonizzoni P., et al. (2017a) A colored graph approach to perfect phylogeny with persistent characters. Theor. Comput. Sci., 658, 60–73.
    1. Bonizzoni P., et al. (2017b) Beyond perfect phylogeny: multisample phylogeny reconstruction via ilp. In: Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics ,ACM-BCB ‘17, ACM, New York, NY, USA. pp. 1–10.

Publication types