Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 May 21;25(1):130.
doi: 10.1186/s13059-024-03267-x.

HATCHet2: clone- and haplotype-specific copy number inference from bulk tumor sequencing data

Affiliations

HATCHet2: clone- and haplotype-specific copy number inference from bulk tumor sequencing data

Matthew A Myers et al. Genome Biol. .

Abstract

Bulk DNA sequencing of multiple samples from the same tumor is becoming common, yet most methods to infer copy-number aberrations (CNAs) from this data analyze individual samples independently. We introduce HATCHet2, an algorithm to identify haplotype- and clone-specific CNAs simultaneously from multiple bulk samples. HATCHet2 extends the earlier HATCHet method by improving identification of focal CNAs and introducing a novel statistic, the minor haplotype B-allele frequency (mhBAF), that enables identification of mirrored-subclonal CNAs. We demonstrate HATCHet2's improved accuracy using simulations and a single-cell sequencing dataset. HATCHet2 analysis of 10 prostate cancer patients reveals previously unreported mirrored-subclonal CNAs affecting cancer genes.

Keywords: Allele-specific; Cancer; Clone; Copy-number aberrations; DNA sequencing; Genomics; Haplotype; Tumor heterogeneity.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Overview of HATCHet2. Starting from aligned DNA sequence reads from one or more bulk tumor samples from the same patient, HATCHet2 identifies integer-valued copy-number profiles for multiple tumor clones along with the proportion of each clone in every sample. Steps in orange boxes indicate key methodological improvements in HATCHet2 compared to HATCHet [30], with orange text highlighting the new features in each step
Fig. 2
Fig. 2
Copy-number segments identified by HATCHet2, HATCHet [30], and Battenberg [23] in prostate cancer. A (Top) Lengths of segments identified by the three methods from 10 prostate cancers from Gundem et al., 2015 [42]. Dotted gray lines indicate 1 kilobase and 1 megabase. (Bottom) Numbers of segments inferred by each method on each prostate cancer patient. B Number of segments identified by each method for each patient (row) within 1 megabase of 41 genes (columns) from The Cancer Genome Atlas prostate cancer publication [72]. C Copy-number segments identified by HATCHet2 near the TP53 locus on chromosome 17 in one sample from patient A12. (Two other samples from this patient are shown in Additional file 1: Fig. S3.) Each point is a small genomic region that contains exactly one SNP with indicated read-depth ratio (RDR) and B-allele frequency (BAF) and is colored by the assigned copy-number state for the corresponding method. Black bars indicate the expected RDR and BAF of each segment according to the copy-number states and clone proportions assigned by the corresponding method. Gene location is indicated by vertical purple bar. Full copy-number state legends for panels CF are reported in Additional file 1: Fig. S5. D Copy-number segments identified by HATCHet near TP53 for the same sample as panel C (Battenberg results for this patient are shown in Additional file 1: Fig. S3). E Copy-number segments identified by HATCHet2 and F Battenberg near the CANT1 locus on chromosome 17 in two samples from patient A10 (two other samples from this patient are shown in Additional file 1: Fig. S4)
Fig. 3
Fig. 3
Mirrored-subclonal copy-number aberrations in prostate cancer identified by HATCHet2. A Copy-number landscape for 41 prostate cancer genes from The Cancer Genome Atlas [72] across 10 prostate cancer patients from Gundem et al. [42]. White entries indicate that no copy-number aberration was identified at the locus. Black entries indicate that mirrored-subclonal CNAs were observed among the tumor clones at the locus. Gray entries indicate that a non-mirrored copy-number aberration was inferred for at least one clone at the locus. B Inferred copy-number states for the single tumor clone present in prostate cancer sample A10-A. Each point is a genomic bin whose position corresponds to its inferred minor haplotype BAF (mhBAF, x-axis) and fractional copy number (FCN, a rescaling of the read-depth ratio, y-axis). Each point is colored by the copy-number state assigned to the bin. Points labeled (ab) are the expected position of the corresponding haplotype-specific copy-number state with a copies of the major haplotype and b copies of the minor haplotype. Dotted blue box indicates mirrored-subclonal CNAs, and red box indicates the mirrored-subclonal CNA examined in panels D–F. C Fractional copy number (top) and mhBAF (bottom) values across the genome. Black lines indicate the expected FCN and mhBAF of the assigned copy-number state (analogous to labeled points in panel B). Dotted blue boxes indicates mirrored-subclonal CNAs, and red box indicates the mirrored-subclonal CNA examined in panels DF. Points are colored by the assigned haplotype-specific copy-number state as in B. D Inferred haplotype-specific copy numbers (ab) (first row) and clone proportions (entries in table) for the normal clone (N) and 4 tumor clones (1–4) for the segment containing the genes ELK4 and SLC45A3. E BAF values (i.e., the fraction of reads with the non-reference allele) across samples for SNPs in the bin containing genes ELK4 (green bar) and SLC45A1 (purple bar). Blue points indicate SNPs that have BAF 0.5 in sample A10-A, while red points indicate SNPs with BAF >0.5 in sample A10-A. Note that in samples A10-C and A10-D, the blue and red points are reflected across the dotted line at BAF = 0.5 relative to sample A10-A, indicating mirrored-subclonal CNA. F Haplotype-phased BAF values (i.e., either the fraction of alternate reads or the fraction of reference reads as indicated by the phasing inferred by HATCHet2) across samples for SNPs in the bin containing genes ELK1 and SLC45A1. SNPs are colored as in panel E. Note that SNPs of different colors (i.e., different BAF values in A10-A) have been grouped together via HATCHet2’s mhBAF inference algorithm to show that the haplotype containing these SNPs is more abundant in sample A10-A (phased BAFs >0.5) but less abundant in samples A10-C and A10-D (phased BAFs <0.5)
Fig. 4
Fig. 4
A Haplotype-specific copy-number profiles for two tumor clones identified by HATCHet2 on 4 pseudobulk samples from single-cell whole-genome sequencing data from 7914 cells of 4 sections a breast tumor. Each copy-number segment is colored by the assigned haplotype-specific copy-number state (labeled in the legend on the right). B Single-cell haplotype-specific copy-number profiles identified by CHISEL [15]. Rows correspond to cells and are grouped into copy-number clones, which are annotated on the left. Each entry is colored by its assigned haplotype-specific copy-number state, as in panel A. C Clone proportions inferred by HATCHet2 (left bar in each pair) and CHISEL (right bar in each pair) in each sample (x-axis)
Fig. 5
Fig. 5
Inference of minor haplotype BAF (mhBAF). A The BAF βj,p for SNP j and sample p is the fraction of reads covering the SNP that contain the alternate allele (red) and is a signal for allelic imbalance. B Positions of five heterozygous SNPs, with alternate alleles given in red, for a bin with a mirrored-subclonal copy-number aberration, with distinct haplotype-specific copy-number states in two tumor clones. C Reference-based phasing groups together SNPs that are likely to be on the same haplotype based on a reference database. We combine read counts for each of these haplotype blocks to obtain a single BAF estimate. D The minor haplotype BAF (mhBAF) is estimated by inferring a phasing h^ that assigns alternate alleles to a haplotype, and computing the frequency fp(h^) of this haplotype in each sample p

Update of

References

    1. Gerstung M, Jolly C, Leshchiner I, Dentro SC, Gonzalez S, Rosebrock D, Mitchell TJ, Rubanova Y, Anur P, Yu K, et al. The evolutionary history of 2,658 cancers. Nature. 2020;578(7793):122–128. doi: 10.1038/s41586-019-1907-7. - DOI - PMC - PubMed
    1. Li Y, Roberts ND, Wala JA, Shapira O, Schumacher SE, Kumar K, Khurana E, Waszak S, Korbel JO, Haber JE, et al. Patterns of somatic structural variation in human cancer genomes. Nature. 2020;578(7793):112–121. doi: 10.1038/s41586-019-1913-9. - DOI - PMC - PubMed
    1. Drews RM, Hernando B, Tarabichi M, Haase K, Lesluyes T, Smith PS, Morrill Gavarró L, Couturier D-L, Liu L, Schneider M, et al. A pan-cancer compendium of chromosomal instability. Nature. 2022;606(7916):976–983. doi: 10.1038/s41586-022-04789-9. - DOI - PMC - PubMed
    1. Steele CD, Abbasi A, Islam SA, Bowes AL, Khandekar A, Haase K, Hames-Fathi S, Ajayi D, Verfaillie A, Dhami P, et al. Signatures of copy number alterations in human cancer. Nature. 2022;606(7916):984–991. doi: 10.1038/s41586-022-04738-6. - DOI - PMC - PubMed
    1. Weinstein JN, Collisson EA, Mills GB, Shaw KR, Ozenberger BA, Ellrott K, Shmulevich I, Sander C, Stuart JM. The cancer genome atlas pan-cancer analysis project. Nat Genet. 2013;45(10):1113–1120. doi: 10.1038/ng.2764. - DOI - PMC - PubMed

Publication types

LinkOut - more resources