Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 Jul 15:2023.07.13.548855.
doi: 10.1101/2023.07.13.548855.

HATCHet2: clone- and haplotype-specific copy number inference from bulk tumor sequencing data

Affiliations

HATCHet2: clone- and haplotype-specific copy number inference from bulk tumor sequencing data

Matthew A Myers et al. bioRxiv. .

Update in

Abstract

Multi-region DNA sequencing of primary tumors and metastases from individual patients helps identify somatic aberrations driving cancer development. However, most methods to infer copy-number aberrations (CNAs) analyze individual samples. We introduce HATCHet2 to identify haplotype- and clone-specific CNAs simultaneously from multiple bulk samples. HATCHet2 introduces a novel statistic, the mirrored haplotype B-allele frequency (mhBAF), to identify mirrored-subclonal CNAs having different numbers of copies of parental haplotypes in different tumor clones. HATCHet2 also has high accuracy in identifying focal CNAs and extends the earlier HATCHet method in several directions. We demonstrate HATCHet2's improved accuracy using simulations and a single-cell sequencing dataset. HATCHet2 analysis of 50 prostate cancer samples from 10 patients reveals previously-unreported mirrored-subclonal CNAs affecting cancer genes.

Keywords: DNA sequencing; allele-specific; cancer; clone; copy-number aberrations; genomics; haplotype; tumor heterogeneity.

PubMed Disclaimer

Conflict of interest statement

Competing interests None to report.

Figures

Figure 1
Figure 1. Overview of HATCHet2.
Starting from aligned DNA sequence reads from one or more bulk tumor samples, HATCHet2 identifies integer-valued copy-number profiles for multiple tumor clones along with the proportion of each clone in every sample. Steps in orange boxes indicate key methodological improvements in HATCHet2 compared to HATCHet [32], with orange text highlighting the new features in each step.
Figure 2
Figure 2. Copy-number segments identified by HATCHet2, HATCHet [32], and Battenberg [25] in prostate cancer.
A Lengths of segments identified by the three methods from 10 prostate cancers from Gundem et al., 2015 [44]. Dotted gray lines indicate 1 kilobase and 1 megabase. B Number of segments identified by each method for each patient (row) within 1 megabase of 35 genes (columns) from the Cancer Gene Census (CGC) [75]. C Copy-number segments identified by HATCHet2 near the TP53 locus on chromosome 17 in one sample from patient A12. (Two other samples from this patient are shown in Fig. S2.) Each point is a genomic location with indicated read-depth ratio (RDR) and B-allele frequency (BAF), and colored by the assigned copy-number state. Black bars indicate the expected RDR and BAF of each segment according to the copy-number states and clone proportions assigned by the corresponding method. Gene location is indicated by vertical purple bar. D Copy-number segments identified by HATCHet near TP53 for the same sample as panel C. E Copy-number segments identified by HATCHet2 and (F) Battenberg near the CANT1 locus on chromosome 17 in two samples from patient A10. (Two other samples from this patient are shown in Fig. S3.)
Figure 3
Figure 3. Mirrored-subclonal copy-number aberrations in prostate cancer identified by HATCHet2.
A Copy-number landscape for 35 prostate cancer genes from COSMIC Cancer Gene Census [75] across 10 prostate cancer patients from [44]. White entries indicate that no copy-number aberration was identified at the locus. Black entries indicate that mirrored-subclonal CNAs were observed among the tumor clones at the locus. Gray entries indicate that a non-mirrored copy-number aberration was inferred for at least one clone at the locus. B Inferred copy-number states for the single tumor clone present in prostate cancer sample A10-A. Each point is a genomic bin whose position corresponds to its inferred mirrored haplotype BAF (mhBAF, x-axis) and fractional copy number (FCN, y-axis). Each bin is colored by its inferred copy-number state. Points labeled (a,b) are the expected position of the corresponding haplotype-specific copy-number state with a copies of the major haplotype and b copies of the minor haplotype. Dotted blue box indicates mirrored-subclonal CNAs, and red box indicates the mirrored-subclonal CNA examined in panels D-F. C Fractional copy number (top) and mhBAF (bottom) values across the genome. Black lines indicate the expected FCN and mhBAF of the assigned copy-number state (analogous to labeled points in panel B). Dotted blue boxes indicates mirrored-subclonal CNAs, and red box indicates the mirrored-subclonal CNA examined in panels D-F. D Inferred haplotype-specific copy numbers (a,b) (first row) and clone proportions (entries in table) for the normal clone (N) and 4 tumor clones (1–4) for the segment containing the genes ELK4 and SLC45A3. E BAF values (i.e., the fraction of reads with the non-reference allele) across samples for SNPs in the bin containing genes ELK4 (green bar) and SLC45A1 (purple bar). Blue points indicate SNPs that have BAF ≤ 0.5 in sample A10-A, while red points indicate SNPs with BAF>0.5 in sample A10-A. Note that in samples A10-C and A10-D, the blue and red points are reflected across the dotted line at BAF=0.5 relative to sample A10-A, indicating mirrored-subclonal CNA. F Haplotype-phased BAF values (i.e., either the fraction of alternate reads or the fraction of reference reads as indicated by the phasing inferred by HATCHet2) across samples for SNPs in the bin containing genes ELK1 and SLC45A1. SNPs are colored as in panel E. Note that SNPs of different colors (i.e., different BAF values in A10-A) have been grouped together via HATCHet2’s mhBAF inference algorithm to show that the haplotype containing these SNPs is more abundant in sample A10-A (phased BAFs > 0.5) but less abundant in samples A10-C and A10-D (phased BAFs < 0.5).
Figure 4
Figure 4
A Haplotype-specific copy-number profiles for two tumor clones identified by HATCHet2 on 4 pseudobulk samples from single-cell whole-genome sequencing data from 7914 cells of 4 sections a breast tumor. Each copy-number segment is colored by the assigned haplotype-specific copy-number state (labeled in the legend on the right). B Single-cell haplotype-specific copy-number profiles identified by CHISEL [17]. Rows correspond to cells and are grouped into copy-number clones, which are annotated on the left. Each entry is colored by its assigned haplotype-specific copy-number state, as in panel A. C Clone proportions inferred by HATCHet2 (left bar in each pair) and CHISEL (right bar in each pair) in each sample (x-axis).
Figure 5
Figure 5. Inference of minor haplotype BAF (mhBAF).
A The BAF βj,p for SNP j and sample p is the fraction of reads covering the SNP that contain the alternate allele (red), and is a signal for allelic imbalance. B Positions of five heterzygous SNPs, with alternate alleles given in red, for a bin with a mirrored-subclonal copy-number aberration, with distinct haplotype-specific copy-number states in two tumor clones. C Reference-based phasing groups together SNPs that are likely to be on the same haplotype based on a reference database. We combine read counts for each of these haplotype blocks to obtain a single BAF estimate. D The mirrored haplotype BAF (mhBAF) is estimated by inferring a phasing hˆ that assigns alternate alleles to a haplotype, and computing the frequency fp(hˆ) of this haplotype in each sample p.

References

    1. Gerstung M., Jolly C., Leshchiner I., Dentro S.C., Gonzalez S., Rosebrock D., Mitchell T.J., Rubanova Y., Anur P., Yu K., et al. : The evolutionary history of 2,658 cancers. Nature 578(7793), 122–128 (2020) - PMC - PubMed
    1. Li Y., Roberts N.D., Wala J.A., Shapira O., Schumacher S.E., Kumar K., Khurana E., Waszak S., Korbel J.O., Haber J.E., et al. : Patterns of somatic structural variation in human cancer genomes. Nature 578(7793), 112–121 (2020) - PMC - PubMed
    1. Gerstung M., Jolly C., Leshchiner I., Dentro S.C., Gonzalez S., Rosebrock D., Mitchell T.J., Rubanova Y., Anur P., Yu K., et al. : The evolutionary history of 2,658 cancers. Nature 578(7793), 122–128 (2020) - PMC - PubMed
    1. Drews R.M., Hernando B., Tarabichi M., Haase K., Lesluyes T., Smith P.S., Morrill Gavarró L., Couturier D.-L., Liu L., Schneider M., et al. : A pan-cancer compendium of chromosomal instability. Nature 606(7916), 976–983 (2022) - PMC - PubMed
    1. Steele C.D., Abbasi A., Islam S.A., Bowes A.L., Khandekar A., Haase K., Hames-Fathi S., Ajayi D., Verfaillie A., Dhami P., et al. : Signatures of copy number alterations in human cancer. Nature 606(7916), 984–991 (2022) - PMC - PubMed

Publication types