Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Mar 13;14(1):1358.
doi: 10.1038/s41467-023-36689-5.

Towards routine chromosome-scale haplotype-resolved reconstruction in cancer genomics

Affiliations

Towards routine chromosome-scale haplotype-resolved reconstruction in cancer genomics

Shilpa Garg. Nat Commun. .

Abstract

Cancer genomes are highly complex and heterogeneous. The standard short-read sequencing and analytical methods are unable to provide the complete and precise base-level structural variant landscape of cancer genomes. In this work, we apply high-resolution long accurate HiFi and long-range Hi-C sequencing to the melanoma COLO829 cancer line. Also, we develop an efficient graph-based approach that processes these data types for chromosome-scale haplotype-resolved reconstruction to characterise the cancer precise structural variant landscape. Our method produces high-quality phased scaffolds on the chromosome level on three healthy samples and the COLO829 cancer line in less than half a day even in the absence of trio information, outperforming existing state-of-the-art methods. In the COLO829 cancer cell line, here we show that our method identifies and characterises precise somatic structural variant calls in important repeat elements that were missed in short-read-based call sets. Our method also finds the precise chromosome-level structural variant (germline and somatic) landscape with 19,956 insertions, 14,846 deletions, 421 duplications, 52 inversions and 498 translocations at the base resolution. Our simple pstools approach should facilitate better personalised diagnosis and disease management, including predicting therapeutic responses.

PubMed Disclaimer

Conflict of interest statement

The author declares no competing interests.

Figures

Fig. 1
Fig. 1. COLO829 HiFi/Hi-C sequencing and SV discovery.
a Read length and coverage characteristics of HiFi and Hi-C sequencing of COLO829 (Top). b Benchmarking of somatic SVs in repeat elements (Middle). SV call sets: short-read-based NYGC call set (https://www.nygenome.org/bioinformatics/3-cancer-cell-lines-on-2-sequencers/), single-cell-based call set available from Enrique Velazquez-Villarreal et al. 2020, multi-technology-based UMCU call set (https://github.com/UMCUGenetics/COLO829_somaticSV) and pstools. Each bar shows the number of variants agreed between call sets for specific repeat elements. c Germline SV calls in repeat elements. X-axis: repeat elements, Y-axis: number of variants (Bottom). Source data are provided as a Source Data file.
Fig. 2
Fig. 2. Overview of the pstools algorithm.
a Produce a HiFi sequence graph that retains bubbles and any complex events, b Map the Hi-C reads to node sequences in the sequence graph, c Phase the bubble chains in the graph to produce haplotype paths (phased contigs), d Connect haplotype paths across components to produce phased scaffolds.
Fig. 3
Fig. 3. Left: Regions from two chromosomes (chr13 and chr14) are fused in a component due to a common repetitive sequence in HG002.
The Hi-C information in the graph (specifically along each green and red path) is helpful for disentangling the chromosomes. Right: The starting regions of chromosome (chr6 and chr1) arms occur in different components. The phasing information for Hi-C (connecting alleles in bubbles) is useful for accurately connecting the starting regions of chromosome arms.
Fig. 4
Fig. 4. Ideogram of phased sequences of HG002 (left) and COLO829 (right).
One colour for each chromosome representing chromosome level. No colour represents gaps for complex regions, for example, centromeric, acrocentric, etc.
Fig. 5
Fig. 5
a Whole-genome precise SV characterisation of COLO829 (Top). SV types and size distributions and the circos plot shows SV distribution for chromosome 1. b Identification of a homozygous 12 kbp deletion affecting PTEN on chromosome 10 (Bottom). Source data are provided as a Source Data file.
Fig. 6
Fig. 6. Copy number profile correlation on all chromosomes.
Coverage distribution from HiFi (Y axis: 0–100) top and HiC data (Y axis: 0–100) middle and pstools phased sequences (Y axis: 0–25) bottom against reference genome, for visualisation of copy number profile correlation on all chromosomes (X axis: chromosome sizes). Source data are provided as a Source Data file.

References

    1. Yi K, Ju YS. Patterns and mechanisms of structural variations in human cancer. Exp. Mol. Med. 2018;50:1–11. doi: 10.1038/s12276-018-0112-3. - DOI - PMC - PubMed
    1. Wang W-J, Li L-Y, Cui J-W. Chromosome structural variation in tumorigenesis: mechanisms of formation and carcinogenesis. Epigenet. Chromatin. 2020;13:49. doi: 10.1186/s13072-020-00371-7. - DOI - PMC - PubMed
    1. ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Author Correction: Pan-cancer analysis of whole genomes. Nature614, E39 10.1038/s41586-022-05598-w (2023). - PMC - PubMed
    1. Sakamoto Y, Sereewattanawoot S, Suzuki A. A new era of long-read sequencing for cancer genomics. J. Hum. Genet. 2020;65:3–10. doi: 10.1038/s10038-019-0658-5. - DOI - PMC - PubMed
    1. Garg S. Computational methods for chromosome-scale haplotype reconstruction. Genome Biol. 2021;22:101. doi: 10.1186/s13059-021-02328-9. - DOI - PMC - PubMed

Publication types