Review

. 2024 Oct 15;22(4):qzae048.

doi: 10.1093/gpbjnl/qzae048.

The Bioinformatic Applications of Hi-C and Linked Reads

Libo Jiang¹, Michael A Quail², Jack Fraser-Govil², Haipeng Wang¹, Xuequn Shi³, Karen Oliver², Esther Mellado Gomez², Fengtang Yang¹, Zemin Ning²

Affiliations

¹ School of Life Sciences and Medicine, Shandong University of Technology, Zibo 255049, China.
² The Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
³ College of Food Science and Technology, Hainan University, Haikou 570228, China.

PMID: 38905513
PMCID: PMC11580686
DOI: 10.1093/gpbjnl/qzae048

Review

The Bioinformatic Applications of Hi-C and Linked Reads

Libo Jiang et al. Genomics Proteomics Bioinformatics. 2024.

. 2024 Oct 15;22(4):qzae048.

doi: 10.1093/gpbjnl/qzae048.

Authors

Libo Jiang¹, Michael A Quail², Jack Fraser-Govil², Haipeng Wang¹, Xuequn Shi³, Karen Oliver², Esther Mellado Gomez², Fengtang Yang¹, Zemin Ning²

Affiliations

¹ School of Life Sciences and Medicine, Shandong University of Technology, Zibo 255049, China.
² The Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
³ College of Food Science and Technology, Hainan University, Haikou 570228, China.

PMID: 38905513
PMCID: PMC11580686
DOI: 10.1093/gpbjnl/qzae048

Abstract

Long-range sequencing grants insight into additional genetic information beyond what can be accessed by both short reads and modern long-read technology. Several new sequencing technologies, such as "Hi-C" and "Linked Reads", produce long-range datasets for high-throughput and high-resolution genome analyses, which are rapidly advancing the field of genome assembly, genome scaffolding, and more comprehensive variant identification. In this review, we focused on five major long-range sequencing technologies: high-throughput chromosome conformation capture (Hi-C), 10X Genomics Linked Reads, haplotagging, transposase enzyme linked long-read sequencing (TELL-seq), and single- tube long fragment read (stLFR). We detailed the mechanisms and data products of the five platforms and their important applications, evaluated the quality of sequencing data from different platforms, and discussed the currently available bioinformatics tools. This work will benefit the selection of appropriate long-range technology for specific biological studies.

Keywords: Genome assembly; Hi-C; Linked Reads; Long-range NGS reads; Quality assessment.

© The Author(s) 2024. Published by Oxford University Press and Science Press on behalf of the Beijing Institute of Genomics, Chinese Academy of Sciences / China National Center for Bioinformation and Genetics Society of China.

PubMed Disclaimer

Conflict of interest statement

The authors have declared no competing interests.

Figures

**Figure 1**
Work flow of library preparation for five long-range platforms A. Hi-C. B. 10X Genomics Linked Reads. C. Haplotagging, TELL-seq, and stLFR. Hi-C, high-throughput chromosome conformation capture; TELL-seq, transposase enzyme linked long-read sequencing; stLFR, single-tube long fragment read; GEM, gel bead in emulsion; HMW-DNA, high molecular weight DNA.

**Figure 2**
Hi-C maps for three human datasets A. Arima V2 NA12878-CEU. B. Arima V2 NA24385-AJ. C. Arima V1 NA12878-CEU. Each square block of the map represents an individual human chromosome, and darker region indicates higher contact density. These three datasets are all from female samples as there are hardly any contact interactions in chromosome Y.

**Figure 3**
Characteristics of Hi-C reads A. Link-separation distance distribution: the distance in the linear genome between two reads which are coupled together by the Hi-C protocol, grouped into bins of 100 bp, and expressed as a percentage frequency. The Arima V2 Oak dataset is included, which demonstrates the breakdown of the power-law relationship of Equation 1, highlighting the desirable features present in the human datasets. The peaks which appear in all three human datasets at $LSD \approx {1.085 \times 10}^{7}$ are of unknown origins, though they are suspercted to be artefacts of the alignment methods used. B. ICI rate: the percentage of paired reads mapped to different chromsomes. The existance of inter-chromsomal pairs is not a desired feature for our purposes. In our quality control experiments, we set up a threshold of 30%, above which the dataset will be marked as failure. C. Base coverage distribution. Although all datasets are covered to approximately the same depth (30×), they show very different distributions around this value — a non-Hi-C dataset (Illumina) is included as a comparison. LSD, link-eparation distance; ICI, inter-chromosomal interaction.

**Figure 4**
Length distributions for various 10X and haplotagging datasets Reads are grouped into fragments by barcodes, with shared barcodes identified and removed. The length of a fragment is the region covered by mapping coordinates from the Linked Reads which share the same barcode.

**Figure 5**
Base coverage profiles for various 10X and haplotagging datasets The 10X and haplotagging downsampled datasets at ∼ 30× are used to remove the effects of differing coverage depths. In terms of coverage evenness, the datasets of rat and oak are not as smooth as other samples.

**Figure 6**
Hi-C maps on contigs and scaffolded assemblies A. Contig fragmentation is clealy observed in the Hi-C map. B. Assembly with Arima V1 data. A chromosome-level assembly is in shape, while some small contigs can still be obsevrved in the lower-right corner. C. Assembly with Arima V2 data. A much improved chromosome-level assembly is observed.

See this image and copyright information in PMC

References

1. Sethi R, Becker J, de Graaf J, Löwer M, Suchan M, Sahin U, et al.Integrative analysis of structural variations using short-reads and linked-reads yields highly specific and sensitive predictions. PLoS Comput Biol 2020;16:e1008397. - PMC - PubMed
1. Goodwin S, McPherson JD, McCombie WR.. Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet 2016;17:333–51. - PMC - PubMed
1. Ott A, Schnable JC, Yeh CT, Wu L, Liu C, Hu HC, et al.Linked read technology for assembling large complex and polyploid genomes. BMC Genomics 2018;19:651. - PMC - PubMed
1. Logsdon GA, Vollger MR, Eichler EE.. Long-read human genome sequencing and its applications. Nat Rev Genet 2020;21:597–614. - PMC - PubMed
1. Amarasinghe SL, Su S, Dong X, Zappia L, Ritchie ME, Gouil Q.. Opportunities and challenges in long-read sequencing data analysis. Genome Biol 2020;21:30. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

WT_/Wellcome Trust/United Kingdom

LinkOut - more resources

Full Text Sources
- PubMed Central
- Silverchair Information Systems

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The Bioinformatic Applications of Hi-C and Linked Reads

Affiliations

The Bioinformatic Applications of Hi-C and Linked Reads

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources