Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul;31(7):1290-1295.
doi: 10.1101/gr.275193.120. Epub 2021 Jun 8.

Rapid and accurate alignment of nucleotide conversion sequencing reads with HISAT-3N

Affiliations

Rapid and accurate alignment of nucleotide conversion sequencing reads with HISAT-3N

Yun Zhang et al. Genome Res. 2021 Jul.

Abstract

Sequencing technologies using nucleotide conversion techniques such as cytosine to thymine in bisulfite-seq and thymine to cytosine in SLAM seq are powerful tools to explore the chemical intricacies of cellular processes. To date, no one has developed a unified methodology for aligning converted sequences and consolidating alignment of these technologies in one package. In this paper, we describe hierarchical indexing for spliced alignment of transcripts-3 nucleotides (HISAT-3N), which can rapidly and accurately align sequences consisting of any nucleotide conversion by leveraging the powerful hierarchical index and repeat index algorithms originally developed for the HISAT software. Tests on real and simulated data sets show that HISAT-3N is faster than other modern systems, with greater alignment accuracy, higher scalability, and smaller memory requirements. HISAT-3N therefore becomes an ideal aligner when used with converted sequence technologies.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Repeat index enables faster 3-nt read alignment. (A) HISAT-3N aligns reads using two different strategies: (1) HISAT-3N can directly align reads to the whole genome using the genome index and output their mapped locations (A, left), and (2) HISAT-3N can use a repeat index to uniquely align reads to the repeat sequences regardless of how many locations to which they align on the genome (A, right). (B) Runtime comparison between direct mapping and repeat mapping strategy. The test data are 10 million simulated single-end BS-seq reads (0.2% per-base sequencing error rate).
Figure 2.
Figure 2.
HISAT-3N alignment steps for BS-seq reads. (A) HISAT-3N converts each input read (READ) to two 3N reads: READ-3N and READ-RC-3N. READ-3N is READ with all thymine replaced by cytosine. READ-RC-3N is the reverse complement of READ, plus the replacement of cytosine with thymine. (B) HISAT-3N maps the two 3N reads to both REF-3N and REF-RC-3N references using prebuilt indexes. (C) After the 3-nt alignment, HISAT-3N compares the original read sequence (READ) to the original 4-nt references (REF and REF-RC) to identify unmethylated cytosine positions and recalculate an alignment score accordingly.

References

    1. Booth MJ, Branco MR, Ficz G, Oxley D, Krueger F, Reik W, Balasubramanian S. 2012. Quantitative sequencing of 5-methylcytosine and 5-hydroxymethylcytosine at single-base resolution. Science 336: 934–937. 10.1126/science.1220671 - DOI - PubMed
    1. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. 2013. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29: 15–21. 10.1093/bioinformatics/bts635 - DOI - PMC - PubMed
    1. The ENCODE Project Consortium. 2012. An integrated encyclopedia of DNA elements in the human genome. Nature 489: 57–74. 10.1038/nature11247 - DOI - PMC - PubMed
    1. Erhard F, Baptista MAP, Krammer T, Hennig T, Lange M, Arampatzi P, Jürges CS, Theis FJ, Saliba AE, Dölken L. 2019. scSLAM-seq reveals core features of transcription dynamics in single cells. Nature 571: 419–423. 10.1038/s41586-019-1369-y - DOI - PubMed
    1. Frommer M, McDonald LE, Millar DS, Collis CM, Watt F, Grigg GW, Molloy PL, Paul CL. 1992. A genomic sequencing protocol that yields a positive display of 5-methylcytosine residues in individual DNA strands. Proc Natl Acad Sci 89: 1827–1831. 10.1073/pnas.89.5.1827 - DOI - PMC - PubMed

LinkOut - more resources