Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 13;38(10):2943-2945.
doi: 10.1093/bioinformatics/btac166.

Analysing high-throughput sequencing data in Python with HTSeq 2.0

Affiliations

Analysing high-throughput sequencing data in Python with HTSeq 2.0

Givanna H Putri et al. Bioinformatics. .

Abstract

Summary: HTSeq 2.0 provides a more extensive application programming interface including a new representation for sparse genomic data, enhancements for htseq-count to suit single-cell omics, a new script for data using cell and molecular barcodes, improved documentation, testing and deployment, bug fixes and Python 3 support.

Availability and implementation: HTSeq 2.0 is released as an open-source software under the GNU General Public License and is available from the Python Package Index at https://pypi.python.org/pypi/HTSeq. The source code is available on Github at https://github.com/htseq/htseq.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Major HTSeq 2.0 improvements. (A–C) Improvements to htseq-count. (A) Parallel processing on multicore architectures enables faster processing of single-cell data, where each cell is represented by a BAM file [typical for Smart-seq2 (Picelli et al. 2013) and viscRNA-Seq (Zanini et al., 2018)]. Note the new output formats available in HTSeq 2.0. (B) Conventional gene–cell matrix, which collapses reads that align to distinct exons of the same gene into a single gene count. (C) Additional attributes enable quantification at the exon level while retaining information on which gene each exon belongs to. (D, E) Sparse data representations in HTSeq 2.0. (D) StepVector represents piecewise-constant sparse genomic data. (E) StretchVector represents sparse islands of genomic data

References

    1. Anders S. et al. (2015) HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics, 31, 166–169. - PMC - PubMed
    1. Beazley D.M. (2003) Automated scientific software scripting with SWIG. Fut. Generat. Comput. Syst. FGCS, 19, 599–609.
    1. Behnel S. et al. (2011) Cython: the best of both worlds. Comput. Sci. Eng., 13, 31–39.
    1. Bonfield J.K. et al. (2021) HTSlib: C library for reading/writing high-throughput sequencing data. GigaScience, 10, giab007. - PMC - PubMed
    1. Harris C.R. et al. (2020) Array programming with NumPy. Nature, 585, 357–362. - PMC - PubMed

Publication types