Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb 16;10(2):giab003.
doi: 10.1093/gigascience/giab003.

long-read-tools.org: an interactive catalogue of analysis methods for long-read sequencing data

Affiliations

long-read-tools.org: an interactive catalogue of analysis methods for long-read sequencing data

Shanika L Amarasinghe et al. Gigascience. .

Abstract

Background: The data produced by long-read third-generation sequencers have unique characteristics compared to short-read sequencing data, often requiring tailored analysis tools for tasks ranging from quality control to downstream processing. The rapid growth in software that addresses these challenges for different genomics applications is difficult to keep track of, which makes it hard for users to choose the most appropriate tool for their analysis goal and for developers to identify areas of need and existing solutions to benchmark against.

Findings: We describe the implementation of long-read-tools.org, an open-source database that organizes the rapidly expanding collection of long-read data analysis tools and allows its exploration through interactive browsing and filtering. The current database release contains 478 tools across 32 categories. Most tools are developed in Python, and the most frequent analysis tasks include base calling, de novo assembly, error correction, quality checking/filtering, and isoform detection, while long-read single-cell data analysis and transcriptomics are areas with the fewest tools available.

Conclusion: Continued growth in the application of long-read sequencing in genomics research positions the long-read-tools.org database as an essential resource that allows researchers to keep abreast of both established and emerging software to help guide the selection of the most relevant tool for their analysis needs.

Keywords: PacBio; data analysis; database; long-read sequencing; nanopore.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1:
Figure 1:
Example use of the Tools tab from long-read-tools.org. A. The custom toolbar for the page. B. Drop-down “Sort By" menu. C. Drop-down “Filter by categories" menu, which allows users to select multiple options by clicking on an item or typing the word in the text box. D. Drop-down “Filter by technologies" menu, which allows users to select multiple options by clicking on an item or typing the word in the text box. When multiple categories or technologies are selected, the website returns the intersection, not the union; i.e., a tool has to satisfy all the requirements to be reported.
Figure 2:
Figure 2:
Summary statistics from long-read-tools.org. A. The number of tools released over time stratified by the long-read technologies they serve. B. The data analysis categories covered by the catalogued tools (ordered from most to least frequent). C. Publication status of the catalogued tools. D. The programming platforms used by the catalogued tools (ordered from most to least frequent). All languages making up ≥10% of a tool’s code are reported. These summary plots are available from the Statistics tab of the database website and can be easily exported for reuse. SNP: single-nucleotide polymorphism.
Figure 3:
Figure 3:
Popularity of the tools from long-read-tools.org based on publication citations. A. Across the entire database. B. For base modification detection. C. Across the entire database for citations in the past year. Each panel shows the year of publication on the x-axis and the square root of the number of citations on the y-axis. If the input set of tools is >50, the 10 most cited tools are labeled, otherwise the 3 most cited tools are labeled.

References

    1. Logsdon GA, Vollger MR, Eichler EE. Long-read human genome sequencing and its applications. Nat Rev Genet. 2020;21:597–614. - PMC - PubMed
    1. Sakamoto Y, Sereewattanawoot S, Suzuki A. A new era of long-read sequencing for cancer genomics. J Hum Genet. 2020;65(1):3–10. - PMC - PubMed
    1. Ho SS, Urban AE, Mills RE. Structural variation in the sequencing era. Nat Rev Genet. 2020;21(3):171–89. - PMC - PubMed
    1. Mitsuhashi S, Matsumoto N. Long-read sequencing for rare human genetic diseases. J Hum Genet. 2020;65(1):11–9. - PubMed
    1. Pollard MO, Gurdasani D, Mentzer AJ, et al. Long reads: their purpose and place. Hum Mol Genet. 2018;27(R2):R234–41. - PMC - PubMed

Publication types

LinkOut - more resources