Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Apr 29;19(9):5442.
doi: 10.3390/ijerph19095442.

Fast, Ungapped Reads Mapping Using Squid

Affiliations

Fast, Ungapped Reads Mapping Using Squid

Christopher Riccardi et al. Int J Environ Res Public Health. .

Abstract

Advances in Next Generation Sequencing technologies allow us to inspect and unlock the genome to a level of detail that was unimaginable only a few decades ago. Omics-based studies are casting a light on the patterns and determinants of disease conditions in populations, as well as on the influence of microbial communities on human health, just to name a few. Through increasing volumes of sequencing information, for example, it is possible to compare genomic features and analyze the modulation of the transcriptome under different environmental stimuli. Although protocols for NGS preparation are intended to leave little to no space for contamination of any kind, a noticeable fraction of sequencing reads still may not uniquely represent what was intended to be sequenced in the first place. If a natural consequence of a sequencing sample is to assess the presence of features of interest by mapping the obtained reads to a genome of reference, sometimes it is useful to determine the fraction of those that do not map, or that map discordantly, and store this information to a new file for subsequent analyses. Here we propose a new mapper, which we called Squid, that among other accessory functionalities finds and returns sequencing reads that match or do not match to a reference sequence database in any orientation. We encourage the use of Squid prior to any quantification pipeline to assess, for instance, the presence of contaminants, especially in RNA-Seq experiments.

Keywords: dynamic programming; mapping; quality check; rna-seq.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Read orientation modes handled by Squid. Library types ISF, ISR (A) and IU (C) model fragments in which the sequencing reads are oriented towards each other. Read A and B represent any R1–R2 pair, as long as mutual exclusivity and orientation are conserved. Library types OSF, OSR (B) and OU (C) instruct Squid of the opposite case, in which the sequencing reads do not face one another. A fragment is never modelled in a matching library protocol (both reads mapping to the same strand) in the current implementation.
Figure 2
Figure 2
Scatterplot of transcripts quantification. Each circle represents the raw counts of a gene in Salmon (y axis) and Squid (x axis). Squid was run using exhaustiveness 0 (A) and 15 (B), respectively. Per-gene ratio was calculated dividing Salmon’s raw counts by Squid’s raw counts (extracted from the BEDPE output file). Note how the ratio scale is different by an order of magnitude between (A) and (B), indicating that mapping accuracy is affected when no additional cycles are performed. Coefficients of determination were 0.97 and 0.99 in (A) and (B), respectively. The regression line was calculated using the generalized additive model (GAM) through the R package ggplot2.

References

    1. Pereira R., Oliveira J., Sousa M. Bioinformatics and Computational Tools for Next-Generation Sequencing Analysis in Clinical Genetics. J. Clin. Med. 2020;9:132. doi: 10.3390/jcm9010132. - DOI - PMC - PubMed
    1. Lischer H.E.L., Shimizu K.K. Reference-guided de novo assembly approach improves genome reconstruction for related species. BMC Bioinform. 2017;18:474. doi: 10.1186/s12859-017-1911-6. - DOI - PMC - PubMed
    1. Martin J.A., Wang Z. Next-generation transcriptome assembly. Nat. Rev. Genet. 2011;12:671–682. doi: 10.1038/nrg3068. - DOI - PubMed
    1. Nieuwenhuis T., Yang S.Y., Verma R.X., Pillalamarri V., Arking D.E., Rosenberg A.Z., McCall M.N., Halushka M.K. Consistent RNA sequencing contamination in GTEx and other data sets. Nat. Commun. 2020;11:1933. doi: 10.1038/s41467-020-15821-9. - DOI - PMC - PubMed
    1. GTEx Consortium The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 2013;45:580–585. doi: 10.1038/ng.2653. - DOI - PMC - PubMed

Publication types

LinkOut - more resources