. 2020 Jun 8;10(6):878.

doi: 10.3390/biom10060878.

High-Throughput Identification of Adapters in Single-Read Sequencing Data

Asan M S H Mohideen¹, Steinar D Johansen¹, Igor Babiak¹

Affiliations

PMID: 32521604
PMCID: PMC7356586
DOI: 10.3390/biom10060878

High-Throughput Identification of Adapters in Single-Read Sequencing Data

Asan M S H Mohideen et al. Biomolecules. 2020.

. 2020 Jun 8;10(6):878.

doi: 10.3390/biom10060878.

Authors

Asan M S H Mohideen¹, Steinar D Johansen¹, Igor Babiak¹

Affiliation

¹ Genomics Group, Faculty of Biosciences and Aquaculture, Nord University, P.O. Box 1490, 8049 Bodø, Norway.

PMID: 32521604
PMCID: PMC7356586
DOI: 10.3390/biom10060878

Abstract

Sequencing datasets available in public repositories are already high in number, and their growth is exponential. Raw sequencing data files constitute a substantial portion of these data, and they need to be pre-processed for any downstream analyses. The removal of adapter sequences is the first essential step. Tools available for the automated detection of adapters in single-read sequencing protocol datasets have certain limitations. To explore these datasets, one needs to retrieve the information on adapter sequences from the methods sections of appropriate research articles. This can be time-consuming in metadata analyses. Moreover, not all research articles provide the information on adapter sequences. We have developed adapt_find, a tool that automates the process of adapter sequences identification in raw single-read sequencing datasets. We have verified adapt_find through testing a number of publicly available datasets. adapt_find secures a robust, reliable and high-throughput process across different sequencing technologies and various adapter designs. It does not need prior knowledge of the adapter sequences. We also produced associated tools: random_mer, for the detection of random N bases either on one or both termini of the reads, and fastqc_parser, for consolidating the results from FASTQC outputs. Together, this is a valuable tool set for metadata analyses on multiple sequencing datasets.

Keywords: 454 pyrosequencing; Illumina; Ion-Torrent; SOLiD; adapter oligonucleotides; adapter trimming; randomized adapters; single-read sequencing; small RNA sequencing.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

**Figure 1**
Essential information on adapter sequences for effective trimming process. The 5′ end adapter is usually of constant length, while the lengths of 3′ end adapters may vary in a dataset. The exact match to the 3′ end (tail part, adjacent to a biological sequence) of the 5′ end adapter, and to the 5′ end (head part, adjacent to a biological sequence) are required to identify the adapters. In the latter case, the shortest variant of the 3′ end adapter suffices.

**Figure 2**
Schematic representation of output reads data format in different sequencing technologies. In all the four sequencing technologies, 3′ end adapters are ligated to biological sequences in the sequencing outputs; in addition, 5′ end adapters are present in Ion Torrent and 454 pyrosequencing outputs. The Illumina output reads may have four-letter barcode in the 5′ end, and/or random 4 “N” nucleotides at both ends. Similarly, depending on the library preparation kit used, output reads from Ion Torrent might have random 5-mer and a three-letter barcode in addition to 5′ end and 3′ end adapters.

**Figure 3**
The adapt_find workflow. Black boxes: general procedure, green boxes: exit step, blue boxes: alternative strategy, yellow diamonds: decision.

**Figure 4**
random_mer workflow. Black boxes: general procedure, green boxes: exit step, orange boxes: further recommended process, yellow diamonds: decision.

See this image and copyright information in PMC

References

1. Quail M.A., Kozarewa I., Smith F., Scally A., Stephens P.J., Durbin R., Swerdlow H., Turner D.J. A large genome center’s improvements to the Illumina sequencing system. Nat. Methods. 2008;5:1005–1010. doi: 10.1038/nmeth.1270. - DOI - PMC - PubMed
1. Head S.R., Komori H.K., LaMere S.A., Whisenant T., Van Nieuwerburgh F., Salomon D.R., Ordoukhanian P. Library construction for next-generation sequencing: Overviews and challenges. BioTechniques. 2014;56:61–passim. doi: 10.2144/000114133. - DOI - PMC - PubMed
1. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 2011;17:10–12. doi: 10.14806/ej.17.1.200. - DOI
1. Jayaprakash A.D., Jabado O., Brown B.D., Sachidanandam R. Identification and remediation of biases in the activity of RNA ligases in small-RNA deep sequencing. Nucleic Acids Res. 2011;39:e141. doi: 10.1093/nar/gkr693. - DOI - PMC - PubMed
1. Simon A. FastQC: A Quality Control Tool for High Throughput Sequence Data. [(accessed on 17 March 2020)]; Available online: https://archive.st/archive/2020/3/www.bioinformatics.babraham.ac.uk/4af3....

Publication types

Actions

MeSH terms

Actions
Actions
Actions

Substances

Actions

Grants and funding

275786/Norges Forskningsråd/International

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

High-Throughput Identification of Adapters in Single-Read Sequencing Data

Affiliation

High-Throughput Identification of Adapters in Single-Read Sequencing Data

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous