Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jul 27;3(7):e00370.
doi: 10.1016/j.heliyon.2017.e00370. eCollection 2017 Jul.

An efficient strategy using k- mers to analyse 16S rRNA sequences

Affiliations

An efficient strategy using k- mers to analyse 16S rRNA sequences

Marcel Martínez-Porchas et al. Heliyon. .

Abstract

The use of k-mers has been a successful strategy for improving metagenomics studies, including taxonomic classifications, or de novo assemblies, and can be used to obtain sequences of interest from the available databases. The aim of this manuscript was to propose a simple but efficient strategy to generate k-mers and to use them to obtain and analyse in silico 16S rRNA sequence fragments. A total of 513,309 bacterial sequences contained in the SILVA database were considered for the study, and homemade PHP scripts were used to search for specific nucleotide chains, recover fragments of bacterial sequences, make calculations and organize information. Consensus sequences matching conserved regions were constructed by aligning most of the primers used in the literature. Sequences of k nucleotides (9- to 15-mers) were extracted from the generated primer contigs. Frequency analysis revealed that k-mer size was inversely proportional to the occurrence of k-mers in the different conserved regions, suggesting a stringency relationship; high numbers of duplicate reactions were observed with short k-mers, and a lower proportion of sequences were obtained with large ones, with the best results obtained using 12-mers. Using 12-mers with the proposed method to obtain and study sequences was found to be a reliable approach for the analysis of 16S rRNA sequences and this strategy may probably be extended to other biomarkers. Furthermore, additional applications such as evaluating the degree of conservation and designing primers and other calculations are proposed as examples.

Keywords: Bioinformatics; Biological sciences; Microbiology.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Workflow established for obtaining primer contigs and the subsequent generation of k-mers.
Fig. 2
Fig. 2
Frequency of k-mers of 9 to 15 nucleotides detected in different conserved regions of 16S rRNA sequences contained in the SILVA database.
Fig. 3
Fig. 3
Duplicate reactions detected within sequences obtained from the SILVA database when using 9- to 15-mers constructed from the primer contigs matching all conserved regions of the 16S rRNA.
Fig. 4
Fig. 4
Proportion of C1 sequences obtained when using 12-mers matching C3 located at different nucleotide positions. For example, C1 was not detected in sequences when C3 is located at position 295 or lower; meanwhile, when C3 is located at position 340 or higher, 80% or more of the sequences contained C1. The cumulative percentage of C3-positive sequences is indicated by the step line.
Fig. 5
Fig. 5
Alignment of primers for DGGE and the primer contig. The most frequent 12-mer is underlined, while the difference G8 of the primer, which corresponds to R4 of the 12-mer, is shaded.

Similar articles

Cited by

References

    1. Pandey S., Singh S., Yadav A.N., Nain L., Saxena A.K. Phylogenetic diversity and characterization of novel and efficient cellulase producing bacterial isolates from various extreme environments. Biosci. Biotechnol. Biochem. 2013;77:1474–1480. - PubMed
    1. Lakaniemi A.-M., Hulatt C.J., Wakeman K.D., Thomas D.N., Puhakka J.A. Eukaryotic and prokaryotic microbial communities during microalgal biomass production. Bioresour. Technol. 2012;124:387–393. - PubMed
    1. Lazarevic V., Whiteson K., Huse S., Hernandez D., Farinelli L., Østerås M., Schrenzel J., François P. Metagenomic study of the oral microbiota by Illumina high-throughput sequencing. J. Microbiol. Methods. 2009;79:266–271. - PMC - PubMed
    1. Stackebrandt E., Goebel B. Taxonomic note: a place for DNA-DNA reassociation and 16S rRNA sequence analysis in the present species definition in bacteriology. Int. J. Syst. Evol. Microbiol. 1994;44:846–849.
    1. Klindworth A., Pruesse E., Schweer T., Peplies J., Quast C., Horn M., Glöckner F.O. Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies. Nucleic Acids Res. 2013;41 - PMC - PubMed