Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Sep 19;8(5):e01397-17.
doi: 10.1128/mBio.01397-17.

The CRISPR Spacer Space Is Dominated by Sequences from Species-Specific Mobilomes

Affiliations

The CRISPR Spacer Space Is Dominated by Sequences from Species-Specific Mobilomes

Sergey A Shmakov et al. mBio. .

Abstract

Clustered regularly interspaced short palindromic repeats and CRISPR-associated protein (CRISPR-Cas) systems store the memory of past encounters with foreign DNA in unique spacers that are inserted between direct repeats in CRISPR arrays. For only a small fraction of the spacers, homologous sequences, called protospacers, are detectable in viral, plasmid, and microbial genomes. The rest of the spacers remain the CRISPR "dark matter." We performed a comprehensive analysis of the spacers from all CRISPR-cas loci identified in bacterial and archaeal genomes, and we found that, depending on the CRISPR-Cas subtype and the prokaryotic phylum, protospacers were detectable for 1% to about 19% of the spacers (~7% global average). Among the detected protospacers, the majority, typically 80 to 90%, originated from viral genomes, including proviruses, and among the rest, the most common source was genes that are integrated into microbial chromosomes but are involved in plasmid conjugation or replication. Thus, almost all spacers with identifiable protospacers target mobile genetic elements (MGE). The GC content, as well as dinucleotide and tetranucleotide compositions, of microbial genomes, their spacer complements, and the cognate viral genomes showed a nearly perfect correlation and were almost identical. Given the near absence of self-targeting spacers, these findings are most compatible with the possibility that the spacers, including the dark matter, are derived almost completely from the species-specific microbial mobilomes.IMPORTANCE The principal function of CRISPR-Cas systems is thought to be protection of bacteria and archaea against viruses and other parasitic genetic elements. The CRISPR defense function is mediated by sequences from parasitic elements, known as spacers, that are inserted into CRISPR arrays and then transcribed and employed as guides to identify and inactivate the cognate parasitic genomes. However, only a small fraction of the CRISPR spacers match any sequences in the current databases, and of these, only a minority correspond to known parasitic elements. We show that nearly all spacers with matches originate from viral or plasmid genomes that are either free or have been integrated into the host genome. We further demonstrate that spacers with no matches have the same properties as those of identifiable origins, strongly suggesting that all spacers originate from mobile elements.

Keywords: CRISPR-Cas; bacteriophages; mobilome; oligonucleotide composition; spacer acquisition.

PubMed Disclaimer

Figures

FIG 1
FIG 1
Computational pipeline for identification and analysis of CRISPR spacers.
FIG 2
FIG 2
Distribution of spacers with matches along the CRISPR arrays. (A) Probability density functions for the spacers with matches (real) and for the same spacers placed randomly onto the array 100 times (random). (B) Probability density function (pdf) of the difference between the number of spacers with matches and randomly placed spacers along the array. Given the difficulty of polarizing CRISPR arrays automatically and under the assumption that new spacers are incorporated at the leader end but not at the distal end of an array, the results are shown from either end (0) to the middle of the array (0.5).
FIG 3
FIG 3
Virus-host bipartite network derived from spacer sharing. Red nodes, bacteria or archaea; green nodes, viruses; edges, shared spacer-protospacer pairs.
FIG 4
FIG 4
Breakdown of the protospacers from nonviral genes, by gene family. Colors indicate genes implicated in conjugal transfer of plasmids and plasmid replication, a putative phage gene (not annotated as such), and a cas3 gene. The protein family names are from the CDD database.
FIG 5
FIG 5
Correlations between the nucleotide compositions of spacers, the genomes of the respective microbes, and their viruses. (A) GC content of spacers versus GC content of microbial genomes and viruses. (B) GC content of spacers with matches versus GC content of microbial genomes and viruses. Linear trend lines are shown for the GC content of spacers (green) and viral genomes (red), and the x = y line is included to guide the eye.
FIG 6
FIG 6
Correlations between the nucleotide compositions of spacers, genomes of bacteria with numerous characterized viruses, and the corresponding viral genomes.
FIG 7
FIG 7
Results of principal-component analysis of the oligonucleotide compositions of spacers and the genomes of the respective microbes and their viruses. (A) Dinucleotide compositions; (B) tetranucleotide compositions. Black circles, spacers; green circles, microbes; red circles, virus. The analysis was performed using standard multidimensional scaling.
FIG 8
FIG 8
Spacer sequence conservation compared to the genomic average. (A) Distributions of matches for the spacers and the mock spacers across the microbial taxonomic ranks. (B) Distributions of the number of matches to the same species per spacer for the spacers and the mock spacers.

Similar articles

Cited by

References

    1. Sorek R, Lawrence CM, Wiedenheft B. 2013. CRISPR-mediated adaptive immune systems in bacteria and archaea. Annu Rev Biochem 82:237–266. doi: 10.1146/annurev-biochem-072911-172315. - DOI - PubMed
    1. Mohanraju P, Makarova KS, Zetsche B, Zhang F, Koonin EV, van der Oost J. 2016. Diverse evolutionary roots and mechanistic variations of the CRISPR-Cas systems. Science 353:aad5147. doi: 10.1126/science.aad5147. - DOI - PubMed
    1. Amitai G, Sorek R. 2016. CRISPR-Cas adaptation: insights into the mechanism of action. Nat Rev Microbiol 14:67–76. doi: 10.1038/nrmicro.2015.14. - DOI - PubMed
    1. Silas S, Mohr G, Sidote DJ, Markham LM, Sanchez-Amat A, Bhaya D, Lambowitz AM, Fire AZ. 2016. Direct CRISPR spacer acquisition from RNA by a natural reverse transcriptase-Cas1 fusion protein. Science 351:aad4234. doi: 10.1126/science.aad4234. - DOI - PMC - PubMed
    1. Hsu PD, Lander ES, Zhang F. 2014. Development and applications of CRISPR-Cas9 for genome engineering. Cell 157:1262–1278. doi: 10.1016/j.cell.2014.05.010. - DOI - PMC - PubMed

Publication types

Substances

LinkOut - more resources