Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Dec 20;4(1):28.
doi: 10.1186/1759-8753-4-28.

A pipeline of programs for collecting and analyzing group II intron retroelement sequences from GenBank

Affiliations

A pipeline of programs for collecting and analyzing group II intron retroelement sequences from GenBank

Michael Abebe et al. Mob DNA. .

Abstract

Background: Accurate and complete identification of mobile elements is a challenging task in the current era of sequencing, given their large numbers and frequent truncations. Group II intron retroelements, which consist of a ribozyme and an intron-encoded protein (IEP), are usually identified in bacterial genomes through their IEP; however, the RNA component that defines the intron boundaries is often difficult to identify because of a lack of strong sequence conservation corresponding to the RNA structure. Compounding the problem of boundary definition is the fact that a majority of group II intron copies in bacteria are truncated.

Results: Here we present a pipeline of 11 programs that collect and analyze group II intron sequences from GenBank. The pipeline begins with a BLAST search of GenBank using a set of representative group II IEPs as queries. Subsequent steps download the corresponding genomic sequences and flanks, filter out non-group II introns, assign introns to phylogenetic subclasses, filter out incomplete and/or non-functional introns, and assign IEP sequences and RNA boundaries to the full-length introns. In the final step, the redundancy in the data set is reduced by grouping introns into sets of ≥95% identity, with one example sequence chosen to be the representative.

Conclusions: These programs should be useful for comprehensive identification of group II introns in sequence databases as data continue to rapidly accumulate.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Example group II intron structure. (A) DNA structure of a group II intron. The intron RNA portion is denoted by red boxes, while conserved ORF domains are in blue. The IEP contains a RT (reverse transcriptase) domain, including conserved sub-domains (0, 1, 2, 2a, 3, 4, 5, 6, 7), an X domain, a D (DNA-binding) domain and an optional En (endonuclease) domain. Intron RNA domains are shown underneath in Roman numerals, and exon 1 and 2 sequences are in black. (B) An example group II intron RNA secondary structure (IIC). The intron sequence is depicted in red lettering, with exon sequences in blue and black. The ORF sequence is represented by the dotted loop in domain IV. IBS1/EBS1 and IBS3/EBS3 (blue and orange shading) represent base pairings between the intron and exons that help to define the intron boundaries during splicing. The sequence shown is for B.h.I1 of Bacillus halodurans.
Figure 2
Figure 2
Pipeline flowchart. The pipeline proceeds through a series of steps in which data are collected and put into eight storage folders. Each storage folder feeds data into a subsequent program, which produces the next storage folder. The number of candidate introns decreases at each step, while more information accumulates for the smaller set of introns. To summarize the overall process briefly, a BLAST search identifies candidate IEPs in GenBank and DNA sequences are downloaded. RTs that are not IEPs are filtered out, and retained candidates are assigned to an intron class. ORF domains (0, 1, 2a, 2b, 3, 4, 5, 6, 7, X, En) are identified and ORF boundaries are annotated. The intron boundaries are then identified and an RNA structure is generated. Candidates with >95% similarity are grouped and a prototype from each group is identified.

Similar articles

Cited by

References

    1. Robart AR, Zimmerly S. Group II intron retroelements: function and diversity. Cytogenet Genome Res. 2005;4:589–597. doi: 10.1159/000084992. - DOI - PubMed
    1. Fedorova O, Zingler N. Group II introns: structure, folding and splicing mechanism. Biol Chem. 2007;4:665–678. - PubMed
    1. Toro N, Jimenez-Zurdo JI, Garcia-Rodriguez FM. Bacterial group II introns: not just splicing. FEMS Microbiol Rev. 2007;4:342–358. doi: 10.1111/j.1574-6976.2007.00068.x. - DOI - PubMed
    1. Michel F, Umesono K, Ozeki H. Comparative and functional anatomy of group II catalytic introns–a review. Gene. 1989;4:5–30. doi: 10.1016/0378-1119(89)90026-7. - DOI - PubMed
    1. Pyle AM. The tertiary structure of group II introns: implications for biological function and evolution. Crit Rev Biochem Mol Biol. 2010;4:215–232. doi: 10.3109/10409231003796523. - DOI - PMC - PubMed