Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov 5;37(21):3929-3931.
doi: 10.1093/bioinformatics/btab613.

BleTIES: annotation of natural genome editing in ciliates using long read sequencing

Affiliations

BleTIES: annotation of natural genome editing in ciliates using long read sequencing

Brandon K B Seah et al. Bioinformatics. .

Abstract

Summary: Ciliates are single-celled eukaryotes that eliminate specific, interspersed DNA sequences (internally eliminated sequences, IESs) from their genomes during development. These are challenging to annotate and assemble because IES-containing sequences are typically much less abundant in the cell than those without, and IES sequences themselves often contain repetitive and low-complexity sequences. Long-read sequencing technologies from Pacific Biosciences and Oxford Nanopore have the potential to reconstruct longer IESs than has been possible with short reads but require a different assembly strategy. Here we present BleTIES, a software toolkit for detecting, assembling, and analyzing IESs using mapped long reads.

Availability and implementation: BleTIES is implemented in Python 3. Source code is available at https://github.com/Swart-lab/bleties (MIT license) and also distributed via Bioconda.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Overview of BleTIES MILRAA method to reconstruct IES junctions from error-corrected CCS reads versus from subreads. (A) CCS reads have accuracy >99%, so the consensus insert coordinate from read mapping is adequate if coverage is sufficient. (B) (1) Subreads have higher error rates, so mapping alone is insufficient to define the insert position. (2) Therefore, clusters of adjacent inserts are identified, from which the insert sequence plus flanking ±100 bp are extracted from the subreads. (3) Extracted sequences are assembled with SPOA (Vaser et al., 2017). (4 and 5) This consensus is realigned to the reference to obtain a more accurate estimate of the insert position

References

    1. Arnaiz O. et al. (2012) The Paramecium germline genome provides a niche for intragenic parasitic DNA: evolutionary dynamics of internal eliminated sequences. PLoS Genet., 8, e1002984. - PMC - PubMed
    1. Chalker D.L. et al. (2013) Epigenetics of ciliates. Cold Spring Harb. Perspect. Biol., 5, a017764. - PMC - PubMed
    1. Chen X. et al. (2014) The architecture of a scrambled genome reveals massive levels of genomic rearrangement during development. Cell, 158, 1187–1198. - PMC - PubMed
    1. Cock P.J.A. et al. (2009) Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 25, 1422–1423. - PMC - PubMed
    1. Denby Wilkes C. et al. (2016) ParTIES: a toolbox for Paramecium interspersed DNA elimination studies. Bioinformatics, 32, 599–601. - PubMed

Publication types