Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 1;39(1):btac771.
doi: 10.1093/bioinformatics/btac771.

EDIR: exome database of interspersed repeats

Affiliations

EDIR: exome database of interspersed repeats

Laura D T Vo Ngoc et al. Bioinformatics. .

Abstract

Motivation: Intragenic exonic deletions are known to contribute to genetic diseases and are often flanked by regions of homology.

Results: In order to get a more clear view of these interspersed repeats encompassing a coding sequence, we have developed EDIR (Exome Database of Interspersed Repeats) which contains the positions of these structures within the human exome. EDIR has been calculated by an inductive strategy, rather than by a brute force approach and can be queried through an R/Bioconductor package or a web interface allowing the per-gene rapid extraction of homology-flanked sequences throughout the exome.

Availability and implementation: The code used to compile EDIR can be found at https://github.com/lauravongoc/EDIR. The full dataset of EDIR can be queried via an Rshiny application at http://193.70.34.71:3857/edir/. The R package for querying EDIR is called 'EDIRquery' and is available on Bioconductor. The full EDIR dataset can be downloaded from https://osf.io/m3gvx/ or http://193.70.34.71/EDIR.tar.gz.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
(a) Graphical depiction of an interspersed repeat structure (IRS) which is a spacer sequence up to 1000 bp flanked by two repeats. (b) Inductive strategy to identify IRS’s with repeats >7 bp. The set of IRS with all possible 7 bp sequences also contains the set of IRS with 8 bp repeats, while the 8 bp set will in fact contain the 9 bp set and so on. As the possible number of sequence combinations increases by a power of four when the repeat is extended by 1 bp, this subsetting methodology makes it possible to identify IRS’s containing long repeat sizes which would otherwise be impractical or virtually impossible to compute due to the large number of combinations. (c) General overview of the used methodology to compile the EDIR database for 7 bp repeat IRS’s

References

    1. Alessandri J.-L. et al. (2018) Recessive loss of function PIGN alleles, including an intragenic deletion with founder effect in La Réunion island, in patients with Fryns syndrome. Eur. J. Hum. Genet., 26, 340–349. - PMC - PubMed
    1. Azuma Y. et al. (2017) Intragenic DOK7 deletion detected by whole-genome sequencing in congenital myasthenic syndromes. Neurol. Genet., 3, e152. - PMC - PubMed
    1. Brett M. et al. (2017) Intragenic multi-exon deletion in the FBN1 gene in a child with mildly dilated aortic sinus: a retrotransposal event. J. Hum. Genet., 62, 711–715. - PubMed
    1. Cai M. et al. (2021) A novel FLCN intragenic deletion identified by NGS in a BHDS family and literature review. Front. Genet., 12, 443. - PMC - PubMed
    1. Carvalho C.M.B., Lupski J.R. (2016) Mechanisms underlying structural variant formation in genomic disorders. Nat. Rev. Genet., 17, 224–238. - PMC - PubMed

Publication types