Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep 15:11:e16026.
doi: 10.7717/peerj.16026. eCollection 2023.

DEVOUR: Deleterious Variants on Uncovered Regions in Whole-Exome Sequencing

Affiliations

DEVOUR: Deleterious Variants on Uncovered Regions in Whole-Exome Sequencing

Erdem Türk et al. PeerJ. .

Abstract

The discovery of low-coverage (i.e. uncovered) regions containing clinically significant variants, especially when they are related to the patient's clinical phenotype, is critical for whole-exome sequencing (WES) based clinical diagnosis. Therefore, it is essential to develop tools to identify the existence of clinically important variants in low-coverage regions. Here, we introduce a desktop application, namely DEVOUR (DEleterious Variants On Uncovered Regions), that analyzes read alignments for WES experiments, identifies genomic regions with no or low-coverage (read depth < 5) and then annotates known variants in the low-coverage regions using clinical variant annotation databases. As a proof of concept, DEVOUR was used to analyze a total of 28 samples from a publicly available Hirschsprung disease-related WES project (NCBI Bioproject: https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJEB19327), revealing the potential existence of 98 disease-associated variants in low-coverage regions. DEVOUR is available from https://github.com/projectDevour/DEVOUR under the MIT license.

Keywords: Clinical NGS informatics; Genetic diseases; Genetic disposition to disease; Genetic variants; Medical genetics; Next-generation sequence (NGS) analysis; Whole-exome sequencing (WES) analysis.

PubMed Disclaimer

Conflict of interest statement

The authors declare there are no competing interests.

Figures

Figure 1
Figure 1. Distribution of pathogenic and likely pathogenic variants related to Hirschsprung disease in ClinVar database.
The figure illustrates the distribution of 32 pathogenic and likely pathogenic variants associated with Hirschsprung disease in the ClinVar database. Each chromosome is represented by a different color. The majority of the pathogenic variants are clustered on chromosomes 4 and 10.
Figure 2
Figure 2. The representation of DEVOUR’s workflow.
Figure 3
Figure 3. The illustration of the user interface for providing the parameters: input files (an alignment in SAM or BAM format and an exome capture file in BED format), depth threshold and human genome reference version.
The initial stage in DEVOUR’s analysis pipeline is to identify potentially uncovered or low-coverage genomic regions. A list of low-coverage genomic regions in BED format is the result of this stage.
Figure 4
Figure 4. The illustration of user interfaces developed for users to create an annotation library.
Either custom annotation sources or ANNOVAR’s disease-associated variant databases, like ClinVar, can be used to annotate variants. A BED-like file with four tab-delimited columns holding the chromosomal name, start coordinate, end coordinate, and the variant annotation in free text, in that order, can serve as the custom annotation source. To handle the inclusion of various unique annotation sources, DEVOUR offers settings panels. Similar to custom annotation sources, DEVOUR has created methods to retrieve disease-associated variant databases from ANNOVAR repositories and convert them into BED-like formats.
Figure 5
Figure 5. The illustration of the user interface for selecting the desired annotation source(s).
At this stage, the annotation resources in the DEVOUR library, which were prepared according to the human reference genome version selected in the previous stage, are listed.
Figure 6
Figure 6. The illustration of the user interface for reviewing the results.
This stage seeks to generate result files to aid clinical diagnosis by indicating mutations in genomic regions with low-coverage that have clinical significance. DEVOUR enables users to evaluate each annotation-based table from the previous stage by previewing and exporting it in TSV or Excel format.
Figure 7
Figure 7. The distribution of samples containing Hirschsprung-related pathogenic variants on low- and high-coverage regions.
DEVOUR analysis (with the default read depth threshold) revealed at least one Hirschsprung-related pathogenic variant in low-coverage regions (read depth < 5) and high-coverage regions (read depth ≥ 5) for 18 and 27 out of 28 samples, respectively. Sample ERR1840777 is distinctive as the only sample containing Hirschsprung-related pathogenic variants exclusively in low-coverage regions.
Figure 8
Figure 8. Sequence coverage on chromosome 10 (Coordinates: 43,000,000 - 44,000,000) for sample ERR1840777.
The figure displays the sequence coverage on chromosome 10 in the genomic region spanning coordinates 43,000,000 to 44,000,000 for sample ERR1840777. Regions without bars on the graph indicate low or no read coverage, highlighting the presence of low-coverage regions in the dataset. The visual representation provides valuable insights into the distribution of sequence coverage and confirms the identification of low-coverage regions in the sample, supporting the findings of this study. NCBI Sequence Viewer Link: https://www.ncbi.nlm.nih.gov/projects/sviewer/?id=CM000672.1&tracks=[key:sequence_track,name:T378820,display_name:Sequence,id:T378820,dbname:GenBank,annots:NA,ShowLabel:false,ColorGaps:false,shown:true,order:1][key:alignment_track,name:ERR1840777,display_name:ERR1840777,id:STD2123359385,dbname:SRA,setting_group:cSRA,annots:ERR1840777,Layout:Adaptive,StatDisplay:15,Color:ShowDifferences,UnalignedTailsMode:glyph,HideSraAlignments:none,sort_by:,LinkMatePairAligns:false,ShowAlnStat:true,AlignedSeqFeats:false,Label:false,IdenticalBases:false,shown:true,order:7]&srz=ERR1840777&assm_context=GCA_000001405.3&mk=42833500|42833500|blue|9&v=43251931:43680986&c=FFFFFF&select=null&slim=0..

References

    1. Bergant G, Maver A, Lovrecic L, Cuturilo G, Hodzic A, Peterlin B. Comprehensive use of extended exome analysis improves diagnostic yield in rare disease: a retrospective survey in 1,059 cases. Genetics in Medicine. 2018;20:303–312. - PubMed
    1. Bick AG, Flannick J, Ito K, Cheng S, Vasan RS, Parfenov MG, Herman DS, De Palma SR, Gupta N, Gabriel SB, Funke BH, Rehm HL, Benjamin EJ, Aragam J, Taylor J, Herman A. Fox ER, Newton-Cheh C, Kathiresan S, O’Donnell CJ, Wilson JG, Altshuler DM, Hirschhorn JN, Seidman JG, Seidman C. Burden of rare sarcomere gene variants in the Framingham and Jackson heart study Cohorts. American Journal of Human Genetics. 2012;91:513. doi: 10.1016/j.ajhg.2012.07.017. - DOI - PMC - PubMed
    1. Choi M, Scholl UI, Ji W, Liu T, Tikhonova IR, Zumbo P, Nayir A, Bakkaloğlu A, Özen S, Sanjad S, Nelson-Williams C, Farhi A, Mane S, Lifton RP. Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proceedings of the National Academy of Sciences of the United States of America. 2009;106:19096. doi: 10.1073/pnas.0910672106. - DOI - PMC - PubMed
    1. Clark MJ, Chen R, Lam HYK, Karczewski KJ, Chen R, Euskirchen G, Butte AJ, Snyder M. Performance comparison of exome DNA sequencing technologies. Nature Biotechnology. 2011;29:908–914. doi: 10.1038/nbt.1975. - DOI - PMC - PubMed
    1. Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, Li H. Twelve years of SAMtools and BCFtools. Gigascience. 2021;10:giab008. doi: 10.1093/gigascience/giab008. - DOI - PMC - PubMed

Publication types