Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun 23;7(2):lqaf087.
doi: 10.1093/nargab/lqaf087. eCollection 2025 Jun.

MOLGENIS VIP: an end-to-end DNA variant interpretation pipeline for research and diagnostics configurable to support rapid implementation of new methods

Affiliations

MOLGENIS VIP: an end-to-end DNA variant interpretation pipeline for research and diagnostics configurable to support rapid implementation of new methods

Willem T K Maassen et al. NAR Genom Bioinform. .

Abstract

Achieving high yield in genetics research and genome diagnostics is a significant challenge because it requires a combination of multiple strategies and large-scale genomic analysis using the latest methods. Existing diagnostic software infrastructures are often unable to cope with high demands for versatility and scalability. We developed MOLGENIS VIP, a flexible, scalable, high-throughput, open-source, and "end-to-end" pipeline to process different types of sequencing data into portable, prioritized variant lists for immediate clinical interpretation in a wide variety of scenarios. VIP supports interpretation of short- and long-read sequencing data, using best-practice annotations and classification trees without complex IT infrastructures. VIP is developed within the long-living MOLGENIS open-source project to provide sustainability and has integrated feedback from a growing international community of users. VIP has undergone genome diagnostic laboratory testing and harnesses experiences from multiple Dutch, European, Canadian, and African diagnostic and infrastructural initiatives (VKGL, EU-Solve-RD, EJP-RD, CINECA, GA4GH). We provide a step-by-step protocol for installing and using VIP. We demonstrate VIP using 25 664 previously classified variants from the VKGL, and 18 and 41 diagnosed patients from a routine diagnostics and a Solve-RD research cohort, respectively. We believe that VIP accelerates causal variant detection and innovation in genome diagnostics and research.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
Summary of modules in VIP. As input, VIP requires a sample sheet in which the patient information is specified. In module 1, the input is validated and preprocessed, resulting in a VCF file. The workflow can be started at all points in the preprocessing step. Module 2 provides the variants in the VCF file with annotations from bioinformatic tools and resources (Supplementary Table S2). Module 3 filters the variants using a customizable decision tree and inheritance information from the previous modules. Finally, VIP generates an interactive report in which the logic for the classifications is explained. Diagnosticians and researchers can use this report for further interpretation of the variants and sharing of the results.
Figure 2.
Figure 2.
Default decision tree. The figure shows a schematic version of the default decision tree. The green blocks and bold arrows represent the general sequence of filtering steps and the values that are evaluated for each variant (see legend). Each value is calculated in the annotation module (Supplementary Table S2). Small arrows represent the decisions for the consequence classifications by VIP. VIP classifies the different consequences as B, LB, VUS, LP, or P. After a variant is classified by VIP, it exits the filter tree. Variants with incorrect contigs or genes and low quality are removed (RM). Using the JSON format, each component in the decision tree can be customized to fit the workflow of the user.
Figure 3.
Figure 3.
Interactive report. This figure shows an example of the base view of the interactive report. The interactive report opens in the sample screen (A). Here, all the individuals within a family are shown. To navigate to the list of variants with their predicted class, users can navigate to the variant view by clicking one of the individuals (C). The default variant view shows the variants and the consequence of the transcript with the highest CAPICE score. The variants in the list can be filtered using different filters, such as HPO filters, inheritance filters, and predicted class filters (B). The report also contains more detailed views to show all annotations for each transcript and which criteria were used to classify a specific variant. When the BAM workflow is used, the built-in genome viewer can be used to study the context of variants within reads that were mapped to the reference genome.
Figure 4.
Figure 4.
Number of recalled variants that are previously classified. Y-axis shows the percentage of variants that VIP was able to recall as LP or P. The first bar at left represents the variants of VKGL release 2024-2 that were used by VIP in the VKGL and CAPICE annotation steps. The second bar represents the newly added variants between VKGL release 2023-11 and release 2024-2 that were not used by VIP. The absolute number of variants that were recalled is shown at the top of the bars.
Figure 5.
Figure 5.
Number of candidate variants and recall rate. (A) The average number of candidate variants per patient in the routine diagnostic cohort. (B) The average number of candidate variants per patient in the Solve-RD research cohort. On the x-axis, the different filters used in the interactive report are specified. The recall rate of causal variants per filter is displayed above the individual bars.
Figure 6.
Figure 6.
Number of candidate variants per patient and rank of causal variants. Panels show the average number of candidate variants per patient plotted versus the average rank of the causal variants for the patients in the (A) routine diagnostic and (B) Solve-RD research cohorts. Patients for whom the causal variants were not found after applying the different filters are not included.

References

    1. Claussnitzer M, Cho JH, Collins R et al. A brief history of human disease genetics. Nature. 2020; 577:179–89. 10.1038/s41586-019-1879-7. - DOI - PMC - PubMed
    1. Liu Y, Yeung WSB, Chiu PCN et al. Computational approaches for predicting variant impact: an overview from resources, principles to applications. Front Genet. 2022; 13:981005. 10.3389/fgene.2022.981005. - DOI - PMC - PubMed
    1. The Lancet Diabetes Endocrinology Spotlight on rare diseases. Lancet Diabetes Endocrinol. 2019; 7:75. 10.1016/S2213-8587(19)30006-3. - DOI - PubMed
    1. Nguengang Wakap S, Lambert DM, Olry A et al. Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database. Eur J Hum Genet. 2020; 28:165–73. 10.1038/s41431-019-0508-0. - DOI - PMC - PubMed
    1. He X, Zhang Y, Yuan D et al. DIVIS: integrated and customizable pipeline for cancer genome sequencing analysis and interpretation. Front Oncol. 2021; 11:672597. 10.3389/fonc.2021.672597. - DOI - PMC - PubMed

LinkOut - more resources