Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Jan 29;17(1):39.
doi: 10.1186/s12920-024-01795-w.

Whole genome sequencing in clinical practice

Affiliations
Review

Whole genome sequencing in clinical practice

Frederik Otzen Bagger et al. BMC Med Genomics. .

Abstract

Whole genome sequencing (WGS) is becoming the preferred method for molecular genetic diagnosis of rare and unknown diseases and for identification of actionable cancer drivers. Compared to other molecular genetic methods, WGS captures most genomic variation and eliminates the need for sequential genetic testing. Whereas, the laboratory requirements are similar to conventional molecular genetics, the amount of data is large and WGS requires a comprehensive computational and storage infrastructure in order to facilitate data processing within a clinically relevant timeframe. The output of a single WGS analyses is roughly 5 MIO variants and data interpretation involves specialized staff collaborating with the clinical specialists in order to provide standard of care reports. Although the field is continuously refining the standards for variant classification, there are still unresolved issues associated with the clinical application. The review provides an overview of WGS in clinical practice - describing the technology and current applications as well as challenges connected with data processing, interpretation and clinical reporting.

Keywords: Clinical bioinformatics infrastructure; Functional variant testing; Variant filtering and interpretation; Whole genome sequencing.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Schematic representation of the WGS laboratory and bioinformatics flow. Short-read WGS protocols can in general be divided into four separate steps: 1. Sample preparation, 2. Library preparation, 3. Cluster generation, and 4. Sequencing. Panel 1, WGS is routinely performed with DNA from EDTA or citrate stabilized whole blood or surgically removed or biopsy tissue. DNA is isolated by conventional methods, but to facilitate CNV detection high molecular DNA is preferred. Historically, WGS required a DNA amplification step, but with newer protocols this step is no longer needed. Omission of the amplification step eliminates the PCR-bias and provides a more uniform coverage and quality [12]. The library is generated by fragmenting the high molecular DNA followed by ligation of adapters that will bind to the linker DNA on the chip surface. Moreover, barcodes allowing pooling of samples from different patients on the same chip may be attached. Panel 2, The libraries are subsequently loaded onto a flow cell and placed on the sequencer, after which the individual DNA fragments are clonally amplified by a polymerase, generating small single-stranded clusters of the particular fragments. The sequencing is in principle a conventional Sanger sequencing [5], where elongation is initiated by the addition of a sequence primer and polymerase and the nucleotide sequence is determined by the incorporation of complementary fluorescent-tagged nucleotide terminators. The fluorescent signal from the incorporated terminators is detected by scanning the chip and the individual clusters with a high-resolution confocal fluorescence laser detector after every round of nucleotide incorporation. Panel 3, Data are compiled in a fastq file that is being transferred to the high performance computer (HPC). In the HPC the reads are mapped and compiled in a .BAM file before variants are called listed in a .VCF file. Panel 4, The VCF is finally uploaded to the interpreters in the genomic laboratory for filtration, annotation and prioritization
Fig. 2
Fig. 2
Clinical applications of WGS. Whole Genome Sequencing (WGS) finds its primary clinical applications in diagnosing rare diseases and pinpointing actionable somatic variants within tumors. Beyond these crucial roles, WGS serves to unveil polygenic risk scores (PRS) and pharmacogenetic profiles. The spectrum of rare diseases and somatic variants encompasses both small and structural variations, all discernible through WGS data analysis. WGS also enables the identification of trinucleotide repeat expansions prevalent in neuro-muscular and degenerative diseases. Additionally, it sheds light on polygenic and pharmacogenomic profiles, elucidated by the presence of widespread small common variants. In a comprehensive approach, WGS not only captures the intricate details of genetic makeup but also unveils tumor signatures by deciphering distinctive patterns within somatic variants. Human insert was created with BioRender.com
Fig. 3
Fig. 3
Variant analysis of patients with rare diseases. Panel A Overview of the filtering steps and the number of variants in rare disease patients referred for WGS analysis (means of 6 patients). The total number of variants in each patient is just above 5 MIO. The analysis begins by elimination of ~ 200.000 low quality variants. Subsequently, common variants with an allele frequency above 2% are excluded, since these are considered unlikely to explain the occurrence of a rare disease. Known pathogenic variants are retained. Since gnomAD may not represent all common variants, variants are moreover filtered against a local (Danish) reference genome and this further reduces the number of variants to about 200.000. Thereafter, the analysis is focused on coding and splice site variants and on average this reduces the number of variants to ~ 2400. Application of additional filters e.g., omitting ACMG/AMP benign variants or those with low REVEL scores further brings the number of variants down to ~ 1500. Panel B On average the patients exhibit 83 loss of function (LOF) variants and 748 missense variants. The remaining variants belonged to other categories such as variants in the UTRs and deep into the intron. Finally, on average 67 variants were previously registered in ClinVar or HGMD and information on these can be readily retrieved and used in the interpretation. The pie chart below shows the ACMG/AMP classification of the variants showing that only a minority are classified as pathogenic and likely pathogenic (< 2.5%). On average only a single pathogenic variant is identified. In many cases the variant represents a recessive heterozygote variant with no obvious relevance for the patient’s disease. Almost one third of the variants represents variants of unknown significance (VUS). Panels C and D shows the total cumulative distribution of gnomad allele frequencies and REVEL scores of ACMG/AMP scored variants (from Varseq) among 63 unrelated patients, respectively. Intergenic variants were filtered away and any variant which had conflicting classifications was removed. Moreover, variants with an allele frequency of more than 0.5 or for which an allele frequency could not be found was removed. The results illustrate that allele frequency is relatively effective in excluding benign variants, whereas likely benign and VUS are not effectively separated from the likely pathogenic and pathogenic variants by frequency filtering. The REVEL score combining pathogenicity predictions from 18 individual scores, in contrast, is clearly discriminative and high scores are enriched among pathogenic variants. About 25% of the VUS exhibit REVEL score above 0.5 that may warrant further analysis of these variants. The number and details of variants in the plots is summarized the attached Supplemental data
Fig. 4
Fig. 4
WGS from patient to clinical report. The journey of Whole Genome Sequencing (WGS) commences and concludes at the patient’s bedside. Upon the attending physician’s assessment, a WGS analysis is deemed potentially beneficial for offering crucial clinical insights, either through diagnosis or by presenting alternative treatment options. Following comprehensive patient briefing and obtaining consent, a sample of whole blood or tumor is dispatched to the specialized laboratory equipped for WGS. Within the genomic laboratory, the sequence data undergo meticulous analysis by the skilled staff. Putative disease-associated variants are subsequently deliberated with the attending physician and, if necessary, a multidisciplinary team comprising medical professionals from pertinent specialties, forming a Multidisciplinary Team (MDT). Specialties include pathology, clinical genetics, immunology, and more. This collaboration aims to establish a conclusive diagnosis and assess the clinical relevance of identified variants. The conclusive clinical report is then transmitted to the clinical department, where the attending physician shares the results with the patient. This communication includes a comprehensive discussion of the implications for the patient and their condition, along with recommended actions. In instances where the initial analysis fails to pinpoint disease-causing variants, the stored WGS data undergoes periodic re-analysis (inner grey arrow). This ongoing process ensures the continuous integration of new knowledge, potentially leading to a diagnosis without the need for additional hospitalization and sampling. Furthermore, throughout the treatment course, various clinically relevant information, such as pharmacogenetics, may be extracted to enhance the overall patient care experience. Inserts were created with BioRender.com
Fig. 5
Fig. 5
Genomic localization of variants and their functional consequence. 1. Germ-line variants located in the gene regulatory domains such as promoters or locus control regions will affect the level of gene transcription. In most instances variants in the promoters disrupt the binding of trans-acting factors thereby reducing expression of the gene. The composition of regulatory motifs is in many instances incompletely understood and it is in general difficult to predict the consequence of these variants. A few diseases exhibit unstable trinucleotide repeat sequences in the promoter, that when expanded is known to impair transcription. The functional significance of promoter variants is normally demonstrated by loss of expression (LOE) via RNA sequencing or measurement of the encoded protein. Repeat expansions may also be directly discerned from the WGS data 2. Variants located at the canonical splice donor (GT) or acceptor (AG) sites or at a known A – branch-site are in general pathogenic since these strongly conserved sequences are essential for splicing. Variants located deeper in the intron or in the connecting exons can also disrupt splicing due to disruption of enhancer or silencer motifs but the significance of these variants is more difficult to predict. The evaluation of these variants in general requires minigene analysis and/or RNA sequencing. 3. Coding nonsense or frameshift variants lead to premature translation termination and shortening of the encoded protein. In most cases this can lead to loss of function (LOF). Missense variants and small indels may disrupt protein function in a number of different ways such as reducing enzymatic activity, stability, localization or structure and macromolecular assembly. Consequently, the evaluation of these variants requires deep insight into the proteins function and in many instances various kinds of functional analysis is necessary in order to classify the variants as pathogenic. Since the functional significance of a particular variant may be difficult to predict - even for canonical splice mutations and LOF variants - it recommended that all classes of variants undergo evaluation according to ACMG/AMP criteria in order to determine pathogenicity

References

    1. Bodmer WF, McKie R. The book of man: the human genome project and the quest to discover our genetic heritagge. New York: Scribner; 1995.
    1. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409(6822):860–921. - PubMed
    1. Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A, et al. The complete sequence of a human genome. Science. 2022;376(6588):44–53. - PMC - PubMed
    1. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, et al. The sequence of the human genome. Science. 2001;291(5507):1304–1351. - PubMed
    1. Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A. 1977;74(12):5463–5467. - PMC - PubMed

Publication types