Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Oct;24(10):1089-1099.
doi: 10.1016/j.jmoldx.2022.06.006. Epub 2022 Jul 19.

NGS4THAL, a One-Stop Molecular Diagnosis and Carrier Screening Tool for Thalassemia and Other Hemoglobinopathies by Next-Generation Sequencing

Affiliations

NGS4THAL, a One-Stop Molecular Diagnosis and Carrier Screening Tool for Thalassemia and Other Hemoglobinopathies by Next-Generation Sequencing

Yujie Cao et al. J Mol Diagn. 2022 Oct.

Abstract

Thalassemia is one of the most common genetic diseases and a major health threat worldwide. Accurate, efficient, and scalable analysis of next-generation sequencing (NGS) data is much needed for its molecular diagnosis and carrier screening. We developed NGS4THAL, a bioinformatics analysis pipeline analyzing NGS data to detect pathogenic variants for thalassemia and other hemoglobinopathies. NGS4THAL realigns ambiguously mapped NGS reads derived from the homologous Hb gene clusters for accurate detection of point mutations and small insertions/deletions. It uses a combination of complementary structural variant (SV) detection tools and an in-house database of control data containing specific SVs to achieve accurate detection of the complex SV types. Detected variants are matched with those in HbVar (A Database of Human Hemoglobin Variants and Thalassemia Mutations), allowing recognition of known pathogenic variants, including disease modifiers. Tested on simulation data, NGS4THAL achieved high sensitivity and specificity. For targeted NGS sequencing data from samples with laboratory-confirmed pathogenic Hb variants, it achieved 100% detection accuracy. Application of NGS4THAL on whole genome sequencing data from unrelated studies revealed thalassemia mutation carrier rates for Hong Kong Chinese and Northern Vietnamese that were consistent with previous reports. NGS4THAL is a highly accurate and efficient molecular diagnosis tool for thalassemia and other hemoglobinopathies based on tailored analysis of NGS data and may be scaled for population carrier screening.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Workflow of NGS4THAL. CNV, copy number variation; InDel, small insertion and deletion; RMA, next-generation sequencing reads with multiple alignments; SNV, single-nucleotide variant; SV, structural variant.
Figure 2
Figure 2
Detection of point mutations and small insertions and deletions (InDels) by NGS4THAL on simulated data. AD: Simulated variants and detected variants by NGS4THAL are shown for HBA2, HBA1, HBB, and HBD, respectively. Point mutations are shown as thin lines, and small InDels are shown as color blocks proportional to their sizes. SNV, single-nucleotide variant.
Figure 3
Figure 3
Detection of Hb structural variants (SVs) by NGS4THAL using simulated data. AD: The next-generation sequencing signatures and detection of subtype-A to subtype-D Hb SVs, the classification of which is based on relative positions between the homologous regions and the SV breakpoints. For each SV subtype: Top panels: the alignment landscape in the format of Integrative Genomics Viewer screenshot, followed by the simulated SV and the detection results from BreakDancer, Pindel, and CoNIFER (with cutoffs of 1.0, 0.8, 0.5, and 0.3, respectively). CoNIFER detections with cutoff of 0.5 are used in NGS4THAL and are shown in green. Bottom panels: Read-depth changes detected by CoNIFER. Black lines are read-depth changes for all the simulated individuals. Red curves are read-depth changes for the SV carriers in question. The y axis is the relative copy number with reference to the baseline defined by all input samples, and the vertical dashed green lines illustrate the start and end of the CoNIFER variant detection using a cutoff of 0.5. The simulated SVs are 2.7-kb deletion (NG_000006.1:g.36664_39364del2701), representing SV subtype A; 5.3-kb deletion (NG_000006.1:g.28684_33930del5246), representing subtype B; 3.5-kb deletion (NG_000006.1:g.37464_40964del3501), representing subtype C; and 3.7-kb (type I) deletion (NG_000006.1:g.34164_37967del3804), representing subtype D. Variant details can be found at https://www.ncbi.nlm.nih.gov/nuccore (last accessed July 8, 2022). Chr, chromosome.
Figure 4
Figure 4
Detection of a compound structural variant (SV) (--SEA/–α3.7) by NGS4THAL through a stage-wise process. (1) A: Detection of --SEA/–α3.7. B: Detection of –α3.7/–α3.7 as comparison. (2) Integrative Genomics Viewer (IGV) screenshot panel shows the alignment landscape and the detections from BreakDancer, Pindel, and CoNIFER; screening stage panel shows the read-depth change in the initial screening stage when all the samples are pooled together for analysis by CoNIFER; fine-profiling stage panel shows the read-depth change in the fine-profiling stage when pooling all the --SEA deletion carriers together to serve as baseline controls (from in-house control database if necessary). Black lines are read-depth changes for all the simulated individuals. Red curves are read-depth changes for the SV carrier in question, and the vertical dashed green lines illustrate the start and end of the CoNIFER variant callings using a cutoff of 0.5. (3) Detection of homozygous –α3.7, shown in B, does not need the fine-profiling stage.
Supplemental Figure S1
Supplemental Figure S1
Illustration of characteristics of Hb structural variants (SVs) from next-generation sequencing (NGS). Hb SV (eg, deletion) classification is based on relative positions between the homologous regions and SVs, and their characteristics from paired-end short-read NGS sequencing, including read pair, split read, and read depth, are shown. Blue ovals show regions where the expected read-depth change should be observed.

Similar articles

Cited by

References

    1. Piel F.B., Weatherall D.J. The α-thalassemias. N Engl J Med. 2014;371:1908–1916. - PubMed
    1. Rund D., Rachmilewitz E. β-Thalassemia. N Engl J Med. 2005;353:1135–1146. - PubMed
    1. Modell B., Darlison M. Global epidemiology of haemoglobin disorders and derived service indicators. Bull World Health Organ. 2008;86:480–487. - PMC - PubMed
    1. Giardine B., Borg J., Viennas E., Pavlidis C., Moradkhani K., Joly P., Bartsakoulia M., Riemer C., Miller W., Tzimas G., Wajcman H., Hardison R.C., Patrinos G.P. Updates of the HbVar database of human hemoglobin variants and thalassemia mutations. Nucleic Acids Res. 2014;42:D1063–D1069. - PMC - PubMed
    1. Higgs D.R. The molecular basis of α-thalassemia. Cold Spring Harb Perspect Med. 2013;3:a011718. - PMC - PubMed

Publication types