Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Dec 1;21(23):9177.
doi: 10.3390/ijms21239177.

A Long-Read Sequencing Approach for Direct Haplotype Phasing in Clinical Settings

Affiliations

A Long-Read Sequencing Approach for Direct Haplotype Phasing in Clinical Settings

Simone Maestri et al. Int J Mol Sci. .

Abstract

The reconstruction of individual haplotypes can facilitate the interpretation of disease risks; however, high costs and technical challenges still hinder their assessment in clinical settings. Second-generation sequencing is the gold standard for variant discovery but, due to the production of short reads covering small genomic regions, allows only indirect haplotyping based on statistical methods. In contrast, third-generation methods such as the nanopore sequencing platform developed by Oxford Nanopore Technologies (ONT) generate long reads that can be used for direct haplotyping, with fewer drawbacks. However, robust standards for variant phasing in ONT-based target resequencing efforts are not yet available. In this study, we presented a streamlined proof-of-concept workflow for variant calling and phasing based on ONT data in a clinically relevant 12-kb region of the APOE locus, a hotspot for variants and haplotypes associated with aging-related diseases and longevity. Starting with sequencing data from simple amplicons of the target locus, we demonstrated that ONT data allow for reliable single-nucleotide variant (SNV) calling and phasing from as little as 60 reads, although the recognition of indels is less efficient. Even so, we identified the best combination of ONT read sets (600) and software (BWA/Minimap2 and HapCUT2) that enables full haplotype reconstruction when both SNVs and indels have been identified previously using a highly-accurate sequencing platform. In conclusion, we established a rapid and inexpensive workflow for variant phasing based on ONT long reads. This allowed for the analysis of multiple samples in parallel and can easily be implemented in routine clinical practice, including diagnostic testing.

Keywords: diagnostic testing; haplotype phasing; nanopore sequencing; variant calling.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Target region selected for this the study. The genomic region chr19:44,902,730–44,915,006 (hg38) visualized in the UCSC Genome Browser, identifying the single-nucleotide polymorphisms (SNVs) included in dbSNP. SNVs indicated with gray ticks represent synonymous variants in coding regions. SNVs indicated with red ticks represent non-synonymous variants in coding regions. SNVs indicated with blue ticks represent variants in untranslated regions. SNVs indicated with black ticks represent either variants in intronic regions or upstream/downstream gene variants. The positions of the PCR primers are shown in green.
Figure 2
Figure 2
Amplification of the APOE locus. (A) Agarose gel electrophoresis of amplicons generated from sample V1 genomic DNA using LongAmp Taq Polymerase (NEB) and commercial primers (Qiagen) or custom primers. The expected amplicon size is ~12 kb. (B) Agarose gel electrophoresis of amplicons generated from sample V1 genomic DNA or from PCR products (V1 amplicons) generated in (A), using nested primers designed to amplify a 412-bp product from APOE exon 3. (C) Agarose gel electrophoresis of long-range PCR amplicons generated using custom primers in the presence of 2.5%, 4%, or 5% DMSO. (D) Agarose-gel electrophoresis of amplicons generated using custom primers and 4% DMSO and further selected by gel excision and on-column purification (manual) or the BluePippin device. (E) Capillary electrophoresis of the same products from (D). Starting and recovered μg refer to the amount of amplicon before and after size selection and the relative yield. NTC = no-template control.
Figure 3
Figure 3
Integrative Genomics Viewer (IGV) visualization of ONT mapped reads from sample V1 at position 11,658. IGV visualization showing a region where a homozygous SNV at position 11,658 was not identified based on ONT data. The blue box shows that the SNV occurs within a long homopolymer run.
Figure 4
Figure 4
Accuracy of SNV calling from ONT reads. The graph shows the harmonic mean of precision and recall for SNV calling as an F1 score averaged across samples, when different numbers of ONT reads (10, 30, 60, 100, 300, 600, 1000, or 10,000) are used. Error bars represent the standard error.
Figure 5
Figure 5
Experimental pipeline to assess the performance of alignment and phasing software. (A) Sets ranging from 10 to 10,000 ONT reads were considered. (B) Variants in the region of interest were identified using the highly accurate Illumina platform and stored in a VCF file (ground-truth unphased variants). ONT reads were mapped to the reference sequence using either BWA or Minimap2, producing a BAM file. (C) VCF and BAM files constitute the input files for the phasing software (WhatsHap or HapCUT2), which generated a phased VCF file allowing haplotype reconstruction. The ONT-phased VCF file was then compared to a reference-phased VCF file (ground-truth phased variants). Analysis was carried out 100 times to obtain more robust estimates of accuracy.
Figure 6
Figure 6
Accuracy of full haplotype reconstruction. Haplotypes were called 100 times using all possible combinations of two alignment and phasing software and different read numbers (10, 30, 60, 100, 300, 600, 1000, and 10,000). The graph shows the frequency at which the fully correct haplotype was called (averaged across samples) with error bars representing the standard error.

References

    1. Scitable by Nature Education. [(accessed on 30 November 2020)]; Available online: https://www.nature.com/scitable/definition/haplotype-haplotypes-142/
    1. Allen M., Kachadoorian M., Quicksall Z., Zou F., Chai H.S., Younkin C., E Crook J., Pankratz V.S., Carrasquillo M.M., Krishnan S., et al. Association of MAPT haplotypes with Alzheimer’s disease risk and MAPT brain gene expression levels. Alzheimer’s Res. Ther. 2014;6:39. doi: 10.1186/alzrt268. - DOI - PMC - PubMed
    1. Williams M.A., McKay G.J., Carson R., Craig D., Silvestri G., Passmore P. Age-Related Macular Degeneration-Associated Genes in Alzheimer Disease. Am. J. Geriatr. Psychiatry. 2015;23:1290–1296. doi: 10.1016/j.jagp.2015.06.005. - DOI - PubMed
    1. Lescai F., Chiamenti A.M., Codemo A., Pirazzini C., D’Agostino G., Ruaro C., Ghidoni R., Benussi L., Galimberti D., Esposito F., et al. An APOE Haplotype Associated with Decreased epsilon4 Expression Increases the Risk of Late Onset Alzheimer’s Disease. J. Alzheimer’s Dis. 2011;24:235–245. doi: 10.3233/JAD-2011-101764. - DOI - PubMed
    1. Navarro S., Medina P., Mira Y., Estelles A., Villa P., Ferrando F., Vaya A., Bertina R.M., España F. Haplotypes of the EPCR gene, prothrombin levels, and the risk of venous thrombosis in carriers of the prothrombin G20210A mutation. Haematol. 2008;93:885–891. doi: 10.3324/haematol.12448. - DOI - PubMed

Substances