Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar;38(3):314-319.
doi: 10.1038/s41587-019-0368-8. Epub 2020 Jan 6.

Accurate detection of mosaic variants in sequencing data without matched controls

Affiliations

Accurate detection of mosaic variants in sequencing data without matched controls

Yanmei Dou et al. Nat Biotechnol. 2020 Mar.

Abstract

Detection of mosaic mutations that arise in normal development is challenging, as such mutations are typically present in only a minute fraction of cells and there is no clear matched control for removing germline variants and systematic artifacts. We present MosaicForecast, a machine-learning method that leverages read-based phasing and read-level features to accurately detect mosaic single-nucleotide variants and indels, achieving a multifold increase in specificity compared with existing algorithms. Using single-cell sequencing and targeted sequencing, we validated 80-90% of the mosaic single-nucleotide variants and 60-80% of indels detected in human brain whole-genome sequencing data. Our method should help elucidate the contribution of mosaic somatic mutations to the origin and development of disease.

PubMed Disclaimer

Conflict of interest statement

COMPETING FINANCIAL INTERESTS

The authors declare no competing financial interests.

Figures

Fig. 1:
Fig. 1:. Framework of MosaicForecast to detect mosaic SNVs from bulk sequencing data.
(a) Candidate mosaics were classified as ‘hap=2’, ‘hap=3’ or ‘hap>3’ by read-based phasing, and a Random Forest model was trained to predict the phasing by using 25 read-level features as covariates. The model was then applied to non-phasable sites to predict their genotypes. Given a list of experimentally-evaluated sites, the model could be further improved by an additional genotype-refinement step. (b) The relative importance of the features from the RF model for the brain WGS data, with four examples of read-level features. (c) 483 phasable sites were orthogonally evaluated by single cell, trio, and targeted sequencing data. After genotype refinement, the phasable sites classified as ‘hap=2’, ‘hap=3’ and ‘hap>3’ were converted to ‘het’, ‘mosaic’, ‘repeat/CNV’ and ‘refhom’ for training. (d) We applied MosaicForecast to non-phasable MuTect2 candidate mosaics and evaluated them in single cell, trio, and targeted sequencing data. In non-repeat regions, the precision increased from 8.9% (MuTect2) to 76% for the Phasing prediction model and 85% for the Refined genotypes prediction model; in the RepeatMaster region, it increased from 1% (MuTect2) to 50% in the Phasing prediction model and 77% in the Refined genotypes prediction model in RepeatMasker regions.
Fig. 2:
Fig. 2:. Comparison among algorithms.
(a) Candidate mosaics (both phasable and non-phasable) in the three individuals with single cell data were evaluated (see Methods). (b) Precision and recall are plotted separately for the non-repeat and repeat regions (as defined by RepeatMasker) and for each individual.
Fig. 3:
Fig. 3:. Impact of read depth on sensitivity and detection of mosaic indels.
(a) At each coverage, a different RF model was trained on the phasable sites and predictions were made on non-phasable sites. Amplicon-sequencing data were used for validation. Although fewer true mosaics were identified at lower coverages, the sensitivity did not drop significantly (e.g., at 50X, MosaicForecast was able to detect ~80% of real variants identified at 250X). (b) Similar to (a) but using simulated data. The sensitivity was ~70% at 50X. (c) >70% of mosaic deletions called by MosaicForecast were validated by IonTorrent; the ‘hap=3’ sites and non-phasable sites had similar validation rates. (d) similar to (c) but for mosaic insertions.

References

    1. Biesecker LG & Spinner NB A genomic view of mosaicism and human disease. Nat Rev Genet 14, 307–320 (2013). - PubMed
    1. Bae T et al. Different mutational rates and mechanisms in human cells at pregastrulation and neurogenesis. Science 359, 550–555 (2018). - PMC - PubMed
    1. Ju YS et al. Somatic mutations reveal asymmetric cellular dynamics in the early human embryo. Nature 543, 714–718 (2017). - PMC - PubMed
    1. Ye AY et al. A model for postzygotic mosaicisms quantifies the allele fraction drift, mutation rate, and contribution to de novo mutations. Genome Res (2018). - PMC - PubMed
    1. Lodato MA et al. Somatic mutation in single human neurons tracks developmental and transcriptional history. Science 350, 94–98 (2015). - PMC - PubMed

Publication types