Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Dec 14;118(50):e2114937118.
doi: 10.1073/pnas.2114937118.

Single-molecule sequencing reveals a large population of long cell-free DNA molecules in maternal plasma

Affiliations

Single-molecule sequencing reveals a large population of long cell-free DNA molecules in maternal plasma

Stephanie C Y Yu et al. Proc Natl Acad Sci U S A. .

Abstract

In the field of circulating cell-free DNA, most of the studies have focused on short DNA molecules (e.g., <500 bp). The existence of long cell-free DNA molecules has been poorly explored. In this study, we demonstrated that single-molecule real-time sequencing allowed us to detect and analyze a substantial proportion of long DNA molecules from both fetal and maternal sources in maternal plasma. Such molecules were beyond the size detection limits of short-read sequencing technologies. The proportions of long cell-free DNA molecules in maternal plasma over 500 bp were 15.5%, 19.8%, and 32.3% for the first, second, and third trimesters, respectively. The longest fetal-derived plasma DNA molecule observed was 23,635 bp. Long plasma DNA molecules demonstrated predominance of A or G 5' fragment ends. Pregnancies with preeclampsia demonstrated a reduction in long maternal plasma DNA molecules, reduced frequencies for selected 5' 4-mer end motifs ending with G or A, and increased frequencies for selected motifs ending with T or C. Finally, we have developed an approach that employs the analysis of methylation patterns of the series of CpG sites on a long DNA molecule for determining its tissue origin. This approach achieved an area under the curve of 0.88 in differentiating between fetal and maternal plasma DNA molecules, enabling the determination of maternal inheritance and recombination events in the fetal genome. This work opens up potential clinical utilities of long cell-free DNA analysis in maternal plasma including noninvasive prenatal testing of monogenic diseases and detection/monitoring of pregnancy-associated disorders such as preeclampsia.

Keywords: cell-free DNA; epigenetics; monogenic diseases; noninvasive prenatal testing; third-generation sequencing.

PubMed Disclaimer

Conflict of interest statement

Competing interest statement: A patent application on the described technology has been filed by S.C.Y.Y., P.J., W.P., S.H.C., Y.T.T.C., K.C.A.C., R.W.K.C., and Y.M.D.L. and licensed to Take2 Holdings Limited founded by K.C.A.C., R.W.K.C., and Y.M.D.L.

Figures

Fig. 1.
Fig. 1.
Overview of the study design. Briefly, cell-free DNA molecules from maternal plasma samples were sequenced with PacBio SMRT sequencing. We first determined the abundance of long cell-free DNA molecules in plasma samples from different trimesters of pregnancies and then characterized short and long DNA molecules by analyzing their end motif profiles. We further performed size and end motif analyses on plasma DNA samples from pregnancies with preeclampsia and aimed to identify potential biomarkers for this pregnancy-associated disorder. Finally, we performed methylation analysis to determine the methylation pattern (i.e., the order of methylated [M] and unmethylated [U] cytosines in all CpG sites) of individual plasma DNA molecules and explored an approach to deduce the tissue of origin of individual plasma DNA molecules according to the methylation pattern. This single-molecule tissue-of-origin analysis of maternal plasma DNA was then applied to deduce the maternal inheritance of the fetus.
Fig. 2.
Fig. 2.
Size distributions of maternal plasma DNA molecules from different sequencing platforms. (A) Percentages of fragments above a given size indicated on the x axis for a maternal plasma DNA sample from a third-trimester pregnancy sequenced with both Illumina NovaSeq (red bars) and PacBio Sequel II systems (cyan bars). (B) Percentage of DNA molecules greater than 1 kb in third-trimester maternal plasma samples sequenced with either the Illumina HiSeq (n = 10) or the PacBio Sequel II system (n = 11).
Fig. 3.
Fig. 3.
Size distributions of cell-free DNA molecules from first- (n = 7), second- (n = 10), and third-trimester maternal plasma samples (n = 11). (A) Size distributions of cell-free DNA molecules from different trimesters of pregnancy are plotted for on a logarithmic scale for the y axis. Blue, yellow, and red curves represented the plasma DNA size profile for the first, second, and third trimesters, respectively. (B) Boxplots showing percentages of plasma DNA molecules greater than 500 bp in maternal plasma samples from different trimesters of pregnancy. (C) Size distributions of fetal- (red curve) and maternal-derived DNA molecules (blue curve) in the maternal plasma plotted on a logarithmic scale for the y axis. (D) Boxplots showing percentages of fetal- (red) and maternal-derived (blue) plasma DNA molecules greater than 500 bp from different trimesters of pregnancy.
Fig. 4.
Fig. 4.
Size and fragment end analyses of maternal plasma DNA molecules. (A) Percentages of fragments ended with A (red), C (yellow), G (blue), and T (green) at the 5′ end of cell-free DNA molecules from first-trimester maternal plasma across the range of fragment sizes from 0 to 3 kb (with x axis plotted on a logarithmic scale). (B) Hierarchical clustering analysis of short and long plasma cell-free DNA molecules using frequencies of the 256 4-mer end motifs. Plasma DNA molecules from each sample are divided into two groups according to the fragment size, namely short and long fragments for those with fragment sizes of ≤500 bp and >500 bp, respectively. Each column indicates a subset from a sample used for analyzing the end motif frequency based on short (denoted by the cyan in the first row) and long fragments (denoted by the yellow in the first row), respectively. Starting from the second row, each row indicates a type of end motif. The end motif frequencies are represented with a series of color gradients according to the row-normalized frequencies (z-score) (i.e., the number of SDs below or above the mean frequency across samples). The red end of the color spectrum indicates a higher frequency of an end motif, and the blue end of the color spectrum indicates a lower frequency of an end motif.
Fig. 5.
Fig. 5.
Size and end motif analyses of plasma DNA from pregnancies with preeclampsia. (A) Size distributions of cell-free DNA molecules from pregnancies with early- (n = 5) and late-onset preeclampsia (n = 5) and their respective gestational-age-matched controls (n = 5 for each respective control group) are plotted on a logarithmic scale for the y axis. Orange, green, red, and blue curves represent the plasma DNA size profile for early-onset preeclampsia, control for early-onset preeclampsia, late-onset preeclampsia, and control for late-onset preeclampsia, respectively. (B) Boxplots showing percentages of plasma DNA molecules greater than 170 bp in maternal plasma samples from control and preeclamptic groups. (C) ROC curve on the use of 256 plasma DNA end motifs for differentiating pregnancies with and without preeclampsia. (D) Boxplots of three representative motifs showing a significant increase in frequency in preeclamptic subjects. In B and D, green, blue, orange, and red dots represented plasma DNA samples from control for early-onset preeclampsia, control for late-onset preeclampsia, early-onset preeclampsia, and late-onset preeclampsia, respectively. (E) Boxplots comparing the expression levels of DNASE2 between normal and preeclamptic placentas.
Fig. 6.
Fig. 6.
Tissue-of-origin analysis using long plasma DNA molecules. (A) Schematic illustration of the tissue-of-origin analysis of plasma DNA. Cell-free DNA molecules from maternal plasma are sequenced with PacBio SMRT sequencing. The methylation status of each CpG site on a plasma DNA molecule is determined using the HK model (9). The methylation pattern of individual plasma DNA molecules is compared to the reference methylomes of buffy coat and placenta obtained from high-depth bisulfite sequencing data. A process of methylation status matching is performed (see details in Materials and Methods) to classify individual plasma DNA molecules as being derived from the buffy coat or the placenta. (B) ROC curve showing the performance of the tissue-of-origin analysis using plasma DNA. (C) A computer simulation analysis showing how the number of CpG sites in a plasma DNA molecule affects the performance (AUC) of the tissue-of-origin analysis. (D) Correlation between the percentage of placenta-derived plasma DNA molecules determined by the single-molecule tissue-of-origin analysis and the fetal DNA fraction determined by the SNP-based approach.
Fig. 7.
Fig. 7.
Principle of noninvasive prenatal testing of monogenic diseases by the analysis of long cell-free DNA fragments in maternal plasma. The top panel shows the parental haplotypes (i.e., the paternal Hap I and Hap II, and the maternal Hap I and Hap II) linked to a gene responsible for a monogenic disease (denoted by “Gene”). Long DNA fragments in maternal plasma are analyzed with single-molecule sequencing. To determine the paternal inheritance of the fetus, one can use an SNP locus where the father is heterozygous (e.g., G/A) and the mother is homozygous (e.g., A/A). The detection of a DNA fragment carrying the paternal-specific allele (i.e., the allele carrying G) in the maternal plasma suggests that the fetus has inherited the paternal Hap I. To determine the maternal inheritance of the fetus, one can use an SNP locus where the father is homozygous (e.g., T/T) and the mother is heterozygous (e.g., C/T). DNA fragments containing the maternal-specific allele (i.e., the allele carrying C) are first identified. One can then determine whether any placenta-derived plasma DNA molecules containing the maternal-specific allele can be detected. Placenta-derived plasma DNA molecules are identified by comparing the CpG methylation pattern of individual plasma DNA molecules with the reference tissue methylome through the process of methylation status matching described in this study. The detection of a placenta-derived plasma DNA molecule containing the maternal-specific allele (i.e., the allele carrying C) that is not identical to the paternal homozygous allele suggests that the fetus has inherited the maternal Hap I. The interpretation (i.e., whether the fetus is affected or unaffected by the disease) depends on the clinical scenario and the mode of inheritance of the disease concerned.

References

    1. Lo Y. M. D., et al. , Maternal plasma DNA sequencing reveals the genome-wide genetic and mutational profile of the fetus. Sci. Transl. Med. 2, 61ra91 (2010). - PubMed
    1. Amicucci P., Gennarelli M., Novelli G., Dallapiccola B., Prenatal diagnosis of myotonic dystrophy using fetal DNA obtained from maternal plasma. Clin. Chem. 46, 301–302 (2000). - PubMed
    1. Fan H. C., Blumenfeld Y. J., Chitkara U., Hudgins L., Quake S. R., Analysis of the size distributions of fetal and maternal cell-free DNA by paired-end sequencing. Clin. Chem. 56, 1279–1286 (2010). - PubMed
    1. De Maio N., et al. ; On Behalf of the Rehab Consortium, Comparison of long-read sequencing technologies in the hybrid assembly of complex bacterial genomes. Microb. Genom. 5, e000294 (2019). - PMC - PubMed
    1. Tan G., Opitz L., Schlapbach R., Rehrauer H., Long fragments achieve lower base quality in Illumina paired-end sequencing. Sci. Rep. 9, 2856 (2019). - PMC - PubMed

Publication types

Substances

LinkOut - more resources