Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan 24;16(731):eadi3883.
doi: 10.1126/scitranslmed.adi3883. Epub 2024 Jan 24.

Machine learning to detect the SINEs of cancer

Affiliations

Machine learning to detect the SINEs of cancer

Christopher Douville et al. Sci Transl Med. .

Abstract

We previously described an approach called RealSeqS to evaluate aneuploidy in plasma cell-free DNA through the amplification of ~350,000 repeated elements with a single primer. We hypothesized that an unbiased evaluation of the large amount of sequencing data obtained with RealSeqS might reveal other differences between plasma samples from patients with and without cancer. This hypothesis was tested through the development of a machine learning approach called Alu Profile Learning Using Sequencing (A-PLUS) and its application to 7615 samples from 5178 individuals, 2073 with solid cancer and the remainder without cancer. Samples from patients with cancer and controls were prespecified into four cohorts used for model training, analyte integration, and threshold determination, validation, and reproducibility. A-PLUS alone provided a sensitivity of 40.5% across 11 different cancer types in the validation cohort, at a specificity of 98.5%. Combining A-PLUS with aneuploidy and eight common protein biomarkers detected 51% of the cancers at 98.9% specificity. We found that part of the power of A-PLUS could be ascribed to a single feature-the global reduction of AluS subfamily elements in the circulating DNA of patients with solid cancer. We confirmed this reduction through the analysis of another independent dataset obtained with a different approach (whole-genome sequencing). The evaluation of Alu elements may therefore have the potential to enhance the performance of several methods designed for the earlier detection of cancer.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.. Study overview.
This study evaluated the potential of Alu elements to enhance methods designed for the earlier detection of cancer. We introduce an ML model called Alu-Profile Learning Using Sequencing (A-PLUS). The model uses differences in the representation of Alu elements in cell-free DNA (cfDNA) to predict cancer status. Samples were prespecified into four distinct cohorts used for model training, analyte integration and threshold determination, validation, and reproducibility.
Fig. 2.
Fig. 2.. Detection of cancer in cohort 2 plasma samples.
(A) Evaluation of the A-PLUS, GAS, and PROT individual performances as well as that of the multi-analyte classifier (A-PLUS + GAS + PROT). Sensitivities were calculated at 99% specificity in each case. Error bars represent 95% CI. (B) ROC curve for the samples in cohort 2. (C) Euler diagram of the overlap in biomarker predictions in cohort 2 cancer samples. (D) Euler diagram of the overlap in biomarker predictions in cohort 2 noncancer samples.
Fig. 3.
Fig. 3.. Detection of cancer in cohort 3 plasma samples.
(A) Comparison of cohort 2 and cohort 3 performance for A-PLUS predictions. Only cancer types common to both cohorts are depicted. (B) Comparison of cohort 2 and cohort 3 performance for the multi-analyte classifier (A-PLUS + GAS + PROT). Only cancer types common to both cohorts are depicted. (C) Full performance metrics for the individual assays and multi-analyte classifier in cohort 3. (D) Euler diagram of the overlap of biomarker predictions for cohort 3 cancer samples. (E) Euler diagram of the overlap of biomarker predictions in cohort 3 noncancer samples.
Fig. 4.
Fig. 4.. Reproducibility of biomarker predictions in technical replicates.
The concordance of positive and negative predictions is depicted with Euler diagrams. (A) A-PLUS concordance. (B) GAS concordance.
Fig. 5.
Fig. 5.. Cancer samples exhibit a global reduction in sequencing depth of AluS when compared with the total depth of the Alu elements (AluS fraction).
(A) AluS fraction of RealSeqS data from samples used in the current study (cohorts 1, 2, 3, and 4). (B) AluS fraction for WGS samples. (C) AluS fraction for a publicly available WGS dataset. There were 312,138 AluJ loci, 686,962 AluS loci, and 143,178 AluY loci distributed throughout the genome. RealSeqS, however, only amplified a subset of the total Alu loci (AluJ—125,797; AluS—485,614; AluY—86,252), hence the substantial difference between the AluS fractions between RealSeqS and WGS samples.

References

    1. Deininger P, Alu elements: Know the SINEs. Genome Biol. 12, 236 (2011). - PMC - PubMed
    1. Deininger PL, Batzer MA, Alu repeats and human disease. Mol. Genet. Metab. 67, 183–193 (1999). - PubMed
    1. Feinberg AP, Tycko B, The history of cancer epigenetics. Nat. Rev. Cancer 4, 143–153 (2004). - PubMed
    1. Rodriguez J, Vives L, Jordà M, Morales C, Muñoz M, Vendrell E, Peinado MA, Genome-wide tracking of unmethylated DNA Alu repeats in normal and cancer cells. Nucleic Acids Res. 36, 770–784 (2008). - PMC - PubMed
    1. Daskalos A, Nikolaidis G, Xinarianos G, Savvari P, Cassidy A, Zakopoulou R, Kotsinas A, Gorgoulis V, Field JK, Liloglou T, Hypomethylation of retrotransposable elements correlates with genomic instability in non-small cell lung cancer. Int. J. Cancer 124, 81–87 (2009). - PubMed

Publication types

LinkOut - more resources