Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan 2:13:giae008.
doi: 10.1093/gigascience/giae008.

Multi-omic dataset of patient-derived tumor organoids of neuroendocrine neoplasms

Affiliations

Multi-omic dataset of patient-derived tumor organoids of neuroendocrine neoplasms

Nicolas Alcala et al. Gigascience. .

Abstract

Background: Organoids are 3-dimensional experimental models that summarize the anatomical and functional structure of an organ. Although a promising experimental model for precision medicine, patient-derived tumor organoids (PDTOs) have currently been developed only for a fraction of tumor types.

Results: We have generated the first multi-omic dataset (whole-genome sequencing [WGS] and RNA-sequencing [RNA-seq]) of PDTOs from the rare and understudied pulmonary neuroendocrine tumors (n = 12; 6 grade 1, 6 grade 2) and provide data from other rare neuroendocrine neoplasms: small intestine (ileal) neuroendocrine tumors (n = 6; 2 grade 1 and 4 grade 2) and large-cell neuroendocrine carcinoma (n = 5; 1 pancreatic and 4 pulmonary). This dataset includes a matched sample from the parental sample (primary tumor or metastasis) for a majority of samples (21/23) and longitudinal sampling of the PDTOs (1 to 2 time points), for a total of n = 47 RNA-seq and n = 33 WGS. We here provide quality control for each technique and the raw and processed data as well as all scripts for genomic analyses to ensure an optimal reuse of the data. In addition, we report gene expression data and somatic small variant calls and describe how they were generated, in particular how we used WGS somatic calls to train a random forest classifier to detect variants in tumor-only RNA-seq. We also report all histopathological images used for medical diagnosis: hematoxylin and eosin-stained slides, brightfield images, and immunohistochemistry images of protein markers of clinical relevance.

Conclusions: This dataset will be critical to future studies relying on this PDTO biobank, such as drug screens for novel therapies and experiments investigating the mechanisms of carcinogenesis in these understudied diseases.

Keywords: cancer; genomics; neuroendocrine neoplasm; organoid; quality control; transcriptomics.

PubMed Disclaimer

Conflict of interest statement

Where authors are identified as personnel of the International Agency for Research on Cancer/World Health Organization, the authors alone are responsible for the views expressed in this article and they do not necessarily represent the decisions, policy, or views of the International Agency for Research on Cancer/World Health Organization.

H.C.’s full disclosure is given at [70]. H.C. is inventor of several patents related to organoid technology, cofounder of Xilis, and currently an employee of Roche, Basel.

Figures

Figure 1:
Figure 1:
Quality control of the raw WGS data. (A) Distribution of the mean sequence quality of the reads in Phred score. (B) Mean sequence quality score as a function of the position in the read in base pairs (bp). (C) Distribution of the GC content in percentages. (D) Percentage of reads containing a sequence corresponding to the Illumina adapter sequence as a function of the position in the read in bp. (E) Percentage of the library with a given level of duplication. (F) Number of unique and duplicated reads per file. In panels (A–E), each line corresponds to a fastq file, with each of the 34 samples from Table 1 subdivided into 4 sequencing lanes (except SINET9Mp1, subdivided into 8 lanes) and additionally subdivided into 2 read pair files, for a total of 4 × 2 × 33 + 8 × 2 = 280 files; in panel (F), each horizontal bar corresponds to a file. In (A–E), green lines correspond to files that passed the most stringent quality control filters of software FastQC; orange lines correspond to files that passed a less stringent filter.
Figure 2:
Figure 2:
Quality control of the raw RNA-seq data. Panels (A), (C), (E), (G), (I), and (K) correspond to controls before read trimming for quality and adapter content by wrapper Trim Galore for software cutadapt; panels (B), (D), (F), (H), (J), and (L) correspond to controls after read trimming. Figure legends for panels (A–E) and (G–L) follow that of Fig. 1. (F) Distribution of the length of the reads trimmed by software cutadapt, for each file (colored lines). In panels (A–J), each line corresponds to a fastq file, with each of the 10 nonnormal samples from Table 1 divided into 2 or 4 sequencing lanes and further subdivided into 2 read pair files, for a total of 2 × 2 × 21 + 4 × 2 × 7 = 140 files; in panels (K) and (L), each horizontal bar corresponds to a file.
Figure 3:
Figure 3:
Quality control of the RNA-seq alignments. (A) Number of known junctions identified by software STAR in a subsample as a function of the percentage of reads in the subsample. (B) Number of novel junctions identified by STAR in a subsample as a function of the percentage of reads in the subsample. (C) Number of sequence tags with each alignment score. (D) Distribution of reads among annotated regions.
Figure 4:
Figure 4:
Network of matches between WGS and RNA-seq samples, computed with software NGSCheckmate. Numbers on the edges and edge thickness correspond to the Pearson correlation coefficient r between allelic fractions for the germline SNP panel; colors: experiments (see Table 1); squares: WGS, circles: RNA-seq, red contour: mismatches.
Figure 5:
Figure 5:
Validation of reported sex. (A) Percentage of reads aligned to chromosomes X and Y in the whole-genome sequencing data. (B) Total gene expression in X and Y chromosomes, in units of variance-stabilized read counts, computed from RNA-seq data. In all panels, samples from each sex are encircled (red: male, blue: female), excluding LCNEC3Np12, which we report as not matching the other samples from the LCNEC3 experiment.
Figure 6:
Figure 6:
RF classification of variants as somatic or germline from RNA-seq data. (A) Schematic of the RF training, test, and prediction. (B) ROC curve. (C) Feature importance for classification accuracy. Mean accuracy decrease: mean difference in accuracy between trees with the feature and trees without the feature; high values indicate important features. Mean minimum depth: tree depth (1: root, value >>1: leaves) of the first time the feature is used for classification, averaged across all trees; low values indicate features often used at the root and thus particularly important. (D) Representative tree of the RF. At each split, the split condition is written above; the left branch corresponds to a Yes and the right branch to a No. Final decision (SOMATIC or NON-SOMATIC) is represented by the leaves. (E–G) Confusion matrix for different levels of sensitivity and specificity. Reference: somatic status assessed from whole-genome sequencing data. Prediction: somatic status predicted from RNA-seq data using the RF algorithm.

References

    1. Clevers H. Modeling development and disease with organoids. Cell. 2016;165(7):1586–97. 10.1016/j.cell.2016.05.082. - DOI - PubMed
    1. Kim J, Koo BK, Knoblich JA. Human organoids: model systems for human biology and medicine. Nat Rev Mol Cell Biol. 2020;21(10):571–84. 10.1038/s41580-020-0259-3. - DOI - PMC - PubMed
    1. Drost J, Clevers H. Organoids in cancer research. Nat Rev Cancer. 2018;18(7):407. 10.1038/s41568-018-0007-6. - DOI - PubMed
    1. Tuveson D, Clevers H. Cancer modeling meets human organoid technology. Science. 2019;364(6444):952–5. 10.1126/science.aaw6985. - DOI - PubMed
    1. LeSavage BL, Suhar RA, Broguiere N, et al. Next-generation cancer organoids. Nat Mater. 2022;21(2):143–59. 10.1038/s41563-021-01057-5. - DOI - PubMed

Publication types

Substances