Evaluation of noninvasive biospecimens for transcriptome studies

doi:10.1186/s12864-023-09875-4

. 2023 Dec 19;24(1):790.

doi: 10.1186/s12864-023-09875-4.

Evaluation of noninvasive biospecimens for transcriptome studies

Molly Martorella^{1

2}, Silva Kasela^{3

4}, Renee Garcia-Flores^{3

5

6}, Alper Gokden³, Stephane E Castel^#^{7

8}, Tuuli Lappalainen^#^{9

10

11}

Affiliations

¹ New York Genome Center, New York, NY, USA. mem2331@cumc.columbia.edu.
² Department of Systems Biology, Columbia University, New York, NY, USA. mem2331@cumc.columbia.edu.
³ New York Genome Center, New York, NY, USA.
⁴ Department of Systems Biology, Columbia University, New York, NY, USA.
⁵ Department of Computer Science, Columbia University, New York, NY, USA.
⁶ Undergraduate Program On Genomic Sciences, National Autonomous University of Mexico, Cuernavaca, Morelos, Mexico.
⁷ New York Genome Center, New York, NY, USA. stephanecastel@gmail.com.
⁸ Department of Systems Biology, Columbia University, New York, NY, USA. stephanecastel@gmail.com.
⁹ New York Genome Center, New York, NY, USA. tlappalainen@nygenome.org.
¹⁰ Department of Systems Biology, Columbia University, New York, NY, USA. tlappalainen@nygenome.org.
¹¹ Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden. tlappalainen@nygenome.org.

^# Contributed equally.

PMID: 38114913
PMCID: PMC10729488
DOI: 10.1186/s12864-023-09875-4

Evaluation of noninvasive biospecimens for transcriptome studies

Molly Martorella et al. BMC Genomics. 2023.

. 2023 Dec 19;24(1):790.

doi: 10.1186/s12864-023-09875-4.

Authors

Molly Martorella^{1

2}, Silva Kasela^{3

4}, Renee Garcia-Flores^{3

5

6}, Alper Gokden³, Stephane E Castel^#^{7

8}, Tuuli Lappalainen^#^{9

10

11}

Affiliations

¹ New York Genome Center, New York, NY, USA. mem2331@cumc.columbia.edu.
² Department of Systems Biology, Columbia University, New York, NY, USA. mem2331@cumc.columbia.edu.
³ New York Genome Center, New York, NY, USA.
⁴ Department of Systems Biology, Columbia University, New York, NY, USA.
⁵ Department of Computer Science, Columbia University, New York, NY, USA.
⁶ Undergraduate Program On Genomic Sciences, National Autonomous University of Mexico, Cuernavaca, Morelos, Mexico.
⁷ New York Genome Center, New York, NY, USA. stephanecastel@gmail.com.
⁸ Department of Systems Biology, Columbia University, New York, NY, USA. stephanecastel@gmail.com.
⁹ New York Genome Center, New York, NY, USA. tlappalainen@nygenome.org.
¹⁰ Department of Systems Biology, Columbia University, New York, NY, USA. tlappalainen@nygenome.org.
¹¹ Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden. tlappalainen@nygenome.org.

^# Contributed equally.

PMID: 38114913
PMCID: PMC10729488
DOI: 10.1186/s12864-023-09875-4

Abstract

Transcriptome studies disentangle functional mechanisms of gene expression regulation and may elucidate the underlying biology of disease processes. However, the types of tissues currently collected typically assay a single post-mortem timepoint or are limited to investigating cell types found in blood. Noninvasive tissues may improve disease-relevant discovery by enabling more complex longitudinal study designs, by capturing different and potentially more applicable cell types, and by increasing sample sizes due to reduced collection costs and possible higher enrollment from vulnerable populations. Here, we develop methods for sampling noninvasive biospecimens, investigate their performance across commercial and in-house library preparations, characterize their biology, and assess the feasibility of using noninvasive tissues in a multitude of transcriptomic applications. We collected buccal swabs, hair follicles, saliva, and urine cell pellets from 19 individuals over three to four timepoints, for a total of 300 unique biological samples, which we then prepared with replicates across three library preparations, for a final tally of 472 transcriptomes. Of the four tissues we studied, we found hair follicles and urine cell pellets to be most promising due to the consistency of sample quality, the cell types and expression profiles we observed, and their performance in disease-relevant applications. This is the first study to thoroughly delineate biological and technical features of noninvasive samples and demonstrate their use in a wide array of transcriptomic and clinical analyses. We anticipate future use of these biospecimens will facilitate discovery and development of clinical applications.

Keywords: EQTLs; Hair follicles; Methods; Noninvasive RNA-sequencing; Transcriptomics; Urine cell pellets.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1**
Noninvasive sample study design and processing outcomes. a Four collections (C1-C4) of four noninvasive tissues were collected from 19 donors over the course of 2–4 weeks per donor. All samples were processed using our in-house method, Loseq, while a subset was prepared using commercially available kits. Two biological replicates of HEK293 cell controls were included in triplicate for all library preparations. b Proportion of samples passing per tissue type and preparation. Failed Prep QC = exceeded 600 bp average size or less than 2 nM yield. Failed Seq QC = protein-coding and lncRNA depth less than 1 million. c RNA-seq quality metrics for all sequenced samples for each tissue and library preparation

**Fig. 2**
Classification of unmapped reads in noninvasive samples. a Proportion of reads per sample. Mapped = aligned to hg38. Remapped = aligned to microbial species using Decontaminer. Repeated = highly abundant reads identified by FastQC. Unknown = reads not mapped, remapped, or highly abundant. b Normalized proportion of reads remapping per species for each tissue. The top 0.5% most abundant microbes are shown. Highlighted species have a median abundance > 0.05 for that tissue

**Fig. 3**
Technical and biological sources of variance in noninvasive samples. a Factors contributing to variance in gene expression across tissues as determined by mixed linear modeling. b Principal component analysis using DESeq2 normalized counts and the top 1000 most variable genes. c GEDIT cell type proportion estimates across collections per donor. Only donors with samples passing QC for all collections are displayed here. Niche cell types were collapsed into larger categories (Supp. Fig. 7b), and the top 25% most abundant cell type categories across tissues are shown

**Fig. 4**
Comparison of noninvasive samples to the GTEx dataset. a Noninvasive sample types projected onto the GTEx expression PCA space. Counts were normalized using DESeq2, centered and scaled, and the top 1000 most variable genes were used. Ellipses represent 95% confidence intervals. b Noninvasive sample types projected onto the top 1000 most variable rMATS splicing events in GTEx. c xCell cell type enrichment estimates per tissue. Tissues are clustered using k-means clustering. d, e, f GTEx eQTL replication estimates for hair, urine, and buccal samples. Dots show π₁ calculated by selecting significant GTEx gene-variant pairs from the noninvasive data with sizing indicating permutation p-value significance. Violin plots show null π₁ distributions generated from allele-frequency matched, randomly selected gene-variant pairs. 1000 permutations were performed

**Fig. 5**
Sex-based expression differences in noninvasive samples. a Volcano plot of sex-based differentially expressed genes in hair. Genes highlighted in red are replicated sun-exposed skin findings in GTEx. Dotted line indicates 0.05 significance threshold. b Hair FGSEA of all genes ranked by z-score and using the Hallmark Gene set from MSigDB. c Sex-based differentially expressed genes in urine cell pellets. Genes highlighted in red are replicated kidney cortex findings in GTEx. d Urine FGSEA of all genes ranked by z-score and using the Hallmark Gene set from MSigDB

**Fig. 6**
Use of noninvasive samples in disease-relevant analyses. a ASE for annotated stop-gain variants vs synonymous. Only sites with > 16 total counts were included. b Genes with median expression > 0.1 TPM in a tissue were intersected with the OMIM gene set. Shown is the intersection of OMIM genes captured across tissues. c Capture of common disease signals in the OpenTargets database. SES = Σ(evidence scores of disease genes expressed in a tissue)/ Σ(evidence scores of disease genes expressed in any included tissue)

See this image and copyright information in PMC

Cited by

A map of blood regulatory variation in South Africans enables GWAS interpretation.
Castel SE, Tluway FD, Emde AK, Smyth N, Karim M, Sengupta D, Gray OA, Hendershott M, LeBaron von Baeyer S, Burke EE, Kaewert S, Nguyen KH, Choma SSR, Mashaba RG, Micklesfield LK, Kabudula C, Kahn K, Xavier Gomez-Olive F, Tollman S, Choudhury A, Mpangase PT, Hazelhurst S, Wasik KA, Yerges-Armstrong L, Ramsay M. Castel SE, et al. Nat Genet. 2025 Jul;57(7):1628-1637. doi: 10.1038/s41588-025-02223-0. Epub 2025 Jun 11. Nat Genet. 2025. PMID: 40500424 Free PMC article.
Increasing diversity of functional genetics studies to advance biological discovery and human health.
George SHL, Medina-Rivera A, Idaghdour Y, Lappalainen T, Gallego Romero I. George SHL, et al. Am J Hum Genet. 2023 Dec 7;110(12):1996-2002. doi: 10.1016/j.ajhg.2023.10.012. Epub 2023 Nov 22. Am J Hum Genet. 2023. PMID: 37995684 Free PMC article.
Genome-wide DNA methylation and transcriptome sequencing analyses of lens tissue in an age-related mouse cataract model.
Hu Y, Su D, Zhang Y, Fu Y, Li S, Chen X, Zhang X, Zheng S, Ma X, Hu S. Hu Y, et al. PLoS One. 2025 Jan 30;20(1):e0316766. doi: 10.1371/journal.pone.0316766. eCollection 2025. PLoS One. 2025. PMID: 39883715 Free PMC article.
Promoter Deletion Leading to Allele Specific Expression in a Genetically Unsolved Case of Primary Ciliary Dyskinesia.
Beaman MM, Yin W, Smith AJ, Sears PR, Leigh MW, Ferkol TW, Kearney B, Olivier KN, Kimple AJ, Clarke S, Huggins E, Nading E, Jung SH, Iyengar AK, Zou X, Dang H, Barrera A, Majoros WH, Rehder CW, Reddy TE, Ostrowski LE, Allen AS, Knowles MR, Zariwala MA, Crawford GE. Beaman MM, et al. Am J Med Genet A. 2025 Feb;197(2):e63880. doi: 10.1002/ajmg.a.63880. Epub 2024 Oct 4. Am J Med Genet A. 2025. PMID: 39364610
Saliva as a potential diagnostic medium: DNA methylation biomarkers for disorders beyond the oral cavity.
Hernangomez-Laderas A, Cilleros-Portet A, Marí S, González-García BP, Arregi A, Jimeno-Romero A, Irizar A, García-Santisteban I, Lesseur C, Fernandez-Jimenez N, Bilbao JR. Hernangomez-Laderas A, et al. NPJ Genom Med. 2025 Jun 20;10(1):49. doi: 10.1038/s41525-025-00509-0. NPJ Genom Med. 2025. PMID: 40541940 Free PMC article.

References

1. Zeggini E, Gloyn AL, Barton AC, Wain LV. Translational genomics and precision medicine: Moving from the lab to the clinic. Science. 2019;365:1409–1413. doi: 10.1126/science.aax4588. - DOI - PubMed
1. Supplitt S, Karpinski P, Sasiadek M, Laczmanska I. Current achievements and applications of transcriptomics in personalized cancer medicine. Int J Mol Sci. 2021;22:1422. doi: 10.3390/ijms22031422. - DOI - PMC - PubMed
1. Wang M, Herbst RS, Boshoff C. Toward personalized treatment approaches for non-small-cell lung cancer. Nat Med. 2021;27:1345–1356. doi: 10.1038/s41591-021-01450-2. - DOI - PubMed
1. Docking TR, et al. A clinical transcriptome approach to patient stratification and therapy selection in acute myeloid leukemia. Nat Commun. 2021;12:2474. doi: 10.1038/s41467-021-22625-y. - DOI - PMC - PubMed
1. Cummings BB, et al. Improving genetic diagnosis in Mendelian disease with transcriptome sequencing. Sci Transl Med. 2017;9:09. doi: 10.1126/scitranslmed.aal5209. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

[1] Zeggini E, Gloyn AL, Barton AC, Wain LV. Translational genomics and precision medicine: Moving from the lab to the clinic. Science. 2019;365:1409–1413. doi: 10.1126/science.aax4588. - DOI - PubMed

[2] Zeggini E, Gloyn AL, Barton AC, Wain LV. Translational genomics and precision medicine: Moving from the lab to the clinic. Science. 2019;365:1409–1413. doi: 10.1126/science.aax4588. - DOI - PubMed

[3] Supplitt S, Karpinski P, Sasiadek M, Laczmanska I. Current achievements and applications of transcriptomics in personalized cancer medicine. Int J Mol Sci. 2021;22:1422. doi: 10.3390/ijms22031422. - DOI - PMC - PubMed

[4] Supplitt S, Karpinski P, Sasiadek M, Laczmanska I. Current achievements and applications of transcriptomics in personalized cancer medicine. Int J Mol Sci. 2021;22:1422. doi: 10.3390/ijms22031422. - DOI - PMC - PubMed

[5] Wang M, Herbst RS, Boshoff C. Toward personalized treatment approaches for non-small-cell lung cancer. Nat Med. 2021;27:1345–1356. doi: 10.1038/s41591-021-01450-2. - DOI - PubMed

[6] Wang M, Herbst RS, Boshoff C. Toward personalized treatment approaches for non-small-cell lung cancer. Nat Med. 2021;27:1345–1356. doi: 10.1038/s41591-021-01450-2. - DOI - PubMed

[7] Docking TR, et al. A clinical transcriptome approach to patient stratification and therapy selection in acute myeloid leukemia. Nat Commun. 2021;12:2474. doi: 10.1038/s41467-021-22625-y. - DOI - PMC - PubMed

[8] Docking TR, et al. A clinical transcriptome approach to patient stratification and therapy selection in acute myeloid leukemia. Nat Commun. 2021;12:2474. doi: 10.1038/s41467-021-22625-y. - DOI - PMC - PubMed

[9] Cummings BB, et al. Improving genetic diagnosis in Mendelian disease with transcriptome sequencing. Sci Transl Med. 2017;9:09. doi: 10.1126/scitranslmed.aal5209. - DOI - PMC - PubMed

[10] Cummings BB, et al. Improving genetic diagnosis in Mendelian disease with transcriptome sequencing. Sci Transl Med. 2017;9:09. doi: 10.1126/scitranslmed.aal5209. - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Evaluation of noninvasive biospecimens for transcriptome studies

Affiliations

Evaluation of noninvasive biospecimens for transcriptome studies

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources