Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 19;20(12):e1012632.
doi: 10.1371/journal.pcbi.1012632. eCollection 2024 Dec.

Predicting lung aging using scRNA-Seq data

Affiliations

Predicting lung aging using scRNA-Seq data

Qi Song et al. PLoS Comput Biol. .

Abstract

Age prediction based on single cell RNA-Sequencing data (scRNA-Seq) can provide information for patients' susceptibility to various diseases and conditions. In addition, such analysis can be used to identify aging related genes and pathways. To enable age prediction based on scRNA-Seq data, we developed PolyEN, a new regression model which learns continuous representation for expression over time. These representations are then used by PolyEN to integrate genes to predict an age. Existing and new lung aging data we profiled demonstrated PolyEN's improved performance over existing methods for age prediction. Our results identified lung epithelial cells as the most significant predictors for non-smokers while lung endothelial cells led to the best chronological age prediction results for smokers.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. The flowchart of the analysis.
A. Polynomial features were extracted for each cell type and features were aggregated at donor level. Regression models were trained based on the extracted polynomial features, either at original gene space or at the PCA-transformed space. B. Training and testing strategy, we adopted LOO test and CD test to evaluate the model performance (see Materials and Methods). C. Cell type mapping and dataset integration. D. SHAP-based ranking and empirical p-value for genes selected in each cell type.
Fig 2
Fig 2. Cell type mapping, dataset integration, comparison of gene types and cell types.
A. Joint cell embeddings of query datasets (IPF, Carraro, Nuclear-Seq) and reference dataset (HLCA) after cell type transfer performed by scArches. Plots were generated by the 30 dimensions of latent representations transformed from the original gene space. B. Joint cell embeddings of query datasets (IPF, Carraro) and reference dataset (HLCA) after dataset integration performed by mnnpy. Plots were generated based on the intersection of HVGs between the reference and the query datasets. C. The mean R2 scores for the comparison of all tested methods. R2 scores shown in the plot are from the top 10 cell types with highest R2 scores for each method. P-value annotation legend: ns (not significant): p-value ≥ 0.05; *: 0.01 < p-value ≤ 0.05; **: 0.001 < p-value ≤ 0.01; ***: 0.0001 < p-value ≤ 0.001; ****: p-value ≤ 0.00001. D. The rankings of different types of gene markers for the top cell types. The rankings were computed for each cell type separately. For each cell type, we selected for gene type’s best PCA setting as determined by highest R2 score. We used this R2 score as the representative R2 score of that gene type for the given cell type. We then ranked the different gene types by these representative R2 scores. The resulted rankings (from 1 to 6) for the top 10 cell types are presented in the plots. These cell types are extracted from the top 10 cell types as shown in Figs 3 and. 4. E,F. Predicted donor ages VS true donor age for the top cell types. For each cell type, we selected its best gene type and PCA setting as determined by highest R2 score. The corresponding best gene type and PCA setting is labeled in each plot.
Fig 3
Fig 3. R2 scores from LOO test and CD test for non-smoker donors and comparison between transcriptome predicted ages and methylation predicted ages.
A R2 scores for non-smoker donors tested with LOO test and CD test. Each row represents the corresponding best type of gene marker from a cell type. The bar in each row represents mean and standard deviation of R2 score from five runs. Left: R2 scores from the LOO test in HLCA dataset; middle: R2 scores from the CD test which used a subset of HLCA for training and a subset of HLCA and other datasets for testing (Materials and Methods); right: LOO test in Nuclear-Seq dataset. The rows with no bars shown indicate R2 scores equal to or smaller than zero. See S2 File for more information on the corresponding senescence markers and number of donors used in each row. B Comparison between transcriptomic ages and methylation ages. Transcriptomic ages were predicted by polyEN and methylation ages were predicted as described in MATERIALS AND METHODS. “true” represents true chronological donor age and “pred” represents transcriptomic ages predicted by polyEN. “hor1” is methylation ages predicted by Horvath1 method; “hor2” is methylation ages predicted by Horvath2 method and “han” is methylation ages predicted by Hannum method. polyEN was applied to the cell types and marker gene lists shown in A for Nuclear-seq.
Fig 4
Fig 4. R2 scores from LOO test and CD test for smoker donors.
Each row represents the corresponding best type of gene marker from a cell type. The bar in each row represents mean and standard deviation of R2 score from five runs. Left: R2 scores from the LOO test in HLCA dataset; middle: R2 scores from the CD test which used a subset of HLCA for training and a subset of HLCA and other datasets for testing (Materials and Methods). The rows with no bars shown indicate R2 scores equal to or smaller than zero. See S2 File for more information on the corresponding senescence markers and number of donors used in each row.
Fig 5
Fig 5. Top significant GO terms from GSEA and polynomial features for basal-related cell types.
A Visualization of the polynomial features for RHOB and PMAIP1/Noxa in basal, basal resting and suprabasal cells of the nonsmoker group. Values visualized in the plots are polynomial features computed based on the log-normalized gene expressions. B The top five significant GO terms from GSEA for basal and basal resting cells of the nonsmoker group. C The common genes with significant SHAP scores among the three basal related cell types; top table: genes identified from all expressed genes; bottom table: genes identified from the union of senescence marker lists. D The distribution of predicted ages VS real ages for IPF disease donors. Models were trained using the non-smoker donors of IPF disease and tested using the smoker donors of IPF disease. x axis denotes the age and y axis denotes the density.
Fig 6
Fig 6. Polynomial features for genes with significant SHAP scores identified in basal, basal resting and suprabasal cells of nonsmokers.
A. Visualization of the polynomial features for all expressed genes. We selected only the genes assigned with significant empirical p-values for each cell type (See Materials and Methods). B. Visualization of the polynomial features for union of senescence markers. We selected only the genes assigned with significant empirical p-values (See Materials and Methods). Each row represents one gene and genes were sorted by row-wise sum.

References

    1. Ogrodnik M, Salmonowicz H, Gladyshev VN. Integrating cellular senescence with the concept of damage accumulation in aging: Relevance for clearance of senescent cells. Aging Cell, (2019) 10.1111/acel.12841. doi: 10.1111/acel.12841 - DOI - PMC - PubMed
    1. Vermulst M, Denney AS, Lang MJ, Hung CW, Moore S, Mosely AM, Thompson WJ, Madden V, Gauer J, Wolfe KJ, et al. Transcription errors induce proteotoxic stress and shorten cellular lifespan. Nat. Commun. (2015) 10.1038/ncomms9065. - PMC - PubMed
    1. Victorelli S, Passos JF Telomeres and Cell Senescence—Size Matters Not. EBioMedicine, (2017) 10.1016/j.ebiom.2017.03.027. doi: 10.1016/j.ebiom.2017.03.027 - DOI - PMC - PubMed
    1. Cheng LQ, Zhang ZQ, Chen HZ, Liu DP. Epigenetic regulation in cell senescence. J. Mol. Med. (2017) 10.1007/s00109-017-1581-x. doi: 10.1007/s00109-017-1581-x - DOI - PubMed
    1. Johnson AA, Akman K, Calimport SRG, Wuttke D, Stolzing A, De Magalhães JP. The role of DNA methylation in aging, rejuvenation, and age-related disease. Rejuvenation Res. (2012) 10.1089/rej.2012.1324. doi: 10.1089/rej.2012.1324 - DOI - PMC - PubMed