Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jan 10;12(1):7.
doi: 10.1186/s13073-019-0705-z.

An unsupervised learning approach to identify novel signatures of health and disease from multimodal data

Affiliations

An unsupervised learning approach to identify novel signatures of health and disease from multimodal data

Ilan Shomorony et al. Genome Med. .

Abstract

Background: Modern medicine is rapidly moving towards a data-driven paradigm based on comprehensive multimodal health assessments. Integrated analysis of data from different modalities has the potential of uncovering novel biomarkers and disease signatures.

Methods: We collected 1385 data features from diverse modalities, including metabolome, microbiome, genetics, and advanced imaging, from 1253 individuals and from a longitudinal validation cohort of 1083 individuals. We utilized a combination of unsupervised machine learning methods to identify multimodal biomarker signatures of health and disease risk.

Results: Our method identified a set of cardiometabolic biomarkers that goes beyond standard clinical biomarkers. Stratification of individuals based on the signatures of these biomarkers identified distinct subsets of individuals with similar health statuses. Subset membership was a better predictor for diabetes than established clinical biomarkers such as glucose, insulin resistance, and body mass index. The novel biomarkers in the diabetes signature included 1-stearoyl-2-dihomo-linolenoyl-GPC and 1-(1-enyl-palmitoyl)-2-oleoyl-GPC. Another metabolite, cinnamoylglycine, was identified as a potential biomarker for both gut microbiome health and lean mass percentage. We identified potential early signatures for hypertension and a poor metabolic health outcome. Additionally, we found novel associations between a uremic toxin, p-cresol sulfate, and the abundance of the microbiome genera Intestinimonas and an unclassified genus in the Erysipelotrichaceae family.

Conclusions: Our methodology and results demonstrate the potential of multimodal data integration, from the identification of novel biomarker signatures to a data-driven stratification of individuals into disease subtypes and stages-an essential step towards personalized, preventative health risk assessment.

Keywords: Cardiometabolic syndrome; Metabolomics; Multimodal; Network analysis; Preventative medicine; Unsupervised machine learning.

PubMed Disclaimer

Conflict of interest statement

IS, ETC, LH, LAN, RRH, MH, IVC, HCY, CLS, NMS, WL, KEN, PB, AMK, CTC, JCV, DSK, EFK, and NS are past or current employees or contractors of Human Longevity, Inc. The remaining authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
a In the study, we collected multimodal data (n = 1385 features) from 1253 individuals. b We analyzed the data by performing cross-modality associations between features after correcting for age, sex, and ancestry. c Using the associations, we performed community detection analysis and found modules of densely connected features. d To reduce the number of indirect associations and identify key biomarker features, we performed conditional independence network analysis (also referred to as a Markov network). e Using the identified key biomarkers, we clustered individuals into distinct groups with similar signatures that are consistent with different health statuses. We characterize the clusters and perform disease risk enrichment analysis
Fig. 2
Fig. 2
The number of significant cross-modality correlations for each pair of modalities is shown (a). The percentages shown are the proportion of correlations that were significant out of all possible pairwise associations between the modality pair. b Associations between p-cresol sulfate metabolite and (top) abundance of Intestinimonas genus, and (bottom) an abundance of unclassified genus in Erysipelotrichaceae family
Fig. 3
Fig. 3
The cardiometabolic module. a We built a Markov network to identify the key biomarker features that represent the cardiometabolic module. This network highlights the most important associations after removing edges corresponding to indirect associations. We observed that the microbiome genera Butyrivibrio and Pseudoflavonifractor are the most relevant microbiome genera in the context of this module that interface with features from other modalities. b We clustered individuals using the key biomarkers. The heatmap shows z-statistics from logistic regression for an association between each cluster and each feature. The plot on the left shows the 22 key cardiometabolic biomarkers. The plot on the right shows associations that emerged from an analysis against the full set of 1385 features with p < 1 × 10−10 as well as 3-hydroxybutyrate (BHBA) and Apolipoprotein B because of their particular enrichment in clusters 3 and 6, respectively. Some correlated features have been collapsed, with the mean z-statistics displayed; the full set of features can be found in Additional file 1: Figure S1. All of these significant associations showed consistent directions of effect in the TwinsUK cohort (Additional file 2: Table S3); however, the microbiome features and 5 of the glycerophosphocholines were not measured in the TwinsUK cohort and thus could not be assessed for replication. Met, metabolome
Fig. 4
Fig. 4
Disease enrichment and longitudinal outcomes of cardiometabolic clusters. a Bar plots showing the prevalence of disease at baseline (combined discovery and TwinsUK baseline cohorts; Additional file 1: Figure S2 shows them individually) and the incidence of disease (i.e., only the new cases of disease) after a median of 5.6 years of follow-up (TwinsUK cohort). For Fisher’s exact test comparison of the rate in each cluster vs. the other clusters, *p < 0.05, **p < 0.005. b The rates at which individuals from each cluster transition into other clusters after a median of 5.6 years of follow-up. The plot shows individuals per cluster (1 to 7) at baseline visit that transition to other clusters during the follow-up. TIA, transient ischemic attack
Fig. 5
Fig. 5
The microbiome richness module. a We built a Markov network to identify the key biomarker features that represent the microbiome richness module. Most of the associations between the microbiome and the metabolome were mediated by species richness. b We clustered individuals using the key biomarkers. The heatmap shows z-statistics from logistic regression for an association between each cluster and each feature. The plot on the left shows the 24 key biomarkers representing the module. Met, metabolome

References

    1. Hartman M, Martin AB, Espinosa N, Catlin A, The National Health Expenditure Acc National health care spending in 2016: spending and enrollment growth slow after initial coverage expansions. Health Aff. 2018;37:150–160. doi: 10.1377/hlthaff.2017.1299. - DOI - PubMed
    1. Mokdad AH, Ballestros K, Echko M, Glenn S, Olsen HE, Mullany E, et al. The state of US health, 1990-2016. JAMA. 2018;319:1444. doi: 10.1001/jama.2018.0158. - DOI - PMC - PubMed
    1. Benziger CP, Roth GA, Moran AE. The global burden of disease study and the preventable burden of NCD. Glob Heart. 2016;11:393–397. doi: 10.1016/j.gheart.2016.10.024. - DOI - PubMed
    1. Perkins BA, Caskey CT, Brar P, Dec E, Karow DS, Kahn AM, et al. Precision medicine screening using whole-genome sequencing and advanced imaging to identify disease risk in adults. Proc Natl Acad Sci U S A. 2018;115:3686–3691. doi: 10.1073/pnas.1706096114. - DOI - PMC - PubMed
    1. Murray CJL, Frenk J. Ranking 37th — measuring the performance of the U.S. health care system. N Engl J Med. 2010;362:98–99. doi: 10.1056/NEJMp0910064. - DOI - PubMed

Publication types