Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 18;12(1):733.
doi: 10.1186/s40359-024-02268-6.

Evaluating the performance of personality-based profiling in predicting physical activity

Affiliations

Evaluating the performance of personality-based profiling in predicting physical activity

Kentaro Katahira et al. BMC Psychol. .

Abstract

Background: Profiling or clustering individuals based on personality and other characteristics is a common statistical approach used in marketing, medicine, and social sciences. This approach improves data simplicity, supports the implementation of a data-driven decision-making process, and guides intervention strategies, such as personalized care. However, the clustering process involves loss of information owing to the discretization of continuous variables. Although any loss of information may be practically or pragmatically acceptable, the amount of information lost and its influence on predicting external outcomes have not yet been systematically investigated.

Methods: We assessed the accuracy of predicting physical activity using the clustering approach and compared it with the dimensional approach, where variables are used as continuous regressors. This analysis is based on survey data from a sample of 20,573 individuals regarding physical activity and psychological traits, including the Big-Five personality traits.

Results: A four-cluster solution, supported by the standard criterion for determining the number of clusters, achieved no more than 60-70% prediction accuracy of the dimensional approach employing the raw dimensional scale as explanatory variables.

Conclusion: The cluster solution suggested by conventional statistical criteria may not be optimal when clusters are used to predict external outcomes.

Keywords: Clustering; Personality; Physical activity; Prediction; Profiling.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: The study protocol was approved by the Ethics Committee of the National Institute of Advanced Industrial Science and Technology (Approval ID: 2022 − 1279). All the participants provided informed consent. The study was conducted using an online text-based survey that required participants to have basic literacy skills. Therefore, illiterate participants were not included in the study. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Evaluation of number of clusters based on a statistical metric. (A) Evaluation of K-means by Gap statistic. A higher value indicates better clustering. Error bars indicate standard errors. (B) Evaluation of number of clusters by BIC. A lower value indicates better clustering
Fig. 2
Fig. 2
Cluster centers for the four cluster solutions. (A) Cluster solution with K-means using only Big-Five (Variable Set 1). (B) Cluster solution with K-means with full variable sets (Variable Set 4). Error bars indicate standard error of the mean. (C) Cluster solution with GMM using only Big-Five (Variable Set 1). (D) Cluster solution with GMM with full variable sets (Variable Set 4). The solid lines represent clustering of results using the entire data, while symbols indicate the results of clustering using training data and test data separately
Fig. 3
Fig. 3
Distribution of clusters on the principal component scores obtained by PCA for Variable Set 1 (Big-Five only). (A) K-means, (B) GMM, (C) Principal component loadings, and (D) Distribution of PC scores. PC = Principal Component
Fig. 4
Fig. 4
Evaluation of predictive accuracy for a different number of clusters. R2 for total PA of test data when using Variable Set 1 (A) and Variable Set 4 (B) are plotted. Error bars indicate 95% confidence interval obtained by the bootstrap samples (both test data and training data). The solid horizontal line indicates the R2 values of the dimensional approach model presented in Table 1 (dashed lines represent 95% CI). The colored lines represent the R2 values of the clustering-approach model (blue for K-means and orange for GMM). Error bars indicate 95% CI

Similar articles

References

    1. Goetz LH, Schork NJ. Personalized medicine: motivation, challenges, and progress. Fertil Steril. 2018;109:952–63. - PMC - PubMed
    1. El-Alti L, Sandman L, Munthe C. Person centered Care and Personalized Medicine: irreconcilable opposites or potential companions? Health Care Anal. 2019;27:45–59. - PubMed
    1. Mauch CE, Edney SM, Viana JNM, Gondalia S, Sellak H, Boud SJ, et al. Precision health in behaviour change interventions: a scoping review. Prev Med. 2022;163:107192. - PubMed
    1. Ozer DJ, Benet-Martínez V. Personality and the prediction of consequential outcomes. Annu Rev Psychol. 2006;57:401–21. - PubMed
    1. Soto CJ. How replicable are links between personality traits and consequential life outcomes? The life outcomes of personality replication project. Psychol Sci. 2019;30:711–27. - PubMed

Grants and funding