Machine learning for morbid glomerular hypertrophy

Yusuke Ushio¹, Hiroshi Kataoka^{2

3}, Kazuhiro Iwadoh^{1

4}, Mamiko Ohara⁵, Tomo Suzuki⁵, Maiko Hirata⁶, Shun Manabe¹, Keiko Kawachi¹, Taro Akihisa¹, Shiho Makabe¹, Masayo Sato¹, Naomi Iwasa^{1

7}, Rie Yoshida^{1

7}, Junichi Hoshino¹, Toshio Mochizuki^{1

7}, Ken Tsuchiya⁴, Kosaku Nitta¹

Affiliations

¹ Department of Nephrology, Tokyo Women's Medical University, 8-1 Kawada-Cho, Shinjuku-Ku, Tokyo, 162-8666, Japan.
² Department of Nephrology, Tokyo Women's Medical University, 8-1 Kawada-Cho, Shinjuku-Ku, Tokyo, 162-8666, Japan. kataoka@twmu.ac.jp.
³ Clinical Research Division for Polycystic Kidney Disease, Department of Nephrology, Tokyo Women's Medical University, Tokyo, 162-8666, Japan. kataoka@twmu.ac.jp.
⁴ Department of Blood Purification, Tokyo Women's Medical University, Tokyo, 162-8666, Japan.
⁵ Department of Nephrology, Kameda Medical Center, Chiba, 296-8602, Japan.
⁶ Japanese Red Cross Saitama Hospital, Saitama, 330-8553, Japan.
⁷ Clinical Research Division for Polycystic Kidney Disease, Department of Nephrology, Tokyo Women's Medical University, Tokyo, 162-8666, Japan.

PMID: 36351996
PMCID: PMC9646707
DOI: 10.1038/s41598-022-23882-7

Machine learning for morbid glomerular hypertrophy

Yusuke Ushio et al. Sci Rep. 2022.

. 2022 Nov 9;12(1):19155.

doi: 10.1038/s41598-022-23882-7.

Authors

Affiliations

¹ Department of Nephrology, Tokyo Women's Medical University, 8-1 Kawada-Cho, Shinjuku-Ku, Tokyo, 162-8666, Japan.
² Department of Nephrology, Tokyo Women's Medical University, 8-1 Kawada-Cho, Shinjuku-Ku, Tokyo, 162-8666, Japan. kataoka@twmu.ac.jp.
³ Clinical Research Division for Polycystic Kidney Disease, Department of Nephrology, Tokyo Women's Medical University, Tokyo, 162-8666, Japan. kataoka@twmu.ac.jp.
⁴ Department of Blood Purification, Tokyo Women's Medical University, Tokyo, 162-8666, Japan.
⁵ Department of Nephrology, Kameda Medical Center, Chiba, 296-8602, Japan.
⁶ Japanese Red Cross Saitama Hospital, Saitama, 330-8553, Japan.
⁷ Clinical Research Division for Polycystic Kidney Disease, Department of Nephrology, Tokyo Women's Medical University, Tokyo, 162-8666, Japan.

PMID: 36351996
PMCID: PMC9646707
DOI: 10.1038/s41598-022-23882-7

Abstract

A practical research method integrating data-driven machine learning with conventional model-driven statistics is sought after in medicine. Although glomerular hypertrophy (or a large renal corpuscle) on renal biopsy has pathophysiological implications, it is often misdiagnosed as adaptive/compensatory hypertrophy. Using a generative machine learning method, we aimed to explore the factors associated with a maximal glomerular diameter of ≥ 242.3 μm. Using the frequency-of-usage variable ranking in generative models, we defined the machine learning scores with symbolic regression via genetic programming (SR via GP). We compared important variables selected by SR with those selected by a point-biserial correlation coefficient using multivariable logistic and linear regressions to validate discriminatory ability, goodness-of-fit, and collinearity. Body mass index, complement component C3, serum total protein, arteriolosclerosis, C-reactive protein, and the Oxford E1 score were ranked among the top 10 variables with high machine learning scores using SR via GP, while the estimated glomerular filtration rate was ranked 46 among the 60 variables. In multivariable analyses, the R² value was higher (0.61 vs. 0.45), and the corrected Akaike Information Criterion value was lower (402.7 vs. 417.2) with variables selected with SR than those selected with point-biserial r. There were two variables with variance inflation factors higher than 5 in those using point-biserial r and none in SR. Data-driven machine learning models may be useful in identifying significant and insignificant correlated factors. Our method may be generalized to other medical research due to the procedural simplicity of using top-ranked variables selected by machine learning.

PubMed Disclaimer

Conflict of interest statement

Toshio Mochizuki received honoraria for lectures from Otsuka Pharmaceutical Co. Toshio Mochizuki and Hiroshi Kataoka belong to an endowed department sponsored by Otsuka Pharmaceutical Co., Chugai Pharmaceutical Co., Kyowa Hakko Kirin Co., and JMS Co. All other authors have no conflicts of interest to declare.

Figures

**Figure 1**
Histogram of MaxGD. The distribution of MaxGD is illustrated as light blue histograms. Abbreviation: MaxGD, maximal glomerular diameter.

**Figure 2**
Permutation test results with the original dataset (permutation test scores for the classifier of MaxGD ≥ 242.3 μm). The distribution of accuracy score for the permuted data is illustrated as blue histograms. It represents the result of 5000 permutation tests for assessing classifier performance when selecting the 60 most discriminative variables. The red dotted line indicates the accuracy score value (0.84) obtained by the classifier in the original dataset (permutation P-value, 0.001). Abbreviation: GD, glomerular diameter.

**Figure 3**
Distribution of functions generated with symbolic regression via genetic programming. Generated functions are plotted on the function space, where the horizontal axis represents the complexity of a function and the vertical axis represents 1 − R² or error. In total, 19,437 predictive functions are generated with symbolic regression via genetic programming. Each dot represents one function, and the red dots represent functions on the Pareto front that are candidates for optimized functions with ensemble learning.

**Figure 4**
Frequencies (GP): Frequently utilized predictive variables in selected models using SR via GP. The 15 most frequently utilized predictive variables in 1819 predictive functions, which are selected among 19,437 models generated in the leave-one-out cross-validation using symbolic regression via genetic programming, are listed in descending order. The horizontal axis represents the appearance frequencies [Frequencies (GP)]: the percentage at which each predictive variable is utilized in all 1,819 predictive functions. Abbreviations: GP, genetic programming; SR, symbolic regression; C3, component 3; U-Prot, urinary protein excretion; Oxford E1, the presence of endocapillary hypercellularity; MaxGD, maximal glomerular diameter.

**Figure 5**
ML scores using eight machine learning models. ML scores of 60 variables. Abbreviations: ML, machine learning; MIC, maximal information coefficient; RF ImpurityReduction, impurity reduction with random forest; XGB, eXtreme Gradient Boosting; SR via GP, symbolic regression via genetic programming; MaxGD, maximal glomerular diameter; eGFR, estimated glomerular filtration rate; WBC, white blood cell; U-Prot, Urinary protein excretion; SBP, systolic blood pressure; MBP, mean blood pressure; DBP, diastolic blood pressure; Complement C4, complement component 4; Complement C3, complement component 3.

See this image and copyright information in PMC

References

1. Beam AL, Kohane IS. Big data and machine learning in health care. JAMA. 2018;319:1317–1318. doi: 10.1001/jama.2017.18391. - DOI - PubMed
1. Bzdok D, Altman N, Krzywinski M. Statistics versus machine learning. Nat. Methods. 2018;15:233–234. doi: 10.1038/nmeth.4642. - DOI - PMC - PubMed
1. Rajula, H. S. R., Verlato, G., Manchia, M., Antonucci, N. & Fanos, V. Comparison of conventional statistical methods with machine learning in medicine: Diagnosis, drug development, and treatment. Medicina (Kaunas). 56 (2020). - PMC - PubMed
1. Bzdok D. Classical statistics and statistical learning in imaging neuroscience. Front. Neurosci. 2017;11:543. doi: 10.3389/fnins.2017.00543. - DOI - PMC - PubMed
1. Deo RC. Machine learning in medicine. Circulation. 2015;132:1920–1930. doi: 10.1161/CIRCULATIONAHA.115.001593. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Machine learning for morbid glomerular hypertrophy

Affiliations

Machine learning for morbid glomerular hypertrophy

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources

Research Materials

Miscellaneous