Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug 24;1(3):154-158.
doi: 10.1016/j.iliver.2022.07.003. eCollection 2022 Sep.

Multi-feature weight factor extraction and survival risk assessment of hepatocellular carcinoma based on a clinical missing dataset-independent support vector machine

Affiliations

Multi-feature weight factor extraction and survival risk assessment of hepatocellular carcinoma based on a clinical missing dataset-independent support vector machine

Fumin Wang et al. ILIVER. .

Abstract

Background: In clinical datasets, the characteristics of an individual patient vary so much that data loss becomes a normal event, which may be a unignorable dilemma in clinical data analysis. Therefore, the construction of a machine learning model aimed at missing clinical datasets (MCD) is of great clinical importance.

Methods: All included patients were divided into two groups according to outcome within a period of up to 36 months or less. The following characteristics (variables) were collected: age, sex, Child-Pugh status, hepatitis status, cirrhosis status, treatment, tumor size, portal vein tumor thrombus, and alpha fetoprotein (μg/mL), and a missing dataset-independent support vector machine (MDI-SVM) independent of missing data was built for the analysis.

Results: A MCD-independent SVM was developed based on clinical data from 1334 patients with hepatocellular carcinoma (HCC) at a single center, which had an accuracy of 84.43% in the survival analysis in the presence of 5% missing data. Based on the different combinations of features, our model calculated five features (tumor size, age, treatment, hepatitis status, and alpha fetoprotein) that had the greatest impact on survival in patients with HCC and extracted their weighting factors.

Conclusions: A MCD-independent SVM was developed to achieve prognosis prediction for patients with HCC in the absence of first-visit data.

Keywords: HCC; MDI-SVM; Machine learning.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Development and validation of the MDI-SVM, which is composed of two parts: the distribution tracking system and the pre-trained SVM. MDI-SVM, missing dataset-independent support vector machine.
Fig. 2
Fig. 2
Operation of the distribution tracking system. The missing data are replaced by selected values derived from the corresponding distribution function based on patients' basic information. The selected rules are: Rule 1, the distribution function shows the greatest possible value for each feature; and Rule 2, when comparing all patients, patients that have similar basic information will have similar advanced treatment details.
Fig. 3
Fig. 3
Prediction accuracy of the MDI-SVM, which obtained 84.43% accuracy (16.61% higher than the traditional method). MDI-SVM, missing dataset-independent support vector machine.
Fig. 4
Fig. 4
Prediction accuracy of the SVM without the distribution tracking system; the traditional SVM obtained only 72.23% accuracy. SVM, support vector machine.
Fig. 5
Fig. 5
Twelve features are arranged as 25 combinations in three categories: 1. all clinical features; 2. each single feature (shown in subfigure); 3. 11 features with one another are excluded. The data of these 25 feature combinations constructed 25 machine learning models using the MDI-SVM method. The ROC curves and corresponding AUC of these combinations are shown in Fig. 5B. For any single feature, a model without this feature and a model using only this feature were constructed. Compared with the model constructed by all features, the specific degree of influence of this feature on OS status can be calculated through the change in relevant parameters (R2, AUC, and rmse) in the three models. AUC, area under the curve; MDI-SVM, missing dataset-independent support vector machine; OS, overall survival.

Similar articles

References

    1. Villanueva A. Hepatocellular carcinoma. N Engl J Med. 2019;380(15):1450–1462. - PubMed
    1. Forner A., Reig M., Bruix J. Hepatocellular carcinoma. Lancet. 2018;391(10127):1301–1314. - PubMed
    1. Kulik L., El-Serag H.B. Epidemiology and management of hepatocellular carcinoma. Gastroenterology. 2019;156(2):477–491 e1. - PMC - PubMed
    1. Yang J.D., Hainaut P., Gores G.J., et al. A global view of hepatocellular carcinoma: trends, risk, prevention and management. Nat Rev Gastroenterol Hepatol. 2019;16(10):589–604. - PMC - PubMed
    1. Sherman M. Surveillance for hepatocellular carcinoma. Best Pract Res Clin Gastroenterol. 2014;28(5):783–793. - PubMed

LinkOut - more resources