Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Oct 9;16(1):100.
doi: 10.1186/s13020-021-00511-5.

Machine learning methods to predict the cultivation age of Panacis Quinquefolii Radix

Affiliations

Machine learning methods to predict the cultivation age of Panacis Quinquefolii Radix

Xiaowen Hu et al. Chin Med. .

Abstract

Background: American ginseng (AG) is a valuable medicine widely consumed as a herbal remedy throughout the world. Huge price difference among AG with different growth years leads to intentional adulteration for higher profits. Thus, developing reliable approaches to authenticate the cultivation ages of AG products is of great use in preventing age falsification.

Methods: A total of 106 batches of AG samples along with their 9 physicochemical features were collected and measured from experiments, which was then split into a training set and two test sets (test set 1 and 2) according to the cultivation regions. Principle component analysis (PCA) was carried out to examine the distribution of the three data sets. Four machine learning (ML) algorithms, namely elastic net, k-nearest neighbors, support vector machine and multi-layer perception (MLP) were employed to construct predictive models using the features as inputs and their growth years as outputs. In addition, a similarity-based applicability domain (AD) was defined for these models to ensure the reliability of the predictive results for AG samples produced in different regions.

Results: A positive correlation was observed between the several features and the growth years. PCA revealed diverse distributions among different cultivation regions. The most accurate model derived from MLP shows good prediction power for the fivefold cross validation and the test set 1 with mean square error (MSE) of 0.017 and 0.016 respectively, but a higher MSE value of 1.260 for the test set 2. After applying the AD, all models showed much lower prediction errors for the test samples within AD (IDs) than those outside the AD (ODs). MLP remains the best predictive model with an MSE value of 0.030 for the IDs.

Conclusion: Cultivation years have a close relationship with bioactive components of AG. The constructed models and AD are also able to predict the cultivation years and discriminate samples that have inaccurate prediction results. The AD-equipped models used in this study provide useful tools for determining the age of AG in the market and are freely available at https://github.com/dreadlesss/Panax_age_predictor .

Keywords: Applicability domain; Cultivation age; Machine learning; Panax quinquefolium L..

PubMed Disclaimer

Conflict of interest statement

These authors declare that there are no conflicts of interest.

Figures

Fig. 1
Fig. 1
The correlation between cultivation age and 9 physicochemical features for the samples in the training set. Data are expressed as mean ± SD. *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001, by two tailed Student’s t test
Fig. 2
Fig. 2
Spatial distributions of the training set (red dots), test set 1 (green dots) and test set 2 (blue dots) after applying PCA. A Scatter plot in 3D space; B projection on three 2D planes. Feature space of test set 1 partially overlaps with that of the training set, whereas most samples in test set 2 fall outside the feature space of the training set
Fig. 3
Fig. 3
Scatter plot of the true ages and predicted ages of four ML algorithms. The regression line is colored in gray. The red dots represent the predicted results of fivefold cross validation of the training set. The green and blue dots represent the predicted values of the test set 1 and test set 2, respectively
Fig. 4
Fig. 4
3D surface plots of A the number of samples in the test sets that fall within the AD (IDs) as a function of k and Z; and BE the negative value of MSE predicted by the four ML models for the test samples within the AD as a function of k and Z. k = 6 and Z = 1.6 (black dot in each figure) are chosen to define the AD for all models

References

    1. Shao ZH, Xie JT, Vanden Hoek TL, Mehendale S, Aung H, Li CQ, Qin Y, Schumacker PT, Becker LB, Yuan CS. Antioxidant effects of American ginseng berry extract in cardiomyocytes exposed to acute oxidant stress. Biochem Biophys Acta. 2004;1670:165–171. doi: 10.1016/j.bbagen.2003.12.001. - DOI - PubMed
    1. Lian XY, Zhang Z, Stringer JL. Protective effects of ginseng components in a rodent model of neurodegeneration. Ann Neurol. 2005;57:642–648. doi: 10.1002/ana.20450. - DOI - PubMed
    1. Duda RB, Zhong Y, Navas V, Li MZ, Toy BR, Alavarez JG. American ginseng and breast cancer therapeutic agents synergistically inhibit MCF-7 breast cancer cell growth. J Surg Oncol. 1999;72:230–239. doi: 10.1002/(SICI)1096-9098(199912)72:4<230::AID-JSO9>3.0.CO;2-2. - DOI - PubMed
    1. Scholey A, Ossoukhova A, Owen L, Ibarra A, Pipingas A, He K, Roller M, Stough C. Effects of American ginseng (Panax quinquefolius) on neurocognitive function: an acute, randomised, double-blind, placebo-controlled, crossover study. Psychopharmacology. 2010;212:345–356. doi: 10.1007/s00213-010-1964-y. - DOI - PMC - PubMed
    1. Zhang F, Tang S, Zhao L, Yang X, Yao Y, Hou Z, Xue P. Stem-leaves of Panax as a rich and sustainable source of less-polar ginsenosides: comparison of ginsenosides from Panax ginseng, American ginseng and Panax notoginseng prepared by heating and acid treatment. J Ginseng Res. 2021;45:163–175. doi: 10.1016/j.jgr.2020.01.003. - DOI - PMC - PubMed

LinkOut - more resources