Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Feb 11;15(2):e0228645.
doi: 10.1371/journal.pone.0228645. eCollection 2020.

Machine learning models for net photosynthetic rate prediction using poplar leaf phenotype data

Affiliations

Machine learning models for net photosynthetic rate prediction using poplar leaf phenotype data

Xiao-Yu Zhang et al. PLoS One. .

Abstract

Background: As an essential component in reducing anthropogenic CO2 emissions to the atmosphere, tree planting is the key to keeping carbon dioxide emissions under control. In 1992, the United Nations agreed to take action at the Earth Summit to stabilize and reduce net zero global anthropogenic CO2 emissions. Tree planting was identified as an effective method to offset CO2 emissions. A high net photosynthetic rate (Pn) with fast-growing trees could efficiently fulfill the goal of CO2 emission reduction. Net photosynthetic rate model can provide refernece for plant's stability of photosynthesis productivity.

Methods and results: Using leaf phenotype data to predict the Pn can help effectively guide tree planting policies to offset CO2 release into the atmosphere. Tree planting has been proposed as one climate change solution. One of the most popular trees to plant are poplars. This study used a Populus simonii (P. simonii) dataset collected from 23 artificial forests in northern China. The samples represent almost the entire geographic distribution of P. simonii. The geographic locations of these P. simonii trees cover most of the major provinces of northern China. The northwestern point reaches (36°30'N, 98°09'E). The northeastern point reaches (40°91'N, 115°83'E). The southwestern point reaches (32°31'N, 108°90'E). The southeastern point reaches (34°39'N, 113°74'E). The collected data on leaf phenotypic traits are sparse, noisy, and highly correlated. The photosynthetic rate data are nonnormal and skewed. Many machine learning algorithms can produce reasonably accurate predictions despite these data issues. Influential outliers are removed to allow an accurate and precise prediction, and cluster analysis is implemented as part of a data exploratory analysis to investigate further details in the dataset. We select four regression methods, extreme gradient boosting (XGBoost), support vector machine (SVM), random forest (RF) and generalized additive model (GAM), which are suitable to use on the dataset given in this study. Cross-validation and regularization mechanisms are implemented in the XGBoost, SVM, RF, and GAM algorithms to ensure the validity of the outputs.

Conclusions: The best-performing approach is XGBoost, which generates a net photosynthetic rate prediction that has a 0.77 correlation with the actual rates. Moreover, the root mean square error (RMSE) is 2.57, which is approximately 35 percent smaller than the standard deviation of 3.97. The other metrics, i.e., the MAE, R2, and the min-max accuracy are 1.12, 0.60, and 0.93, respectively. This study demonstrates the ability of machine learning models to use noisy leaf phenotype data to predict the net photosynthetic rate with significant accuracy. Most net photosynthetic rate prediction studies are conducted on herbaceous plants. The net photosynthetic rate prediction of P. simonii, a kind of woody plant, illustrates significant guidance for plant science or environmental science regarding the predictive relationship between leaf phenotypic characteristics and the Pn for woody plants in northern China.

PubMed Disclaimer

Conflict of interest statement

Author Andrew Siu was employed by Amgen. This disclosure does not alter our adherence to the PLOS ONE polices on sharing data and materials. All authors declare no competing interests.

Figures

Fig 1
Fig 1. Dataset correlation.
The correlation patterns of Pn and six predictors with area, length, width, perimeter, ratio, factor (R package PerformanceAnalytics).
Fig 2
Fig 2. Influence diagnostics.
(A) Cook’s distance bar plot with threshold 0.008, (B) DFBETAs panels for intercept, area, length and width with threshold 0.09, (C) DFBETAs panels for perimeter, ratio and factor with threshold 0.09, (D) DFFITS plot for Pn with threshold 0.23 (R package olsrr).
Fig 3
Fig 3. Studentized residuals plots.
(A) Before outlier removal with threshold abs(3), (B) After outlier removal with threshold abs(3) (R package olsrr).
Fig 4
Fig 4. PAM results.
(A) Silhouette width diagram, (B) Frequency among all indices with optimal number of clusters k = 3 (R packages graphics and NbClust).
Fig 5
Fig 5. Clustering of the leaf phenotypic traits.
(A) Cluster plot where colors red, green and blue correspond to cluster 1, 2 and 3, respectively, (B) Box plot for Pn, area, length, width, perimeter, ratio and factor (R packages factoextra and ggplot2).
Fig 6
Fig 6. Results for random forest model.
The left plot is variable importance plot with horizontal axis as %IncMSE, and the right one is variable importance plot with horizontal axis as IncNodePurity (R package graphics).
Fig 7
Fig 7. Prediction results of XGBoost model.
(A) XGBoost testing plot with Min-Max Accuracy 0.93, RMSE 2.57 and SD 3.97, (B) XGBoost training plot with Min-Max Accuracy 0.99, RMSE 0.02 and SD 3.97, (C) XGBoost test residual plot with Min-Max Accuracy 0.93, RMSE 2.57 and SD 3.97 (R package ggplot2).
Fig 8
Fig 8. Prediction results of SVM model.
(A) SVM testing plot with Min-Max Accuracy 0.92, RMSE 2.77 and SD 3.97, (B) SVM training plot with Min-Max Accuracy 0.99, RMSE 0.2 and SD 3.97, (C) SVM test residual plot with Min-Max Accuracy 0.92, RMSE 2.77 and SD 3.97 (R package ggplot2).
Fig 9
Fig 9. Prediction results of RF model.
(A) RF testing plot with Min-Max Accuracy 0.87, RMSE 3.02 and SD 3.97, (B) RF training plot with Min-Max Accuracy 0.93, RMSE 1.32 and SD 3.97, (C) RF test residual plot with Min-Max Accuracy 0.87, RMSE 3.02 and SD 3.97 (R package ggplot2).
Fig 10
Fig 10. Prediction results of GAM model.
(A) GAM testing plot with Min-Max Accuracy 0.82, RMSE 3.96 and SD 3.97, (B) GAM training plot with-Min Max Accuracy 0.87, RMSE 2.59 and SD 3.97, (C) GAM test residual plot with Min-Max Accuracy 0.82, RMSE 3.96 and SD 3.97 (R package ggplot2).

References

    1. Masson-Delmotte V, Zhai P, Pörtner HO, Roberts D, J Skea PRS, Pirani A, et al. 2018: Global warming of 1.5°C. An IPCC Special Report on the impacts of global warming of 1.5°C above pre-industrial levels and related global greenhouse gas emission pathways, in the context of strengthening the global response to the threat of climate change, sustainable development, and efforts to eradicate poverty. 2018.
    1. Tyrchniewicz A, Meyer M. Offsetting CO2 emissions tree planting on the prairies. 2002.
    1. Shipley B, Vile D, Garnier E, Wright IJ, Poorter H. Functional linkages between leaf traits and net photosynthetic rate: reconciling empirical and mechanistic models. Functional Ecology. 2005;19(4):602–615. 10.1111/j.1365-2435.2005.01008.x - DOI
    1. Glime JM. Bryophyte Ecology, Vol. 1. Physiological Ecology. Michigan Technological University and the International Association of Bryologists; 2007. Available from: www.bryoecol.mtu.edu.
    1. Long S, Incoll L. The prediction and measurement of photosynthetic rate of Spartina townsendii (sensu lato) in the field. Journal of Applied Ecology. 1979; p. 879–891. 10.2307/2402861 - DOI

Publication types