Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2026 Feb 18:16:1703429.
doi: 10.3389/fpls.2025.1703429. eCollection 2025.

Metabolomic analysis of Yunnan cigar tobacco leaves: impact of geography and climate on flavor characteristics and machine learning-based origin traceability

Affiliations

Metabolomic analysis of Yunnan cigar tobacco leaves: impact of geography and climate on flavor characteristics and machine learning-based origin traceability

Yuping Wu et al. Front Plant Sci. .

Abstract

To investigate how Yunnan's distinctive geographical and climatic conditions shape the unique metabolic profile of its cigar tobacco leaves (CTLs), and to establish a reliable method for origin traceability using machine learning, a non-targeted metabolomics analysis was conducted on 71 CTL samples collected from the Dominican Republic, Indonesia, and Yunnan, including Lincang, Pu'er, and Yuxi within Yunnan. A total of 778 highly reliable metabolites were identified. Influenced by Yunnan's high altitude, large diurnal temperature variation, intense ultraviolet radiation, and relative dryness, its CTLs exhibited characteristic metabolic profiles, with significant enrichment in pathways such as flavone and flavonol biosynthesis and betalain biosynthesis. Elevated levels of polyphenols, indoles, jasmonates, carotenoids, and other compounds were linked to Yunnan CTLs' distinct woody, roasted, and astringent flavor profile. Twelve key biomarkers were selected using Multivariate methods with unbiased variable selection in R (MUVR). Machine learning algorithms-including LDA, LR, GMM, KNN, and SVM-were applied to these biomarkers, achieving highly accurate origin discrimination across national (Yunnan vs. Dominican Republic/Indonesia) and regional (Lincang, Pu'er, Yuxi) scales. Validation results showed a median false classification rate of 0.1 over 100 iterations and an AUC close to 1, confirming the model's high accuracy and robustness for CTLs origin traceability.

Keywords: biomarkers; flavor profile; geographical origin; machine learning; metabolomics.

PubMed Disclaimer

Conflict of interest statement

GZ, WW, LY, TZ, JW was employed by company China Tobacco Yunnan Industrial Co., Ltd. The remaining author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
The PCA (A) and PLS-DA (B) plots of CTLs form Dominica, Indonesia and China (Yunnan).
Figure 2
Figure 2
Metabolic profiles of Dominican (DMNJ) and Yunnan (CN) CTLs. (A) Plot of PCA. (B) Volcano plot of differential metabolites. (C) Heatmap of differential metabolites. Red indicates metabolites that were up-regulated and green indicates metabolites that were down-regulated. (D) Pathway enrichment plot. Colors represent the relative degree of the impact of each pathway (X-axis) and statistical significance (Y-axis).
Figure 3
Figure 3
Box plots labeled A–E compare the normalized area of various compounds between DMNJ and CN groups. A shows melatonin, B shows 7-Isomethyljasmonate, C shows 3-Indolepropionic acid, D shows trans-3-Indoleacrylic acid, and E shows Indole-3-lactic acid. CN group exhibits higher values than DMNJ across all compounds. Each plot includes outliers.
Figure 4
Figure 4
Metabolic profiles of Indonesian (INA) and Yunnan (CN) CTLs. (A) Plot of PCA. (B) Volcano plot of differential metabolites. (C) Heatmap of differential metabolites. Red indicates metabolites that were up-regulated and green indicates metabolites that were down-regulated. (D) Pathway enrichment plot. Colors represent the relative degree of the impact of each pathway (X-axis) and statistical significance (Y-axis).
Figure 5
Figure 5
Box plots comparing the normalized area of three compounds between INA and CN groups. (A) Trans-3-Indoleacrylic acid shows higher values in CN. (B) Melatonin displays greater variability and higher values in CN. (C) 7-Isomethyljasmonate also shows higher values in CN than INA. Outliers are marked with dots.
Figure 6
Figure 6
Redundancy analysis (RDA) plots of metabolite and environmental factors of CTLs form Dominica, Indonesia and China (Yunnan).
Figure 7
Figure 7
Variable selection based on a multivariate methods with unbiased variable selection in R (MUVR) algorithm. [Balanced Error Rate (BER)].
Figure 8
Figure 8
100 prediction errors of SVM, LDA, LR, GMM, and KNN models for different origins CTLs.

References

    1. Abe S. S., Ashida K., Kamil M. I., Karyanto O., Hardjowigeno S., Tawaraya K. (2020). Land use and management effects on volcanic soils in West Sumatra, Indonesia. Geoderma Regional 22, e00308. doi: 10.1016/j.geodrs.2020.e00308 - DOI
    1. Acree T. E., Nishida R., Fukami H. (1985). Odor thresholds of the stereoisomers of methyl jasmonate. J. Agric. Food Chem. 33, 425–427. doi: 10.1021/jf00063a026 - DOI
    1. Aloum L., Alefishat E., Adem A., Petroianu G. (2020). Ionone is more than a violet’s fragrance: A review. Molecules 25, 5822. doi: 10.3390/molecules25245822, PMID: - DOI - PMC - PubMed
    1. Ashihara H., Crozier A., Ludwig I. A. (2020). Plant nucleotide metabolism: Biosynthesis, degradation, and alkaloid formation (Hoboken, NJ: John Wiley & Sons Ltd; ).
    1. Badmus U. O., Crestani G., Cunningham N., Gohir W., Jaremko M., Edel K. H., et al. (2022). UV radiation induces specific changes in the carotenoid profile of Arabidopsis thaliana. Biomolecules 12, 1879. doi: 10.3390/biom12121879, PMID: - DOI - PMC - PubMed