Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 19;17(5):e0267498.
doi: 10.1371/journal.pone.0267498. eCollection 2022.

The phytochemical diversity of commercial Cannabis in the United States

Affiliations

The phytochemical diversity of commercial Cannabis in the United States

Christiana J Smith et al. PLoS One. .

Abstract

The legal status of Cannabis is changing, fueling an increasing diversity of Cannabis-derived products. Because Cannabis contains dozens of chemical compounds with potential psychoactive or medicinal effects, understanding this phytochemical diversity is crucial. The legal Cannabis industry heavily markets products to consumers based on widely used labeling systems purported to predict the effects of different "strains." We analyzed the cannabinoid and terpene content of commercial Cannabis samples across six US states, finding distinct chemical phenotypes (chemotypes) which are reliably present. By comparing the observed phytochemical diversity to the commercial labels commonly attached to Cannabis-derived product samples, we show that commercial labels do not consistently align with the observed chemical diversity. However, certain labels do show a biased association with specific chemotypes. These results have implications for the classification of commercial Cannabis, design of animal and human research, and regulation of consumer marketing-areas which today are often divorced from the chemical reality of the Cannabis-derived material they wish to represent.

PubMed Disclaimer

Conflict of interest statement

I have read the journal’s policy and the authors of this manuscript have the following competing interests: D.V. is the founder and president of the non-profit organization Agricultural Genomics Foundation, and the sole owner of CGRI, LLC. N.J. is employed by Leafly Holdings, Inc. Leafly allowed N.J. to use some professional time to oversee this research project and work on the manuscript.

Figures

Fig 1
Fig 1. Cannabinoid variation among commercial Cannabis-derived product samples in the US.
(A) Violin plot of distribution of the set of common cannabinoids measured across all regions. (B) Total THC vs. Total CBD levels, color-coded by THC:CBD chemotype. (C) Histogram showing THC:CBD distribution on a log10 scale. “Inf” stands for “infinite” (any samples with 0 total THC or CBD). (D) Principal Component Analysis of all cannabinoids shown in panel A, color-coded by THC:CBD chemotype.
Fig 2
Fig 2. Correlations among total THC, CBD, and CBG levels in each THC:CBD chemotype.
Scatterplots showing the linear correlation between total THC, CBD, and CBG levels in each of the main THC:CBD chemotypes. The sample sizes for each of the groups is as follows: THC-dominant N = 82,563; Balanced N = 1,876, and CBD-dominant N = 1,188. Top Row: Total THC vs. Total CBD; middle row: Total CBD vs. Total THC. Bottom row: Total CBD vs. Total CBG. ***P < 0.0001.
Fig 3
Fig 3. Terpene abundance across commercial Cannabis-derived product samples in the US.
(A) Violin plots showing distributions of the set of common terpenes measured across all regions. (B) Scatterplot showing the correlation between α- and β-pinene, two common pinene isomers. Rs = 0.78, ***P < 0.0001 (C) Scatterplot showing the correlation between β-caryophyllene and humulene, two Cannabis terpenes co-produced by common enzymes. Rs = 0.88, ***P < 0.0001.
Fig 4
Fig 4. Patterns of terpene co-occurrence among commercial Cannabis-derived product samples in the US.
(A) Hierarchically clustered correlation matrix showing pairwise correlations between all terpenes consistently measured across regions. (B) Network diagram where nodes are terpenes and edges are thresholded to the strongest observed correlations and their widths correspond to the strength of the correlation.
Fig 5
Fig 5. Patterns of terpene profile diversity across THC:CBD chemotypes.
(A) Histogram showing the proportion of variation explained by each principal component after performing Principal Component Analysis on the terpene dataset. (B) PCA scores plotted along PC1 and PC2, color-coded by major THC:CBD chemotype. Vectors depict the loadings of the five individual terpenes onto these principal axes. (C) PCA scores plotted along PC1 and PC3. (D) PCA scores plotted along PC2 and PC3. (E) Violin plot showing distribution of ‘product diversity’ values (cosine distances) for each THC:CBD chemotype. Product values are calculated by averaging samples with the same strain name linked to a given producer ID. ***P < 0.0001, Welch’s t-test and Cohen’s d’. (F) Stacked bar chart showing the percent products with a given dominant terpene for each THC:CBD chemotype.
Fig 6
Fig 6. Commercial “strain category” labels poorly align to patterns of phytochemistry.
(A) PCA scores for all THC-dominant samples plotted along PC1 and PC2, color-coded by Indica/Hybrid/Sativa label attached to each sample. (B) Silhouette coefficients for each sample with a given Indica/Hybrid/Sativa label. (C) PCA scores for all THC-dominant samples plotted along PC1 and PC2, color-coded by the dominant terpene of each sample. (D) Silhouette coefficients for each sample with a given dominant terpene. (E) PCA scores for all THC-dominant samples plotted along PC1 and PC2, color-coded by k-means cluster labels attached to each sample. (F) Silhouette coefficients for each sample with a given k-means cluster label. Each silhouette plot depicts a random subset of 10,000 samples from the full dataset (n = 41,201).
Fig 7
Fig 7. Cluster analysis reveals distinct chemotypes of THC-dominant commercial Cannabis commonly present in US states.
(A) Violin plot showing the distribution of silhouette coefficients for each labelling method. ***P < 0.0001, Welch’s t-test and Cohen’s d’. Absolute effect sizes are given as Cohen’s d’ values. ***P < 0.0001, **P < 0.001; *P < 0.01 (B) Stacked bar chart showing the percent of samples falling within each group for each labelling system. (C) UMAP embedding in two dimensions showing samples classified into each k-means cluster. (D) Polar plot showing the mean, normalized levels of eight of the most abundant terpenes observed for Cluster I (high caryophyllene-limonene) products. (E) Similar polar plot for Cluster II (high myrcene-pinene) products. (F) Similarly polar plot for Cluster III (high terpinolene-myrcene) products. Gray lines represent the top 25 products from each cluster with the most samples per product.
Fig 8
Fig 8. Commercial “strain names” are associated with variable levels of chemical consistency across Cannabis products.
(A) Scatterplot of the number of products tested vs. normalized popularity for all product-level data attached to cultivator-given strain names (log10 scale). rs = 0.59, ***P < 0.0001 (B) Similarity matrix depicting pairwise cosine similarities between all product-level data attached to the ten most common strain names by abundance. (C) Violin plot depicting the distribution of cosine similarity scores between products attached to the same strain name. Dashed line represents the average similarity level after randomly shuffling strain names. **P < 0.001, ***P < 0.0001, Welch’s t-test. (D) Violin plots representing total cannabinoid distributions and polar plots representing terpene profiles for all products attached to the strain names “Purple Punch” (left) and “Tangie” (right); (E) UMAP embedding showing where each of the product samples for Purple Punch and Tangie from panel D show up in this representation.
Fig 9
Fig 9. Some commercial Cannabis labels are overrepresented for specific chemotypes.
(A) UMAP embedding of product-level data as in Fig 8E, color-coded by Indica/Hybrid/Sativa label. (B) Stacked bar chart showing the proportion of products labelled as Indica, Hybrid, or Sativa within each k-means cluster, compared to the overall distribution. ***P < 0.0001, Chi-squared test. (C) UMAP embedding of product-level data as in Fig 8D, color-coded by k-means cluster label, showing where all products attached to either “Blue Dream” or “Dutch Treat” are found. (D) Bar charts showing the percent of products attached to each strain name that are found in each k-means cluster, color-coded by its most prominent cluster. Dashed line represents expected percent after randomly shuffling strain names. ***P < 0.0001, Welch’s t-test.
Fig 10
Fig 10. Potential scheme for classifying commercial Cannabis based on cannabinoid and terpene profiles.
Flow chart showing a potential classification framework for commercial Cannabis. Level 1 represents cannabinoid ratios and displays the three common THC:CBD chemotypes as well as novel cannabinoids that could be bred. Level 2 represents terpene profiles and displays the three clusters we identified as well as other terpene combinations which could come to exist. Terpene clusters overlap slightly to illustrate that terpenes in each cluster are not mutually exclusive. Grey lines demonstrate a chemotype that may be possible (e.g., CBD-dominant and terpinolene-dominant) but has not yet been observed.

References

    1. Clarke R, Merlin M. Cannabis: evolution and ethnobotany: Univ of California Press; 2013.
    1. Clarke RC, Merlin MD. Cannabis domestication, breeding history, present-day genetic diversity, and future prospects. Critical reviews in plant sciences. 2016;35(5–6):293–327.
    1. Russo EB. History of cannabis and its preparations in saga, science, and sobriquet. Chemistry & Biodiversity. 2007;4(8):1614–48. - PubMed
    1. Abel EL. Marihuana: the first twelve thousand years: Springer Science & Business Media; 2013.
    1. Watts G. Science commentary: Cannabis confusions. BMJ: British Medical Journal. 2006;332(7534):175. doi: 10.1136/bmj.332.7534.175 - DOI - PMC - PubMed