Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Apr;230(1):73-89.
doi: 10.1111/nph.17140. Epub 2021 Jan 8.

Recent advances in Cannabis sativa genomics research

Affiliations
Review

Recent advances in Cannabis sativa genomics research

Bhavna Hurgobin et al. New Phytol. 2021 Apr.

Abstract

Cannabis (Cannabis sativa L.) is one of the oldest cultivated plants purported to have unique medicinal properties. However, scientific research of cannabis has been restricted by the Single Convention on Narcotic Drugs of 1961, an international treaty that prohibits the production and supply of narcotic drugs except under license. Legislation governing cannabis cultivation for research, medicinal and even recreational purposes has been relaxed recently in certain jurisdictions. As a result, there is now potential to accelerate cultivar development of this multi-use and potentially medically useful plant species by application of modern genomics technologies. Whilst genomics has been pivotal to our understanding of the basic biology and molecular mechanisms controlling key traits in several crop species, much work is needed for cannabis. In this review we provide a comprehensive summary of key cannabis genomics resources and their applications. We also discuss prospective applications of existing and emerging genomics technologies for accelerating the genetic improvement of cannabis.

Keywords: breeding; cannabinoids; cannabis; crop improvement; genome assembly; genomics.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Schematic diagram of cannabinoid biosynthesis including polyketide and isoprenoid precursor pathways. Precursor pathways are merged by a plastid‐localized aromatic prenyltransferase, with alkylresorcinolic acids and geranyl diphosphate intermediates forming cannabigeroids with a linear isoprenyl residue (Gülck & Møller, 2020). Cannabinoid synthesis concludes in the apoplastic storage cavity of glandular trichomes. Here, cannabigeroids are converted to tri‐ and di‐cyclic cannabinoids such as ∆9‐tetrahydrocannabinolic acid (THCA) and cannabidiolic acid (CBDA) via stereoselective oxidative cyclisation of the isoprenyl moiety. This occurs enzymatically by the cannabinoid synthases THCAS, CBDAS and CBCAS. The green arrow indicates location of the extracellular storage cavity of a Cannabis stalked glandular trichome; bar, 100 µm. Subcellular locations of cannabinoid and precursor pathway enzymes were predicted with the subcellular location software TargetP‐2.0 (http://www.cbs.dtu.dk/services/TargetP/). AAE, acyl‐activating enzyme; CBCA, cannabichromenic acid; CBCAS, cannabichromenic acid synthase; CBCOA, cannabiorcichromenic acid; CBCVA, cannabichromevarinic acid; CBDA, cannabidiolic acid; CBDAS, cannabidiolic acid synthase; CBDPA, cannabidiphorolic acid; CBDVA, cannabidivarinic acid; CMK, 4‐(cytidine 50‐diphospho)‐2‐C‐methyl‐d‐erythritol kinase; DXR, 1‐deoxy‐D‐xylulose‐5‐phosphate reductoisomerase; DXS, 1‐deoxy‐d‐xylulose 5 phosphate synthase; HDR, 1‐hydroxy‐2‐methyl‐2‐butenyl 4‐diphosphate reductase; HDS, 1‐hydroxy‐2‐methyl‐2‐butenyl 4‐diphosphate synthase; MCT, 2‐C‐methyl‐d‐erythritol 4‐phosphate cytidylyltransferase; MDS, 2‐C‐methyl‐d‐erythritol 2,4‐cyclodiphosphate synthase; MEP, 2‐C‐methyl‐d‐erythritol‐4‐phosphate; OAC, olivetolic acid cyclase; PT, prenyltransferase (e.g. geranylpyrophosphate:olivetolate geranyltransferase (GOT)); THCA, tetrahydrocannabinolic acid; THCAS, tetrahydrocannabinolic acid synthase; THCOA, tetrahydrocannabiorcolic acid; THCPA, tetrahydrocannabipgorolic acid; THCVA, tetrahydrocannabivarinic acid; TKS, tetraketide synthase.
Fig. 2
Fig. 2
Benchmarking Universal Single‐Copy Ortholog (BUSCO) assessment of the cannabis genome assemblies shown in Table 1(a). The percentages of complete (single‐copy and duplicated), fragmented and missing universal single‐copy orthologue genes were identified using busco v.4.02 (Simão et al., 2015). The Jamaican Lion assemblies (female parent, male parent, F1) have more complete BUSCOs on average, but they also harbour a larger number of duplicated BUSCOs, which reflects the fragmented nature of these assemblies.
Fig. 3
Fig. 3
Sequence similarity between the Cannabis sativa cannabinoid synthase genes tetrahydrocannabinolic acid synthase (THCAS; GenBank acc. no. AB057805.1), cannabidiolic acid synthase (CBDAS: GenBank acc. no. AB292682.1) and cannabichromenic acid synthase (CBCAS; GenBank acc. no. LY658671.1). (a) Protein sequence alignments of THCAS, CBDAS and CBCAS were performed using Clustal Omega and protein domains were annotated using interproscan v.5.41‐78.0 (Sievers et al., 2011; Jones et al., 2014). The p‐cresol methylhydroxylase (PCMH)‐type flavin adenine dinucleotide (FAD)‐binding domain (residues 77–251, PrositeProfiles: PS51387, InterPro:IPR016166) and berberine and berberine‐like domain (residues 480–538 for THCAS and CBCAS, residues 479–537 for CBDAS, Pfam: PF08031, InterPro: IPR012951) are highlighted in red and black, respectively. The FAD‐binding domain (residues 81–214, Pfam: PF01565, IPR006094) is not shown. (b) THCAS is more similar to CBCAS than CBDAS at the nucleotide and amino acid levels. It is possible that the presence of CBCAS may lead to the production of THCA as a by‐product (McKernan et al., 2020).
Fig. 4
Fig. 4
Maximum‐likelihood phylogenetic tree depicting the relationship among the Cannabis sativa cannabinoid synthase genes tetrahydrocannabinolic acid synthase (THCAS), cannabidiolic acid synthase (CBDAS) and cannabichromenic acid synthase (CBCAS). The published nucleotide sequences of the active/functional forms of THCAS (GenBank acc. no. AB057805.1), CBDAS (GenBank acc. no. AB292682.1), CBCAS (GenBank acc. no. LY658671.1) and the paralogues of these genes as annotated in the cs10 v.2.0 and Jamaican Lion (female parent and male parent) assemblies (Supporting Information Tables S1, S2) were aligned against the latest C. sativa reference genome assemblies (Table 1) using Blast+/2.2.29 (Altschul et al., 1990). Best hits corresponding to a percentage identity > 98.5%, query coverage > 75% and alignment length = query length ± 100 bp were retained (Tables S1, S2). The nucleotide sequences of these best hits were extracted from each assembly (where applicable) using bedtools v.2.26.0 (Quinlan & Hall, 2010). transdecoder v.3.0 was used to predict the longest open reading frame from the extracted regions (https://transdecoder.github.io/). The predicted proteins along with amino acid sequences (complete CDS) of AB057805.1 (gene ID in blue), AB292682.1 (gene ID in red), LY658671.1 (gene ID in green) and the other cannabinoid synthase gene copies annotated in the cs10 v.2.0 and Jamaican Lion (female parent and male parent) genome assemblies were used for multiple sequence alignment using clustal Omega (Sievers et al., 2011). The phylogenetic tree was reconstructed from these alignments using raxml v.8.12.12. with 500 bootstrap replicates under the JTT model of amino acid substitution and visualized using Interactive Tree Of Life (iTOL) (Letunic & Bork, 2007; Stamatakis et al., 2008). The tree was rooted with the Humulus lupulus THCAS homolog (GenBank acc. no. LA634839.1). Only bootstrap values of > 70% are shown. It is worth noting that all CBCAS genes cluster with some the THCAS genes reflecting the high sequence similarity between these two cannabinoid synthase genes (Fig. 3).
Fig. 5
Fig. 5
Dotplots showing the syntenic relationship between genomes of three Cannabis sativa cultivars. Pairwise genome alignments for (a) PK v.5.0 (GenBank acc. no. GCA_000230575.5) and FN v.2.0 (GenBank acc. no. GCA_003417725.2), (b) cs10 v.2.0 (GenBank acc. no. GCA_900626175.2) and PK v.5.0 and (c) cs10 v.2.0 and FN v.2.0 were performed using Minimap2 and the alignments were visualized using d‐genies (Cabanettes & Klopp, 2018; Li, 2018) (Supporting Information Table S3). Breaks in the alignment could be due to the presence of structural variants or the less contiguous nature of the PK and FN assemblies. The difference in chromosome orientation between the assemblies also can be seen. Only chromosome‐level alignments are shown. PK, Purple Kush; FN, Finola.
Fig. 6
Fig. 6
Cannabinoid synthase gene expression in relation to cannabinoid content and composition in nine high cannabinoid yielding cannabis cultivars (data taken from Zager et al., 2019). (a) Tetrahydrocannabinolic acid : cannabidiolic acid (THCA : CBDA) ratio and (b) cannabinoid contents of the cultivars. The lower panel in (b) shows a zoomed‐in view of cannabinoid content (% DW) in the range 0–0.5%. (c) Trichome‐specific expression patterns of 13 cannabinoid synthase genes from cs10 v.1.0 genome assembly (GenBank acc. no. GCA_900626175.1) in these cultivars. Positions on chromosomes represent one or more cannabinoid synthase locus. The reference CBDAS (LOC115697762) and inactive THCAS (LOC115697880) loci are underlined. LOC115697762 bears 100% nucleotide identity with the functional CBDAS identified by Taura et al. (2007) (GenBank acc. no. AB292682.1), whereas LOC115697880 is 99% identical to CBCAS (GenBank acc. no. LY658671.1) at the nucleotide level (Taura et al., 2007). Of the 13 loci, two (LOC115698060 and LOC115697886) are pseudogenic inactive THCAS copies containing in‐frame stop codons, whereas the remaining 11 genes produce full‐length CDS. Trichome enriched RNA‐seq reads previously reported by Zager et al. (2019) were accessed from the NCBI Sequence Read Archive (SRA project no. PRJNA498707; Zager et al., 2019). The reads were mapped to the Cannabis sativa cs10 v.1.0 genome assembly (using Hisat2 v.2.1.0 and sorted by genomic location using samtools v.1.9 Li et al., 2009; Kim et al., 2019). stringtie v.1.3.5 was used to assemble RNA‐Seq alignments into potential transcripts and to calculate gene abundances (TPM) (Supporting Information Table S6; Pertea et al., 2015). Chromosome numbers have been changed to community standard nomenclature in accordance with cs10 v.2.0. (GenBank acc. no. GCA_900626174.2.). Cannabis sativa var. cs10 is associated with a high CBD chemotype. BB, Black Berry Kush; BL, Black Lime; CC, Cherry Chem; CT, Canna Tsu; MT, Mama Thai; SD, Sour Diesel; TP, Terple; TPM, Transcripts per million; VF, Valley Fire; WC, White Cookies. Error bars represent ± 1 SD of the mean metabolite content of each cultivar (n = 3).

References

    1. Abe A, Kosugi S, Yoshida K, Natsume S, Takagi H, Kanzaki H, Matsumura H, Yoshida K, Mitsuoka C, Tamiru M et al. 2012. Genome sequencing reveals agronomically important loci in rice using MutMap. Nature Biotechnology 30: 174–178. - PubMed
    1. Adams R, Hunt M, Clark J. 1940. Structure of cannabidiol, a product isolated from the marihuana extract of Minnesota wild hemp. I. Journal of the American Chemical Society 62: 196–200.
    1. Aizpurua‐Olaizola O, Soydaner U, Öztürk E, Schibano D, Simsir Y, Navarro P, Etxebarria N, Usobiaga A. 2016. Evolution of the cannabinoid and terpene content during the growth of Cannabis sativa plants from different chemotypes. Journal of Natural Products 79: 324–331. - PubMed
    1. Allen KD, McKernan K, Pauli C, Roe J, Torres A, Gaudino R. 2019. Genomic characterization of the complete terpene synthase gene family from Cannabis sativa . PLoS ONE 14: e0222363. - PMC - PubMed
    1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. Journal of Molecular Biology 215: 403–410. - PubMed

Publication types