Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb 9;13(2):jkac304.
doi: 10.1093/g3journal/jkac304.

A long-read and short-read transcriptomics approach provides the first high-quality reference transcriptome and genome annotation for Pseudotsuga menziesii (Douglas-fir)

Affiliations

A long-read and short-read transcriptomics approach provides the first high-quality reference transcriptome and genome annotation for Pseudotsuga menziesii (Douglas-fir)

Vera Marjorie Elauria Velasco et al. G3 (Bethesda). .

Abstract

Douglas-fir (Pseudotsuga menziesii) is native to western North America. It grows in a wide range of environmental conditions and is an important timber tree. Although there are several studies on the gene expression responses of Douglas-fir to abiotic cues, the absence of high-quality transcriptome and genome data is a barrier to further investigation. Like for most conifers, the available transcriptome and genome reference dataset for Douglas-fir remains fragmented and requires refinement. We aimed to generate a highly accurate, and complete reference transcriptome and genome annotation. We deep-sequenced the transcriptome of Douglas-fir needles from seedlings that were grown under nonstress control conditions or a combination of heat and drought stress conditions using long-read (LR) and short-read (SR) sequencing platforms. We used 2 computational approaches, namely de novo and genome-guided LR transcriptome assembly. Using the LR de novo assembly, we identified 1.3X more high-quality transcripts, 1.85X more "complete" genes, and 2.7X more functionally annotated genes compared to the genome-guided assembly approach. We predicted 666 long noncoding RNAs and 12,778 unique protein-coding transcripts including 2,016 putative transcription factors. We leveraged the LR de novo assembled transcriptome with paired-end SR and a published single-end SR transcriptome to generate an improved genome annotation. This was conducted with BRAKER2 and refined based on functional annotation, repetitive content, and transcriptome alignment. This high-quality genome annotation has 51,419 unique gene models derived from 322,631 initial predictions. Overall, our informatics approach provides a new reference Douglas-fir transcriptome assembly and genome annotation with considerably improved completeness and functional annotation.

Keywords: Pseudotsuga menziesii var. glauca; Pseudotsuga menziesii var. menziesii; de novo assembly; NovaSeq; PacBio Iso-Seq; coastal Douglas-fir; full-length isoform; functional annotation; genome annotation; interior Douglas-fir; long noncoding RNA; reference transcriptome; transcription factors.

PubMed Disclaimer

Conflict of interest statement

Conflicts of interest None declared.

Figures

Fig. 1.
Fig. 1.
Comparison of quality of de novo and reference genome-guided assembly of Douglas-fir LR-generated transcriptome. a) Length versus number of unique transcripts and b) transcriptome completeness score. c) Number and percentage of unique transcripts with functional annotation.
Fig. 2.
Fig. 2.
Summary of gene family and gene ontology assignments in de novo assembled Douglas-fir LR transcriptome. Top ten a,b) taxonomic groups versus unique transcripts, c) GO biological processes, and d) GO molecular function terms versus number of GO terms assigned. White and gray bars represent data from de novo and genome-guided transcriptome assembly, respectively.
Fig. 3.
Fig. 3.
Trait values of transcripts predicted as lncRNAs in Douglas-fir. a) Mean transcript length, b) ORF length, c) GC content, d) Fickett testscore, e) Hexamer score, and f) lncRNA score of lncRNAs and all other assembled transcripts including protein-coding transcripts.
Fig. 4.
Fig. 4.
Genome annotation processes and BUSCO completeness. (a–c) Flow chart summarizes the steps taken to obtain genome annotations (b) Annotation v2 (pre-filter) and Annotation v2 and (c) Annotation v1. (d) BUSCO scores for genome assembly* used in genome annotation steps and de novo assembled transcriptome§ are also shown.
Fig. 5.
Fig. 5.
Genome annotation evaluation. a) Gene and intron length distribution across genome annotation approaches, and the transcriptome alignment (de novo assembled prior to alignment). The log-scaled values for gene length and intron length reflect improvements in contiguity with the addition of Iso-Seq data. b) Reciprocal BLAST-style analysis was conducted at 2 coverage values for the total set of genes produced from each method. The 50% coverage of the target/query and 80% coverage of target and query are shown. The numerical value at the end of each bar represents the percentage of total sequences that were functionally annotated at that coverage value.

Similar articles

Cited by

References

    1. Ardui S, Ameur A, Vermeesch JR, Hestand MS. Single molecule real-time (SMRT) sequencing comes of age: applications and utilities for medical diagnostics. Nucleic Acids Res. 2018;46(5):2159–2168. doi:10.1093/nar/gky066 - DOI - PMC - PubMed
    1. Bayega A, Fahiminiya S, Oikonomopoulos S, Ragoussis J. 2018. Current and future methods for mRNA analysis: a drive toward single molecule sequencing. In: Raghavachari N, Garcia-Reyero N, editors. Gene Expression Analysis: Methods and Protocols, Methods in Molecular Biology. New York: (NY: ): Springer. p. 209–241. - PubMed
    1. Bedon F, Bomal C, Caron S, Levasseur C, Boyle B, Mansfield SD, Schmidt A, Gershenzon J, Grima-Pettenati J, Séguin A. Subgroup 4 R2R3-MYBs in conifer trees: gene family expansion and contribution to the isoprenoid- and flavonoid-oriented responses. J Exp Bot. 2010;61(14):3847–3864. doi:10.1093/jxb/erq196 - DOI - PMC - PubMed
    1. Bohne A-V, Schwenkert S, Grimm B, Nickelsen J. Roles of tetratricopeptide repeat proteins in biogenesis of the photosynthetic apparatus. Int Rev Cell Mol Biol. 2016;324:187–227. doi:10.1016/bs.ircmb.2016.01.005 - DOI - PubMed
    1. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for illumina sequence data. Bioinformatics. 2014;30(15):2114–2120. doi:10.1093/bioinformatics/btu170 - DOI - PMC - PubMed

Publication types

LinkOut - more resources