Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 May 7;14(5):jkae061.
doi: 10.1093/g3journal/jkae061.

A genome sequence for the threatened whitebark pine

Affiliations

A genome sequence for the threatened whitebark pine

David B Neale et al. G3 (Bethesda). .

Erratum in

Abstract

Whitebark pine (WBP, Pinus albicaulis) is a white pine of subalpine regions in the Western contiguous United States and Canada. WBP has become critically threatened throughout a significant part of its natural range due to mortality from the introduced fungal pathogen white pine blister rust (WPBR, Cronartium ribicola) and additional threats from mountain pine beetle (Dendroctonus ponderosae), wildfire, and maladaptation due to changing climate. Vast acreages of WBP have suffered nearly complete mortality. Genomic technologies can contribute to a faster, more cost-effective approach to the traditional practices of identifying disease-resistant, climate-adapted seed sources for restoration. With deep-coverage Illumina short reads of haploid megagametophyte tissue and Oxford Nanopore long reads of diploid needle tissue, followed by a hybrid, multistep assembly approach, we produced a final assembly containing 27.6 Gb of sequence in 92,740 contigs (N50 537,007 bp) and 34,716 scaffolds (N50 2.0 Gb). Approximately 87.2% (24.0 Gb) of total sequence was placed on the 12 WBP chromosomes. Annotation yielded 25,362 protein-coding genes, and over 77% of the genome was characterized as repeats. WBP has demonstrated the greatest variation in resistance to WPBR among the North American white pines. Candidate genes for quantitative resistance include disease resistance genes known as nucleotide-binding leucine-rich repeat receptors (NLRs). A combination of protein domain alignments and direct genome scanning was employed to fully describe the 3 subclasses of NLRs. Our high-quality reference sequence and annotation provide a marked improvement in NLR identification compared to previous assessments that leveraged de novo-assembled transcriptomes.

Keywords: Pinus albicaulis; annotation; conifer; genome assembly; gymnosperm; whitebark pine.

PubMed Disclaimer

Conflict of interest statement

Conflicts of interest Any use of product names is for informational purposes only and does not imply endorsement by the US Government. The findings and conclusions in this publication are those of the authors and should not be construed to represent any official USDA or US Government determination or policy.

Figures

Fig. 1.
Fig. 1.
Flow chart for sequencing and assembly steps for the whitebark pine genome. The center row presents the sequence of activities (in boldface italic type) and software tools (underlined). The top and bottom rows describe the starting tissues, sequencing platforms, and sequence read and linkage map inputs and the thin arrows indicate where in the assembly process these inputs entered. The intermediate whitebark pine assembly (v0.9) emerges at the second step in the middle row, while the final assembly (v1.0) emerges at the end step of the middle row. WBP, whitebark pine. Photo credits: Sugar pine inset photograph by Mitch Barre via Wikimedia under Creative Commons Atribution-Share Alike 2.0 Generic license; Whitebark pine needles inset photograph by co-author Patrick McGuire.
Fig. 2.
Fig. 2.
Flow chart for annotation steps. Oval rectangles present the activities (in boldface italic type) and the software tools (underlined). Protein coding annotations v0.9 and v1.0 utilized the same input RNA libraries and alignments via HiSAT2. The first version of the annotation (v0.9) relied primarily on StringTie2 to resolve transcripts and incorporated additional models from high-quality NLRs curated from an independent BRAKER2 run. The second version of the annotation (v1.0) was conducted with EASEL that integrates direct evidence-based evaluations and high-quality ab initio predictions. Both annotations were functionally annotated with EnTAP and evaluated with benchmarks generated by BUSCO and AGAT.
Fig. 3.
Fig. 3.
Alignment of the sugar pine linkage map markers to the whitebark pine super-scaffolds. The individual chromosome plots are produced by the ALLMAPS software. The vertical bars in the middle of each of the 12 panels represent the chromosomes. The individual scaffolds of a chromosome are indicated in white or gray shading within those vertical bars. The 2 linkage maps are shown alongside each chromosome representation with marker alignments indicated with fine lines from the central chromosome representation to the linkage maps.
Fig. 4.
Fig. 4.
Results of NLR annotation methods. a) Within the genome annotation, complete NLRs identified by each method and annotations with support from multiple methods. In each cluster, the upper-left circle (yellow) represents NLRs identified only using InterProScan; the upper-right circle (coral/red) represents NLRs identified using only RGAugury; and the lower circle (pink) represents NLRs identified using only NLR-Annotator and supported by the genome annotation. b) NLRs identified by input type: a de novo-assembled transcriptome, the genome sequence, and the genome annotation. In each bar, the top rectangle represents the number of complete NLRs; the second-from-the-top rectangle represents the number of NLRs missing an LRR domain; the second-from-the-bottom rectangle represents the number of NLRs missing an N-terminal domain; and the bottom rectangle represents the number of NLRs identified only by the NB-ARC domain. c) Breakdown of total classified NLRs in the genome annotation with the addition of genes recovered from BRAKER and their contribution to the NLR classes. From left to right, the bars represent the TNL, CNL, and RNL classes of NLRs. For each class (bar), the top rectangle (gray) represents the number of complete NLRs; the next rectangle down (orange) represents the partial NLRs; and the bottom rectangle (blue, missing from the RNL bar) represents the NLRs recovered from BRAKER.

Update of

References

    1. Bondar EI, Feranchuk SI, Miroshnikova KA, Sharov VV, Kuzmin DA, Oreshkova NV, Krutovsky KV. 2022. Annotation of Siberian larch (Larix sibirica Ledeb.) nuclear genome—one of the most cold-resistant tree species in the only deciduous genus in Pinaceae. Plants (Basel). 11(15):2062. doi:10.3390/plants11152062. - DOI - PMC - PubMed
    1. Bower AD, Aitken SN. 2008. Ecological genetics and seed transfer guidelines for Pinus albicaulis (Pinaceae). Am J Bot. 95(1):66–76. doi:10.3732/ajb.95.1.66. - DOI - PubMed
    1. Brůna T, Hoff KJ, Lomsadze A, Stanke M, Borodovsky M. 2021. Braker2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genom Bioinform. 3(1):lqaa108. doi:10.1093/nargab/lqaa108. - DOI - PMC - PubMed
    1. Bushmanova E, Antipov D, Lapidus A, Prjibelski AD. 2019. rnaSPAdes: a de novo transcriptome assembler and its application to RNA-seq data. Gigascience. 8(9):giz100. doi:10.1093/gigascience/giz100. - DOI - PMC - PubMed
    1. Crepeau MW, Langley CH, Stevens KA. 2017. From pine cones to read clouds: rescaffolding the megagenome of sugar pine (Pinus lambertiana). G3 (Bethesda). 7(5):1563–1568. doi:10.1534/g3.117.040055. - DOI - PMC - PubMed

Publication types