. 2024 May 7;14(5):jkae061.

doi: 10.1093/g3journal/jkae061.

A genome sequence for the threatened whitebark pine

David B Neale^{1

2}, Aleksey V Zimin³, Amy Meltzer³, Akriti Bhattarai⁴, Maurice Amee⁴, Laura Figueroa Corona⁵, Brian J Allen^{1

6}, Daniela Puiu³, Jessica Wright⁷, Amanda R De La Torre⁵, Patrick E McGuire¹, Winston Timp³, Steven L Salzberg^{3

8}, Jill L Wegrzyn^{4

9}

Affiliations

¹ Department of Plant Sciences, University of California, Davis, CA 95616, USA.
² Whitebark Pine Ecosystem Foundation, Missoula, MT 59808, USA.
³ Department of Biomedical Engineering and Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA.
⁴ Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA.
⁵ School of Forestry, Northern Arizona University, Flagstaff, AZ 86011, USA.
⁶ University of California Cooperative Extension, Central Sierra, Jackson, CA 95642, USA.
⁷ USDA Forest Service, Pacific Southwest Research Station, Davis, CA 95618, USA.
⁸ Departments of Computer Science and Biostatistics, Johns Hopkins University, Baltimore, MD 21218, USA.
⁹ Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA.

PMID: 38526344
PMCID: PMC11075562
DOI: 10.1093/g3journal/jkae061

A genome sequence for the threatened whitebark pine

David B Neale et al. G3 (Bethesda). 2024.

. 2024 May 7;14(5):jkae061.

doi: 10.1093/g3journal/jkae061.

Authors

Affiliations

¹ Department of Plant Sciences, University of California, Davis, CA 95616, USA.
² Whitebark Pine Ecosystem Foundation, Missoula, MT 59808, USA.
³ Department of Biomedical Engineering and Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA.
⁴ Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA.
⁵ School of Forestry, Northern Arizona University, Flagstaff, AZ 86011, USA.
⁶ University of California Cooperative Extension, Central Sierra, Jackson, CA 95642, USA.
⁷ USDA Forest Service, Pacific Southwest Research Station, Davis, CA 95618, USA.
⁸ Departments of Computer Science and Biostatistics, Johns Hopkins University, Baltimore, MD 21218, USA.
⁹ Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA.

PMID: 38526344
PMCID: PMC11075562
DOI: 10.1093/g3journal/jkae061

Erratum in

Correction to: A genome sequence for the threatened whitebark pine.
[No authors listed] [No authors listed] G3 (Bethesda). 2024 Jun 5;14(6):jkae085. doi: 10.1093/g3journal/jkae085. G3 (Bethesda). 2024. PMID: 38683110 Free PMC article. No abstract available.

Abstract

Whitebark pine (WBP, Pinus albicaulis) is a white pine of subalpine regions in the Western contiguous United States and Canada. WBP has become critically threatened throughout a significant part of its natural range due to mortality from the introduced fungal pathogen white pine blister rust (WPBR, Cronartium ribicola) and additional threats from mountain pine beetle (Dendroctonus ponderosae), wildfire, and maladaptation due to changing climate. Vast acreages of WBP have suffered nearly complete mortality. Genomic technologies can contribute to a faster, more cost-effective approach to the traditional practices of identifying disease-resistant, climate-adapted seed sources for restoration. With deep-coverage Illumina short reads of haploid megagametophyte tissue and Oxford Nanopore long reads of diploid needle tissue, followed by a hybrid, multistep assembly approach, we produced a final assembly containing 27.6 Gb of sequence in 92,740 contigs (N50 537,007 bp) and 34,716 scaffolds (N50 2.0 Gb). Approximately 87.2% (24.0 Gb) of total sequence was placed on the 12 WBP chromosomes. Annotation yielded 25,362 protein-coding genes, and over 77% of the genome was characterized as repeats. WBP has demonstrated the greatest variation in resistance to WPBR among the North American white pines. Candidate genes for quantitative resistance include disease resistance genes known as nucleotide-binding leucine-rich repeat receptors (NLRs). A combination of protein domain alignments and direct genome scanning was employed to fully describe the 3 subclasses of NLRs. Our high-quality reference sequence and annotation provide a marked improvement in NLR identification compared to previous assessments that leveraged de novo-assembled transcriptomes.

Keywords: Pinus albicaulis; annotation; conifer; genome assembly; gymnosperm; whitebark pine.

PubMed Disclaimer

Conflict of interest statement

Conflicts of interest Any use of product names is for informational purposes only and does not imply endorsement by the US Government. The findings and conclusions in this publication are those of the authors and should not be construed to represent any official USDA or US Government determination or policy.

Figures

**Fig. 1.**
Flow chart for sequencing and assembly steps for the whitebark pine genome. The center row presents the sequence of activities (in boldface italic type) and software tools (underlined). The top and bottom rows describe the starting tissues, sequencing platforms, and sequence read and linkage map inputs and the thin arrows indicate where in the assembly process these inputs entered. The intermediate whitebark pine assembly (v0.9) emerges at the second step in the middle row, while the final assembly (v1.0) emerges at the end step of the middle row. WBP, whitebark pine. Photo credits: Sugar pine inset photograph by Mitch Barre via Wikimedia under Creative Commons Atribution-Share Alike 2.0 Generic license; Whitebark pine needles inset photograph by co-author Patrick McGuire.

**Fig. 2.**
Flow chart for annotation steps. Oval rectangles present the activities (in boldface italic type) and the software tools (underlined). Protein coding annotations v0.9 and v1.0 utilized the same input RNA libraries and alignments via HiSAT2. The first version of the annotation (v0.9) relied primarily on StringTie2 to resolve transcripts and incorporated additional models from high-quality NLRs curated from an independent BRAKER2 run. The second version of the annotation (v1.0) was conducted with EASEL that integrates direct evidence-based evaluations and high-quality ab initio predictions. Both annotations were functionally annotated with EnTAP and evaluated with benchmarks generated by BUSCO and AGAT.

**Fig. 3.**
Alignment of the sugar pine linkage map markers to the whitebark pine super-scaffolds. The individual chromosome plots are produced by the ALLMAPS software. The vertical bars in the middle of each of the 12 panels represent the chromosomes. The individual scaffolds of a chromosome are indicated in white or gray shading within those vertical bars. The 2 linkage maps are shown alongside each chromosome representation with marker alignments indicated with fine lines from the central chromosome representation to the linkage maps.

**Fig. 4.**
Results of NLR annotation methods. a) Within the genome annotation, complete NLRs identified by each method and annotations with support from multiple methods. In each cluster, the upper-left circle (yellow) represents NLRs identified only using InterProScan; the upper-right circle (coral/red) represents NLRs identified using only RGAugury; and the lower circle (pink) represents NLRs identified using only NLR-Annotator and supported by the genome annotation. b) NLRs identified by input type: a de novo-assembled transcriptome, the genome sequence, and the genome annotation. In each bar, the top rectangle represents the number of complete NLRs; the second-from-the-top rectangle represents the number of NLRs missing an LRR domain; the second-from-the-bottom rectangle represents the number of NLRs missing an N-terminal domain; and the bottom rectangle represents the number of NLRs identified only by the NB-ARC domain. c) Breakdown of total classified NLRs in the genome annotation with the addition of genes recovered from BRAKER and their contribution to the NLR classes. From left to right, the bars represent the TNL, CNL, and RNL classes of NLRs. For each class (bar), the top rectangle (gray) represents the number of complete NLRs; the next rectangle down (orange) represents the partial NLRs; and the bottom rectangle (blue, missing from the RNL bar) represents the NLRs recovered from BRAKER.

See this image and copyright information in PMC

Update of

A Genome Sequence for the Threatened Whitebark Pine.
Neale DB, Zimin AV, Meltzer A, Bhattarai A, Amee M, Corona LF, Allen BJ, Puiu D, Wright J, Torre AR, McGuire PE, Timp W, Salzberg SL, Wegrzyn JL. Neale DB, et al. bioRxiv [Preprint]. 2023 Nov 17:2023.11.16.567420. doi: 10.1101/2023.11.16.567420. bioRxiv. 2023. Update in: G3 (Bethesda). 2024 May 7;14(5):jkae061. doi: 10.1093/g3journal/jkae061. PMID: 38014212 Free PMC article. Updated. Preprint.

References

1. Bondar EI, Feranchuk SI, Miroshnikova KA, Sharov VV, Kuzmin DA, Oreshkova NV, Krutovsky KV. 2022. Annotation of Siberian larch (Larix sibirica Ledeb.) nuclear genome—one of the most cold-resistant tree species in the only deciduous genus in Pinaceae. Plants (Basel). 11(15):2062. doi: 10.3390/plants11152062. - DOI - PMC - PubMed
1. Bower AD, Aitken SN. 2008. Ecological genetics and seed transfer guidelines for Pinus albicaulis (Pinaceae). Am J Bot. 95(1):66–76. doi: 10.3732/ajb.95.1.66. - DOI - PubMed
1. Brůna T, Hoff KJ, Lomsadze A, Stanke M, Borodovsky M. 2021. Braker2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genom Bioinform. 3(1):lqaa108. doi: 10.1093/nargab/lqaa108. - DOI - PMC - PubMed
1. Bushmanova E, Antipov D, Lapidus A, Prjibelski AD. 2019. rnaSPAdes: a de novo transcriptome assembler and its application to RNA-seq data. Gigascience. 8(9):giz100. doi: 10.1093/gigascience/giz100. - DOI - PMC - PubMed
1. Crepeau MW, Langley CH, Stevens KA. 2017. From pine cones to read clouds: rescaffolding the megagenome of sugar pine (Pinus lambertiana). G3 (Bethesda). 7(5):1563–1568. doi: 10.1534/g3.117.040055. - DOI - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A genome sequence for the threatened whitebark pine

Affiliations

A genome sequence for the threatened whitebark pine

Authors

Affiliations

Erratum in

Abstract

Conflict of interest statement

Figures

Update of

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources