Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 5:10:e13607.
doi: 10.7717/peerj.13607. eCollection 2022.

Telomere-to-telomere genome assembly of Phaeodactylum tricornutum

Affiliations

Telomere-to-telomere genome assembly of Phaeodactylum tricornutum

Daniel J Giguere et al. PeerJ. .

Abstract

Phaeodactylum tricornutum is a marine diatom with a growing genetic toolbox available and is being used in many synthetic biology applications. While most of the genome has been assembled, the currently available genome assembly is not a completed telomere-to-telomere assembly. Here, we used Oxford Nanopore long reads to build a telomere-to-telomere genome for Phaeodactylum tricornutum. We developed a graph-based approach to extract all unique telomeres, and used this information to manually correct assembly errors. In total, we found 25 nuclear chromosomes that comprise all previously assembled fragments, in addition to the chloroplast and mitochondrial genomes. We found that chromosome 19 has filtered long-read coverage and a quality estimate that suggests significantly less haplotype sequence variation than the other chromosomes. This work improves upon the previous genome assembly and provides new opportunities for genetic engineering of this species, including creating designer synthetic chromosomes.

Keywords: Genome assembly; High-molecular weight DNA; Methylation; Nanopore sequencing; Phaeodactylum tricornutum; Telomere-to-telomere; Transposons.

PubMed Disclaimer

Conflict of interest statement

Martin Flatley is an employee of Suncor Energy.

Figures

Figure 1
Figure 1. Workflow for telomere-to-telomere genome assembly.
Telomere-containing nanopore reads larger than 50 kb are extracted and mapped in all-vs-all mode using minimap2. The resulting alignments are filtered by 95% query coverage, and a network graph is created using iGraph using read names as vertices, and alignments between reads as edges. Each resulting cluster represents one end of a chromosome. On a chromosome-by-chromosome basis, ultra-long read coverage is plotted. If an assembled chromosome is missing a telomere or has an assembly error revealed by a lack of overlapping read coverage, the longest read from each telomere cluster is mapped against the chromosome, and the resulting telomere is used to manually correct the assembly and extend to the telomere using an overlap-layout consensus approach.
Figure 2
Figure 2. (A) Filtered long-read coverage and comparison to previous assembly. Reads longer than 20 kb were mapped against the assembly, filtered (minimum 20,000 base alignment and 50% query coverage), and genome coverage was calculated in 50 kb windows using mosdepth. The colours and ranges bottom-right) describe the coverage depth calculate for each 50 kb window. Newly proposed chromosomes names are indicated on the left (by length). Scaffolds from the previous genome assembly (ASM15095v2) are overlayed as grey bars, aligned using minimap2 in asm5 mode and filtered to retain minimum 10 kb alignments. Numbers on top of gray bars indicate which previous scaffold number, with S representing small “bottom drawer” scaffolds. Horizontal “T” bars on each end indicate telomere-repeat presence. (B) Visualization of proposed chromosome 3 with alignments to previous chromosomes. Dark gray regions indicate overlap.
Coloured arrows on the right indicate minimum overlapping read path (orange = negative strand, blue = positive strand), black arrows on left show ultra-long reads that completely span regions where previous assembly could not assemble through.
Figure 3
Figure 3. Summary of genomic features for chromosome 3.
(A) The density of LTR-retrotransposons as predicted by the EDTA pipeline. (B) The proportion of reads that were called as methylated at each position along the chromosome. (C) Scaffolds from the previous assembly are overlayed in gray bars, with dark grey representing overlapping regions. (D) Filtered long-read coverage (minimum 20 kb length and 70% query coverage). (E) GC content calculated and plotted in 100 base windows. An overlapping read tiling path, with a minimum overlap of 30 kb, is shown with orange indicating reads mapping to the negative strand and blue indicating reads mapping to the positive strand. The region highlighted in red is the window with the lowest GC content.

References

    1. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–2120. doi: 10.1093/bioinformatics/btu170. - DOI - PMC - PubMed
    1. Bowler C, Allen AE, Badger JH, Grimwood J, Jabbari K, Kuo A, Maheswari U, Martens C, Maumus F, Otillar RP, Rayko E, Salamov A, Vandepoele K, Beszteri B, Gruber A, Heijde M, Katinka M, Mock T, Valentin K, Verret F, Berges JA, Brownlee C, Cadoret J-P, Chiovitti A, Choi CJ, Coesel S, De Martino A, Detter JC, Durkin C, Falciatore A, Fournet J, Haruta M, Huysman MJJ, Jenkins BD, Jiroutova K, Jorgensen RE, Joubert Y, Kaplan A, Kröger N, Kroth PG, La Roche J, Lindquist E, Lommer M, Martin-Jézéquel V, Lopez PJ, Lucas S, Mangogna M, McGinnis K, Medlin LK, Montsant A, Oudot-Le Secq M-P, Napoli C, Obornik M, Parker MS, Petit J-L, Porcel BM, Poulsen N, Robison M, Rychlewski L, Rynearson TA, Schmutz J, Shapiro H, Siaut M, Stanley M, Sussman MR, Taylor AR, Vardi A, von Dassow P, Vyverman W, Willis A, Wyrwicz LS, Rokhsar DS, Weissenbach J, Armbrust EV, Green BR, Van de Peer Y, Grigoriev IV. The Phaeodactylum genome reveals the evolutionary history of diatom genomes. Nature. 2008;456(7219):239–244. doi: 10.1038/nature07410. - DOI - PubMed
    1. Buck JM, Bártulos CR, Gruber A, Kroth PG. Blasticidin-S deaminase, a new selection marker for genetic transformation of the diatom Phaeodactylum tricornutum. PeerJ. 2018;6:e5884. doi: 10.7717/peerj.5884. - DOI - PMC - PubMed
    1. Bulankova P, Sekulić M, Jallet D, Nef C, van Oosterhout C, Delmont TO, Vercauteren I, Osuna-Cruz CM, Vancaester E, Mock T, Sabbe K, Daboussi F, Bowler C, Vyverman W, Vandepoele K, De Veylder L. Mitotic recombination between homologous chromosomes drives genomic diversity in diatoms. Current Biology. 2021;31(15):3221–3232. doi: 10.1016/j.cub.2021.05.013. - DOI - PubMed
    1. Butler T, Kapoore RV, Vaidyanathan S. Phaeodactylum tricornutum: a diatom cell factory. Trends in Biotechnology. 2020;38(6):606–622. - PubMed

Publication types

MeSH terms

LinkOut - more resources