Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2005 Feb;137(2):500-13.
doi: 10.1104/pp.104.052829. Epub 2005 Jan 21.

Comparative genomics of the pennate diatom Phaeodactylum tricornutum

Affiliations
Comparative Study

Comparative genomics of the pennate diatom Phaeodactylum tricornutum

Anton Montsant et al. Plant Physiol. 2005 Feb.

Abstract

Diatoms are one of the most important constituents of phytoplankton communities in aquatic environments, but in spite of this, only recently have large-scale diatom-sequencing projects been undertaken. With the genome of the centric species Thalassiosira pseudonana available since mid-2004, accumulating sequence information for a pennate model species appears a natural subsequent aim. We have generated over 12,000 expressed sequence tags (ESTs) from the pennate diatom Phaeodactylum tricornutum, and upon assembly into a nonredundant set, 5,108 sequences were obtained. Significant similarity (E < 1E-04) to entries in the GenBank nonredundant protein database, the COG profile database, and the Pfam protein domains database were detected, respectively, in 45.0%, 21.5%, and 37.1% of the nonredundant collection of sequences. This information was employed to functionally annotate the P. tricornutum nonredundant set and to create an internet-accessible queryable diatom EST database. The nonredundant collection was then compared to the putative complete proteomes of the green alga Chlamydomonas reinhardtii, the red alga Cyanidioschyzon merolae, and the centric diatom T. pseudonana. A number of intriguing differences were identified between the pennate and the centric diatoms concerning activities of relevance for general cell metabolism, e.g. genes involved in carbon-concentrating mechanisms, cytosolic acetyl-Coenzyme A production, and fructose-1,6-bisphosphate metabolism. Finally, codon usage and utilization of C and G relative to gene expression (as measured by EST redundance) were studied, and preferences for utilization of C and CpG doublets were noted among the P. tricornutum EST coding sequences.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Sorting of the P. tricornutum (Pt) nonredundant EST set into categories according to the presence of putative orthologs in either of three microalgal genomes or combinations of them. Two E-value thresholds were applied, one ensuring a low error rate when classifying sequences as absent (E < 1E-04; Fig. 1A) and one minimizing the error when defining sequences as present (E < 1E-30; Fig. 1B). See Tables II to IV for recognizable proteins within the subsets of sequences encoding peptides similar to C. merolae (Cm) and/or C. reinhardtii (Cr) predicted proteins but with no similarity to T. pseudonana (Tp) predicted proteins, and Supplemental Table II for sequences with similarities in T. pseudonana but not in C. merolae or C. reinhardtii. The predicted proteomes of Cm, Cr, and Tp were downloaded from their respective genome browsers. Total numbers of predicted proteins (NRS sequences for Pt) are indicated in parentheses.
Figure 2.
Figure 2.
Similarity of P. tricornutum (Pt) NRS sequences to related sequences in C. merolae (Cm) and C. reinhardtii (Cr). Average score of 108 pairwise alignments between P. tricornutum translated ESTs and green or red algal proteins that are most highly conserved (>40% ID over at least 100 amino acids) in the 4 species represented in Figure 1. Error bars indicate se.
Figure 3.
Figure 3.
Phylogenetic analysis of putative β-type carbonic anhydrases (CAs) from P. tricornutum. A neighbor-joining tree is shown of putative or described β-type CAs from several lineages within the eukarya and eubacteria, including the two P. tricornutum β-type CA-like NRS sequences (in bold). The sequences were selected to represent 5 out of the 6 major clades of β-type CAs that contain photosynthetic or eukaryotic orthologs (A, B, C, monocots, and dicots; Smith et al., 1999) and trimmed to a conserved core of 191 amino acids. The Mycobacterium tuberculosis sequence, a distant ortholog of the sixth clade D, was chosen to root the tree. Bootstrap values above 70% (of 1,000 replicates) are shown. The GenBank GI sequence identifiers of the proteins used are shown following the species name. Members of the Rhodophyta (Rp), Cyanobacteria (Cb), and Viridiplantae (Vp) are indicated. Scale bar = 0.1 substitutions/site.
Figure 4.
Figure 4.
Phylogenetic analysis of putative ATP:citrate lyases (ACLs) from P. tricornutum. The figure shows a neighbor-joining tree of putative or described ACLs representing the major lineages in which ACL-like genes are known, including the most highly conserved ACL-like sequence in P. tricornutum (in bold). The sequences were trimmed to a conserved core of 185 amino acids. The GenBank GI sequence identifiers of the proteins used are shown following the species name. The C. reinhardtii and C. merolae sequences were obtained from their respective genome browsers rather than from GenBank and their predicted gene identification numbers are given. Bootstrap values above 70% (of 1,000 replicates) are shown. Members of the Animalia (An), Fungi (Fu), Glaucocystophyta (Gp), Rhodophyta (Rp), and Viridiplantae (Vp) are indicated. The Chlorobium limicola ACL is the sole prokaryotic ortholog known to date and was used to root the tree. The arrow indicates the time of fusion of α- and β-ACLs into a single gene as proposed by Fatland et al. (2002). Scale bar = 0.1 substitutions/site.
Figure 5.
Figure 5.
Correlation of expression and %C3 content of P. tricornutum transcripts. P. tricornutum NRS sequences with soundly assigned frames were classified by increasing %C3, divided into 3 groups with equal numbers of sequences, and their average redundancy within the redundant EST collection was calculated. Bars indicate the se.

References

    1. Akashi H (2001) Gene expression and molecular evolution. Curr Opin Genet Dev 11: 660–666 - PubMed
    1. Altschul SF, Madden TL, Schaffer AA, Zhang JH, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389–3402 - PMC - PubMed
    1. Andersson SGE, Kurland CG (1990) Codon preferences in free-living microorganisms. Microbiol Rev 54: 198–210 - PMC - PubMed
    1. Apt KE, Kroth-Pancic PG, Grossman AR (1996) Stable nuclear transformation of the diatom Phaeodactylum tricornutum. Mol Gen Genet 252: 572–579 - PubMed
    1. Apt KE, Zaslavkaia L, Lippmeier JC, Lang M, Kilian O, Wetherbee R, Grossman AR, Kroth PG (2002) In vivo characterization of diatom multipartite plastid targeting signals. J Cell Sci 115: 4061–4069 - PubMed

Publication types

LinkOut - more resources