Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Mar 22:6:43.
doi: 10.1186/1471-2164-6-43.

Protein encoding genes in an ancient plant: analysis of codon usage, retained genes and splice sites in a moss, Physcomitrella patens

Affiliations

Protein encoding genes in an ancient plant: analysis of codon usage, retained genes and splice sites in a moss, Physcomitrella patens

Stefan A Rensing et al. BMC Genomics. .

Abstract

Background: The moss Physcomitrella patens is an emerging plant model system due to its high rate of homologous recombination, haploidy, simple body plan, physiological properties as well as phylogenetic position. Available EST data was clustered and assembled, and provided the basis for a genome-wide analysis of protein encoding genes.

Results: We have clustered and assembled Physcomitrella patens EST and CDS data in order to represent the transcriptome of this non-seed plant. Clustering of the publicly available data and subsequent prediction resulted in a total of 19,081 non-redundant ORF. Of these putative transcripts, approximately 30% have a homolog in both rice and Arabidopsis transcriptome. More than 130 transcripts are not present in seed plants but can be found in other kingdoms. These potential "retained genes" might have been lost during seed plant evolution. Functional annotation of these genes reveals unequal distribution among taxonomic groups and intriguing putative functions such as cytotoxicity and nucleic acid repair. Whereas introns in the moss are larger on average than in the seed plant Arabidopsis thaliana, position and amount of introns are approximately the same. Contrary to Arabidopsis, where CDS contain on average 44% G/C, in Physcomitrella the average G/C content is 50%. Interestingly, moss orthologs of Arabidopsis genes show a significant drift of codon fraction usage, towards the seed plant. While averaged codon bias is the same in Physcomitrella and Arabidopsis, the distribution pattern is different, with 15% of moss genes being unbiased. Species-specific, sensitive and selective splice site prediction for Physcomitrella has been developed using a dataset of 368 donor and acceptor sites, utilizing a support vector machine. The prediction accuracy is better than those achieved with tools trained on Arabidopsis data.

Conclusion: Analysis of the moss transcriptome displays differences in gene structure, codon and splice site usage in comparison with the seed plant Arabidopsis. Putative retained genes exhibit possible functions that might explain the peculiar physiological properties of mosses. Both the transcriptome representation (including a BLAST and retrieval service) and splice site prediction have been made available on http://www.cosmoss.org, setting the basis for assembly and annotation of the Physcomitrella genome, of which draft shotgun sequences will become available in 2005.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Comparative BLAST searches between Arabidopsis, rice and moss. Comparative BLAST searches of the Arabidopsis (At, yellow), rice (Os, cyan) and Physcomitrella (Pp, green) transcriptomes. Each search was done with the respective sets once as query and once as search space (subject). The area of the circles represents the percentage of the query/subject sequence space that yielded filtered hits.
Figure 2
Figure 2
BLAST hits of Physcomitrella protein genes against the taxprot dataset. a) Absolute number of hits against different taxonomic groups. b) Amount of non-redundant hits as percentage of the respective sequence space.
Figure 3
Figure 3
Mapping of Physcomitrella transcripts to the Arabidopsis chromosomes. Mapping of filtered BLAST hits (grey), paralogs (red) and orthologs (green) against the five Arabidopsis chromosomes (left to right / top to bottom). a) Hits per Mbp; error bars: average absolute deviation (AAD); column 6: mean values. b) Graphical representation using a finer granularity (100 kbp), each vertical step represents one hit.
Figure 4
Figure 4
Retained genes in moss: taxonomic distribution and functional categories. a) Physcomitrella transcripts which have their best BLAST hit not among plants, divided by taxonomic category, further subdivided into specific hits (unique to a single taxonomic group – yellow) and those that could be assigned a putative function by means of homology searches (green). b) Distribution of functional categories among those taxonomic groups that yielded unique hits.
Figure 5
Figure 5
Splice site sequence logos and efficiency of splice site prediction. a) Sequence logos of Physcomitrella donor and acceptor sites. b) Prediction performance of Netplantgene and svmsplice for Physcomitrella splice sites. TP = true positive, FN = false negative, FP = false positive, measured on the lefthand (%) axis. Recall (sensitivity) = tp/(tp+fn), precision = tp/(tp+fp), measured on the righthand axis.
Figure 6
Figure 6
Trinucleotide frequencies and codon usage. a) The averaged Physcomitrella codon fraction usage measured as percentage of the total amount of counted codons is shown as grey diamonds, including a margin of 2× average absolute deviation (AAD, error bars), in comparison with Arabidopsis (yellow circles). Significantly deviating codons of the sequence subsets are presented as colored circles, namely retained genes (blue), paralogs (red) and orthologs (green). b) The effective number of codons (enc) for Physcomitrella (green) and Arabidopsis (yellow) as a range distribution scatter plot (y axis: % of analysed genes) and as averaged values (horizontal bar chart; error bars: standard deviation).

References

    1. Theissen G, Münster T, Henschel K. Why don't mosses flower? New Phytologist. 2001;150:1–8. doi: 10.1046/j.1469-8137.2001.00089.x. - DOI
    1. Miller ND. Tertiary and quarternary fossils. In: Schuster RM, editor. New manual of Bryology. Vol. 2. Miyazaki: Hattori Bot Lab; 1984. pp. 1194–1232.
    1. Frahm J-P. Moose – lebende Fossilien. BuZ. 1994;24:120–124.
    1. Chiang TY, Schaal BA. Molecular evolution and phylogeny of the atpB-rbcL spacer of chloroplast DNA in the true mosses. Genome. 2000;43:417–426. doi: 10.1139/gen-43-3-417. - DOI - PubMed
    1. Hohe A, Egener T, Lucht JM, Holtorf H, Reinhard C, Schween G, Reski R. An improved and highly standardised transformation procedure allows efficient production of single and multiple targeted gene-knockouts in a moss, Physcomitrella patens. Curr Genet. 2004;44:339–347. doi: 10.1007/s00294-003-0458-4. - DOI - PubMed

MeSH terms