Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Dec;145(4):1311-22.
doi: 10.1104/pp.107.104513. Epub 2007 Oct 19.

Identification and characterization of lineage-specific genes within the Poaceae

Affiliations

Identification and characterization of lineage-specific genes within the Poaceae

Matthew A Campbell et al. Plant Physiol. 2007 Dec.

Abstract

Using the rice (Oryza sativa) sp. japonica genome annotation, along with genomic sequence and clustered transcript assemblies from 184 species in the plant kingdom, we have identified a set of 861 rice genes that are evolutionarily conserved among six diverse species within the Poaceae yet lack significant sequence similarity with plant species outside the Poaceae. This set of evolutionarily conserved and lineage-specific rice genes is termed conserved Poaceae-specific genes (CPSGs) to reflect the presence of significant sequence similarity across three separate Poaceae subfamilies. The vast majority of rice CPSGs (86.6%) encode proteins with no putative function or functionally characterized protein domain. For the remaining CPSGs, 8.8% encode an F-box domain-containing protein and 4.5% encode a protein with a putative function. On average, the CPSGs have fewer exons, shorter total gene length, and elevated GC content when compared with genes annotated as either transposable elements (TEs) or those genes having significant sequence similarity in a species outside the Poaceae. Multiple sequence alignments of the CPSGs with sequences from other Poaceae species show conservation across a putative domain, a novel domain, or the entire coding length of the protein. At the genome level, syntenic alignments between sorghum (Sorghum bicolor) and 103 of the 861 rice CPSGs (12.0%) could be made, demonstrating an additional level of conservation for this set of genes within the Poaceae. The extensive sequence similarity in evolutionarily distinct species within the Poaceae family and an additional screen for TE-related structural characteristics and sequence discounts these CPSGs as being misannotated TEs. Collectively, these data confirm that we have identified a specific set of genes that are highly conserved within, as well as specific to, the Poaceae.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Diagram illustrating the strategy to identify the CPSG, NH, and SH gene sets. Shown is the filtering strategy employed to identify the NH and SH sets of genes from the total set of non-TE rice genes using genomic sequence, genome annotations, and TAs. The dotted boxes reflect the identification of the SH set. The TAs are clustered into six phylogenetic groupings: (1) eudicots; (2) nasal Magnoliophyta; (3) conifers; (4) non-Poaceae monocots; (5) Poaceae; and (6) other plants. The rice TAs were excluded from this analysis so the total number of TAs represents 184 plant species.
Figure 2.
Figure 2.
Histograms showing the GC content for the CPSG, SH, and TE sets broken into bins of 10%. A, The histograms show the whole gene (including untranslated regions, exons, and introns) GC content. B, The histograms show the coding sequence GC content.
Figure 3.
Figure 3.
MSAs of the CPSGs with translated ORFs from TAs or cDNAs from other Poaceae species. A, LOC_Os01g01970. B, LOC_Os01g37670.
Figure 4.
Figure 4.
Synteny between regions encoding rice CPSGs and sorghum. A, LOC_Os03g01740. B, LOC_Os02g37610. C, LOC_Os06g02410.

References

    1. Allen KD (2002) Assaying gene content in Arabidopsis. Proc Natl Acad Sci USA 99 9568–9572 - PMC - PubMed
    1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25 3389–3402 - PMC - PubMed
    1. Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408 796–813 - PubMed
    1. Bao Z, Eddy SR (2002) Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res 12 1269–1276 - PMC - PubMed
    1. Bedell JA, Budiman MA, Nunberg A, Citek RW, Robbins D, Jones J, Flick E, Rholfing T, Fries J, Bradford K, et al (2005) Sorghum genome sequencing by methyl filtration. PLoS Biol 3 e13. - PMC - PubMed

Publication types

LinkOut - more resources