Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Feb 18:11:47.
doi: 10.1186/1471-2148-11-47.

Evolutionary origins of Brassicaceae specific genes in Arabidopsis thaliana

Affiliations

Evolutionary origins of Brassicaceae specific genes in Arabidopsis thaliana

Mark Ta Donoghue et al. BMC Evol Biol. .

Abstract

Background: All sequenced genomes contain a proportion of lineage-specific genes, which exhibit no sequence similarity to any genes outside the lineage. Despite their prevalence, the origins and functions of most lineage-specific genes remain largely unknown. As more genomes are sequenced opportunities for understanding evolutionary origins and functions of lineage-specific genes are increasing.

Results: This study provides a comprehensive analysis of the origins of lineage-specific genes (LSGs) in Arabidopsis thaliana that are restricted to the Brassicaceae family. In this study, lineage-specific genes within the nuclear (1761 genes) and mitochondrial (28 genes) genomes are identified. The evolutionary origins of two thirds of the lineage-specific genes within the Arabidopsis thaliana genome are also identified. Almost a quarter of lineage-specific genes originate from non-lineage-specific paralogs, while the origins of ~10% of lineage-specific genes are partly derived from DNA exapted from transposable elements (twice the proportion observed for non-lineage-specific genes). Lineage-specific genes are also enriched in genes that have overlapping CDS, which is consistent with such novel genes arising from overprinting. Over half of the subset of the 958 lineage-specific genes found only in Arabidopsis thaliana have alignments to intergenic regions in Arabidopsis lyrata, consistent with either de novo origination or differential gene loss and retention, with both evolutionary scenarios explaining the lineage-specific status of these genes. A smaller number of lineage-specific genes with an incomplete open reading frame across different Arabidopsis thaliana accessions are further identified as accession-specific genes, most likely of recent origin in Arabidopsis thaliana. Putative de novo origination for two of the Arabidopsis thaliana-only genes is identified via additional sequencing across accessions of Arabidopsis thaliana and closely related sister species lineages. We demonstrate that lineage-specific genes have high tissue specificity and low expression levels across multiple tissues and developmental stages. Finally, stress responsiveness is identified as a distinct feature of Brassicaceae-specific genes; where these LSGs are enriched for genes responsive to a wide range of abiotic stresses.

Conclusion: Improving our understanding of the origins of lineage-specific genes is key to gaining insights regarding how novel genes can arise and acquire functionality in different lineages. This study comprehensively identifies all of the Brassicaceae-specific genes in Arabidopsis thaliana and identifies how the majority of such lineage-specific genes have arisen. The analysis allows the relative importance (and prevalence) of different evolutionary routes to the genesis of novel ORFs within lineages to be assessed. Insights regarding the functional roles of lineage-specific genes are further advanced through identification of enrichment for stress responsiveness in lineage-specific genes, highlighting their likely importance for environmental adaptation strategies.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Summary of evidence for evolutionary origins of Arabidopsis thaliana lineage-specific genes. The number of LSGs that fit each evolutionary scenario tested, plus the number of LSGs without elucidated origins. Support for gene model expression provided by an EST or cDNA consistent with the of gene model (as listed by TAIR). Support of expression at the locus provided by EST, cDNA or microarray probeset (TAIR and Ath1 affymetrix microarray).
Figure 2
Figure 2
Summary of transposon exaptation frequency in LSG and non-LSG CDS. a) The frequency of exaptation of each TE super-family for all genes; split into LSG and non-LSGs. b) Close up view of a) where there is evidence of TE exaptation in LSGs and non-LSGs. c) The frequency of each TE super-family exapted in those genes containing exapted TE DNA, split into LSG and non-LSGs. Note, some genes have exapted DNA from several super-families, each case is reported therefore the total percentage is marginally over 100% to reflect this.
Figure 3
Figure 3
Distribution of percentage coverage for LSG CDS vs. Arabidopsis lyrata intergenic region alignments. Where percentage coverage is greater than 100% indicates a gap in the alignment in the LSG, indicating an indel in the LSG in Arabidopsis thaliana.
Figure 4
Figure 4
Distribution of SNPs causing interruptions to the ORFs of LSGs in various Arabidopsis thaliana accessions. Predicted SNPs predicted by Perlegen re-sequencing data sets that cause the interruption of an LSG ORF. Left hand axis: lists the gene model and the SNP name. Left hand column: Lists the reference nucleotide found in the Columbia accession (Col-0). Main body of table: SNPs causing a missing of a start codon (MSC) are coloured orange. SNPs causing an internal stop codon (ISC) are coloured green. Missing data (i.e. when the nucleotide is undetermined at that position) coloured gray. Only SNPs causing ISC or MSC are annotated. Bottom axis: lists the accessions tested, they are divided by broad geographical distinctions; i.e. red = Northern Europe, blue = Central Europe, purple = Mediterranean, orange = British Isles, yellow = Central Asia, brown = Japan, pink = North America and gray = Cape Verde Islands. Right hand columns: First column represents the SNP data at the position for the intergenic alignment between the LSGs and intergenic regions in Arabidopsis lyrata. SNP types marked the same as the main table with the addition of an "X" representing those instances were no alignment was identified in Arabidopsis lyrata. The second column represent the total number of ISC and indels found in the aligned Arabidopsis lyrata sequence. The final column represents the proportion of the LSG that is covered by the Arabidopsis lyrata alignment.
Figure 5
Figure 5
Tissue expression patterns in LSGs and non-LSGs. a) Distribution of the number of tissues/developmental-stage each gene called as present (expressed) in the AtGenExpress developmental series microarray experiment (see methods). White bars represent all representative gene models tested. Light gray bars represent non-LSGs. Dark gray bars represent LSGs. b) Distribution of log2 median expression of genes called as present for each tissue/developmental stage in the AtGenExpress developmental series microarray experiment.
Figure 6
Figure 6
Summary of the origins of stress responsive LSGs.

References

    1. Fischer D, Eisenberg D. Finding families for genomic ORFans. Bioinformatics. 1999;15:759–762. doi: 10.1093/bioinformatics/15.9.759. - DOI - PubMed
    1. Wilson GA, Bertrand N, Patel Y, Hughes JB, Feil EJ, Field D. Orphans as taxonomically restricted and ecologically important genes. Microbiology. 2005;151:2499–2501. doi: 10.1099/mic.0.28146-0. - DOI - PubMed
    1. Schmid K, Aquadro C. The evolutionary analysis of "orphans" from the Drosophila genome identifies rapidly diverging and incorrectly annotated genes. Genetics. 2001;159 - PMC - PubMed
    1. Khalturin K, Hemmrich G, Fraune S, Augustin R, Bosch TC. More than just orphans: are taxonomically-restricted genes important in evolution? Trends Genet. 2009;25:404–413. doi: 10.1016/j.tig.2009.07.006. - DOI - PubMed
    1. Wilson GA, Feil EJ, Lilley AK, Field D. Large-scale comparative genomic ranking of taxonomically restricted genes (TRGs) in bacterial and archaeal genomes. PLoS ONE. 2007;2:e324. doi: 10.1371/journal.pone.0000324. - DOI - PMC - PubMed

Publication types