Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Sep 30;10(9):e0139075.
doi: 10.1371/journal.pone.0139075. eCollection 2015.

The Prediction and Validation of Small CDSs Expand the Gene Repertoire of the Smallest Known Eukaryotic Genomes

Affiliations

The Prediction and Validation of Small CDSs Expand the Gene Repertoire of the Smallest Known Eukaryotic Genomes

Abdel Belkorchia et al. PLoS One. .

Abstract

The proper prediction of the gene catalogue of an organism is essential to obtain a representative snapshot of its overall lifestyle, especially when it is not amenable to culturing. Microsporidia are obligate intracellular, sometimes hard to culture, eukaryotic parasites known to infect members of every animal phylum. To date, sequencing and annotation of microsporidian genomes have revealed a poor gene complement with highly reduced gene sizes. In the present paper, we investigated whether such gene sizes may have induced biases for the methodologies used for genome annotation, with an emphasis on small coding sequence (CDS) gene prediction. Using better delineated intergenic regions from four Encephalitozoon genomes, we predicted de novo new small CDSs with sizes ranging from 78 to 255 bp (median 168) and corroborated these predictions by RACE-PCR experiments in Encephalitozoon cuniculi. Most of the newly found genes are present in other distantly related microsporidian species, suggesting their biological relevance. The present study provides a better framework for annotating microsporidian genomes and to train and evaluate new computational methods dedicated at detecting ultra-small genes in various organisms.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Example of the genomic context of previously annotated genes and newly-identified sCDSs in Encephalitozoon genomes.
The transcriptional signals of the newly predicted genes are highlighted in red (promoter signal) and green (polyadenylation signal), respectively. The putative polyadenylation signals of the genes flanking the new sCDSs are highlighted in light blue.
Fig 2
Fig 2. Validation example of the newly predicted orthologs using both protein and nucleotide sequence alignments.
Protein and nucleotide alignments were performed using MUSCLE and Clustal Omega, respectively.
Fig 3
Fig 3. Identification of the 5' and 3' maturation sites of the newly predicted small CDSs.
Translation initiation codons and stop codons are highlighted in light-grey for all genes. Putative polyadenylation signals are underlined and highlighted in bold characters. Distances between putative polyadenylation signals and polyadenylation sites are indicated between parentheses. Putative microsporidian promoter specific signals, located upstream the transcription start sites, are highlighted in dark grey. For brevity, the complete CDS sequences were not included and are represented instead by the corresponding gene names. ND; Not Defined.
Fig 4
Fig 4. Phylogenetic distribution of the newly predicted small protein-coding genes across 17 sequenced microsporidian species.
Left: The HKY85 Maximum Likelihood phylogenetic tree shown here is derived from the small ribosomal RNA-encoding gene. Bootstrap support for each cluster is indicated on the corresponding nodes; only bootstraps greater than 50% are indicated. Right: The presence/absence of the newly identified sCDSs in the corresponding species are denoted by filled and empty circles, respectively. The two grey circles indicate genes that fall within unsequenced regions in the E. intestinalis and E. hellem genomes and whose presence could not be confirmed. Locus names of the new sCDSs (on top) are derived from the E. cuniculi accessions.

Similar articles

Cited by

References

    1. McHardy AC. Finding genes in genome sequence. Methods Mol Biol. 2008;452:163–77. 10.1007/978-1-60327-159-2_8 . - DOI - PubMed
    1. Warren AS, Archuleta J, Feng WC, Setubal JC. Missing genes in the annotation of prokaryotic genomes. BMC Bioinformatics. 2010;11:131 10.1186/1471-2105-11-131 . - DOI - PMC - PubMed
    1. Cheng H, Chan WS, Li Z, Wang D, Liu S, Zhou Y. Small open reading frames: current prediction techniques and future prospect. Curr Protein Pept Sci. 2011;12(6):503–7. doi: CPPS-143 [pii]. . - PMC - PubMed
    1. Yang X, Tschaplinski TJ, Hurst GB, Jawdy S, Abraham PE, Lankford PK, et al. Discovery and annotation of small proteins using genomics, proteomics, and computational approaches. Genome Res. 2011;21(4):634–41. 10.1101/gr.109280.110 . - DOI - PMC - PubMed
    1. Brent MR. Steady progress and recent breakthroughs in the accuracy of automated genome annotation. Nat Rev Genet. 2008;9(1):62–73. doi: nrg2220 [pii]. . - PubMed

Publication types

Substances

LinkOut - more resources