Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Oct 14;9(10):R152.
doi: 10.1186/gb-2008-9-10-r152.

Improved genome assembly and evidence-based global gene model set for the chordate Ciona intestinalis: new insight into intron and operon populations

Affiliations

Improved genome assembly and evidence-based global gene model set for the chordate Ciona intestinalis: new insight into intron and operon populations

Yutaka Satou et al. Genome Biol. .

Abstract

Background: The draft genome sequence of the ascidian Ciona intestinalis, along with associated gene models, has been a valuable research resource. However, recently accumulated expressed sequence tag (EST)/cDNA data have revealed numerous inconsistencies with the gene models due in part to intrinsic limitations in gene prediction programs and in part to the fragmented nature of the assembly.

Results: We have prepared a less-fragmented assembly on the basis of scaffold-joining guided by paired-end EST and bacterial artificial chromosome (BAC) sequences, and BAC chromosomal in situ hybridization data. The new assembly (115.2 Mb) is similar in length to the initial assembly (116.7 Mb) but contains 1,272 (approximately 50%) fewer scaffolds. The largest scaffold in the new assembly incorporates 95 initial-assembly scaffolds. In conjunction with the new assembly, we have prepared a greatly improved global gene model set strictly correlated with the extensive currently available EST data. The total gene number (15,254) is similar to that of the initial set (15,582), but the new set includes 3,330 models at genomic sites where none were present in the initial set, and 1,779 models that represent fusions of multiple previously incomplete models. In approximately half, 5'-ends were precisely mapped using 5'-full-length ESTs, an important refinement even in otherwise unchanged models.

Conclusion: Using these new resources, we identify a population of non-canonical (non-GT-AG) introns and also find that approximately 20% of Ciona genes reside in operons and that operons contain a high proportion of single-exon genes. Thus, the present dataset provides an opportunity to analyze the Ciona genome much more precisely than ever.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Concordant identification of linkage between version 1 scaffolds from EST mate pairs, and BAC paired-end sequences. (a) Multiple 5'- and 3'-EST mate pairs identified a linkage between version 1 scaffolds 21 and 103. (b) Paired end sequence data of two independent BAC clones also identified this joined-scaffold linkage. (c) Identification of such linkages and FISH data constitute a larger scaffold representing chromosome 9. This new scaffold includes 61 of version 1 scaffolds. Black and red arrows indicate version 1 scaffolds in leftward and rightward directions. (d) FISH data are used to orient and place tentative joined scaffolds, which are built by EST mate pairs and paired BAC ends, on chromosomes. Left panel: two-color FISH of GECi23_g02 (green) and GECi42_e12 (red) BAC clones, which are mapped onto the same tentative joined scaffold, determines the orientation of this tentative joined-scaffold on the chromosome 9. Right panel: similarly, two-color FISH of GECi45_n13 (green) and GECi42_e12 (red) BAC clones, which are mapped onto different tentative joined-scaffolds, indicates that these two tentative joined scaffolds are in this order on chromosome 9. White arrowheads indicate the centromere.
Figure 2
Figure 2
Improvement of gene models. (a) Improvement of a gene model for Gli, including the joining of two JGI version 1 scaffolds. 5'-ESTs and 3'-ESTs are shown as yellow and purple boxes and EST pairs are connected by dashed lines. Multiple EST pairs indicate that this locus is artifactually split into two version 1 scaffolds. This Gli gene locus was not precisely predicted in the previous studies (exons are indicated by pink boxes and joined by lines). The new gene model (green boxes) precisely coincides with the structure of a cDNA sequence (yellow boxes) and ESTs. (b) The alignment of ESTs and gene models with the genome sequence around the 5'-end of the Gli locus. The 5'-full-length EST shown here has the spliced leader sequence (red letters), which is not aligned with the genome sequence because it is appended to Gli mRNA by trans-splicing. The acceptor dinucleotide for this trans-splicing is shown in red in the genome sequence. Note that only the new model precisely represents the 5'-end of this locus. (c) A gene locus that had not been modeled in previous annotations. Although 5'-ESTs (yellow boxes) and 3'-ESTs (purple boxes) indicate the existence of genes in this region, no previous model sets have included models in this region. Two gene models for this locus were built on the basis of EST evidence.
Figure 3
Figure 3
Operons in the Ciona genome. In the genomic region indicated, 5'-ESTs (yellow boxes) and 3'-ESTs (purple boxes) clearly indicate that there are (a) two and (b) three genes encoded. (Note that the genomic region indicated in (a) is not included in the version 2 genome and there are no version 2 gene models.) Previous models (pink boxes) failed to model these loci precisely and the present study yielded gene models that faithfully reflect cDNA evidence. The lower panel in (a) is a magnification of the region around the intergenic region of this operon and the inset shows corresponding DNA sequences.
Figure 4
Figure 4
Prevalence of single-exon 5'-most genes in Ciona operons. Ratio of genes containing a given number of exons within non-operonic (blue) and operonic (green) gene populations. Red and black lines indicate the ratio within the 5'-most upstream genes encoded in operons and the downstream operonic genes, respectively. Genes with 11 or more exons are not shown in this graph for simplicity. Note that single-exon genes are more prevalent in operons than in the non-operon (monocistronic) gene population, and are especially prevalent among the 5'-most genes of operons.

Similar articles

Cited by

References

    1. Dehal P, Satou Y, Campbell RK, Chapman J, Degnan B, De Tomaso A, Davidson B, Di Gregorio A, Gelpke M, Goodstein DM, Harafuji N, Hastings KE, Ho I, Hotta K, Huang W, Kawashima T, Lemaire P, Martinez D, Meinertzhagen IA, Necula S, Nonaka M, Putnam N, Rash S, Saiga H, Satake M, Terry A, Yamada L, Wang HG, Awazu S, Azumi K, et al. The draft genome of Ciona intestinalis: insights into chordate and vertebrate origins. Science. 2002;298:2157–2167. doi: 10.1126/science.1080049. - DOI - PubMed
    1. Satou Y, Hamaguchi M, Takeuchi K, Hastings KE, Satoh N. Genomic overview of mRNA 5'-leader trans-splicing in the ascidian Ciona intestinalis. Nucleic Acids Res. 2006;34:3378–3388. doi: 10.1093/nar/gkl418. - DOI - PMC - PubMed
    1. Ganot P, Kallesoe T, Reinhardt R, Chourrout D, Thompson EM. Spliced-leader RNA trans splicing in a chordate, Oikopleura dioica, with a compact genome. Mol Cell Biol. 2004;24:7795–7805. doi: 10.1128/MCB.24.17.7795-7805.2004. - DOI - PMC - PubMed
    1. Vandenberghe AE, Meedel TH, Hastings KE. mRNA 5'-leader trans-splicing in the chordates. Genes Dev. 2001;15:294–303. doi: 10.1101/gad.865401. - DOI - PMC - PubMed
    1. The Ciona intestinalis Genome Browser (version 2) http://genome.jgi-psf.org/Cioin2/Cioin2.download.html

Publication types