Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Mar;16(3):441-50.
doi: 10.1101/gr.4602906. Epub 2006 Feb 14.

A global assembly of cotton ESTs

Affiliations

A global assembly of cotton ESTs

Joshua A Udall et al. Genome Res. 2006 Mar.

Abstract

Approximately 185,000 Gossypium EST sequences comprising >94,800,000 nucleotides were amassed from 30 cDNA libraries constructed from a variety of tissues and organs under a range of conditions, including drought stress and pathogen challenges. These libraries were derived from allopolyploid cotton (Gossypium hirsutum; A(T) and D(T) genomes) as well as its two diploid progenitors, Gossypium arboreum (A genome) and Gossypium raimondii (D genome). ESTs were assembled using the Program for Assembling and Viewing ESTs (PAVE), resulting in 22,030 contigs and 29,077 singletons (51,107 unigenes). Further comparisons among the singletons and contigs led to recognition of 33,665 exemplar sequences that represent a nonredundant set of putative Gossypium genes containing partial or full-length coding regions and usually one or two UTRs. The assembly, along with their UniProt BLASTX hits, GO annotation, and Pfam analysis results, are freely accessible as a public resource for cotton genomics. Because ESTs from diploid and allotetraploid Gossypium were combined in a single assembly, we were in many cases able to bioinformatically distinguish duplicated genes in allotetraploid cotton and assign them to either the A or D genome. The assembly and associated information provide a framework for future investigation of cotton functional and evolutionary genomics.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Histogram of number of EST members in a contig. Different patterns and shading of the bars indicate contigs composed of ESTs from a single species and those derived from ESTs from more than one species. Contigs with more than 100 EST members are not illustrated.
Figure 2.
Figure 2.
A framework to investigate the genomes of domesticated cotton species. The progenitor genomes of allopolyploid cotton (including G. hirsutum, AD genome) are represented by diploid A-genome (G. arboreum) and D-genome (G. raimondii) lineages, which united ∼1–2 million years ago. Nucleotide sequence divergence between diploid A and D genomes (or their corresponding descendants in the allopolyploid) is ∼4% (Senchina et al. 2003; OG = Outgroup). Shown also are the number of ESTs derived from each of the three species used in the assembly.
Figure 3.
Figure 3.
The top 25 categories of protein domains as identified by Pfam analysis of the exemplar sequences. The total bar height indicates the number of exemplar sequences containing each domain. The height of the solid area indicates the number of exemplar sequences that had a Pfam annotation but no significant BLASTX hit or gene ontology information. Categories with <33 members are not shown.

References

    1. Adams, K.L., Cronn, R., Percifield, R., and Wendel, J.F. 2003. Genes duplicated by polyploidy show unequal contributions to the transcriptome and organ-specific reciprocal silencing. Proc. Natl. Acad. Sci. 100 4649–4654. - PMC - PubMed
    1. Adams, K.L., Percifield, R., and Wendel, J.F. 2004. Organ-specific silencing of duplicated genes in a newly synthesized cotton allotetraploid. Genetics 168 2217–2226. - PMC - PubMed
    1. Alba, R., Fei, Z., Payton, P., Liu, Y., Moore, S.L., Debbie, P., Cohn, J., D'Ascenzo, M., Gordon, J.S., Rose, J.K.C., et al. 2004. ESTs, cDNA microarrays, and gene expression profiling: Tools for dissecting plant physiology and development. Plant J. 39 697–714. - PubMed
    1. Apweiler, R., Bairoch, A., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., et al. 2004. UniProt: The universal protein knowledgebase. Nucleic Acids Res. 32 D115–D119. - PMC - PubMed
    1. The Arabidopsis Genome Initiative. 2000. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408 796–815. - PubMed

Publication types

Substances

Associated data