Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Oct 8;10 Suppl 11(Suppl 11):S8.
doi: 10.1186/1471-2105-10-S11-S8.

Structural and functional-annotation of an equine whole genome oligoarray

Affiliations

Structural and functional-annotation of an equine whole genome oligoarray

Lauren A Bright et al. BMC Bioinformatics. .

Abstract

Background: The horse genome is sequenced, allowing equine researchers to use high-throughput functional genomics platforms such as microarrays; next-generation sequencing for gene expression and proteomics. However, for researchers to derive value from these functional genomics datasets, they must be able to model this data in biologically relevant ways; to do so requires that the equine genome be more fully annotated. There are two interrelated types of genomic annotation: structural and functional. Structural annotation is delineating and demarcating the genomic elements (such as genes, promoters, and regulatory elements). Functional annotation is assigning function to structural elements. The Gene Ontology (GO) is the de facto standard for functional annotation, and is routinely used as a basis for modelling and hypothesis testing, large functional genomics datasets.

Results: An Equine Whole Genome Oligonucleotide (EWGO) array with 21,351 elements was developed at Texas A&M University. This 70-mer oligoarray was designed using the approximately 7 x assembled and annotated sequence of the equine genome to be one of the most comprehensive arrays available for expressed equine sequences. To assist researchers in determining the biological meaning of data derived from this array, we have structurally annotated it by mapping the elements to multiple database accessions, including UniProtKB, Entrez Gene, NRPD (Non-Redundant Protein Database) and UniGene. We next provided GO functional annotations for the gene transcripts represented on this array. Overall, we GO annotated 14,531 gene products (68.1% of the gene products represented on the EWGO array) with 57,912 annotations. GAQ (GO Annotation Quality) scores were calculated for this array both before and after we added GO annotation. The additional annotations improved the meanGAQ score 16-fold. This data is publicly available at AgBase http://www.agbase.msstate.edu/.

Conclusion: Providing additional information about the public databases which link to the gene products represented on the array allows users more flexibility when using gene expression modelling and hypothesis-testing computational tools. Moreover, since different databases provide different types of information, users have access to multiple data sources. In addition, our GO annotation underpins functional modelling for most gene expression analysis tools and enables equine researchers to model large lists of differentially expressed transcripts in biologically relevant ways.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Gene products represented on the equine whole genome array. Array gene products were linked to public databases to facilitate functional modelling. 1.6% of the elements represent experimentally validated products found in UniProtKB or the RefSeq databases while 58.2% are predicted based upon computational structural annotation of the horse genome. 20.6% are predicted genes not available from NCBI and 10.1% are ESTs that are not linked to known or predicted horse genes. A further 9.9% have been removed from the NCBI databases due to structural reannotation.
Figure 2
Figure 2
Functional grouping of equine array gene products using GOSlimViewer. The GO annotation is divided into three broad functional groups using the GOA and whole proteome GOSlim and the GOSlimViewer tool: A. Biological Process, B. Molecular Function, and C. Cellular Component. Further subcategories within functional groups A-C are listed on the y-axis and the frequency of this function within the array is represented on the x-axis. The functional group, "biological process" had the most GO IDs represented, followed by "molecular function," and finally "cellular component." In A, the largest three subcategories were: cellular process, regulation of biological process, and metabolic process. In B, binding was the most annotated function. For C, the top three cell component subcategories were e cell, cell membrane, and cellular component. Particularly significant is the wide display of GO IDs shown, suggesting the equine whole genome array is fairly comprehensive.
Figure 3
Figure 3
GO Annotation Quality (GAQ) score. GAQ Scores were calculated for the existing GO annotation on the array and the GO annotation available after we added the additional annotations described in this paper. GAQ Scores are calculated exactly as described previously [22]. Briefly, GAQ score quantitatively measures GO quality, which includes breadth of GO annotation, the level of detail of annotation (depth), and the type of evidence used to make the annotation. Additional GO improved the meanGAQ score 16-fold, from 1.6 for the pre-existing GO to 26.7 for the completed or additional GO. meanGAQ score for each ontology is shown as well. Cellular component increased 11-fold, from 0.4 to 4.5, biological process increased 16-fold, from 0.5 to 8.1, and molecular function increased 18-fold, or from 0.7 to 13.2.
Figure 4
Figure 4
Flow chart demonstrating the functional annotation process. Functional annotation begins by accession mapping through ArrayIDer. ArrayIDer divides the input file into broad categories: predicted gene products, ESTs, non-NCBI predicted gene assemblies, and UniprotKB or Genbank RefSeq, as well as predicted proteins that were removed from the database. Predicted gene products go down the ISO pipeline, and the rest go through IEA pipelines, with the exception of UniprotKB or RefSeq, which are sent to GORetriever. GORetriever pulls out the genes which already have existing GO annotations, and the rest are manually curated by mapping orthologs to human, mouse, and rat genes.

References

    1. Horse Genome Assembled. http://www.genome.gov/20519480
    1. Lewis SE. Gene Ontology: looking backwards and forwards. Genome biology. 2005;6(1):103. doi: 10.1186/gb-2004-6-1-103. - DOI - PMC - PubMed
    1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT. et al.Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25(1):25–29. doi: 10.1038/75556. - DOI - PMC - PubMed
    1. Hill DP, Smith B, McAndrews-Hill MS, Blake JA. Gene Ontology annotations: what they mean and where they come from. BMC Bioinformatics. 2008;9(Suppl 5):S2. doi: 10.1186/1471-2105-9-S5-S2. - DOI - PMC - PubMed
    1. Barrell D, Dimmer E, Huntley RP, Binns D, O'Donovan C, Apweiler R. The GOA database in 2009 – an integrated Gene Ontology Annotation resource. Nucleic Acids Res. 2009. pp. D396–403. - DOI - PMC - PubMed

Publication types

LinkOut - more resources