Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2009 Jun 15;25(12):1476-83.
doi: 10.1093/bioinformatics/btp247. Epub 2009 Apr 8.

Cross species analysis of microarray expression data

Affiliations
Review

Cross species analysis of microarray expression data

Yong Lu et al. Bioinformatics. .

Abstract

Motivation: Many biological systems operate in a similar manner across a large number of species or conditions. Cross-species analysis of sequence and interaction data is often applied to determine the function of new genes. In contrast to these static measurements, microarrays measure the dynamic, condition-specific response of complex biological systems. The recent exponential growth in microarray expression datasets allows researchers to combine expression experiments from multiple species to identify genes that are not only conserved in sequence but also operated in a similar way in the different species studied.

Results: In this review we discuss the computational and technical challenges associated with these studies, the approaches that have been developed to address these challenges and the advantages of cross-species analysis of microarray data. We show how successful application of these methods lead to insights that cannot be obtained when analyzing data from a single species. We also highlight current open problems and discuss possible ways to address them.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Exponential growth of microarray datasets. Left: growth in the number of sequences deposited in Genbank in the 90s. Right: growth in microarray datasets deposited in GEO in the last decade. Note the exponential growth in both datasets during these times. One of the primary uses of sequence databases is to search for similar genes in other species. Similarly with the growth in microarray datasets applications that can combine data across species may lead to new insights into biological systems that are activated in multiple species.
Fig. 2.
Fig. 2.
Strategies for combining and comparing microarray datasets from multiple species. Left: expression meta-analysis. Samples from each species are hybridized to different arrays and each array is independently analyzed to identify differentially expressed genes. Lists of such genes are later compared to identify the overlap. Middle: using the same array for all species. Samples from all species are hybridized to the same array and all arrays are analyzed using the same method. The list of differentially expressed probes can then be compared. Note that this method can only be used to compare closely related species. Right: concurrent analysis of expression data. Samples from each species are hybridized to separate arrays but are analyzed together so that homologs can be used to improve the assignment of genes.
Fig. 3.
Fig. 3.
Visual depiction of coexpression meta-analysis methods for multiple species. Top: the metagene analysis of Stuart et al. (2003) first finds strictly orthologous genes in the species (a), and measures the amount of coexpression between genes within each species. Then pairs of metagenes are defined to be coexpressed (b) if their constituent gene pairs are sufficiently coexpressed. Bottom: bicluserting methods (Bergman et al., ; Lu et al., 2007b). Given a set of coexpressed genes in species 1 (a), orthologous genes are identified in species 2 which are coexpressed in species 2 (b). The set of genes in species 2 can then be extended (c) to include additional coexpressed genes.
Fig. 4.
Fig. 4.
Concurrent analysis of microarray data from multiple species (Lu et al., 2007a). Sequence data (left) and expression data (bottom right) are used to construct a graph in which gene expression is represented by nodes and edges represent homology (top right). Information is propagated along edges in this graph allowing genes to influence the assignment of their homologs. This can be used to elevate a borderline gene if many of its homologs are strongly expressed overcoming problems related to noise and treatment differences. The set of edges can be derived from a curated database or from sequence analysis methods including BLAST.

References

    1. Alexander PA, et al. The design and characterization of two proteins with 88% sequence identity but different structure and function. Proc. Natl Acad. Sci. USA. 2007;104:11963–11968. - PMC - PubMed
    1. Alter O, et al. Generalized singular value decomposition for comparative analysis of genome-scale expression datasets of two different organisms. Proc. Natl Acad. Sci. USA. 2003;100:3351–3356. - PMC - PubMed
    1. Altschul SF, et al. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. - PubMed
    1. Arbeitman MN, et al. Gene expression during the life cycle of Drosophila melanogaster. Science. 2002;298:2270–2275. - PubMed
    1. Bergmann S, et al. Similarities and differences in genome-wide expression data of six organisms. PLoS Biol. 2004;2:86–93. - PMC - PubMed

Publication types