Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;14 Suppl 14(Suppl 14):S4.
doi: 10.1186/1471-2105-14-S14-S4. Epub 2013 Oct 9.

Towards the integration, annotation and association of historical microarray experiments with RNA-seq

Towards the integration, annotation and association of historical microarray experiments with RNA-seq

Shweta S Chavan et al. BMC Bioinformatics. 2013.

Abstract

Background: Transcriptome analysis by microarrays has produced important advances in biomedicine. For instance in multiple myeloma (MM), microarray approaches led to the development of an effective disease subtyping via cluster assignment, and a 70 gene risk score. Both enabled an improved molecular understanding of MM, and have provided prognostic information for the purposes of clinical management. Many researchers are now transitioning to Next Generation Sequencing (NGS) approaches and RNA-seq in particular, due to its discovery-based nature, improved sensitivity, and dynamic range. Additionally, RNA-seq allows for the analysis of gene isoforms, splice variants, and novel gene fusions. Given the voluminous amounts of historical microarray data, there is now a need to associate and integrate microarray and RNA-seq data via advanced bioinformatic approaches.

Methods: Custom software was developed following a model-view-controller (MVC) approach to integrate Affymetrix probe set-IDs, and gene annotation information from a variety of sources. The tool/approach employs an assortment of strategies to integrate, cross reference, and associate microarray and RNA-seq datasets.

Results: Output from a variety of transcriptome reconstruction and quantitation tools (e.g., Cufflinks) can be directly integrated, and/or associated with Affymetrix probe set data, as well as necessary gene identifiers and/or symbols from a diversity of sources. Strategies are employed to maximize the annotation and cross referencing process. Custom gene sets (e.g., MM 70 risk score (GEP-70)) can be specified, and the tool can be directly assimilated into an RNA-seq pipeline.

Conclusion: A novel bioinformatic approach to aid in the facilitation of both annotation and association of historic microarray data, in conjunction with richer RNA-seq data, is now assisting with the study of MM cancer biology.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Affymetrix to NGS association tool algorithm. Associations between Affymetrix probe setIDs and Ensembl gene identifiers as well as gene symbols are established by either a direct or indirect method.
Figure 2
Figure 2
Design of the NGS association system. A model-view-controller (MVC) software engineering architecture methodology was utilized to promote code reusability, modularity, and extensibility.
Figure 3
Figure 3
Interface for the NGS association system. Both web browser and command line interfaces are supported. The web interface is dynamic and takes as input the output of Cufflinks or similar tool. The NGS association system is independent of any particular transcriptome reconstruction or quantitation tool or file format. Input formatting solely consists of an Ensembl Gene ID in the first column, and gene symbol or gene accession number in the second column, respectively. Additional columns/variables are able to be imported and conveyed in a seamless manner. Using association algorithms, the aim is to link Gene IDs or GeneBank accessions, and Gene symbols with corresponding Affy probe set-ID(s).
Figure 4
Figure 4
Interface options for the NGS association system. A biologically directed filtering through large RNA-seq data sets allows the researcher to quickly leverage and focus on important MM gene and gene isoform subsets, many of which were developed through microarray approaches.
Figure 5
Figure 5
Visualizing gene and gene isoform expression data in the cell line H929. This is a system generated graphic illustration of table 2. Section A shows the gene expression values for both the microarray and corresponding RNA-seq experiments. Here, for RNA-seq data, Log2(Expression) corresponds to Log2(FPKM). For microarray data, Log2(Expression) corresponds to the log2 function applied to the normalized absolute intensities from the microarray. Bar plots are used as a visualization aid for the tabular data, and are not meant as a direct comparison between the two different platform technologies. Section B shows an isoform-based comparison, on a gene-by-gene basis, for the RNA-seq gene expression values and allows for an easier identification of dominant forms.
Figure 6
Figure 6
Visualizing gene and gene isoform expression data in the cell line RPMI-8226. This is a system generated graphic illustration of table 3. All computations and comparisons were performed in an identical manner as in figure 5.

Similar articles

Cited by

References

    1. Boguski MS, Arnaout R, Hill C. Customized care 2020: how medical sequencing and network biology will enable personalized medicine. F1000 Biol Rep. 2009;14:73. - PMC - PubMed
    1. Johann DJ Jr, Blonder J. Biomarker discovery: tissues versus fluids versus both. Expert Rev Mol Diagn. 2007;14(5):473–475. doi: 10.1586/14737159.7.5.473. - DOI - PubMed
    1. Johann DJ Jr, Wei BR, Prieto DA, Chan KC, Ye X, Valera VA, Simpson RM, Rudnick PA, Xiao Z, Issaq HJ. et al.Combined blood/tissue analysis for cancer biomarker discovery: application to renal cell carcinoma. Anal Chem. 2010;14(5):1584–1588. doi: 10.1021/ac902204k. - DOI - PMC - PubMed
    1. Kyle RA, Rajkumar SV. Multiple myeloma. N Engl J Med. 2004;14(18):1860–1873. doi: 10.1056/NEJMra041875. - DOI - PubMed
    1. Rajkumar SV. Treatment of multiple myeloma. Nat Rev Clin Oncol. 2011;14(8):479–491. doi: 10.1038/nrclinonc.2011.63. - DOI - PMC - PubMed

Publication types