Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Mar 13:2013:bat010.
doi: 10.1093/database/bat010. Print 2013.

Uncovering hidden duplicated content in public transcriptomics data

Affiliations

Uncovering hidden duplicated content in public transcriptomics data

Marta Rosikiewicz et al. Database (Oxford). .

Abstract

As part of the development of the database Bgee (a dataBase for Gene Expression Evolution), we annotate and analyse expression data from different types and different sources, notably Affymetrix data from GEO and ArrayExpress, and RNA-Seq data from SRA. During our quality control procedure, we have identified duplicated content in GEO and ArrayExpress, affecting ∼14% of our data: fully or partially duplicated experiments from independent data submissions, Affymetrix chips reused in several experiments, or reused within an experiment. We present here the procedure that we have established to filter such duplicates from Affymetrix data, and our procedure to identify future potential duplicates in RNA-Seq data.

PubMed Disclaimer

References

    1. Bastian F, Parmentier G, Roux J, et al. Bgee: integrating and comparing heterogeneous transcriptome data among species. In: Bairoch A, Cohen-Boulakia S, Froidevaux C, editors. Data Integration in the Life Sciences. Vol. 5109. Berlin/Heidelberg: Springer; 2008. pp. 124–131.
    1. Barrett T, Troup DB, Wilhite SE, et al. NCBI GEO: archive for functional genomics data sets‚ 10 years on. Nucleic Acids Res. 2011;39:D1005–D1010. - PMC - PubMed
    1. Parkinson H, Sarkans U, Kolesnikov N, et al. ArrayExpress update – an archive of microarray and high-throughput sequencing-based functional genomics experiments. Nucleic Acids Res. 2011;39:D1002–D1004. - PMC - PubMed
    1. Kodama Y, Shumway M, Leinonen R. The Sequence Read Archive: explosive growth of sequencing data. Nucleic Acids Res. 2012;40:D54–D56. - PMC - PubMed
    1. Liu W-m, Mei R, Di X, et al. Analysis of high density expression microarrays with signed-rank call algorithms. Bioinformatics. 2002;18:1593–1599. - PubMed

Publication types