Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Aug 29:4:170107.
doi: 10.1038/sdata.2017.107.

FANTOM5 CAGE profiles of human and mouse reprocessed for GRCh38 and GRCm38 genome assemblies

Affiliations

FANTOM5 CAGE profiles of human and mouse reprocessed for GRCh38 and GRCm38 genome assemblies

Imad Abugessaisa et al. Sci Data. .

Abstract

The FANTOM5 consortium described the promoter-level expression atlas of human and mouse by using CAGE (Cap Analysis of Gene Expression) with single molecule sequencing. In the original publications, GRCh37/hg19 and NCBI37/mm9 assemblies were used as the reference genomes of human and mouse respectively; later, the Genome Reference Consortium released newer genome assemblies GRCh38/hg38 and GRCm38/mm10. To increase the utility of the atlas in forthcoming researches, we reprocessed the data to make them available on the recent genome assemblies. The data include observed frequencies of transcription starting sites (TSSs) based on the realignment of CAGE reads, and TSS peaks that are converted from those based on the previous reference. Annotations of the peak names were also updated based on the latest public databases. The reprocessed results enable us to examine frequencies of transcription initiations on the recent genome assemblies and to refer promoters with updated information across the genome assemblies consistently.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Work flow of FANTOM5 data re-processing.
The figure describes the reprocessing of the FANTOM5 data. The workflow encompasses three processes; CAGE reads realignment (1), CAGE peaks liftOver (2) and CAGE peaks call (3). The source datasets are in (GRCH37/hg19) and (NCBI37/mm9). The target assembly is (GRCH38/hg38) and (GRCm38/mm10). CAGE reads realignment result in mapped CAGE peaks, CAGE peaks liftOver result in two sets of CAGE peaks (mapped and unmapped). And the CAGE peaks call result in new CAGE peaks in the latest genomes. Process (1) and (2) are followed by quality checking (QC). The QC filtered the mapped CAGE peaks into fair and problematic CAGE peaks. The set of problematic and dropped CAGE peak regions are investigated and manually curated. The new CAGE peaks from (3) are intersected with the fair CAGE peaks using bedtools (intersectbed) to define non-overlapped CAGE peaks (new CAGE peaks). The fair and new CAGE peaks are annotated with the latest gene and transcript models and their expression tables are calculated.
Figure 2
Figure 2. Correlation between the CAGE tags count of the aligned CAGE reads and the liftOver CAGE peaks.
The scatterplot shows the correlation between the number of tag count within the regions of aligned CAGE reads and the liftOver CAGE peaks. [2a] human and [2b] mouse.

Dataset use reported in

  • doi: 10.1038/nature13182

References

Data Citations

    1. Kawaji H. 2017. Zenodo. http://doi.org/10.5281/zenodo.545682 - DOI
    1. 2014. DDBJ Sequence Read Archive. DRA000991
    1. 2014. DDBJ Sequence Read Archive. DRA001026
    1. 2014. DDBJ Sequence Read Archive. DRA001027
    1. 2014. DDBJ Sequence Read Archive. DRA001028

References

    1. Abugessaisa I., Kasukawa T. & Kawaji H. Genome Annotation. Methods Mol Biol 1525, 107–121 (2017). - PubMed
    1. Church D. M. et al. Modernizing reference genome assemblies. PLoS Biol 9, e1001091 (2011). - PMC - PubMed
    1. O'Leary N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res 44, D733–D745 (2016). - PMC - PubMed
    1. Harrow J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res 22, 1760–1774 (2012). - PMC - PubMed
    1. de Hoon M., Shin J. W. & Carninci P. Paradigm shifts in genomics through the FANTOM projects. Mamm Genome 26, 391–402 (2015). - PMC - PubMed

Publication types