Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jan 18;22(1):515-525.
doi: 10.1093/bib/bbz134.

CpG-island-based annotation and analysis of human housekeeping genes

Affiliations

CpG-island-based annotation and analysis of human housekeeping genes

Le Zhang et al. Brief Bioinform. .

Abstract

By reviewing previous CpG-related studies, we consider that the transcription regulation of about half of the human genes, mostly housekeeping (HK) genes, involves CpG islands (CGIs), their methylation states, CpG spacing and other chromosomal parameters. However, the precise CGI definition and positioning of CGIs within gene structures, as well as specific CGI-associated regulatory mechanisms, all remain to be explained at individual gene and gene-family levels, together with consideration of species and lineage specificity. Although previous studies have already classified CGIs into high-CpG (HCGI), intermediate-CpG (ICGI) and low-CpG (LCGI) densities based on CpG density variation, the correlation between CGI density and gene expression regulation, such as co-regulation of CGIs and TATA box on HK genes, remains to be elucidated. First, this study introduces such a problem-solving protocol for human-genome annotation, which is based on a combination of GTEx, JBLA and Gene Ontology (GO) analysis. Next, we discuss why CGI-associated genes are most likely regulated by HCGI and tend to be HK genes; the HCGI/TATA± and LCGI/TATA± combinations show different GO enrichment, whereas the ICGI/TATA± combination is less characteristic based on GO enrichment analysis. Finally, we demonstrate that Hadoop MapReduce-based MR-JBLA algorithm is more efficient than the original JBLA in k-mer counting and CGI-associated gene analysis.

Keywords: CpG island density; genome analysis; genome annotation; housekeeping genes; statistical genetics.

PubMed Disclaimer

Publication types

LinkOut - more resources