Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Sep 20;47(9):513-521.
doi: 10.1016/j.jgg.2020.08.005. Epub 2020 Oct 10.

The human malaria parasite genome is configured into thousands of coexpressed linear regulatory units

Affiliations

The human malaria parasite genome is configured into thousands of coexpressed linear regulatory units

Chengqi Wang et al. J Genet Genomics. .

Abstract

The human malaria parasite Plasmodium falciparum thrives in radically different host environments in mosquitoes and humans, with only a limited set of transcription factors. The nature of regulatory elements or their target genes in the P. falciparum genome remains elusive. Here, we found that this eukaryotic parasite uses an efficient way to maximally use genetic and epigenetic regulation to form regulatory units (RUs) during blood infections. Genes located in the same RU tend to have the same pattern of expression over time and are associated with open chromatin along regulatory elements. To precisely define and quantify these RUs, a novel hidden Markov model was developed to capture the regulatory structure in a genome-wide fashion by integrating expression and epigenetic evidence. We successfully identified thousands of RUs and cross-validated with previous findings. We found more genes involved in red blood cell (RBC) invasion located in the same RU as the PfAP2-I (AP2-I) transcription factor, demonstrating that AP2-I is responsible for regulating RBC invasion. Our study has provided a regulatory mechanism for a compact eukaryotic genome and offers new insights into the in vivo transcriptional regulation of the P. falciparum intraerythrocytic stage.

Keywords: ATAC-Seq; Gene regulation; Hidden Markov Model; Malaria; Plasmodium falciparum; RNA-Seq; Regulatory units.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Multiple regulatory elements are involved together to regulate neighboring genes. A: Pearson correlation coefficients between the gene expression level and chromatin accessibility of assays for transposase-accessible chromatin using sequencing (ATAC-Seq) peaks in different genome segments. Each ATAC peak is assigned to different genes based on the number of genes separating them. The random group was generated by 100 shuffled ATAC-Seq peak-to-transcript matches. B: IGV screenshot (chr1: 383, 284–393, 490) shows the RNA-Seq signal (bottom) on the sense strand and the chromatin accessibility of ATAC-Seq (top). C: The upper plot shows the relative genome elements located in the genome region (chr1, 383, 284–393, 490). The dashed rectangular line represents the elements inside the gene’s RU. The heat map displays the correlations between chromatin accessibility of ATAC peaks and mRNA abundance profiles in this region. RPM, reads per million.
Fig. 2.
Fig. 2.
Identification of the regulatory unit in the P. falciparum genome. A: The flowchart shows the HMM building process. After encoding each genome element (gene and assays for transposase-accessible chromatin using sequencing [ATAC-Seq] peaks) by the sequencing signal, the directionality vector (DV) is calculated for each query element, which calculates the Pearson correlation with the upstream or downstream element. Then, the HMM is built based on the DV, and the final RU is determined by the HMM status. The final RUs are required to contain at least two genome elements (gene or ATAC peak). B: Scatter plot shows the sequencing signal correlation between each genome element with its upstream or downstream element. The different color of the points represents the four states called from the HMM. States 1, 2, 3, and 4 represent ‘downstream,’ ‘no bias,’ ‘upstream,’ and ‘both side’. C: The histogram plot shows the distribution of Pearson correlation between each pair of the genome element inside the regulatory RU or boundary. HMM, hidden Markov model; RU, regulatory unit.
Fig. 3.
Fig. 3.
Genome organization of RUs in P. falciparum. A: The chromosomal map displays 2061 gene RUs throughout the genome. B: Three hundred ninety-two of previously identified gene RUs are overlapped with coexpression units identified here. C: Pearson correlation coefficients between all pairs of genome elements inside RUs. The elements include ATAC peaks and genes. The chromatin accessibility of ATAC peaks and gene mRNA abundance is used for correlation coefficient calculation. D: The upper plot shows the relative genome elements located in the genome region (chr10: 1, 244, 215–1, 251, 807). The dashed rectangular line represents the elements inside the RU. The heat map displays the correlation between chromatin accessibility of ATAC peaks and mRNA abundance profiles in this region. E: The upper plot shows the relative location genome elements in the genome region (chr6: 217, 845–249, 898). The rectangular dashed line represents the elements inside the RU. The heat map displays the correlation between chromatin accessibility at ATAC peaks and mRNA abundance profiles in this region. RU, regulatory unit.
Fig. 4.
Fig. 4.
AP2-I is involved in regulating red blood cell invasion genes located in the same RU. A: The Venn diagram shows the overlapping profile between assays for transposase-accessible chromatin using sequencing (ATAC-Seq) peaks inside RUs and AP2-I peaks. B: Pearson correlation coefficients between chromatin accessibility of ATAC peaks overlapped with AP2-I and gene mRNA abundance in different genome segments. (Each ATAC peak overlapped with AP2-I is assigned to its nearest downstream genes as previously reported). C: Heat maps displaying relative chromatin accessibility of AP2-I binding sites and mRNA abundance profiles of the gene in the same RU through eight stages of intraerythrocytic development. All pairs of gene and ATAC-Seq peaks that overlap with AP2-I peaks in the same RU are shown. The ATAC-Seq peaks overlapped with AP2-I peaks in the intergenic region are considered here. The accessibility profile at ATAC-Seq peaks is calculated as the fraction of the maximum of RPKM values over all the time points. The K-means algorithm is used here with 1-Pearson correlation distance. The RNA-Seq data were normalized as ATAC-Seq data and presented in the same order. D: Same as C, but shows the genes located in the nearest downstream of the AP2-I peak but not in the same RU. E: Pearson correlation coefficients between chromatin accessibility of ATAC peaks overlapped with AP2-I and gene mRNA abundance in different genome segments. F: The upper plot shows the relative genome elements located in the genome region (chr8: 800, 920–822, 452). The rectangular dashed line represents the elements inside the RU. The heat map displays the correlation between chromatin accessibility of ATAC peaks and mRNA abundance profiles in this region. G: The upper plot shows the relative genome location of elements located in the genome region (chr13: 1,412,221–1,422,196). The dashed rectangular line represents the elements inside the RU. The heat map displays the correlation between chromatin accessibility of ATAC peaks and mRNA abundance profiles in this region. RU, regulatory unit.
Fig. 5.
Fig. 5.
The association between gene coexpression elements and the physical partitioning in the P. falciparum genome. A: The heat map shows the gene expression correlation between 10-kb bins along chromosome 7 in P. falciparum. Five genome domains are indicated in the heat map. B: Normalized Hi-C interaction frequencies along chromosome 7 displayed as a two-dimensional heat map. There are 5 genome domains of highly interacting regions that are consistent with the 5 gene expression genome domains shown in (A). The blue bar in the bottom region indicates the predicted RUs, while the brown bar is the region of topological domain (TAD) called by the method published in the study by Dixon et al., 2012, which indicates enriched intrachromosome interactions. The boundaries of the RUs and TAD are highly overlapped, suggesting the RUs are formed from intrachromosome interactions. RU, regulatory unit.

References

    1. Aurrecoechea C, Brestelli J, Brunk BP, Dommer J, Fischer S, Gajria B, Gao X, Gingle A, Grant G, Harb OS, et al., 2009. PlasmoDB: a functional genomic database for malaria parasites. Nucleic Acids Res. 37, D539–D543. - PMC - PubMed
    1. Ay F, Bunnik EM, Varoquaux N, Bol SM, Prudhomme J, Vert JP, Noble WS, Le Roch KG, 2014. Three-dimensional modeling of the P. falciparum genome during the erythrocytic cycle reveals a strong connection between genome architecture and gene expression. Genome Res. 24, 974–988. - PMC - PubMed
    1. Ay F, Bunnik EM, Varoquaux N, Vert J-P, Noble WS, Le Roch KG, 2015. Multiple dimensions of epigenetic gene regulation in the malaria parasite Plasmodium falciparum: gene regulation via histone modifications, nucleosome positioning and nuclear architecture in P. falciparum. BioEssays 37, 182–194. - PMC - PubMed
    1. Bártfai R, Hoeijmakers WAM, Salcedo-Amaya AM, Smits AH, Janssen-Megens E, Kaan A, Treeck M, Gilberger T-W, Françoijs K-J, Stunnenberg HG, 2010. H2A.Z demarcates intergenic regions of the Plasmodium falciparum epigenome that are dynamically marked by H3K9ac and H3K4me3. PLoS Pathog. 6, e1001223. - PMC - PubMed
    1. Bozdech Z, Llinás M, Pulliam BL, Wong ED, Zhu J, DeRisi JL, 2003. The transcriptome of the intraerythrocytic developmental cycle of Plasmodium falciparum. PLoS Biol. 1, e5. - PMC - PubMed

Publication types

MeSH terms