Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 17;26(1):671.
doi: 10.1186/s12864-025-11859-5.

Direct long-read visualization reveals hidden variation in GCH1 gene copy number and precise expansion steps

Affiliations

Direct long-read visualization reveals hidden variation in GCH1 gene copy number and precise expansion steps

Shiwei Liu et al. BMC Genomics. .

Abstract

Background: Increases in the copy number of large genomic regions, termed amplifications, are an important adaptive strategy for many organisms. Numerous amplifications across the AT-rich Plasmodium falciparum genome contribute directly to drug resistance or impact the fitness of this protozoan parasite. During the characterization of malaria parasites selected with a dihydroorotate dehydrogenase (DHODH) inhibitor that targets pyrimidine biosynthesis, we detected increased copies of a genomic region that encompassed 3 genes (~ 5 kb) including GTP cyclohydrolase I (GCH1 amplicon). While amplification of this gene is reported in antifolate-resistant parasites, GCH1 amplicons had not previously been implicated in DHODH inhibitor resistance.

Results: Here, we explored the expansion of the GCH1 locus in this family of parasite lines using long-read sequencing and single-read visualization. We directly quantified higher numbers of tandem GCH1 amplicons in selected parasite lines (up to 9 GCH1 amplicons) compared to parental P. falciparum parasites (strictly 3 GCH1 amplicons). Because each read represents DNA from an individual genome, we were able to appreciate hidden variation within a single parasite line (3, to 5, to 7 amplicons) that was not reflected in other DNA-based analysis methods. While all GCH1 amplicons shared a consistent structure, expansions arose in precise 2-unit steps within selected lines. We found conserved AT-rich sequences at amplicon boundaries, which is consistent with the Plasmodium model of CNV formation. Parasite lines with expanded GCH1 also had DHODH amplicons on a separate chromosome. When we evaluated prior DHODH inhibitor selections, we observed that GCH1 amplification was not required for resistance; however, selection outcomes suggest that pre-existing GCH1 amplicons may support amplification at the DHODH locus.

Conclusions: We identified previously undetected heterogeneity in gene copy number by viewing long pieces of DNA from individual genomes. This approach was possible due to the amplicon's tandem orientation and relatively small size that can be spanned by a single long ONT read. The positive association between DHODH and GCH1 copy number, combined with the metabolic connection between P. falciparum pyrimidine and folate biosynthesis, justifies further investigation into the adaptive evolution of these two genomic loci.

Keywords: Copy number variation; Dihydroorotate dehydrogenase; Folate biosynthesis; GTP cyclohydrolase I; Long-read sequencing; Malaria; Oxford nanopore technology; Plasmodium falciparum; Pyrimidine biosynthesis; Single read visualization.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: Not applicable. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Short-read analysis shows potential relationship between GCH1 and DHODH copy number in one family of DSM1-selected parasites. A DSM1 selection schematic, as presented previously [21]. Blue text: Illumina short read sequenced lines. Underline: modern lines confirmed by ddPCR analysis (Table 1). Asterisk (*): lines subjected to long-read sequencing in this study. Wild-type (WT1, Dd2) P. falciparum was selected with DSM1 in two steps; the first step selected for low‐level (L) resistance and the second step selected for moderate‐ (M) or high‐level (H) resistance. DSM1 EC50 values are as follows: WT1 (0.2 µM), L1 (1 µM), L2 (0.9 µM), M1 (7.2 µM), H2 (85 µM), H6 (56 µM), H4 (49 µM). All values were previously reported and clone names were adapted as previously [21, 36]. B Relationship between GCH1 and DHODH copy number in DSM1 selected parasites as quantified using short read data from Guler et al. 2013. A trend line indicates the relationship between GCH1 and DHODH copy numbers but a Pearson r correlation analysis is not significant (R2 = 0.5737, p value = 0.1381)
Fig. 2
Fig. 2
Long-read visualization shows conserved structure and boundaries of GCH1 amplicon from parental and DSM1-selected parasites. A-D Representative images of single reads from the Shiny application comparing a non-amplified locus (A) and the GCH1 amplicon in WT1 (B), M1 (C), and H2 (D) spanning reads from chromosome 12. Red dashed square: copies of GCH1 amplicon covering 3 gene units. Each gene sequence from the 3D7 reference genome (v3.0, no GCH1 amplicon represented) was split into < = 500 bp fragments and BLASTed against individual ONT reads (dark gray: genic regions, light grey: intergenic regions). For the GCH1 amplicon, individual reads were trimmed to span the same gene regions (Pf3D7_1224500–Pf3D7_1223200) and aligned in the same direction for ease of viewing (but all reads varied in boundaries and direction). The non-amplified locus was selected from a region downstream of the GCH1 amplicon on chromosome 12. Other small regions represent regions with similar sequence identity to the sequences on the amplicon. E Orientation of GCH1 amplicons from each parasite line. The three genes within the GCH1 amplicon unit include Pf3D7_1224000 (GTP cyclohydrolase I, GCH1), Pf3D7_1223900 (50 S ribosomal protein L24, putative, RPL), and Pf3D7_1223800 (citrate/oxoglutarate carrier protein, putative, YHM2). Note: The gene order (GCH1-RPL-YHM2) is reversed compared to the 3D7 reference genome v3.0 (YHM2-RPL-GCH1) in order to facilitate comparison with the read images in panels B-D. Red circles: amplicon boundary sequences that act as expansion sites
Fig. 3
Fig. 3
Quantification of long-reads displays an increase in GCH1 amplicon number in DSM1-selected parasites that is not dependent on overall read length. A GCH1 copy number from WT1 and selected (M1 and H2) parasite lines from all reads (spanning and non-spanning). Spanning reads were grouped with their corresponding non-spanning count (e.g. a read spanning 2 copies of the GCH1 amplicon, including up- and down-stream genes, is counted as a 2 + read). Blue bars represent median, error bars represent interquartile range (dotted lines, values that represent detected spanning reads from panel B). Solid grey vertical line represents parental amplicon tally (3 copies). B GCH1 copy number from only spanning reads. Overall (all runs combined), 40 of 139 total reads were characterized as spanning (WT1, 10/19 reads; M1, 20/56 reads; H2, 10/64 reads). Blue bars represent median, error bars represent interquartile range (dotted lines, values that represent detected spanning reads). Dunn multiple comparisons test: WT1 vs. M1: p = 0.0005 (***), WT1 vs. H2: p < 0.0001 (****). Solid grey vertical line represents parental amplicon tally (3 copies). C Read length distribution from all long reads (> = 10 kb) covering the GCH1 amplicon. Thick line represents median (WT1: 51144, M1: 46684, H2: 27081), thin lines represent quartiles. *, p = 0.01
Fig. 4
Fig. 4
GCH1 amplicon boundaries from long-reads indicate multiple precise amplification events. Representation of amplification steps from junction visualizations presented in Figures S4 (WT1), S5 (M1) and S6 (H2). This data is consistent with the “trigger site” model reported previously [40], where AT dinucleotides form stable DNA hairpins that lead to double-strand breaks and homopolymeric (pure) A or T tracks act as microhomology for error-prone break repair pathways (gray dashed boxes). A Before amplification, the primary 5′ inner homopolymeric A/T track (green) and 3′AT dinucleotide sequence (red and green) participate in events leading to initial amplification of the 3 gene unit. The gene order (YHM2-50 S RPL24-GCH1) reflects that in the 3D7 reference genome (but is opposite to Fig. 2E). GCH1 (G): Pf3D7_1224000 (GTP cyclohydrolase I); RPL (R): Pf3D7_1223900 (50 S ribosomal protein L24, putative); YHM2 (Y): Pf3D7_1223800 (citrate/oxoglutarate carrier protein, putative). B A secondary outer 5′ homopolymeric T track (purple circles) and 3′ AT dinucleotide sequence that flanks the predicted inverted head-to-head duplication intermediate initiates the re-amplification of the 3 gene unit to generate the WT1 (Dd2) amplicon. Note: this intermediate state has not been observed and is speculation based on the WT1 amplicon structure. C The same sequences from panel B that sit at the end of the tail-to-tail inverted amplicon copies (red circles) trigger 2 unit amplification to generate the expanded locus in M1 and H2 lines under selection (see also expansion sites depicted in Fig. 2E. D The GCH1 amplicon continues to expand by 2 units using the precise boundary sequences as in the previous round (red circles)
Fig. 5
Fig. 5
Preexisting GCH1 amplicons may support the acquisition of DHODH amplicons. A GeneToCN assessment of GCH1 and DHODH copy number using current long-read data. Mean values are represented for each amplicon calculated in comparison to three different reference genes using data from all sequencing runs (Table S7). Error bars show SD. CN, copy number. Pearson r correlation analysis is significant (R2 = 0.9985, p value = 0.0246). (B) Relationship between GCH1 and DHODH copy number in parental (black outline) and DHODH inhibitor-selected parasite lines as quantified using short-read data (see Table S6 for source data). Each data point represents a copy number from an individual parasite clone or line from a distinct selection (see Tables S1 and S2 for copy number assessment). 3D7, yellow squares (2 selections); Dd2, blue circles (2 selections, including lines from the current study); K1, red triangle (1 selection, unknown DHODH copy number post-selection); HB3, grey diamonds (2 unsuccessful selections, potentially resistance incapable). C GCH1 amplicon size and gene number vary between parasite lines used for DHODH inhibitor selections (see Table S6 for selection details). 3D7, 4 copy, single gene (Pf3D7_1224000, 2kb, Figure S2); Dd2, 3 copy, three genes (Pf3D7_1223800- Pf3D7_1224000, 5 kb, Fig. 2); K1, unknown copy number (grey dash, anticipated 2 copy), 7 genes (Pf3D7_1223500– Pf3D7_1224100, 19 kb); HB3, 1 copy of region
Fig. 6
Fig. 6
A metabolic connection between pyrimidine and folate biosynthesis pathways. Enzymes with gene copy number variations are indicated in blue text (DHODH: dihydroorotate dehydrogenase; GCH1: GTP cyclohydrolase 1) and connections between the two pathways in blue arrows. Gln: Glutamine; DHO: Dihydroorotate; UMP: Uridine monophosphate; dUMP: Deoxyuridine monophosphate; dTMP: deoxythymidine monophosphate; GTP: Guanosine-5’-triphosphate; DHPS: Dihydropteroate synthase; DHFR: Dihydrofolate reductase, bifunctional enzyme with TS: Thymidylate synthase; DHF: Dihydrofolate; THF: Tetrahydrofolate; HMDP-P2: 6-hydroxymethyl-7, 8-dihydropterin diphosphate; pABA: para-amino-benzoic acid; 5, 10-CH2-THF: 5,10-Methylenetetrahydrofolate. *Dd2 carries 3 mutations in both DHPS and DHFR (6 total) and HB3 and 3D7 have a single mutation each in DHFR and DHPS, respectively [26, 77]

References

    1. Hastings P, Lupski JR, Rosenberg SM, Ira G. Mechanisms of change in gene copy number. Nat Rev Genet. 2009;10(8):551–64. - PMC - PubMed
    1. Lauer S, Gresham D. An evolving view of copy number variants. Curr Genet. 2019;65(6):1287–95. - PubMed
    1. Hollox EJ, Zuccherato LW, Tucci S. Genome structural variation in human evolution. Trends Genet. 2022;38(1):45–58. - PubMed
    1. Liu X, Chen W, Huang B, Wang X, Peng Y, Zhang X, et al. Advancements in copy number variation screening in herbivorous livestock genomes and their association with phenotypic traits. Front Vet Sci. 2023;10:1334434. - PMC - PubMed
    1. Pokrovac I, Pezer Ž. Recent advances and current challenges in population genomics of structural variation in animals and plants. Front Genet. 2022;13:1060898. - PMC - PubMed

MeSH terms

Substances

LinkOut - more resources