Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Mar 17:2025.03.14.643395.
doi: 10.1101/2025.03.14.643395.

Genetic diversity and regulatory features of human-specific NOTCH2NL duplications

Affiliations

Genetic diversity and regulatory features of human-specific NOTCH2NL duplications

Taylor D Real et al. bioRxiv. .

Abstract

NOTCH2NL (NOTCH2-N-terminus-like) genes arose from incomplete, recent chromosome 1 segmental duplications implicated in human brain cortical expansion. Genetic characterization of these loci and their regulation is complicated by the fact they are embedded in large, nearly identical duplications that predispose to recurrent microdeletion syndromes. Using nearly complete long-read assemblies generated from 67 human and 12 ape haploid genomes, we show independent recurrent duplication among apes with functional copies emerging in humans ~2.1 million years ago. We distinguish NOTCH2NL paralogs present in every human haplotype (NOTCH2NLA) from copy number variable ones. We also characterize large-scale structural variation, including gene conversion, for 28% of haplotypes leading to a previously undescribed paralog, NOTCH2tv. Finally, we apply Fiber-seq and long-read transcript sequencing to human cortical neurospheres to characterize the regulatory landscape and find that the most fixed paralogs, NOTCH2 and NOTCH2NLA, harbor the greatest number of paralog-specific elements potentially driving their regulation.

Keywords: NOTCH2; NOTCH2NL; Segmental duplication; gene duplications; human evolution.

PubMed Disclaimer

Conflict of interest statement

DECLARATIONS OF INTEREST E.E.E. is a scientific advisory board (SAB) member of Variant Bio, Inc. A.B.S. is a co-inventor on a patent relating to the Fiber-seq method (US17/995,058). All other authors declare no competing interests.

Figures

Figure 1.
Figure 1.. Genome structure and organization of the NOTCH2NL gene family.
a) Long-range organization of NOTCH2/NOTCH2NL loci in T2T-CHM13 reference genome, including centromere satellite annotations of active alpha satellite (αSat) higher order repeats (HORs) (black) and classical human satellite 2 (hsat2, secondary constriction [qh] region [Patil and Lubs 1977]) (blue), regions unique to the T2T-CHM13 assembly (green), intervals of SDs (purple), and Mendelian and genomic disorders associated with specific regions/paralogs (red). A subset of genes is depicted, including NOTCH2NL (purple), NBPF genes that are directly downstream of NOTCH2NL (pink), first unique genes located outside of SD blocks (gray), and all others (black). b) Duplicon organization as defined by DupMasker (Methods) flanking the NOTCH2NL region and intron/exon structure of genes in T2T-CHM13 V2.0 (Perez et al. 2025, http://genome.ucsc.edu). Red asterisks mark the nontraditional CTG start that the browser annotations do not take into consideration. c) Stacked SVbyEye plot of 1 Mbp regions flanking human NOTCH2NL genes (gray squares), contrasting syntenic regions in direct orientation (blue/lavender) versus inverted alignments (yellow). Annotations include different NBPF genes in the region (teal). Note: the two large inversions between NOTCH2/NOTCH2NLR and NOTCH2NLA/NOTCH2NLB, respectively, are the result of proximity due to overlapping sequence. Duplicons as defined by DupMasker (colored triangles).
Figure 2.
Figure 2.. Ape evolutionary rearrangement and expansion of human chromosome 1p21.2-q23.2.
The genomic structure of chromosome 1p21.2-q23.2 region is compared among macaque (MFA), Sumatran orangutan (PAB), Bornean orangutan (PPY), gorilla (GGO), chimpanzee (PTR), bonobo (PPA), and human (HSA) with annotations that include ancestral NOTCH2 (black stars), NOTCH2NL duplications (purple stars), NBPF duplications (pink circles), and the centromere (orange bars). The circled numbers represent previous ancestral states of chromosome 1. Three distinct evolutionary inversions are predicted (I, II, III). Two probes (RP11–314N2, green, and RP11–458I7, red) used in FISH analyses from Szamalek et al. (2006) are shown (green and red triangles). Both probes map to the q-arm in humans, with the green probe located inside the inverted region and the red probe outside. FISH data from Szamalek et al. (2006) revealed that in chimpanzee the green probe maps to the region homologous to the human p-arm, while the red probe maps to the q-arm. Sequence analysis supports the FISH mapping and shows that in great apes the sequence of the two probes (represented as red and green lines in the SVbyEye) map on opposite sides of the centromere.
Figure 3.
Figure 3.. NOTCH2/NOTCH2NL phylogeny.
a) A maximum likelihood phylogeny based on a multiple sequence alignment of 21 kbp of intronic NOTCH2/NL sequence from a subset of paralogs of five ape species, using Sumatran orangutan as an outgroup. Bootstrap support (>95%) is indicated (asterisk). Estimated divergence times of human paralogs and their confidence intervals are indicated (multicolored dots). Timings were based on human–orangutan divergence time of 15.2 MYA (Methods). b) Examples and abundance of five transcript types, which are representative of 20/26 NOTCH2NL-like loci in NHA from testis, fibroblast/lymphoblastoid cell lines, iPSCs, neuroepithelium, and neural progenitor cells. The histogram to the right of each model represents the number of Iso-Seq transcripts in support of the different predicted models at the locus. c) Multiple sequence alignment (MSA) of predicted protein sequences from 13/26 NHA NOTCH2NL-like loci, NOTCH2 from the NHAs, and all five NOTCH2/NL paralogs from human. Pop-out of exon 5 alignment shows NHAs possess the same unmodified carboxy terminus as NOTCH2NLR, which lacks a 4 bp deletion necessary for expression (Fiddes et al. 2018).
Figure 4.
Figure 4.. Patterns of human NOTCH2NL structural variation and gene conversion.
a) Workflow to characterize NOTCH2NL paralog identity based on i) best transcript match (defined as the fewest mismatches with respect to T2T-CHM13 reference CDS annotation), ii) phylogenetic clade (assignment to nearest monophyletic grouping based on NOTCH2 intronic ML tree), and iii) map location (defined here as the long-range genomic context based on DupMasker barcodes). b) Analysis of 70 human haplotypes depicts the clade assignment based on the phylogenetic tree, then the best transcript match, and finally the long-range duplicon organization based on the assembled HPRC genomes. Disagreements in paralog identity suggest potential gene conversion; examples marked with red asterisks.
Figure 5.
Figure 5.. NOTCH2NL structural diversity and NOTCH2tv.
a) A simplified schematic summary of the NOTCH2NL haplotype organization and frequency based on 66 sequence-resolved HPRC genomes and the T2T-CHM13 reference. b) Alignment of predicted AAs for the three paralogs suggests that NOTCH2tv arose as a result of an interlocus gene conversion (IGC) of NOTCH2NLR from NOTCH2. c) Nucleotide alignment of NOTCH2tv (middle) to ancestral NOTCH2 (top) and NOTCH2NLR (bottom) confirms larger stretches of near perfect sequence identity (red≥99.9%) between NOTCH2tv and NOTCH2, consistent with IGC.
Figure 6.
Figure 6.. Regulatory architecture and transcription of NOTCH2NL in neurospheres.
a) HG02360 was reprogrammed into iPSCs, differentiated into neurospheres, and then subjected to Fiber-seq and Iso-Seq to define putative regulatory elements and generate full-length transcripts. b) Fiber-seq peaks and chromatin actuation sites for each NOTCH2/NL paralog in the context of homology (gray), gene model, and transcription start site (TSS). Dotted black boxes are around elements specific to a region only shared across NOTCH2, NOTCH2NLA, and NOTCH2NLB; though the underlying sequence is nearly identical, we show paralog-specific actuation signals. c) The absolute abundance of full-length transcripts compared to other premature termination, fusion, and intron retention products in neurospheres. d) Boxplot showing categorization of accessible elements surrounding each NOTCH2NL paralog based on the presence of duplicate sequence and accessibility at that sequence on the different paralogs. Note that NOTCH2 and NOTCH2NLA have the greatest proportion of paralog-specific sites (dark green). e) Percent actuation of each accessible regulatory element surrounding the NOTCH2NLA paralog (dark green) as well as the percent actuation of duplicate sequences for each element that are present surrounding other NOTCH2NL paralogs (light green) or outside of the NOTCH2NL paralogs (pink).

References

    1. Fiddes Ian T., Lodewijk Gerrald A., et al. “Human-Specific NOTCH2NL Genes Affect Notch Signaling and Cortical Neurogenesis”. In: Cell 173.6 (May 2018), 1356–1369.e22. issn:00928674. doi: 10.1016/j.cell.2018.03.051. - DOI - PMC - PubMed
    1. Suzuki Ikuo K. et al. “Human-Specific NOTCH2NL Genes Expand Cortical Neurogenesis through Delta/Notch Regulation”. In: Cell 173.6 (May 31, 2018), 1370–1384.e16. issn:0092–8674. doi: 10.1016/j.cell.2018.03.067. - DOI - PMC - PubMed
    1. Yoo DongAhn et al. Complete sequencing of ape genomes. Pages: 2024.07.31.605654 Section:New Results. July 31, 2024. doi: 10.1101/2024.07.31.605654. - DOI
    1. Florio Marta, Heide Michael, et al. “Evolution and cell-type specificity of human-specific genes preferentially expressed in progenitors of fetal neocortex”. In: eLife 7 (Mar. 21, 2018). Ed. by Gleeson Joseph G. Publisher: eLife Sciences Publications, Ltd, e32332. issn: 2050–084X. doi: 10.7554/eLife.32332. - DOI - PMC - PubMed
    1. Rajagopalan Ramakrishnan et al. “Genome sequencing increases diagnostic yield in clinically diagnosed Alagille syndrome patients with previously negative test results”. In: Genetics in Medicine 23.2 (Feb. 1, 2021), pp. 323–330. issn: 1098–3600. doi: 10.1038/s41436-020-00989-8. - DOI - PMC - PubMed

Publication types