Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Dec 4;68(4):555-578.
doi: 10.1042/EBC20240106.

Genetic variability in proteoglycan biosynthetic genes reveals new facets of heparan sulfate diversity

Affiliations
Review

Genetic variability in proteoglycan biosynthetic genes reveals new facets of heparan sulfate diversity

Mohand Ouidir Ouidja et al. Essays Biochem. .

Abstract

Heparan sulfate (HS) and chondroitin sulfate (CS) proteoglycans (PG) consist of a core protein to which the glycosaminoglycan (GAG) chains, HS or CS, are attached through a common linker tetrasaccharide. In the extracellular space, they are involved in the regulation of cell communication, assuring development and homeostasis. The HSPG biosynthetic pathway has documented 51 genes, with many diseases associated to defects in some of them. The phenotypic consequences of this genetic variation in humans, and of genetic ablation in mice, and their expression patterns, led to a phenotypically centered HSPG biosynthetic pathway model. In this model, HS sequences produced by ubiquitous NDST1, HS2ST and HS6ST enzymes are essential for normal development and homeostasis, whereas tissue restricted HS sequences produced by the non-ubiquitous NDST2-4, HS6ST2-3, and HS3ST1-6 enzymes are involved in adaptative behaviors, cognition, tissue responsiveness to stimuli, and vulnerability to disease. The model indicates that the flux through the HSPG/CSPG pathways and its diverse branches is regulated by substrate preferences and protein-protein-interactions. This results in a privileged biosynthesis of HSPG over that of CSPGs, explaining the phenotypes of linkeropathies, disease caused by defects in genes involved in the biosynthesis of the common tetrasaccharide linker. Documented feedback loops whereby cells regulate HS sulfation, and hence the interactions of HS with protein partners, may be similarly implemented, e.g., protein tyrosine sulfation and other posttranslational modifications in enzymes of the HSPG pathway. Together, ubiquitous HS, specialized HS, and their biosynthesis model can facilitate research for a better understanding of HSPG roles in physiology and pathology.

Keywords: Chondroitin sulfate; Heparan sulfate; biosynthesis; linkeropathies; proteoglycan; sulfation.

PubMed Disclaimer

Conflict of interest statement

The authors declare that there are no competing interests associated with this manuscript.

Figures

Figure 1
Figure 1. Schematic representation of HSPG biosynthesis and the genes coding for the biosynthetic machinery
HSPG are constituted of a core protein (CP) carrying a glycosaminoglycan-CP linker (GAG-CP linker) tetrasaccharide specifically bound to a serine residue included in a serine-glycine motif on which either heparan sulfate (HS) or chondroitin sulfate (CS) chains are polymerized. The GAG-CP linker biosynthetic genes are common to HSPG and CSPG biosynthetic pathways. The HS chain is formed of constitutive disaccharides composed of a uronic acid (initially GlcA) and a glucosamine (initially GlcNAc) carrying sulfation (in red) and the epimer of GlcA, IdoA. NS domains contain only GlcNS and have high IdoA levels; NA/NS domains have intermediary N-sulfation, N-acetylation, and IdoA levels; NA domains contain GlcNAc and have very low levels of sulfation and IdoA. CS chain structure and associated biosynthetic genes are not represented.
Figure 2
Figure 2. Essential and non-essential HSPG biosynthetic genes
Essential genes are those for which pathogenic variants have been reported in human and confirmed by lethality or overt phenotypes in the corresponding null mouse (♦) (Tables 1-3). Non-essential genes are those for which SNPs (○) have been associated with altered behaviors or altered vulnerability to pathologies in human (Tables 1, 3 and 4) or with altered response to external stimuli in null mice.
Figure 3
Figure 3. The human HSPG biosynthetic machinery organized by gene essentiality and expression levels
HSPG biosynthetic genes were first clustered as ‘Essential’ or ‘Non-essential’ for normal development and homeostasis (Tables 1-4 and SI2). Then, RNAseq databases RIKEN FANTOM5 project (FANTOM5 project; http://fantom.gsc.riken.jp/data/), the ENCODE project (https://www.encodeproject.org/), and the Uhlen project (https://pmc.ncbi.nlm.nih.gov/articles/PMC4848759/; http://www.proteinatlas.org/humanproteome/tissue+specific), were used to recover expression levels from healthy adult human tissue. Depending on whether transcripts were detected or not (cutoff was 0.05 TPM), genes were re-clustered in 4 groups. Group I are essential genes that code for widely expressed (ubiquitous) core proteins (uCP), ubiquitous GAG-CP linker tetrasaccharide (uTetr) and ubiquitous HS chain biosynthetic enzymes biosynthetic enzymes (Tables 1-3 and SI2) that together are able to produce ubiquitous HS sequences (uHS), and ubiquitous remodeling enzymes, here indicated as ubiquitous quality control (QC). Group II are essential genes expressed in a tissue restricted manner. To date, this group clustered only one gene, HPSE2, which is involved in the control of HS post-synthetic remodeling, here considered as tissue restricted (specialized) QC (Table 3 and SI2). Group III are non-essential widely expressed genes coding for uCP and HS remodeling enzymes (QC), all involved in responsiveness to stimuli (Tables 1 and 4; SI2). Group IV are non-essential genes restrictively expressed and that code for enzymes involved in the production of specialized HS sequences (sHS) and specialized HS remodeling enzymes (QC) (Table 4 and SI2). Transcript levels of two refence genes were included for each tissue. TPM, transcript for million; nd, not detected.
Figure 4
Figure 4. Simplified organization of the HSPG biosynthetic genes
In summary, genes were first clustered as ‘Essential’ (pathogenic) and ‘Non-essential’. Then, genes were clustered depending on whether they show ‘restricted expression’ or ‘wide expression’ in adult human tissues. This dual organization allowed formation of four groups of genes: details on Groups I-IV are defined in Figure 3.
Figure 5
Figure 5. Schematic representation of HSPG biosynthesis
(A) Detail of the differences in the activities of the xylosyl transferases XYLT1 and XYLT2 in development and postnatally. XYLT1 is highly expressed during development (↑↑dev) and its expression decreases postnatally. XYLT1 preferentially binds to low molecular weight (LMW) CSPG core proteins, e.g., decorin ensuring high flux through the ‘decorin path’. XYLT2 is expressed after birth and shows similar binding to HSPG and CSPG core proteins, thus does not favour any particular pathway. (B) The biosynthesis of the GAG-CP linker tetrasaccharide, incorporating (A) for completeness, which is common to HSPG/CSPG. Xyl(±P), is routed to different paths depending on substrate selectivities and Xyl phosphorylation. With the exception of XYLT1 and XYLT2, all enzymes and transporters are ubiquitously expressed during development (dev) and postnatally. The CSPG-decorin path lacks phosphate in Xyl, the HSPG/CSPG-aggrecan path carries Xyl(±P) (as in aggrecan and neurocan). (C) After synthesis of the GAG-CP linker, the activity of EXTL3, EXT1/EXT2, NDST1, GLCE, HS2ST, and HS6ST1 leads to the synthesis of ubiquitous heparan sulfate (uHS) chains. (D) Specialized heparan sulfates (sHS) chains are formed by the additional action of enzymes whose expression is restricted to specific tissues and in response to stimuli. Note that these are depicted separately from the enzymes in (C), but are likely to function with them. (E) Representative HS disaccharide indicating GlcA, IdoA and GlcN units and sulfation positions (in red). Arrow width indicates preferred flux-based on substrate preferences and protein-protein interactions, wide arrows indicating preferred paths. Substrate pools guarantee overlapping paths. Green empty arrows indicate UDP-sugars or ion transport. Some nucleotide (SCL35A2 and SCL35A3) and ion transporters (SLC10A7) are represented. Ions as Ca2+, Mn2+ and/or Mg2+ are indicated when known to be required for enzymatic activity.

References

    1. Alotaibi F.S., Alsadun M.M.R., Alsaiari S.A., Ramakrishnan K., Yates E.A. and Fernig D.G. (2024) Interactions of proteins with heparan sulfate. Essays Biochem. 4479–489 10.1042/EBC20230093 - DOI - PMC - PubMed
    1. Sandoval D.R., Gomez Toledo A., Painter C.D., Tota E.M., Sheikh M.O., West A.M.V.et al. (2020) Proteomics-based screening of the endothelial heparan sulfate interactome reveals that C-type lectin 14a (CLEC14A) is a heparin-binding protein. J. Biol. Chem. 295, 2804–2821 10.1074/jbc.RA119.011639 - DOI - PMC - PubMed
    1. Nunes Q.M., Su D., Brownridge P.J., Simpson D.M., Sun C., Li Y.et al. (2019) The heparin-binding proteome in normal pancreas and murine experimental acute pancreatitis. PloS ONE 14, e0217633 10.1371/journal.pone.0217633 - DOI - PMC - PubMed
    1. Iozzo R.V. and Schaefer L. (2015) Proteoglycan form and function: A comprehensive nomenclature of proteoglycans. Matrix Biol. 42, 11–55 10.1016/j.matbio.2015.02.003 - DOI - PMC - PubMed
    1. Gao J. and Huang X. (2021) Recent advances on glycosyltransferases involved in the biosynthesis of the proteoglycan linkage region. Adv. Carbohydr. Chem. Biochem. 80, 95–119 10.1016/bs.accb.2021.10.003 - DOI - PMC - PubMed

LinkOut - more resources