Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Aug 8;111(8):1700-1716.
doi: 10.1016/j.ajhg.2024.06.007. Epub 2024 Jul 10.

Structural and genetic diversity in the secreted mucins MUC5AC and MUC5B

Affiliations

Structural and genetic diversity in the secreted mucins MUC5AC and MUC5B

Elizabeth G Plender et al. Am J Hum Genet. .

Abstract

The secreted mucins MUC5AC and MUC5B are large glycoproteins that play critical defensive roles in pathogen entrapment and mucociliary clearance. Their respective genes contain polymorphic and degenerate protein-coding variable number tandem repeats (VNTRs) that make the loci difficult to investigate with short reads. We characterize the structural diversity of MUC5AC and MUC5B by long-read sequencing and assembly of 206 human and 20 nonhuman primate (NHP) haplotypes. We find that human MUC5B is largely invariant (5,761-5,762 amino acids [aa]); however, seven haplotypes have expanded VNTRs (6,291-7,019 aa). In contrast, 30 allelic variants of MUC5AC encode 16 distinct proteins (5,249-6,325 aa) with cysteine-rich domain and VNTR copy-number variation. We group MUC5AC alleles into three phylogenetic clades: H1 (46%, ∼5,654 aa), H2 (33%, ∼5,742 aa), and H3 (7%, ∼6,325 aa). The two most common human MUC5AC variants are smaller than NHP gene models, suggesting a reduction in protein length during recent human evolution. Linkage disequilibrium and Tajima's D analyses reveal that East Asians carry exceptionally large blocks with an excess of rare variation (p < 0.05) at MUC5AC. To validate this result, we use Locityper for genotyping MUC5AC haplogroups in 2,600 unrelated samples from the 1000 Genomes Project. We observe a signature of positive selection in H1 among East Asians and a depletion of the likely ancestral haplogroup (H3). In Europeans, H3 alleles show an excess of common variation and deviate from Hardy-Weinberg equilibrium (p < 0.05), consistent with heterozygote advantage and balancing selection. This study provides a generalizable strategy to characterize complex protein-coding VNTRs for improved disease associations.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests E.E.E. is a scientific advisory board (SAB) member of Variant Bio, Inc.

Figures

None
Graphical abstract
Figure 1
Figure 1
The genetic architecture of MUC5AC in 206 human haplotypes (A) Recombination-aware phylogenetic analysis of ∼27 kbp neutral sequence (5.592 kbp from introns 31–48 and 21 kbp from 3′ flanking sequence) from 206 human haplotypes of MUC5AC with two chimpanzee haplotypes as outgroup. () = central node with 100% bootstrap support. H1–H3 correspond to three major haplogroups; P1–P6 correspond to protein groups (consistent with C). (B) Frequency of population-specific haplotypes found in the three common phylogenetic haplogroups of MUC5AC. H1–H3 correspond to the three major haplogroups. (C) Protein predictions for haplotypes of MUC5AC. Diagrams represent protein domains with the large central exon of MUC5AC, modeled after Guo et al. Colors correspond to protein groups visualized in (A). CysD corresponds to cys domains and PTS corresponds to proline-, serine-, and threonine-rich domains. (D) Distributions of absolute serine and threonine (S/T) count across VNTR domains within the four most common protein groups of MUC5AC. (E) Distributions of percent S/T content within VNTR domains for the four most common protein groups of MUC5AC. (F) Logo plot of the 130 8-mer amino acid motif variants used in MUC5AC VNTR domains. Colors correspond to biochemical groupings of amino acids. (G) Heatmap of 8-mer motif utilization across 206 protein variants of human MUC5AC, colored vertically by protein group identities. Heatmap constructed with normalization within motifs (columns) and hierarchical clustering of haplotypes (rows) and motifs (columns). See Figure S2 for an extended version that includes the matched motifs (columns).
Figure 2
Figure 2
The genetic architecture of MUC5B in 206 human haplotypes (A) Recombination-aware phylogenetic analysis of ∼26.5 kbp neutral sequence (introns 16–48) from 206 human haplotypes of MUC5B with two chimpanzee haplotypes as outgroup. () = central node with 100% bootstrap support. H1 and H2 correspond to two major haplogroups; P1–P6 correspond to protein groups (consistent with C); trunc. corresponds to haplotypes with truncated protein predictions. (B) Frequency of population-specific haplotypes found in the two common phylogenetic haplogroups of MUC5B. (C) Protein predictions for 206 human haplotypes of MUC5B. Diagrams represent protein domains with the large central exon of MUC5B, modeled after those in Ridley et al. Colors correspond to protein groups visualized in (A). CysD corresponds to cys domains and PTS corresponds to proline-, serine-, and threonine-rich domains. (D) Distributions of absolute serine and threonine (S/T) count across VNTR domains for the three most common protein groups of MUC5B. (E) Distributions of percent S/T content within VNTR domains for the three most common protein groups of MUC5B. (F.) Logo plot of the complete 29-mer amino acid motif variants used in MUC5B VNTR domains across 206 human haplotypes. Colors correspond to biochemical groupings of amino acids. (G) Heatmap of 190–29-mer motif utilization across protein variants of human MUC5B, colored vertically by protein group identities. Heatmap constructed through normalization for total VNTR sequence length, normalization within each motif (columns), and hierarchical clustering of haplotypes (rows) and motifs (columns). See Figure S4 for an extended version that includes the matched motifs (columns).
Figure 3
Figure 3
The genetic architecture of MUC5AC and MUC5B in the nonhuman ape lineages (A) Phylogenetic analysis of ∼25 kbp from at minimum two haplotypes per ape lineage for MUC5AC and subsequent protein predictions based on human exon boundary alignments. () = central node distinguishing species branches with bootstrap support. Diagrams represent protein domains within the large central exon. HSA denotes human haplotypes. (B) Scatterplot of total MUC5AC exon 31 length (in base pairs) and total VNTR motif count across all VNTR domains in human and NHPs. (C) Tiled alignments between representative haplotypes of each ape species (most common or most structurally unique haplotype per species) for MUC5AC. MUC5AC intron/exon boundaries are distinguished by the gene model at the top of the visualization. (D) Phylogenetic analysis of ∼15 kbp from at minimum two haplotypes per NHP lineage and subsequent protein predictions for MUC5B haplotypes based on human exon boundary liftover. () = central node distinguishing species branches with 100% bootstrap support. Diagrams represent protein domains with the large central exon. (E) Scatterplot of total MUC5B exon 31 length (in base pairs) and total VNTR motif count across all VNTR domains in human and NHPs. (F) Tiled alignments between representative haplotypes of each NHP species (most common or most structurally unique haplotype per species) for MUC5B. MUC5B intron/exon boundaries distinguished by gene model at top of visualization.
Figure 4
Figure 4
Linkage disequilibrium (LD) analysis of the MUC5AC/MUC5B locus for African, American, European, East Asian, and South Asian genomes from the phased, short-read 1000 Genomes Project (1KG) cohort (A) LD plots for the MUC5AC/MUC5B locus based on D′, with increasing red intensity indicative of higher LD between SNPs. Gene models corresponding to MUC5AC and MUC5B indicated by black annotations at top. (B) Autosome-wide LD block size distributions for each major population. Blocks above 100 kbp visually excluded as outliers (included in distribution analyses within populations).
Figure 5
Figure 5
Genotyping of MUC5AC haplogroups with Locityper for population distributions and signatures of positive selection (A) Locityper leave-one-out results comparing edit distances between actual and retrieved genotype (predicted from Locityper) versus edit distances between actual and closest possible genotype (best possible reference genotype from a multiple sequence alignment with true genotype) for MUC5AC. Dot color based on the number of haplotypes in diploid sample sets that were correctly genotyped. (B) MUC5AC haplogroup frequencies across super populations and populations in the 1KG dataset from Locityper predictions. (C) Distribution ranks of negative Tajima’s D values across 10 kbp bins in the MUC5AC locus for genotyped haplogroups in each of the 1KG super populations. The dashed black line corresponds to the 10% distribution rank and the dashed red line corresponds to the 5% distribution rank. The three values above the dashed red line pass permutation testing and multiple testing correction. (D) Six GWAS risk and protective alleles mapped to the MUC5AC phylogeny. SNPs grouped based on disease association and squared correlations color coded based on haplogroup partitioning.

Update of

References

    1. Chatterjee M., van Putten J.P.M., Strijbis K. Defensive properties of mucin glycoproteins during respiratory infections—relevance for Sars-CoV-2. mBio. 2020;11 - PMC - PubMed
    1. Wallace L.E., Liu M., van Kuppeveld F.J., de Vries E., de Haan C.A. Respiratory mucus as a virus-host range determinant. Trends Microbiol. 2021;29:983–992. - PMC - PubMed
    1. Morrison C.B., Markovetz M.R., Ehre C. Mucus, mucins, and cystic fibrosis. Pediatr. Pulmonol. 2019;54:S84–S96. - PMC - PubMed
    1. Bergstrom K.S.B., Xia L. Mucin-type O-glycans and their roles in intestinal homeostasis. Glycobiology. 2013;23:1026–1037. - PMC - PubMed
    1. Chaisson M.J.P., Sanders A.D., Zhao X., Malhotra A., Porubsky D., Rausch T., Gardner E.J., Rodriguez O.L., Guo L., Collins R.L., et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat. Commun. 2019;10:1784. - PMC - PubMed

LinkOut - more resources