Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan;103(1):e15273.
doi: 10.1111/tan.15273. Epub 2023 Oct 29.

High-throughput complement component 4 genomic sequence analysis with C4Investigator

Affiliations

High-throughput complement component 4 genomic sequence analysis with C4Investigator

Wesley M Marin et al. HLA. 2024 Jan.

Abstract

The complement component 4 gene loci, composed of the C4A and C4B genes and located on chromosome 6, encodes for complement component 4 (C4) proteins, a key intermediate in the classical and lectin pathways of the complement system. The complement system is an important modulator of immune system activity and is also involved in the clearance of immune complexes and cellular debris. C4A and C4B gene loci exhibit copy number variation, with each composite gene varying between 0 and 5 copies per haplotype. C4A and C4B genes also vary in size depending on the presence of the human endogenous retrovirus (HERV) in intron 9, denoted by C4(L) for long-form and C4(S) for short-form, which affects expression and is found in both C4A and C4B. Additionally, human blood group antigens Rodgers and Chido are located on the C4 protein, with the Rodger epitope generally found on C4A protein, and the Chido epitope generally found on C4B protein. C4A and C4B copy number variation has been implicated in numerous autoimmune and pathogenic diseases. Despite the central role of C4 in immune function and regulation, high-throughput genomic sequence analysis of C4A and C4B variants has been impeded by the high degree of sequence similarity and complex genetic variation exhibited by these genes. To investigate C4 variation using genomic sequencing data, we have developed a novel bioinformatic pipeline for comprehensive, high-throughput characterization of human C4A and C4B sequences from short-read sequencing data, named C4Investigator. Using paired-end targeted or whole genome sequence data as input, C4Investigator determines the overall gene copy numbers, as well as C4A, C4B, C4(Rodger), C4(Ch), C4(L), and C4(S). Additionally, C4Ivestigator reports the full overall C4A and C4B aligned sequence, enabling nucleotide level analysis. To demonstrate the utility of this workflow we have analyzed C4A and C4B variation in the 1000 Genomes Project Data set, showing that these genes are highly poly-allelic with many variants that have the potential to impact C4 protein function.

Keywords: C4; bioinformatics pipeline; complement component; copy number; genotyping; immunogenetics.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1.
Figure 1.. Sequence features of C4A and C4B genes and C4 proteins.
(A) Positions of C4A and C4B genomic sequence features shown for a long-form of the genes. Exon positions are marked in black, the HERV-K(C4) sequence is marked in red, and select sequence variants are shown above the exons. Positions are based on the C4A and C4B combined alignment reference, which includes 5’UTR and 3’UTR sequence. The C-del variant and the TC-ins variant are frame-shift mutations that result in premature terminations. (B) Positions of C4A and C4B protein sequence features. The major chains, α, β, and γ, are shown in the bottom row, the cleavage products, C4a and C4d, are shown on the middle row, and important binding locations and sequence variants are shown in the top row. The amino acid positions include the leading 19 amino acid signal peptide.
Figure 2.
Figure 2.. Superpopulation distributions of C4A and C4B copy number results for the 1KGP dataset.
C4A and C4B overall copy represents the total copy number of C4A and C4B, C4S represents the total copy number for the short-forms of C4A and C4B, and C4L represents the total copy number for the long-forms of C4A and C4B. AFR = African, AMR = Admixed American, EAS = East Asian, EUR = European, SAS = South Asian.
Figure 3.
Figure 3.. SNV variation across the 1KGP dataset.
(A) Total copy of combined C4A and C4B non-reference variants, which are variants not represented in the main assembly of GRCh38, by C4A and C4B position for the 1KGP dataset. The copy number of all non-reference variants for a position across the 1KGP dataset are summed to get the non-reference variant copy, which was then filtered to only show variant positions with total copy of at least 10. Positions of exon and HERV-K(C4) regions are marked. (B) Global carrier frequencies for non-reference variants in the 1KGP dataset for increasing global allele frequency thresholds from 0.00–0.05 for introns, exons, and the HERV-K(C4) region. The y-axis represents the total proportion of carriers that carry a non-reference allele that is at or below the global allele frequency threshold on the x-axis. For example, nearly 25% of the 1KGP dataset carried exonic variants with a global allele frequency of 1% or lower.

Update of

Similar articles

Cited by

References

    1. Wang H, Liu M. Complement C4, Infections, and Autoimmune Diseases. Frontiers in Immunology [Internet]. 2021. [cited 2022 Apr 28];12. Available from: https://www.frontiersin.org/article/10.3389/fimmu.2021.694928 - DOI - PMC - PubMed
    1. Toapanta FR, Ross TM. Complement-mediated activation of the adaptive immune responses: role of C3d in linking the innate and adaptive immunity. Immunol Res. 2006;36(1–3):197–210. - PubMed
    1. Charles A Janeway J, Travers P, Walport M, Shlomchik MJ. The complement system and innate immunity. Immunobiology: The Immune System in Health and Disease 5th edition [Internet]. 2001. [cited 2022 Jan 4]; Available from: https://www.ncbi.nlm.nih.gov/books/NBK27100/
    1. Merle NS, Noe R, Halbwachs-Mecarelli L, Fremeaux-Bacchi V, Roumenina LT. Complement System Part II: Role in Immunity. Frontiers in Immunology. 2015;6:257. - PMC - PubMed
    1. Yang Y, Chung EK, Zhou B, Blanchong CA, Yu CY, Füst G, et al. Diversity in Intrinsic Strengths of the Human Complement System: Serum C4 Protein Concentrations Correlate with C4 Gene Size and Polygenic Variations, Hemolytic Activities, and Body Mass Index. The Journal of Immunology. 2003. Sep 1;171(5):2734–45. - PubMed

Publication types