Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Apr 16;8(4):e61701.
doi: 10.1371/journal.pone.0061701. Print 2013.

Widespread divergence of the CEACAM/PSG genes in vertebrates and humans suggests sensitivity to selection

Affiliations

Widespread divergence of the CEACAM/PSG genes in vertebrates and humans suggests sensitivity to selection

Chia Lin Chang et al. PLoS One. .

Abstract

In mammals, carcinoembryonic antigen cell adhesion molecules (CEACAMs) and pregnancy-specific glycoproteins (PSGs) play important roles in the regulation of pathogen transmission, tumorigenesis, insulin signaling turnover, and fetal-maternal interactions. However, how these genes evolved and to what extent they diverged in humans remain to be investigated specifically. Based on syntenic mapping of chordate genomes, we reveal that diverging homologs with a prototypic CEACAM architecture-including an extracellular domain with immunoglobulin variable and constant domain-like regions, and an intracellular domain containing ITAM motif-are present from cartilaginous fish to humans, but are absent in sea lamprey, cephalochordate or urochordate. Interestingly, the CEACAM/PSG gene inventory underwent radical divergence in various vertebrate lineages: from zero in avian species to dozens in therian mammals. In addition, analyses of genetic variations in human populations showed the presence of various types of copy number variations (CNVs) at the CEACAM/PSG locus. These copy number polymorphisms have 3-80% frequency in select populations, and encompass single to more than six PSG genes. Furthermore, we found that CEACAM/PSG genes contain a significantly higher density of nonsynonymous single nucleotide polymorphism (SNP) compared to the chromosome average, and many CEACAM/PSG SNPs exhibit high population differentiation. Taken together, our study suggested that CEACAM/PSG genes have had a more dynamic evolutionary history in vertebrates than previously thought. Given that CEACAM/PSGs play important roles in maternal-fetal interaction and pathogen recognition, these data have laid the groundwork for future analysis of adaptive CEACAM/PSG genotype-phenotypic relationships in normal and complicated pregnancies as well as other etiologies.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Syntenic mapping of CEACAM/PSG family genes in vertebrates.
a) Syntenic mapping of the CEACAM/PSG locus in human, chimpanzee (P. troglodytes), and Rhesus monkey (M. mulatta). CEACAM/PSG genes of these primates can be subdivided into two clusters: cluster I genes are flanked by orthologs of TGFB1 and XRCC1 whereas cluster II genes are close to orthologs of TOMM40, APOE, and SIGLEC8. All CEACAM subfamily genes are indicated by blue oval symbols. CEACAM/PSG locus marker genes, including TGFB1, ATP1A3, ZNF574, PAFAH1B3, TMEM145, CNFN, LIPE, ETHE1, XRCC1, TOMM40, APOE, and SIGLEC8, are indicated by triangle symbols. The PSG subfamily genes are indicated by shaded square boxes. The relative position of genes is shown under gene symbols in Kbp. b) Syntenic mapping of CEACAM family genes in dog, rat, the gray short-tailed opossum (M. domestica), platypus (O. anatinus), and the clawed frog (X. tropicalis). The opossum contains more than three dozen paralogs on multiple chromosomes (Table S2 in File S1). For the opossum, only paralogs mapped on chromosomes 2 and 4 are shown. Those found on unknown chromosomes are described in Table S2 in File S1. Among these homologs, eleven (MdoCEACAMI-XI) were found to cluster in a 2-Mbp span on chromosome 4, which also contained the marker genes TOMM40 and APOE. On the other hand, the platypus genome encoded four CEACAM homologs (OanCEACAM16, 16LI, 20LI, and 20LII)(Table S2 in File S1). In X. tropicalis, three homologs (XtrCEACAMI-III) were located near marker genes, including LIPE, CNFN, TMEM145, PAFAH1B3, and ZNF574. The chromosomal number and the genomic contig number are indicated at the top of the schematic representation of each genomic fragment. CEACAM family genes are indicated by red diamond-shaped symbols. Marker genes are identified by colored diamond-shaped symbols. The relative position of genes on chromosomes and contigs is shown next to the gene symbols. c) Syntenic mapping of CEACAM loci in teleosts. The genomes of the medaka fish (O. latipes), stickleback (G. aculeatus), zebrafish (D. rerio), and two pufferfishes (T. rubripes and T. nigroviridis) encode 1–12 CEACAM family genes. Syntenic mapping indicated that zebrafish and T. nigroviridis CEACAM genes are located on whole genome duplication (WGD)-derived chromosome fragments, and that zebrafish CEACAMs on chromosome 16 are located on three separate loci (I, II, and III). The WGD-derived syntenic chromosomal regions in teleosts are indicated by a yellow background. The chromosomal number and the genomic contig number are indicated at the top of the schematic representation of each genomic fragment. CEACAM family genes are indicated by red diamond-shaped symbols. Marker genes are identified by colored diamond-shaped symbols. The relative position of genes on chromosomes and contigs is shown next to the gene symbols.
Figure 2
Figure 2. Analysis of CEACAM/PSG family gene evolution based on the Neighbor-Joining method.
a) Phylogenetic tree of PSG subfamily genes from human, chimpanzee, and Rhesus monkey. A Rhesus monkey-specific cluster and an ape-specific cluster are indicated by vertical bars on the right. Potential pseudogenes, including human PSG10, chimpanzee LOC468901, and Rhesus monkey LOC709992, were excluded from the analysis. Human, Hsa; Chimpanzee, Ptr; Rhesus monkey, Mmu. b) Phylogenetic tree of 48 CEACAM family proteins from human, dog (C. familiaris), opossum (M. domestica), and platypus (O. anatinus). The analysis involved 48 protein sequences. There were a total of 2119 positions in the final dataset. The human CEACAM1-like cluster is indicated by a vertical bar on the right. The CEACAM16 and 20 homologs appear to diverge from other family members before the separation of eutherian, metatherian and prototherian mammals. Human, Hsa; dog, Cfa, opossum, Mdo; platypus, Oan. It is important to note that the bootstrap values for basal lineages in this tree are extremely low. The interpretation of this Neighbor-Joining tree has to be cautious. c) Phylogenetic tree of teleost CEACAM homologs. Twenty-three CEACAM proteins from D. rerio, G. aculeatus, T. rubripes, and T. nigroviridis were analyzed. A D. rerio-specific cluster is indicated by a vertical bar on the right. The robustness of the tree was assessed by 1,000 bootstrap replicates, and the percentage of replicates is shown next to the branches.
Figure 3
Figure 3. Analysis of CEACAM/PSG family gene evolution based on the Maximum Likelihood method.
a) Phylogenetic tree of PSG subfamily genes from human, chimpanzee, and Rhesus monkey. A Rhesus monkey-specific cluster and an ape-specific cluster are indicated by vertical bars on the right. Potential pseudogenes, including human PSG10, chimpanzee LOC468901, and Rhesus monkey LOC709992, were excluded from the analysis. Human, Hsa; Chimpanzee, Ptr; Rhesus monkey, Mmu. b) Phylogenetic tree of 48 CEACAM family proteins from human, dog (C. familiaris), opossum (M. domestica), and platypus (O. anatinus). The analysis involved 48 protein sequences. The human CEACAM1-like cluster is indicated by a vertical bar on the right. Human, Hsa; dog, Cfa, opossum, Mdo; platypus, Oan. c) Phylogenetic tree of teleost CEACAM homologs. Twenty-three CEACAM proteins from D. rerio, G. aculeatus, T. rubripes, and T. nigroviridis were analyzed. A D. rerio-specific cluster is indicated by a vertical bar on the right.
Figure 4
Figure 4. Analysis of CEACAM transcript expression in tissues of platypus, pufferfish T. nigroviridis, and zebrafish D. rerio.
a) RT-PCR analysis of OanCEACAM16, 16LI, 20LI, and 20LII in the intestine of a platypus. Size markers are shown on the left. Specific PCR products for OanCEACAM16 and 20LI are indicated by arrows. b) RT-PCR detection of transcripts of DreCEACAMI, VII, and X in kidney, testis, ovary, gill, gut, head, heart, liver, and fin of D. rerio (right panel) as well as TniCEACAMI-III in brain, muscle, gut, kidney, heart, liver, gill, and skin of T. nigroviridis (left panel) using gene-specific primers (Table S4 in File S1). Expected size of PCR products for each gene is indicated by an arrow.
Figure 5
Figure 5. The human PSG locus exhibits frequent copy number variation (CNV).
a) Schematic representation of CNVs found at the human PSG locus on chromosome 19 (47,600–48,500 kb) based on studies using high-density probes and DNA sequencing , , . CNVs that were identified in CEU (U.S. residents with northern and western European ancestry, N = 20) and YRI (Yoruba from Ibadan in African, N = 20) populations are indicated by blue brackets under the chromosome. CNVs that were identified in Asian populations (Chinese, Japanese, and Koreans; N = 30) are indicated by black brackets. CNVs that have a frequency higher than 50% are indicated by bold brackets. b) The size distribution of 387 unique chromosome-19 CNVs that have a length greater than 500 bp. The figure in the inset shows the distribution of long fragment CNVs (46 in total have a length >20 kb) found on chromosome 19 and those at the PSG locus.
Figure 6
Figure 6. The CEACAM/PSG and three other progressive gene families (sialic acid binding Ig-like lectin, leukocyte immunoglobulin-like receptor, and olfactory receptor) have a higher percentage of genes containing a nonsynonymous SNP with FST scores in the top 15% bracket as compared to that of the rest of genes (conserved genes) on chromosome 19.
Progressive gene families are those encoded secreted ligands or cell surface receptors, and expanded multiple times during primate evolution.

References

    1. Gold P, Freedman SO (1965) Demonstration of Tumor-Specific Antigens in Human Colonic Carcinomata by Immunological Tolerance and Absorption Techniques. J Exp Med 121: 439–462. - PMC - PubMed
    1. Huang J, Hardy JD, Sun Y, Shively JE (1999) Essential role of biliary glycoprotein (CD66a) in morphogenesis of the human mammary epithelial cell line MCF10F. J Cell Sci 112 (Pt 23): 4193–4205. - PubMed
    1. Leung N, Turbide C, Olson M, Marcus V, Jothy S, et al. (2006) Deletion of the carcinoembryonic antigen-related cell adhesion molecule 1 (Ceacam1) gene contributes to colon tumor progression in a murine model of carcinogenesis. Oncogene 25: 5527–5536. - PubMed
    1. Yokoyama S, Chen CJ, Nguyen T, Shively JE (2007) Role of CEACAM1 isoforms in an in vivo model of mammary morphogenesis: mutational analysis of the cytoplasmic domain of CEACAM1–4S reveals key residues involved in lumen formation. Oncogene 26: 7637–7646. - PubMed
    1. Chen DS, Asanaka M, Yokomori K, Wang F, Hwang SB, et al. (1995) A pregnancy-specific glycoprotein is expressed in the brain and serves as a receptor for mouse hepatitis virus. Proc Natl Acad Sci U S A 92: 12095–12099. - PMC - PubMed

Publication types

LinkOut - more resources