Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Dec;3(12):e247.
doi: 10.1371/journal.pcbi.0030247.

Comparative genomics search for losses of long-established genes on the human lineage

Affiliations

Comparative genomics search for losses of long-established genes on the human lineage

Jingchun Zhu et al. PLoS Comput Biol. 2007 Dec.

Abstract

Taking advantage of the complete genome sequences of several mammals, we developed a novel method to detect losses of well-established genes in the human genome through syntenic mapping of gene structures between the human, mouse, and dog genomes. Unlike most previous genomic methods for pseudogene identification, this analysis is able to differentiate losses of well-established genes from pseudogenes formed shortly after segmental duplication or generated via retrotransposition. Therefore, it enables us to find genes that were inactivated long after their birth, which were likely to have evolved nonredundant biological functions before being inactivated. The method was used to look for gene losses along the human lineage during the approximately 75 million years (My) since the common ancestor of primates and rodents (the euarchontoglire crown group). We identified 26 losses of well-established genes in the human genome that were all lost at least 50 My after their birth. Many of them were previously characterized pseudogenes in the human genome, such as GULO and UOX. Our methodology is highly effective at identifying losses of single-copy genes of ancient origin, allowing us to find a few well-known pseudogenes in the human genome missed by previous high-throughput genome-wide studies. In addition to confirming previously known gene losses, we identified 16 previously uncharacterized human pseudogenes that are definitive losses of long-established genes. Among them is ACYL3, an ancient enzyme present in archaea, bacteria, and eukaryotes, but lost approximately 6 to 8 Mya in the ancestor of humans and chimps. Although losses of well-established genes do not equate to adaptive gene losses, they are a useful proxy to use when searching for such genetic changes. This is especially true for adaptive losses that occurred more than 250,000 years ago, since any genetic evidence of the selective sweep indicative of such an event has been erased.

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Algorithm for Identifying Losses of Well-Established Genes in the Human Lineage Since the Common Ancestor of Euarchontoglires
TransMap predicts (pseudo)genes in the human and dog genomes by syntenically mapping mouse mRNA gene structures to the target genomes through genome alignments and then transferring over the corresponding genomic coordinates. The predicted coding regions are conceptually translated and scanned for ORF-disrupting mutations. In this example, a stop codon (labeled with an “*”) is detected in the first coding exon mapped to the human genome, which has also experienced an insertion. Genomic insertions or deletions are shown as red rectangles. Of 19,541 mouse RefSeq genes, 1,008 are identified as initial candidate gene losses in the human lineage based on the differential mutation status in the TransMap results. The list is narrowed down to 72 after eliminating those overlapping with human transcription evidence and filtered out by a manual inspection. Twenty-six are identified as losses of well-established genes in the human lineage after analyzing their duplication histories.
Figure 2
Figure 2. A Nonsense Mutation Inactivated ACYL3 During Great Ape Evolution
It occurred after the divergence of gorillas from the human lineage and before the human–chimp split. The nonsense mutation is located in exon 13 of ACYL3. A multispecies syntenic alignment showing the nonsense mutation (“*”) lies in a highly conserved protein coding region. The stop codon mutation (TGA) is present in the human and chimp genomes, but a TGG tryptophan (W) codon is present in the rhesus, mouse, rat, dog, and other mammalian genomes. The region maps to human Chromosome 18 at location 54,881,070–54,881,124. We sequenced the genomic region in a gorilla DNA sample to show that the codon TGG (W) codon is present in the gorilla genomes.
Figure 3
Figure 3. Timing of the Gene Losses in the Human Lineage Since the Common Ancestor of Euarchontoglires, Estimated Based on Shared Mutation Analysis
Branch intervals enclosing the earliest ORF-disrupting mutations shared between human and other mammals are illustrated on the human lineage of a mammalian species tree. Genes are represented by numbers, which correspond to their row numbers in Tables 1 and 2. Marks on the rhesus lineage represent independent ORF-disrupting mutations that are not shared with the ones in the human lineage. Approximate time when the species diverged from the human lineage is shown in Mya. Species with complete genome sequences are enclosed by rectangles, while others only have trace sequences available for analysis. Orang.: orangutan; Marm.: marmoset; T.shrew: tree shrew.
Figure 4
Figure 4. Timing of Gene Birth Is Estimated by Determining Duplication Histories of Genomic Regions Surrounding the Gene Losses
For the subset of gene losses with detectable human self-alignment, the duplication branch is determined by tracing each duplicate of the best human self-alignment through a seven-species syntenic genomic alignment. For those without detectable self-alignments, the duplication branch is determined by the seven-species syntenic alignments plus alignments with the chicken genome, and the elephant, tenrec, and armadillo scaffolds when available. Filled rectangles represent syntenic alignment to the human genome, and open rectangles represent genomes or scaffolds aligned to the human genome without the syntenic constraint. Approximate time when the species diverged from the human lineage is shown in Mya.

References

    1. Tournamille C, Colin Y, Cartron JP, Le Van Kim C. Disruption of a GATA motif in the Duffy gene promoter abolishes erythroid gene expression in Duffy-negative individuals. Nat Genet. 1995;10:224–228. - PubMed
    1. Dean M, Carrington M, Winkler C, Huttley GA, Smith MW, et al. Genetic restriction of HIV-1 infection and progression to AIDS by a deletion allele of the CKR5 structural gene. Hemophilia Growth and Development Study, Multicenter AIDS Cohort Study, Multicenter Hemophilia Cohort Study, San Francisco City Cohort, ALIVE Study. Science. 1996;273:1856–1862. - PubMed
    1. Stedman HH, Kozyak BW, Nelson A, Thesier DM, Su LT, et al. Myosin gene mutation correlates with anatomical changes in the human lineage. Nature. 2004;428:415–418. - PubMed
    1. Olson MV. When less is more: gene loss as an engine of evolutionary change. Am J Hum Genet. 1999;64:18–23. - PMC - PubMed
    1. Ringelhann B, Hathorn MK, Jilly P, Grant F, Parniczky G. A new look at the protection of hemoglobin AS and AC genotypes against plasmodium falciparum infection: a census tract approach. Am J Hum Genet. 1976;28:270–279. - PMC - PubMed

Publication types