Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;4(5):641-57.
doi: 10.1093/gbe/evs035. Epub 2012 Apr 13.

Identification and characterization of lineage-specific highly conserved noncoding sequences in Mammalian genomes

Affiliations

Identification and characterization of lineage-specific highly conserved noncoding sequences in Mammalian genomes

Mahoko Takahashi et al. Genome Biol Evol. 2012.

Abstract

Vertebrate genome comparisons revealed that there are highly conserved noncoding sequences (HCNSs) among a wide range of species and many of which contain regulatory elements. However, recently emerged sequences conserved in specific lineages have not been well studied. Toward this end, we identified 8,198 primate and 21,128 specific HCNSs as representative ones among mammals from human-marmoset and mouse-rat comparisons, respectively. Derived allele frequency analysis of primate-specific HCNSs showed that these HCNSs were under purifying selection, indicating that they may harbor important functions. We selected the top 1,000 largest HCNSs and compared the lineage-specific HCNS-flanking genes (LHF genes) with ultraconserved element (UCE)-flanking genes. Interestingly, the majority of LHF genes were different from UCE-flanking genes. This lineage-specific set of LHF genes was more enriched in protein-binding function. Conversely, the number of LHF genes that were also shared by UCEs was small but significantly larger than random expectation, and many of these genes were involved in anatomical development as transcriptional regulators, suggesting that certain groups of genes preferentially recruit new HCNSs in addition to old HCNSs that are conserved among vertebrates. This group of LHF genes might be involved in the various levels of lineage-specific evolution among vertebrates, mammals, primates, and rodents. If so, the emergence of HCNSs in and around these two groups of LHF genes developed lineage-specific characteristics. Our results provide new insight into lineage-specific evolution through interactions between HCNSs and their LHF genes.

PubMed Disclaimer

Figures

F<sc>ig</sc>. 1.—
Fig. 1.—
Phylogenetic relationship of species mainly used in this study. The blue, yellow, and purple circles represent primate-specific, rodent-specific, and vertebrate-shared HCNSs, respectively. The approximate divergence times for ancestral species of each lineage are shown on the tree (Mouse Genome Sequencing Consortium 2002; Hedges and Kumar 2003; Gibbs et al. 2004; She et al. 2006).
F<sc>ig</sc>. 2.—
Fig. 2.—
The procedure of extraction of primate- and rodent-specific HCNSs. Pairwise alignments represent human–marmoset and mouse–rat alignments for extraction of primate- and rodent-specific HCNSs, respectively. After the step “remove vertebrate homologous regions,” another filtering was applied for primate comparison, and the sequences that were not conserved in other primate species (rhesus macaque, orangutan, and chimpanzee) were removed.
F<sc>ig</sc>. 3.—
Fig. 3.—
Substitution rates in lineage-specific intergenic HCNSs and their flanking regions. The average substitution number per site within 100-bp window in the range of ±10,000 bp of the top largest 1,000 primate-specific HCNSs (A) and rodent-specific HCNSs (B). The insets show enlarged distributions in the range of ±1,500 bp. The red lines represent average substitution numbers per site of nongapped noncoding regions in the human and mouse genomes, respectively. The error bars are 95% confidence intervals of substitution rate in each window.
F<sc>ig</sc>. 4.—
Fig. 4.—
DAF distribution in primate-specific HCNS. DAF distribution of Yoruba from Nigeria (YRI) (A), Han Chinese from Beijing combined with Japanese from Tokyo (ASN) (B), and American of European ancestry (CEU) (C). Light gray and blue bars represent data for SNPs in the nonrepetitive human genome and SNPs within primate-specific HCNSs. Error bars were estimated using binominal distribution as σ2 = (pq)/n, where p represented the fraction of SNPs in a particular bin, q represented 1 − p, and n represented the total number of SNPs. All primate-specific HCNSs (8,198) were used for this analysis.
F<sc>ig</sc>. 5.—
Fig. 5.—
Fractions of genic categories in whole genomes and lineage-specific HCNSs. The pie charts show percentages of genic categories in the human genome (left) and primate-specific HCNSs (right) (A), in the mouse genome (left) and rodent-specific HCNSs (right) (B). The percentages of UTRs become markedly elevated in the lineage-specific HCNSs. The distribution of genic categories between genomes and lineage-specific HCNSs showed significant difference (P < 10−15, Chi-squared test).
F<sc>ig</sc>. 6.—
Fig. 6.—
Definition of LHF orthologs. The primate and rodent LHF orthologs are defined as the ortholog of primate LHF gene in rodents (Gene A in rodents), and the ortholog of rodent LHF gene in primates (Gene B in primates). Although primate and rodent LHF genes recruited lineage-specific HCNSs after the divergence of each lineage from the common ancestor, the majority of primate and rodent LHF orthologs did not.
F<sc>ig</sc>. 7.—
Fig. 7.—
Comparison of genes among lineage-specific HCNSs and UCEs. Upper 7 panels show the scatter plots of the number of overrepresented gene functions and their P values obtained by GO analysis. The letters A through G in the scatter plots are corresponding to the letters in the Venn diagram, which shows the number of overlapping LHF genes among primate- and rodent-specific HCNSs and UCEs (numbers in parentheses).
F<sc>ig</sc>. 8.—
Fig. 8.—
Examples of lineage-specific HCNS and UCE distributions. Purple, light blue, and yellow circles represent the position of UCE, primate-, and rodent-specific HCNSs, respectively. Examples of PBX1 (A), Pbx3 (B), SOX13 (C), Sox6 (D), MEF2C (E), TLE4 (F), NPAS3 (G), and FOXP1 (H) are shown in the figure. When LHF genes are of primate-specific HCNSs, the distribution of HCNSs and UCEs are always shown on the human genes. All genes but NPAS3 are highly conserved in vertebrates. For NPAS3, both human and mouse genes are shown since there is no intronic region corresponding to rodent-specific HCNSs in the human gene. As additional information, the human accelerated region (HAR) is shown.

Similar articles

Cited by

References

    1. Ahituv N, Rubin EM, Nobrega MA. Exploiting human–fish genome comparisons for deciphering gene regulation. Hum Mol Genet. 2004;13:R261–R266. - PubMed
    1. Aparicio S, et al. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science. 2002;297:1301–1310. - PubMed
    1. Ashburner M, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25:25–29. - PMC - PubMed
    1. Beissbarth T, Speed TP. GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics. 2004;20:1464–1465. - PubMed
    1. Bejerano G, et al. Ultraconserved elements in the human genome. Science. 2004;304:1321–1325. - PubMed

Publication types