Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May 19;38(6):2306-2318.
doi: 10.1093/molbev/msab026.

Genome-Scale Profiling Reveals Noncoding Loci Carry Higher Proportions of Concordant Data

Affiliations

Genome-Scale Profiling Reveals Noncoding Loci Carry Higher Proportions of Concordant Data

Robert Literman et al. Mol Biol Evol. .

Abstract

Many evolutionary relationships remain controversial despite whole-genome sequencing data. These controversies arise, in part, due to challenges associated with accurately modeling the complex phylogenetic signal coming from genomic regions experiencing distinct evolutionary forces. Here, we examine how different regions of the genome support or contradict well-established relationships among three mammal groups using millions of orthologous parsimony-informative biallelic sites (PIBS) distributed across primate, rodent, and Pecora genomes. We compared PIBS concordance percentages among locus types (e.g. coding sequences (CDS), introns, intergenic regions), and contrasted PIBS utility over evolutionary timescales. Sites derived from noncoding sequences provided more data and proportionally more concordant sites compared with those from CDS in all clades. CDS PIBS were also predominant drivers of tree incongruence in two cases of topological conflict. PIBS derived from most locus types provided surprisingly consistent support for splitting events spread across the timescales we examined, although we find evidence that CDS and intronic PIBS may, respectively and to a limited degree, inform disproportionately about older and younger splits. In this era of accessible wholegenome sequence data, these results:1) suggest benefits to more intentionally focusing on noncoding loci as robust data for tree inference and 2) reinforce the importance of accurate modeling, especially when using CDS data.

Keywords: bioinformatics; genomics; phylogenetics.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Evolutionary relationships among study taxa. These relationships, supported by three independent phylogenomic studies, were also fully resolved in 36/39 trees inferred in this study. For each split in the tree, the size of filled node icons is proportional to the number of parsimony-informative biallelic sites (PIBS) that support that split under parsimony (i.e. clustering taxa by alleles and assessing monophyly). Split support ranges for each focal group were as follows: Pecora (green squares): 173 K–1.04M; Primates (orange triangles): 148 K–1.86M; Rodents (blue circles): 27.4–600 K. Open circles denote nodes included in the combined analysis that were excluded from focal analyses, and are not scaled to support size (combined support range: 487–33.9 K sites). Tip labels for reference annotation species are red and bolded. Relative to splits seen in the reference topologies, nodes outlined in red are swapped in the TimeTree database.
Fig. 2.
Fig. 2.
Concordance rates of parsimony-informative biallelic site (PIBS) derived from different locus types. Modified Z-score analysis of genome-wide PIBS concordance (i.e. the proportion of sites where biallelic variation reflects a true split event) reveals that PIBS derived from different locus types varied significantly the proportion of sites supporting (a) the entire reference tree, and (b) two conflicting nodes from the TimeTree database for rodents and Pecora. Filled shapes indicate locus types with concordance percentages that are either significantly higher or lower than the median concordance among locus types. (a) Across datasets, PIBS derived from CDS displayed the lowest concordance relative to all locus types (all P≤ 2.13E−7). (b) When comparing support for the correct relationships and the incompatible phylogenies from TimeTree, CDS PIBS were most likely to support the incorrect topology in both cases (both P≤ 1.08E−6). Conversely, 5′-UTR PIBS provided proportionally more support for the reference relationships (both P≤ 2.12E−11).
Fig. 3.
Fig. 3.
Changes in phylogenetic utility over time among locus types. Based on divergence times estimated from SISRS orthologs (displayed here) as well as dates from the TimeTree database, we ran linear regression analyses to determine whether the proportion of parsimony-informative biallelic sites (PIBS) from different locus types changed in their phylogenetic utility over time. Filled shaped indicate locus types where PIBS inform disproportionately on older or more recent splits. Among rodents and in the combined analysis, CDS-derived PIBS (upper left) provided proportionally more support for older splits (both P≤ 1.08E−6), while conversely and for the same groups, intron-derived PIBS (upper right) informed disproportionately about younger splits (both P≤ 2.34E−3). Sites from genes that were not annotated as CDS, UTR, or intron (“Genic [Other]”; lower left) show a weaker trend toward increased utility at younger nodes in rodents (P = 3.77E−3), but the relationship is not significant when using dates from TimeTree (P= 0.113). No other locus type, including intergenic/unannotated sites (lower right), displayed any time-dependent shifts in phylogenetic support.

Similar articles

Cited by

References

    1. Aguileta G, Marthey S, Chiapello H, Lebrun M-H, Rodolphe F, Fournier E, Gendrault-Jacquemard A, Giraud T.. 2008. Assessing the performance of single-copy genes for recovering robust phylogenies. Syst Biol 57(4):613–627. - PubMed
    1. Bejerano G. 2004. Ultraconservedelements in the human genome. Science 304(5675):1321–1325. - PubMed
    1. Biswas MK, Bagchi M, Nath UK, Biswas D, Natarajan S, Jesse DMI, Park J-I, Nou I-S.. 2020. Transcriptome wide SSR discovery cross-taxa transferability and development of marker database for studying genetic diversity population structure of Lilium species. Sci Rep 10(1):1–13. - PMC - PubMed
    1. Bleidorn C. 2017. Sources of error and incongruence in phylogenomic analyses. Phylogenomics 173–193, doi:10.1007/978-3-319-54064-1_9
    1. Boisvert S, Laviolette F, Corbeil J.. 2010. Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies. JComput Biol 17(11):1519–1533. - PMC - PubMed

Publication types

Substances

LinkOut - more resources