Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct 9;48(18):e103.
doi: 10.1093/nar/gkaa680.

DGINN, an automated and highly-flexible pipeline for the detection of genetic innovations on protein-coding genes

Affiliations

DGINN, an automated and highly-flexible pipeline for the detection of genetic innovations on protein-coding genes

Lea Picard et al. Nucleic Acids Res. .

Abstract

Adaptive evolution has shaped major biological processes. Finding the protein-coding genes and the sites that have been subjected to adaptation during evolutionary time is a major endeavor. However, very few methods fully automate the identification of positively selected genes, and widespread sources of genetic innovations such as gene duplication and recombination are absent from most pipelines. Here, we developed DGINN, a highly-flexible and public pipeline to Detect Genetic INNovations and adaptive evolution in protein-coding genes. DGINN automates, from a gene's sequence, all steps of the evolutionary analyses necessary to detect the aforementioned innovations, including the search for homologs in databases, assignation of orthology groups, identification of duplication and recombination events, as well as detection of positive selection using five methods to increase precision and ranking of genes when a large panel is analyzed. DGINN was validated on nineteen genes with previously-characterized evolutionary histories in primates, including some engaged in host-pathogen arms-races. Our results confirm and also expand results from the literature, including novel findings on the Guanylate-binding protein family, GBPs. This establishes DGINN as an efficient tool to automatically detect genetic innovations and adaptive evolution in diverse datasets, from the user's gene of interest to a large gene list in any species range.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Workflow diagram of DGINN. Phylogenetic steps (yellow) happen sequentially from the entry point of the pipeline (Steps 1–4). Each genetic innovation step (purple, Steps 5, 6 and 7) is optional. All red arrowheads denote possible entry points into the pipeline following file formats from Table 1.
Figure 2.
Figure 2.
Example of workflow on the HERC5 primate gene. The workflow follows the diagram from Figure 1. Using human HERC5 CDS as the starting point in DGINN gave results for both HERC5 and HERC6. The number of sequences (seq) retrieved or left after each step is indicated. In the bottom panel, each colored circle represents the results from one of the five methods to detect positive selection at the gene level, with red representing significant evidence of positive selection and blue no significant evidence. P-values are indicated below the colored circles. Gp, orthologous group.
Figure 3.
Figure 3.
DGINN results on the validation dataset. The nineteen primate genes studied are color-coded according to their selection profile category (Table 2). Left panel, number of methods detecting significant positive selection for each alignment; each method is color-coded (embedded legend). Right panel, percentage of positively selected sites (by at least one method) over the length of the query coding sequence. Genes are ordered by descending number of methods detecting positive selection then descending percentage of positively selected sites.
Figure 4.
Figure 4.
Positive selection patterns on nineteen primate genes. The genes are color-coded according to their selection profile category (Table 2) and follow the same order as in Figure 3. Genes without positively selected sites were excluded from this representation. Positively selected sites are represented as a spike at their position on the alignment. Height of the peak is proportional to the number of methods that have identified the site as being under positive selection (posterior probabilities > 0.95 for Bio++ and PAML codeml M2 and M8 models, and P-value < 0.10 for MEME), with each method being represented by a different color (embedded legend). HYPHY MEME sites were only mapped if the gene was detected as under positive selection by BUSTED (P < 0.05). For each gene, alignment coverage is represented under the line, which itself represents the length of the alignment in light gray.
Figure 5.
Figure 5.
Evolutionary history of the primate GBP family. (A) Maximum-likelihood phylogeny established through DGINN based on a run on the GBP5 query (step 4). The four main primate lineages are identified by color-coding: Old World monkeys, blue; Hominoids, green; New World monkeys, orange; prosimians, purple/pink. Asterisks (*) denote nodes that are statistically supported by aLRT > 0.90. The GBP5 group, which lacks Old World monkey sequences, is boxed in yellow. The scale bar represents the number of nucleotide substitutions per site and the tree was midpoint rooted. (B) Maximum-likelihood phylogeny of the GBP5 group of primate orthologs established through DGINN screen (step 7). (C) Maximum-likelihood phylogeny of the whole GBP family performed in DGINN after manual addition of primate GBP4 and GBP6 sequences. (D) Diagram of the genomic locus of the GBP gene family in seven simian primate species. The reference genomes from the NCBI used were: papAnu (Papio anubis): Panu_3.0, macMul (Macaca mulatta): Mmul10, chlSab (Chlorocebus sabaeus): Chlorocebus_sabeus 1.1, homSap (Homo sapiens): GRCh38.p13, gorGor (Gorilla gorilla): gorGor4, saiBol (Saimiri boliviensis): saiBol1.0. All alignments and phylogenies for panel A, B and C (referred as 5A_aln, 5A_tree etc.) can be found on the GitHub (see Availability).

Similar articles

Cited by

References

    1. Daugherty M.D., Malik H.S.. Rules of engagement: molecular insights from host-virus arms races. Annu. Rev. Genet. 2012; 46:677–700. - PubMed
    1. Daugherty M.D., Zanders S.E.. Gene conversion generates evolutionary novelty that fuels genetic conflicts. Curr. Opin. Genet. Dev. 2019; 58–59:49–54. - PMC - PubMed
    1. Kondrashov F.A. Gene duplication as a mechanism of genomic adaptation to a changing environment. Proc. R. Soc. B. 2012; 279:5048–5057. - PMC - PubMed
    1. McLaughlin R.N., Malik H.S.. Genetic conflicts: the usual suspects and beyond. J. Exp. Biol. 2017; 220:6–17. - PMC - PubMed
    1. Kosiol C., Vinař T., da Fonseca R.R., Hubisz M.J., Bustamante C.D., Nielsen R., Siepel A.. Patterns of positive selection in six mammalian genomes. PLoS Genet. 2008; 4:e1000144. - PMC - PubMed

Publication types