Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jul;41(Web Server issue):W286-91.
doi: 10.1093/nar/gkt497. Epub 2013 Jun 12.

SigniSite: Identification of residue-level genotype-phenotype correlations in protein multiple sequence alignments

Affiliations

SigniSite: Identification of residue-level genotype-phenotype correlations in protein multiple sequence alignments

Leon Eyrich Jessen et al. Nucleic Acids Res. 2013 Jul.

Abstract

Identifying which mutation(s) within a given genotype is responsible for an observable phenotype is important in many aspects of molecular biology. Here, we present SigniSite, an online application for subgroup-free residue-level genotype-phenotype correlation. In contrast to similar methods, SigniSite does not require any pre-definition of subgroups or binary classification. Input is a set of protein sequences where each sequence has an associated real number, quantifying a given phenotype. SigniSite will then identify which amino acid residues are significantly associated with the data set phenotype. As output, SigniSite displays a sequence logo, depicting the strength of the phenotype association of each residue and a heat-map identifying 'hot' or 'cold' regions. SigniSite was benchmarked against SPEER, a state-of-the-art method for the prediction of specificity determining positions (SDP) using a set of human immunodeficiency virus protease-inhibitor genotype-phenotype data and corresponding resistance mutation scores from the Stanford University HIV Drug Resistance Database, and a data set of protein families with experimentally annotated SDPs. For both data sets, SigniSite was found to outperform SPEER. SigniSite is available at: http://www.cbs.dtu.dk/services/SigniSite/.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Sequence logo. Example of sequence logo (13) output from SigniSite from the analysis of the ATV ∼Antivirogram multiple sequence alignment (MSA), truncated to p1p35 for the purpose of illustration (see ‘Materials and Methods’ section). The analysis was performed with default settings. On the x-axis are the MSA positions p and on the y-axis the Z-scores for each amino acid residue a (formula image). The height of each letter representing the residues is proportional to formula image, i.e. the strength of the statistical association between the residue and the data set-phenotype. Residues above the Z = 0 line have a formula image, i.e. enhances the phenotype, whereas residues below the Z = 0 line have a formula image, i.e. inhibits the phenotype, e.g. the presence of a certain residue with favourable chemical properties may enhance binding (formula image), whereas a residue with unfavourable properties may inhibit binding (formula image). Colour-coding: acidic [DE]: red, basic [HKR]: blue, hydrophobic [ACFILMPVW]: black and neutral [GNQSTY]: green (14).
Figure 2.
Figure 2.
SigniSite heatmap from the analysis of the ATV ∼Antivirogram multiple sequence alignment (MSA), truncated to p1p35 for the purpose of illustration (see ‘Materials and Methods’ section). The analysis was performed with default settings. On the x-axis are the 20 proteogenic amino acids a and on the y-axis the positions p in the analysed MSA. The colour coding of the fields is such that fields reflecting formula image are blue, whereas formula image results in a red field. For formula image, nuances in between are used. If a residue has a formula image of 0, the cell is coloured grey. Absent residues are coloured black. If only one grey cell is present at a given position, this implies that the position is fully conserved, harbouring only this residue. If more grey cells are present, their associated P-values have become formula image after correction for multiple testing.
Figure 3.
Figure 3.
Measures are mean (AUC) ± SE. Columns are: HIV [SPEER/SIGNI], SPEER and SigniSite’s predictions on the HIVdb data set. SDP [SPEER/SIGNI] SPEER and SigniSite’s predictions on the SDP data set. P-values quantifying the significance of the difference in performance were obtained using a two-tailed paired t-test.

Similar articles

Cited by

References

    1. Shcherbo D, Shemiakina II, Ryabova AV, Luker KE, Schmidt BT, Souslova EA, Gorodnicheva TV, Strukova L, Shidlovskiy KM, Britanova OV, et al. Near-infrared fluorescent proteins. Nat. Methods. 2010;7:827–829. - PMC - PubMed
    1. Gnidehou S, Jessen L, Gangnard S, Ermont C, Triqui C, Quiviger M, Guitard J, Lund O, Deloron P, Ndam NT. Insight into antigenic diversity of VAR2CSA-DBL5ϵ Domain from multiple Plasmodium falciparum placental isolates. PLoS One. 2010;5:e13105. - PMC - PubMed
    1. Brandt BW, Feenstra KA, Heringa J. Multi-Harmony: detecting functional specificity from sequence alignment. Nucleic Acids Res. 2010;38:35–40. - PMC - PubMed
    1. Capra JA, Singh M. Characterization and prediction of residues determining protein functional specificity. Bioinformatics. 2008;24:1473–1480. - PMC - PubMed
    1. Chakrabarti S, Bryant SH, Panchenko AR. Functional specificity lies within the properties and evolutionary changes of amino acids. J. Mol. Biol. 2007;373:801–810. - PMC - PubMed

Publication types