Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Jun 28:8:227.
doi: 10.1186/1471-2105-8-227.

Relationship between insertion/deletion (indel) frequency of proteins and essentiality

Affiliations

Relationship between insertion/deletion (indel) frequency of proteins and essentiality

Simon K Chan et al. BMC Bioinformatics. .

Abstract

Background: In a previous study, we demonstrated that some essential proteins from pathogenic organisms contained sizable insertions/deletions (indels) when aligned to human proteins of high sequence similarity. Such indels may provide sufficient spatial differences between the pathogenic protein and human proteins to allow for selective targeting. In one example, an indel difference was targeted via large scale in-silico screening. This resulted in selective antibodies and small compounds which were capable of binding to the deletion-bearing essential pathogen protein without any cross-reactivity to the highly similar human protein. The objective of the current study was to investigate whether indels were found more frequently in essential than non-essential proteins.

Results: We have investigated three species, Bacillus subtilis, Escherichia coli, and Saccharomyces cerevisiae, for which high-quality protein essentiality data is available. Using these data, we demonstrated with t-test calculations that the mean indel frequencies in essential proteins were greater than that of non-essential proteins in the three proteomes. The abundance of indels in both types of proteins was also shown to be accurately modeled by the Weibull distribution. However, Receiver Operator Characteristic (ROC) curves showed that indel frequencies alone could not be used as a marker to accurately discriminate between essential and non-essential proteins in the three proteomes. Finally, we analyzed the protein interaction data available for S. cerevisiae and observed that indel-bearing proteins were involved in more interactions and had greater betweenness values within Protein Interaction Networks (PINs).

Conclusion: Overall, our findings demonstrated that indels were not randomly distributed across the studied proteomes and were likely to occur more often in essential proteins and those that were highly connected, indicating a possible role of sequence insertions and deletions in the regulation and modification of protein-protein interactions. Such observations will provide new insights into indel-based drug design using bioinformatics and cheminformatics tools.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Sample alignment and pipeline. A) Sample Alignment: Gaps were reported as insertions/deletions with respect to the query sequence. There are seven insertions (red) and two deletions (blue) in this sample alignment. B) Pipeline: A summary of the steps taken to calculate the mean insertion and deletion frequencies for essential and non-essential proteins in B. subtilis, E. coli, and S. cerevisiae.
Figure 2
Figure 2
Mean insertion and deletion frequencies in essential and non-essential proteins plotted against minimum indel length. Mean insertion and deletion frequencies were calculated for essential and non-essential query proteins aligned to proteins from the 22 bacteria or 15 eukaryote species. The t-test statistic is shown for the minimum indel lengths that were found significantly more often in essential (blue bars) than non-essential (purple bars) proteins. Significance was set at P < 0.05. Note that no such difference was observed in insertions within B. subtilis proteins.
Figure 3
Figure 3
Proportion of essential and non-essential proteins with indels plotted against minimum indel length. Insertions are represented by blue bars while deletions are represented by purple bars.
Figure 4
Figure 4
Approximation of abundance of indels with the Weibull distribution. r2 values close to 1.0 indicated that the abundance of insertions (blue points and blue line) and deletions (purple points and purple line) in essential and non-essential proteins of the three query species could be accurately modeled by the Weibull distribution.

Similar articles

Cited by

References

    1. Fraser CM, Gocayne JD, White O, Adams MD, Clayton RA, Fleischmann RD, Bult CJ, Kerlavage AR, Sutton G, Kelley JM, Fritchman RD, Weidman JF, Small KV, Sandusky M, Fuhrmann J, Nguyen D, Utterback TR, Saudek DM, Phillips CA, Merrick JM, Tomb JF, Dougherty BA, Bott KF, Hu PC, Lucier TS, Peterson SN, Smith HO, Hutchison CA, 3, Venter JC. The minimum gene complement of Mycoplasma genitalium. Science. 1995;270:397–403. doi: 10.1126/science.270.5235.397. - DOI - PubMed
    1. Hutchison CA, Peterson SN, Gill SR, Cline RT, White O, Fraser CM, Smith HO, Venter JC. Global transposon mutagenesis and a minimal Mycoplasma genome. Science. 1999;286:2165–2169. doi: 10.1126/science.286.5447.2165. - DOI - PubMed
    1. Glass JI, Assad-Garcia N, Alperovich N, Yooseph S, Lewis MR, Maruf M, Hutchison CA, 3, Smith HO, Venter JC. Essential genes of a minimal bacterium. Proc Natl Acad Sci. 2006;103:425–430. doi: 10.1073/pnas.0510013103. - DOI - PMC - PubMed
    1. Cole ST. Comparative mycobacterial genomics as a tool for drug target and antigen discovery. Eur Respir J Suppl. 2002;36:78s–86s. doi: 10.1183/09031936.02.00400202. - DOI - PubMed
    1. Chalker AF, Lunsford RD. Rational identification of new antibacterial drug targets that are essential for viability using a genomics-based approach. Pharmacol Ther. 2002;95:1–20. doi: 10.1016/S0163-7258(02)00222-X. - DOI - PubMed

Publication types

LinkOut - more resources