Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Oct 15;10 Suppl 12(Suppl 12):S15.
doi: 10.1186/1471-2105-10-S12-S15.

Extraction, integration and analysis of alternative splicing and protein structure distributed information

Affiliations

Extraction, integration and analysis of alternative splicing and protein structure distributed information

Matteo D'Antonio et al. BMC Bioinformatics. .

Abstract

Background: Alternative splicing has been demonstrated to affect most of human genes; different isoforms from the same gene encode for proteins which differ for a limited number of residues, thus yielding similar structures. This suggests possible correlations between alternative splicing and protein structure. In order to support the investigation of such relationships, we have developed the Alternative Splicing and Protein Structure Scrutinizer (PASS), a Web application to automatically extract, integrate and analyze human alternative splicing and protein structure data sparsely available in the Alternative Splicing Database, Ensembl databank and Protein Data Bank. Primary data from these databases have been integrated and analyzed using the Protein Identifier Cross-Reference, BLAST, CLUSTALW and FeatureMap3D software tools.

Results: A database has been developed to store the considered primary data and the results from their analysis; a system of Perl scripts has been implemented to automatically create and update the database and analyze the integrated data; a Web interface has been implemented to make the analyses easily accessible; a database has been created to manage user accesses to the PASS Web application and store user's data and searches.

Conclusion: PASS automatically integrates data from the Alternative Splicing Database with protein structure data from the Protein Data Bank. Additionally, it comprehensively analyzes the integrated data with publicly available well-known bioinformatics tools in order to generate structural information of isoform pairs. Further analysis of such valuable information might reveal interesting relationships between alternative splicing and protein structure differences, which may be significantly associated with different functions.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Events of alternative splicing. In a typical alternatively spliced gene, where most of the exons are constitutive (i.e. they are always included in the final mRNA), four different types of alternative splicing events may occur to give rise to different final transcripts: a) cassette exons (one exon is skipped in some transcripts), b) isoforms of introns or exons (their boundaries may be different in different transcripts, with consequent truncation/extension of the flanking introns/exons), c) intron retentions (one intron is not spliced out and may be inserted into the final transcript), d) mutually exclusive exons (different exons may be included in different final transcripts). On the right of the figure the alternative mRNA transcripts are displayed; conserved parts are colored in green, while alternative elements are colored in red or blue.
Figure 2
Figure 2
Entities-Relationships diagram of the PASS database. The PASS database is composed of several parts: ReferenceProteinSequences, AltSplicedProteinSequences, AlternativeSplicingEvents and PDB_ProteinSequences contain the primary data from ASD, Ensembl and PDB; PDB_ENSP, BLAST_Homology and ClustalwAlignedSequences contain the analysis data from PICR, BLAST and CLUSTALW, respectively; Annotation contains the different possible positions defined for the alternative elements of a couple of protein sequences (and used in the residue annotation); while FeatureMap3D_Reports, AverageStructures and Residues contain the FeatureMap3D analysis results and their processing.
Figure 3
Figure 3
Protein structures colored according to their residue annotation. Example of the isoforms for the gene ENSG00000104870 (IgG receptor FcRn large subunit p51 precursor): the same isoform (a, c) is aligned using CLUSTALW to two other different isoforms (b, d) of the same gene. The parts conserved between the two aligned isoforms are colored in green, the residues conserved only in one isoform are colored in blue (the first and last residue of the insertion are colored in red in the other isoform), and the different mismatches between the two sequences are depicted in other colors, based on whether the substitution has a positive or negative value in the BLOSUM62 matrix. The images are obtained with PyMOL .
Figure 4
Figure 4
Entities-Relationships diagram of the PassUsers database. The PassUsers database is composed of three main parts that store: the registration data for every user (Users), the data uploaded by the user (DataSets, with the definition of the uploaded datasets; DataTypes, with the definition of the type of data uploaded; and UploadedData, with all the identifiers uploaded for every dataset), and the user's search data (Searches, with the searches defined by every user; Queries, with all the queries the user may make; and Fields, with the information about what field in the SELECT statement of the query the user has chosen to display).
Figure 5
Figure 5
Search section of the PASS Web application. Screenshot of the PASS search main page that enables users to define the initial parameters of a search to be performed in the PASS database.
Figure 6
Figure 6
Example of structural analysis. Isoforms (identified by the ENSP code of the reference protein and their splicing pattern) of the gene ENSG00000170899 (Glutathione S-transferase A4) and bar plot composition of their structure: alpha helices (green), extended strands participating in beta ladder (blue), 3-helices (yellow), hydrogen bonded turns (purple), bends (white), and other structures (cyan); the last two columns on the right contain the bar plots of the accessibility (static solvent exposure) and flexibility (B factor) values of the isoforms.
Figure 7
Figure 7
Examples of the primary data considered. a) Example data of the alternative splicing events of gene ENSG00000104870 extracted from the ASD database. b) Output alignment from CLUSTALW of two alternatively spliced proteins from the gene in a); * indicates a conserved residue. c) Residue annotation defined for the pair of CLUSTALW aligned sequences in b); the residues labeled as "1" are present only in one sequence, the residues labeled as "2" represent the positions where residues are inserted in the other sequence, and the residues labeled as "." are conserved in the two sequences. d) Output report of the FeatureMap3D analysis of the annotated sequences displayed in c); the data colored in red are those extracted and stored into the PASS database.

Similar articles

References

    1. Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF, et al. The genome sequence of Drosophila melanogaster. Science. 2000;287:2185–2195. doi: 10.1126/science.287.5461.2185. - DOI - PubMed
    1. Modrek B, Resch A, Grasso C, Lee C. Genome-wide detection of alternative splicing in expressed sequences of human genes. Nucleic Acids Res. 2001;29:2850–2859. doi: 10.1093/nar/29.13.2850. - DOI - PMC - PubMed
    1. Hiller M, Backofen R, Heymann S, Busch A, Glaesser TM, Freytag JC. Efficient prediction of alternative splice forms using protein domain homology. In Silico Biol. 2004;4:195–208. - PubMed
    1. Boue S, Letunic I, Bork P. Alternative splicing and evolution. Bioessays. 2003;25:1031–1034. doi: 10.1002/bies.10371. - DOI - PubMed
    1. Zhang T, Haws P, Wu Q. Multiple variable first exons: a mechanism for cell- and tissue-specific gene regulation. Genome Res. 2004;14:79–89. doi: 10.1101/gr.1225204. - DOI - PMC - PubMed