Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Nov 20;19(Suppl 14):414.
doi: 10.1186/s12859-018-2381-1.

Automated shape-based clustering of 3D immunoglobulin protein structures in chronic lymphocytic leukemia

Affiliations

Automated shape-based clustering of 3D immunoglobulin protein structures in chronic lymphocytic leukemia

Eleftheria Polychronidou et al. BMC Bioinformatics. .

Abstract

Background: Although the etiology of chronic lymphocytic leukemia (CLL), the most common type of adult leukemia, is still unclear, strong evidence implicates antigen involvement in disease ontogeny and evolution. Primary and 3D structure analysis has been utilised in order to discover indications of antigenic pressure. The latter has been mostly based on the 3D models of the clonotypic B cell receptor immunoglobulin (BcR IG) amino acid sequences. Therefore, their accuracy is directly dependent on the quality of the model construction algorithms and the specific methods used to compare the ensuing models. Thus far, reliable and robust methods that can group the IG 3D models based on their structural characteristics are missing.

Results: Here we propose a novel method for clustering a set of proteins based on their 3D structure focusing on 3D structures of BcR IG from a large series of patients with CLL. The method combines techniques from the areas of bioinformatics, 3D object recognition and machine learning. The clustering procedure is based on the extraction of 3D descriptors, encoding various properties of the local and global geometrical structure of the proteins. The descriptors are extracted from aligned pairs of proteins. A combination of individual 3D descriptors is also used as an additional method. The comparison of the automatically generated clusters to manual annotation by experts shows an increased accuracy when using the 3D descriptors compared to plain bioinformatics-based comparison. The accuracy is increased even more when using the combination of 3D descriptors.

Conclusions: The experimental results verify that the use of 3D descriptors commonly used for 3D object recognition can be effectively applied to distinguishing structural differences of proteins. The proposed approach can be applied to provide hints for the existence of structural groups in a large set of unannotated BcR IG protein files in both CLL and, by logical extension, other contexts where it is relevant to characterize BcR IG structural similarity. The method does not present any limitations in application and can be extended to other types of proteins.

Keywords: 3D protein descriptors; CLL protein clustering; descriptor fusion.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The author(s) declare(s) that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Block diagram illustrating the proposed methodology
Fig. 2
Fig. 2
Determination of optimal number of clusters for the FPFH descriptor
Fig. 3
Fig. 3
Clustering of the annotated protein dataset, using the combined descriptors method
Fig. 4
Fig. 4
Clustering of both the combined annotated and unannotated protein dataset, using the combined descriptors method

References

    1. Bender A, Glen RC. Molecular similarity: a key technique in molecular informatics. Org Biomol Chem. 2004;2(22):3204–18. doi: 10.1039/b409813g. - DOI - PubMed
    1. Webb B, Sali A. Comparative protein structure modeling using MODELLER. Curr Protoc Bioinforma. 2014;47(1):5–6. doi: 10.1002/0471250953.bi0506s47. - DOI - PubMed
    1. Axenopoulos A, Rafailidis D, Papadopoulos G, Houstis EN, Daras P. Similarity search of flexible 3d molecules combining local and global shape descriptors. IEEE/ACM Trans Comput Biol Bioinforma. 2016;13(5):954–70. doi: 10.1109/TCBB.2015.2498553. - DOI - PubMed
    1. Murzin AG, Brenner SE, Hubbard T, Chothia C. Scop: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995;247(4):536–40. - PubMed
    1. Knudsen M, Wiuf C. The cath database. Hum Genomics. 2010;4(3):207. doi: 10.1186/1479-7364-4-3-207. - DOI - PMC - PubMed

Substances

LinkOut - more resources