Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Aug 31;19(8):e1011404.
doi: 10.1371/journal.pcbi.1011404. eCollection 2023 Aug.

Phylogenetic inference of the emergence of sequence modules and protein-protein interactions in the ADAMTS-TSL family

Affiliations

Phylogenetic inference of the emergence of sequence modules and protein-protein interactions in the ADAMTS-TSL family

Olivier Dennler et al. PLoS Comput Biol. .

Abstract

Numerous computational methods based on sequences or structures have been developed for the characterization of protein function, but they are still unsatisfactory to deal with the multiple functions of multi-domain protein families. Here we propose an original approach based on 1) the detection of conserved sequence modules using partial local multiple alignment, 2) the phylogenetic inference of species/genes/modules/functions evolutionary histories, and 3) the identification of co-appearances of modules and functions. Applying our framework to the multidomain ADAMTS-TSL family including ADAMTS (A Disintegrin-like and Metalloproteinase with ThromboSpondin motif) and ADAMTS-like proteins over nine species including human, we identify 45 sequence module signatures that are associated with the occurrence of 278 Protein-Protein Interactions in ancestral genes. Some of these signatures are supported by published experimental data and the others provide new insights (e.g. ADAMTS-5). The module signatures of ADAMTS ancestors notably highlight the dual variability of the propeptide and ancillary regions suggesting the importance of these two regions in the specialization of ADAMTS during evolution. Our analyses further indicate convergent interactions of ADAMTS with COMP and CCN2 proteins. Overall, our study provides 186 sequence module signatures that discriminate distinct subgroups of ADAMTS and ADAMTSL and that may result from selective pressures on novel functions and phenotypes.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Domain/motif organization of the 26 human ADAMTS-TSL paralogs, adapted from [5].
Fig 2
Fig 2. Phylogenetic inference of module and phenotype appearances.
The different steps of the method, illustrated here for a dummy set of sequences containing two paralogs p1 and p2 (from one species) and their ortholog p3 (from another species), are: 1) Inference of the reference gene tree from protein sequences by a standard pipeline (PASTA, RAxML, TreeFix); 2) Identification of conserved sequence modules (i.e. sets of strongly similar segments from at least 2 protein sequences aligned in PLMA blocks by Paloma-D); 3) Inference of the module composition of ancestral genes in the reference tree (through Module-Gene-Species reconciliation by SEADOG-MD using the phylogenetic tree of each module inferred with PhyML and TreeFix); 4) Annotation of proteins with known phenotypic traits of interest (here Protein-Protein Interactions); 5) Reconstruction of the ancestral scenario of phenotype evolution across the reference gene tree (PastML); 6) Merging module and phenotype evolutionary information: each ancestral gene of the reference gene tree is then characterized by a module composition and a set of phenotypic traits (protein interactants here). The final result is the prediction of functional signatures by identification of module(s) and phenotypic trait(s) co-appearance.
Fig 3
Fig 3. Identification of modules by partial local multiple alignment.
We show here a schematic PLMA of sequences S1, …, S5 composed of the alignment blocks B1, …, B6. This example illustrates the locality of the alignments: each alignment of positions is supported by a local alignment of sequences, leaving possibly other sequence positions unaligned. For instance the alignment block B2 is local since it aligns only the two segments M2.3 and M2.4 of the sequences S3 and S4, and no other positions of these sequences. Local alignments authorize to align only subsets of adjacent positions from each sequence. In an orthogonal way, partial alignments authorize to align positions of only a subset of the sequences. This is also illustrated by B2 which is partial since it aligns only positions from S3 and S4 (and does not even align them to positions of the block B1 above B2). Partial local alignments are not limited to pairwise alignment: the block B3 is an example of partial local alignment block, aligning the segments M3.2, M3.3 and M3.4 from sequences S2, S3 and S4, that can be built from the pairwise local alignments (highlighted here in blue colors) of the segments M3.2 with M3.3, M3.3 with M3.4 and M3.2 with M3.4. The PLMA blocks align each a set of segments conserved specifically in a sequence subset. This set of segments, provided that they are long enough, is said to be a conserved sequence module. Let us assume here that all the blocks, except B6, align segments of 5 or more residues. The set of segments {M2.3, M2.4} aligned in B2 defines then for instance the module M2, while the block B3 enables to identify the module M3 = {M3.2, M3.3, M3.4}. Blocks B4, B5 and B6 illustrates how the definition of the blocks –requiring that each segment is aligned to, and only to, all the other segments of the block– enables to split possibly longer local alignments to identify segments specifically conserved in sequence subsets: even if the concatenation of M4.1 and M5.1 is locally aligned to the concatenation of M4.3 and M5.3 (this could be for instance a pairwise alignment used to build the PLMA), none of these two concatenations is locally aligned to M5.2. In this case, the maximal set of segments aligned with M5.2 is {M5.1, M5.3, M5.2} and the modules are here M4 = {M4.1, M4.3} specifically conserved in {S1, S3} and M5 = {M5.1, M5.3, M5.2} specifically conserved in {S1, S3, S2}. Similarly, the block B6 aligns the segments S6.3 and S6.2 but, in contrast with the segments aligned by B4, these segments are shorter than 5 residues and do not define a module.
Fig 4
Fig 4. Reference gene tree of the 125 ADAMTS, 48 ADAMTSL and 41 ADAM outgroup members (figure produced with Itol).
Fig 5
Fig 5. Module composition of the 26 H. sapiens ADAMTS-TSL sequences.
The phylogenetic gene tree of the 26 H. sapiens ADAMTS and ADAMTSL paralogs was extracted from the reference gene tree (Fig 4). The modules identified by the Paloma-D program are represented on the sequences with Itol, using a unique combination of form and color to designate each module. The complete list of modules is provided in the S8 Table.
Fig 6
Fig 6. Protein-Protein Interaction networks of H. sapiens ADAMTS-TSL.
The 119 PPIs shared by the 26 human ADAMTS-TSL are visualized with Cytoscape [41]. Yellow nodes are hyalectanases, green nodes are pro-collagenases, grey nodes are ADAMTS with unspecific substrates, blue nodes are ADAMTSL and white nodes are proteins interacting with ADAMTS-TSL.
Fig 7
Fig 7. Location in H. sapiens paralogs of the modules involved in the 45 events of module(s)-PPI(s) co-appearance.
(A) ADAMTS-TSL phylogeny indicating the 45 ancestral nodes (labels in white boxes) corresponding to the 45 module(s)-PPI(s) co-appearance events. TS, ADAMTS; TSL, ADAMTSL; Hs, Homo sapiens; Dr, Danio rerio; Xt, Xenopus Tropicalis; Gg, Gallus gallus; Mm, Mus musculus; Bt, Boss Taurus. (B) Each line corresponds to a H. sapiens protein chosen as representative of the ancestors, and onto which the ancestral module signature is reported. The Pfam domains are represented as grey boxes and the gained modules as green marks. Each sequence is divided into 3 regions; 1) the N-terminal region that contain the propetide in ADAMTS (blue), 2) the central region including the catalytic domain and the disintegrin domain (orange) and the central TSP1, the cys-rich domain and the spacer (yellow) and 3) the variable ancillary region from the end of the spacer to the C-terminal end (purple).
Fig 8
Fig 8. Convergent evolution of COMP and CCN2 interactions with ADAMTS-TSL.
COMP and CCN2 interactions with ADAMTS-TSL are associated with independent module signatures acquired during evolution. (A) Phylogenetic tree of ADAMTSs with magnification of phylogenetic ADAMTS subtrees involved in COMP and CCN2 PPIs. (B) Heatmap: the published interactions of H. sapiens ADAMTS-3, ADAMTS-4, ADAMTS-7 and ADAMTS-12 with COMP and CCN2 are represented as dark blue boxes. The gains of the PPIs (inferred by PastML) are represented by the internal nodes: G315 (the ADAMTS-3 Amniota ancestor), G161 (the ADAMTS-7 and ADAMTS-12 paralogs ancestor) and G15 (the ADAMTS-4 mammalian ancestor) for CCN2, CCN2/COMP and COMP respectively. The PPIs inferred by PastML are represented as light blue boxes. At the opposite the absence of interaction (missing information) between ADAMTS and CCN2 and COMP is represented as dark orange boxes (for human proteins) and the absence of interaction inferred by PastML is represented as light orange boxes (for non human proteins).
Fig 9
Fig 9. Three module signatures are associated with COMP and/or CCN2 PPIs.
The ancestral module signatures reported on the descendant proteins in H. sapiens: (A) ADAMTS-7, (B) ADAMTS-4 and (C) ADAMTS-3. The location of the modules is represented as green boxes along with the location of the Pfam domain represented as grey boxes (top panel) while the content of the modules is represented by raw sequence logos [49] using the Protomata visualization [14], which displays also the chaining of the modules with arrows labeled by the minimal and maximal distances between the modules in the sequences of descendants (bottom panels). Because of the size of the G161 module signature, only the modules in the region of interaction with COMP are shown in (A).
Fig 10
Fig 10. Evolutionary histories of hyalectanases PPIs.
(A) Hyalectanase tree. The last common ancestor of the human hyalectanases is the G96 gene node. The gain of the ACAN and VCAN PPIs was inferred at the G96 gene node. (B) Heatmap: the published interactions of H. sapiens hyalectanases with ACAN, VCAN, LRP1, TIMP3 and COMP are represented as dark blue boxes. The PPIs inferred by PastML tool are represented as light blue boxes. At the opposite the absence of interaction (missing information) between hyalectanases and ACAN, VCAN, LRP1, TIMP3 and COMP is represented as dark orange boxes (for human proteins) and the absence of interaction inferred by PastML is represented as light orange boxes (for non human proteins).
Fig 11
Fig 11. Module signatures of ADAMTS-5 and hyalectanase proteins.
(A) Location of G65 and G96 signature modules on H. sapiens ADAMTS-5 (NP_008969.2). The protein domains are shown in grey, the modules gained at the ADAMTS-5 ancestral gene G65 are shown in green and the modules gained at the G96 hyalectanase ancestral gene are shown in purple. (B) Excerpt of the PLMA restricted to the sequences of H. sapiens hyalectanases and three non hyalectanases (ADAMTS-6, ADAMTS-10 and ADAMTSL-4) in the spacer domain. PLMA blocks defining modules are shown as boxes containing the module segments. The succession of the module segments of each sequence is indicated by arrows labeled by a numbering of the sequences displayed (from 1 to 10 here), completed with the interval of sequence positions skipped if the segments are not contiguous. (C) Sequence logos as in Fig 9 of the modules gained at G96 (left) and G65 (right) in the spacer. (D) All the segments of H. sapiens ADAMTS-5 (NP_008969.2) in G65 and G96 signature modules (E) Predicted structures of the H. sapiens ADAMTS-5 protein with and without propeptide, colored with G65 and G96 modules. The three hypervariable loops, β1-β2, β3-β4 and β9-β10 previously described in Santamaria et al, 2019 [59] are marked by *.

Similar articles

References

    1. Kelwick R, Desanlis I, Wheeler GN, Edwards DR. The ADAMTS (A Disintegrin and Metalloproteinase with Thrombospondin motifs) family. Genome Biology. 2015;16:113. doi: 10.1186/s13059-015-0676-3 - DOI - PMC - PubMed
    1. Mead TJ, Apte SS. ADAMTS proteins in human disorders. Matrix biology: journal of the International Society for Matrix Biology. 2018;71-72:225–239. doi: 10.1016/j.matbio.2018.06.002 - DOI - PMC - PubMed
    1. Rose KWJ, Taye N, Karoulias SZ, Hubmacher D. Regulation of ADAMTS Proteases. Frontiers in Molecular Biosciences. 2021;8:701959. doi: 10.3389/fmolb.2021.701959 - DOI - PMC - PubMed
    1. Hubmacher D, Apte SS. ADAMTS proteins as modulators of microfibril formation and function. Matrix Biol. 2015;47:34–43. doi: 10.1016/j.matbio.2015.05.004 - DOI - PMC - PubMed
    1. Théret N, Bouezzedine F, Azar F, Diab-Assaf M, Legagneux V. ADAM and ADAMTS Proteins, New Players in the Regulation of Hepatocellular Carcinoma Microenvironment. Cancers. 2021;13(7). doi: 10.3390/cancers13071563 - DOI - PMC - PubMed

Publication types