Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jun 18;9(6):e0094823.
doi: 10.1128/msystems.00948-23. Epub 2024 May 3.

Fusion/fission protein family identification in Archaea

Affiliations

Fusion/fission protein family identification in Archaea

Anastasiia Padalko et al. mSystems. .

Abstract

The majority of newly discovered archaeal lineages remain without a cultivated representative, but scarce experimental data from the cultivated organisms show that they harbor distinct functional repertoires. To unveil the ecological as well as evolutionary impact of Archaea from metagenomics, new computational methods need to be developed, followed by in-depth analysis. Among them is the genome-wide protein fusion screening performed here. Natural fusions and fissions of genes not only contribute to microbial evolution but also complicate the correct identification and functional annotation of sequences. The products of these processes can be defined as fusion (or composite) proteins, the ones consisting of two or more domains originally encoded by different genes and split proteins, and the ones originating from the separation of a gene in two (fission). Fusion identifications are required for proper phylogenetic reconstructions and metabolic pathway completeness assessments, while mappings between fused and unfused proteins can fill some of the existing gaps in metabolic models. In the archaeal genome-wide screening, more than 1,900 fusion/fission protein clusters were identified, belonging to both newly sequenced and well-studied lineages. These protein families are mainly associated with different types of metabolism, genetic, and cellular processes. Moreover, 162 of the identified fusion/fission protein families are archaeal specific, having no identified fused homolog within the bacterial domain. Our approach was validated by the identification of experimentally characterized fusion/fission cases. However, around 25% of the identified fusion/fission families lack functional annotations for both composite and split states, showing the need for experimental characterization in Archaea.IMPORTANCEGenome-wide fusion screening has never been performed in Archaea on a broad taxonomic scale. The overlay of multiple computational techniques allows the detection of a fine-grained set of predicted fusion/fission families, instead of rough estimations based on conserved domain annotations only. The exhaustive mapping of fused proteins to bacterial organisms allows us to capture fusion/fission families that are specific to archaeal biology, as well as to identify links between bacterial and archaeal lineages based on cooccurrence of taxonomically restricted proteins and their sequence features. Furthermore, the identification of poorly characterized lineage-specific fusion proteins opens up possibilities for future experimental and computational investigations. This approach enhances our understanding of Archaea in general and provides potential candidates for in-depth studies in the future.

Keywords: archaeal biology; archaeal evolution; bacterial fusions; comparative genomics; large-scale screening.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig 1
Fig 1
Fusion/fission protein screening pipeline.
Fig 2
Fig 2
Composite (fused) and split (unfused) protein quantification. (a) Frequency distribution (smoothed density) between composite and split proteins, split syntenic/nonsyntenic sets, and composite proteins. (b)) Proportion of composite and split proteins identified. (c) Proportion of validated function (BRENDA or SwissProt), predicted function (significant KO), and unclassified. (d) Correlation between the number of composite proteins and the number of proteins per assembly. (e)) Correlation between the total number of fusion proteins and the number of proteins per assembly.
Fig 3
Fig 3
Taxonomic distribution of fusions across 1,678 archaeal assemblies. The taxonomic level, represented on the vertical axis, is grouped by order, phyla, or superphyla (indicated in bold). Protein clusters/families are represented on the horizontal axis, with functional category indication underneath. High-confidence fusions are represented on the left side, with probable fusions on the right side. Black indicates the presence of split syntenic sets, green of split sets (where syntenic representatives are absent), red of composite proteins, and white absence of any within the taxonomic rank. The top bar chart shows the percentage of split proteins over the total number of proteins per family. Singletons were excluded from the figure. On the right, the row annotation bar charts show (a) the average number of fusion events per genome (composite protein count); (b) the average number of probable fusion events per genome (composite protein count); and (c) the average number of probable fission events per genome (split protein count).
Fig 4
Fig 4
Relative abundance of fusion/fission proteins versus the total number of proteins identified per functional category (KEGG categories and KO annotations per assembly were utilized to calculate the ratio).
Fig 5
Fig 5
Fusion events in archaeal metabolism. The listed abbreviations do not occur in the text. Protein abbreviations. HHC: Acc, acetyl-CoA carboxylase; Hhps, hydroxypropionyl-coenzyme A synthetase; Acr, acryloyl-CoA reductase; Ssr, succinate semialdehyde reductase; Hbl, 4-hydroxybutyrate-CoA ligase; AtoB, acetyl-CoA acetyltransferase; Hpd, hydroxypropionyl-CoA dehydratase. PPP/RHP: Pgk, phosphoglycerate kinase; Gap, glyceraldehyde-3-phosphate dehydrogenase; Tal, transaldolase. TCA: Cs, citrate synthase; Frd, fumarate reductase. Respiratory chain: Nuo, NADH-quinone oxidoreductase; Qcr, quinol-cytochrome c reductase; Cyo, cytochrome-c oxidase. WLP: Fwd: formylmethanofuran dehydrogenase; Ftr, formylmethanofuran–tetrahydromethanopterin N-formyltransferase; mch, methenyltetrahydromethanopterin cyclohydrolase; Mtd, methylenetetrahydromethanopterin dehydrogenase; Mer, 5,10-methylenetetrahydromethanopterin reductase. Methanogenesis: Mcr, methyl-coenzyme M reductase; Mvh, F420-non-reducing hydrogenase; Fpo, F420H2: phenazine/quinone oxidoreductase. Sulfur metabolism: Sox, sulfur-oxidation system; Qmo, quinone oxidoreductase; Apr, adenosine 5′-phosphosulfate reductase; Dsr, dissimilatory sulfite reductase; Cys, sulfate assimilation enzymes. Nitrogen metabolism: Nar, nitrate reductase; Nir, nitrite reductase; Nor, nitric oxide reductase; Amo, ammonia monooxygenase; Nif, nitrogenase. AAs biosynthesis proteins: HisFAIE, histidine; Aro, chorismate; Cys, cysteine; Phe, phenylalanine; Trp, tryptophane; MetH, methyltetrahydrofolate-homocysteine methyltransferase; LeuCD, leucine; Ser, serine. Others: Por, pyruvate-ferredoxin/flavodoxin oxidoreductase; Cdh, carbon-monoxide dehydrogenase; Acs, acetyl-CoA synthetase; Pyc: pyruvate carboxylase; Compound abbreviations. HHC: 3HP, 3-hydroxypropionate; 4HB, 4-hydroxybutyrate. PPP/RHP: Ru5P, ribulose 5-phosphate; 3-PGA, 3-Phospho-D-glycerate; GAP, glyceraldehyde 3-phosphate; F6P, fructose-6-phosphate; R5P, ribose-5-phosphate; X5P, D-xylulose 5-phosphate; S7P, sedoheptulose 7-phosphate; E4P, erythrose 4-phosphate; G6P, glucose-6-phosphate. TCA: AGK, alpha-ketoglutarate; WLP: MF, methanofuran; H4MPT, tetrahydromethanopterin; Fd, ferredoxin; Sulfur metabolism: AAs biosynthesis: His, histidine; Ser, serine; Cys, cysteine; Phe, phenylalanine; Trp, tryptophan; Met, methionine.

Similar articles

Cited by

References

    1. Castelle CJ, Banfield JF. 2018. Major new microbial groups expand diversity and alter our understanding of the tree of life. Cell 172:1181–1197. doi: 10.1016/j.cell.2018.02.016 - DOI - PubMed
    1. Hug LA, Baker BJ, Anantharaman K, Brown CT, Probst AJ, Castelle CJ, Butterfield CN, Hernsdorf AW, Amano Y, Ise K, Suzuki Y, Dudek N, Relman DA, Finstad KM, Amundson R, Thomas BC, Banfield JF. 2016. A new view of the tree of life. Nat Microbiol 1:16048. doi: 10.1038/nmicrobiol.2016.48 - DOI - PubMed
    1. Adam PS, Borrel G, Brochier-Armanet C, Gribaldo S. 2017. The growing tree of Archaea: new perspectives on their diversity, evolution and ecology. ISME J 11:2407–2425. doi: 10.1038/ismej.2017.122 - DOI - PMC - PubMed
    1. Parks DH, Rinke C, Chuvochina M, Chaumeil P-A, Woodcroft BJ, Evans PN, Hugenholtz P, Tyson GW. 2017. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat Microbiol 2:1533–1542. doi: 10.1038/s41564-017-0012-7 - DOI - PubMed
    1. Sayers EW, Cavanaugh M, Clark K, Pruitt KD, Schoch CL, Sherry ST, Karsch-Mizrachi I. 2021. Genbank. Nucleic Acids Research 49:D92–D96. doi: 10.1093/nar/gkaa1023 - DOI - PMC - PubMed

Substances

LinkOut - more resources