Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Dec 5;15(1):946.
doi: 10.1186/1471-2164-15-946.

Conservation analysis of the CydX protein yields insights into small protein identification and evolution

Affiliations

Conservation analysis of the CydX protein yields insights into small protein identification and evolution

Rondine J Allen et al. BMC Genomics. .

Abstract

Background: The reliable identification of proteins containing 50 or fewer amino acids is difficult due to the limited information content in short sequences. The 37 amino acid CydX protein in Escherichia coli is a member of the cytochrome bd oxidase complex, an enzyme found throughout Eubacteria. To investigate the extent of CydX conservation and prevalence and evaluate different methods of small protein homologue identification, we surveyed 1095 Eubacteria species for the presence of the small protein.

Results: Over 300 homologues were identified, including 80 unannotated genes. The ability of both closely-related and divergent homologues to complement the E. coli ΔcydX mutant supports our identification techniques, and suggests that CydX homologues retain similar function among divergent species. However, sequence analysis of these proteins shows a great degree of variability, with only a few highly-conserved residues. An analysis of the co-variation between CydX homologues and their corresponding cydA and cydB genes shows a close synteny of the small protein with the CydA long Q-loop. Phylogenetic analysis suggests that the cydABX operon has undergone horizontal gene transfer, although the cydX gene likely evolved in a progenitor of the Alpha, Beta, and Gammaproteobacteria. Further investigation of cydAB operons identified two additional conserved hypothetical small proteins: CydY encoded in CydAQlong operons that lack cydX, and CydZ encoded in more than 150 CydAQshort operons.

Conclusions: This study provides a systematic analysis of bioinformatics techniques required for the unique challenges present in small protein identification and phylogenetic analyses. These results elucidate the prevalence of CydX throughout the Proteobacteria, provide insight into the selection pressure and sequence requirements for CydX function, and suggest a potential functional interaction between the small protein and the CydA Q-loop, an enigmatic domain of the cytochrome bd oxidase complex. Finally, these results identify other conserved small proteins encoded in cytochrome bd oxidase operons, suggesting that small protein subunits may be a more common component of these enzymes than previously thought.

PubMed Disclaimer

Figures

Figure 1
Figure 1
cydABX organization and function in Escherichia coli . (A) Operon organization of the cydABX cytochrome bd oxidase operon. (B) Function of the CydABX complex in the electron transport chain.
Figure 2
Figure 2
Evaluating methods for accurately identifying CydX homologues in 1121 species of bacteria. (A) Venn diagram of the number of CydX homologues identified by an HMM-based method (“HMM”), a tblastn screen of the NCBI microbial database using the CydX protein sequence as the query and an expect value of 1000 (“tblastn”), or by manual curation (“Missed”). (B) Receiver operating characteristic (ROC) plot of a tblastn screen of the microbial database using the CydX protein sequence as the query with different E-value cutoffs. (C) Graph of the number of CydX homologues identified in a tblastn screen of the microbial database using the CydX protein sequence as the query with different expect values. All tblastn searches were conducted using the NCBI BLAST Microbial Genomes site [45].
Figure 3
Figure 3
Confirmation of functionality of CydX homologues. (A) Alignment of protein sequences of CydX homologues from Escherichia coli and other bacteria species. The small protein from Burkholderia sp. 383 (“Burkholderia383”) is not thought to be a homologue and was included as a negative control for the assay. Based on its significant sequence divergence was included in a separate alignment. (B) Alignment of the E. coli CydX protein with the CydZ protein from Klebsiella pneumoniae. (C) Assay of complementation of the ΔcydX β-mercaptoethanol sensitivity phenotype by expression of potential CydX homologues, a false positive from the tblastn search (Burkholderia sp. 383), and an unrelated small protein (CydZ) from a different bacterial species. Sensitivity was measured using zones of inhibition, and the diameter of the zone after addition of 10 μL of 12 M β-mercaptoethanol to a plate of bacteria is shown. Species are as follows: Escherichia coli (“Escherichia”), Pectobacterium atrosepticus (“Pectobacterium”), Burkholderia xenovorans (“Burkholderia”), Actinobacillus pleuropneumoniae (“Actinobacillus”), Burkholderia sp. 383 (“Burkholderia sp. 383”), Klebsiella pneumoniae (“Klebsiella”), Cellvibrio japonicus Ueda107 (“Cellvibrio”), Methylibium petroleiphilum PM1 (“Methylibium”), Haemophilus influenzae 10810 (“Haemophilus”), and Francisella philomiragia subsp. Philomiragia ATCC 25017 (“Francisella”). Alignments were generated using the program MUSCLE [57]. Amino acids are colored based on their properties at physiological conditions as follows: red amino acids are hydrophobic, green residues are hydrophilic, purple residues are positively-charged and blue residues are negatively-charged. ‘*’ indicates that the residues are identical in all sequences and ‘:’ and ‘.’, respectively, indicated conserved and semi-conserved substitutions as defined by MUSCLE.
Figure 4
Figure 4
Sequence analysis of the CydX protein family. (A) Consensus sequence of CydX homologues compared to the presence of predicted transmembrane domains (red bars) and the number of homologues that contain amino acids at each position (grey bars). The sequence logo was created using a MUSCLE alignment [57] analyzed by the WebLogo program [57]. Amino acids are colored based on their properties at physiological conditions as follows: black amino acids are hydrophobic, green residues are hydrophilic, blue residues are positively-charged and red residues are negatively-charged. Transmembrane domains were predicted using the program TMHMM [56]. (B) Predicted evolutionary importance of each residue in CydX. Analysis performed using the Lichtarge Computational Biology Lab’s Universal Evolutionary Trace web server [57]. (C) Predicted selection pressure on each amino acid in the CydX protein. Analysis performed using the Selecton program. (D) Residues within the CydX protein that share mutual information. Analysis performed using the MISTIC program. Residues are colored based on conservation, with the amino acids in red positions in the alignment being conserved and blue amino acids showing less conservation. (E) Alpha-helical wheel project of the predicted transmembrane domain of the E. coli CydX protein [28]. The conserved residues Y3, W6 and G9 are outlined in black. The shapes the amino acids are based on their properties at physiological conditions as follows: hydrophobic residues are diamonds and hydrophilic residues are circles. The degree of hydrophobicity of diamond residues is also reflected in the color, with green being most hydrophobic and yellow being least hydrophobic, and a range of color between those depending on predicted hydrophobicity. Likewise, the degree of hydrophilicity of circle residues is reflected in the color, with red being most hydrophilic and light orange being least, and a range of color between those depending on predicted hydrophilicity.
Figure 5
Figure 5
Testing the functional importance of the CydX C-terminal amino acids. (A) Alignment of the E. coli CydX protein sequence along with six mutant sequences containing mutated C-terminal amino acid sequences. (B) Assay of CydX function was conducted using a zone assay testing the sensitivity to β-mercaptoethanol. Sensitivity was measured using zones of inhibition, and the diameter of the zone after addition of 10 μL of 12 M β-mercaptoethanol to a plate of bacteria is shown. The average and standard deviation of zone sizes was calculated from at least three replicate plates. Alignments were generated using the program MUSCLE [57].
Figure 6
Figure 6
Distribution of cydA, cydB , cydX and other cyd -related small proteins throughout bacteria. (A) Phylogenetic tree of 1095 species from major Eubacterial clades overlaid with the presence of the different cyd genes in each species. Gene identification in a bacterial genome are labeled as follows: species adjacent to a red bar contain at least one cydA gene, to a blue bar contain at least one cydB gene, to a green bar contain at least one cydX gene, those adjacent to an yellow bar contain at least one cydZ gene, and those adjacent to a black bar contain at least one cydY gene. Major bacterial clades are labeled. The Alpha, Beta, Epsilon, Delta and Gamma labels identify the different classes in the Proteobacteria phylum. (B) Alignment of representative homologues identified from major bacterial clades. Gene names and sequences are shaded corresponding to the color used for that clade in the preceding phylogeny, while pISP1 and pRLG204 are not colored because they are not represented in the tree. Species are as follows: Shigella flexneri 2a str. 2457 T (“Enterobacteriaceae”), Legionella pneumonophila 2300/99 Alcoy (“Legionellaceae”), Hyphomonas neptunium ATCC15444 (“Hyphomonadaceae”), Asticcacaulis excentricus CB 48 (“Caulobacteraceae”), Laribacter hongkongensis HLHK9 (“Neisseriaceae”), Archromobacter xylosoxidans A8 (“Alcaligenaceae”), Mariprofundus ferrooxydans PV-1 1099921033905 (Mariprofundaceae), Sphingomonas sp. MM-1 plasmid pISP1 (“pISP1”), and Rhizonbium leguminosarum bs. trifolii WSM2304 plasmid pRLG204 (“pRLG204”). Alignments were generated using the program MUSCLE [57]. ‘*’ indicates that the residues are identical in all sequences and ‘:’ and ‘.’, respectively, indicated conserved and semi-conserved substitutions as defined by MUSCLE.
Figure 7
Figure 7
Phylogenetic analysis of CydX. (A) Phylogenetic analysis was conducted using concatenated CydABX protein sequences, and clades of CydABX sequences with strong statistical support are labeled by color. (B) Species containing specific CydABX sequences are labeled on the phylogenetic tree using bars of the same color as their clade in the phylogenetic analysis of the CydABX sequences. Species containing CydX homologues that are not contained in a cydABX operon are labeled with a black bar. The Alpha, Beta, Epsilon, Delta and Gamma labels identify the different classes in the Proteobacter phylum. (C) Alignment of protein sequences of CydX homologues grouped into the “yellow clade” in the phylogenetic analysis. (D) Alignment of select protein sequences of CydX homologues grouped into the “grey clade” in the phylogenetic analysis. Gene names and sequences are shaded corresponding to the color used for that clade in the preceding phylogeny. Species are as follows: Pseudoalteromonas haloplanktis TAC125 (“Psuedoalteromonas(1)”), Pseudoalteromonas sp. SM9913 (“Pseudoalteromonas(2)”), Glaciecola sp. 4H-3-7 + YE-5 (“Glaciecola”), Pseudoalteromonas atlantica T6c (”Pseudoalteromonas(3)”), Allochromatium vinosum DSM 180 (”Allochromatium”), Colwellia psychrerythraea 34H (“Colwellia”), Rhodospirillum photometricum DSM 122 (“Rhodospirillum”), Thiomonas intermedia K12 (“Thiomonas”), Bordetella avium 197 N (“Bordetella”), Frateuria aurantia DSM 6220 (“Frateuria”), Acidiphillium cryptum JF-5 (“Acidiphilium(1)”), Acidiphillium multivorum AIU301 (“Acidiphilium(2)”), Acidithiobacillus ferrooxidans ATCC 53993 (“Acidithiobacillus(1)”), Acidithiobacillus caldus SM-1 (“Acidithiobacillus(2)”), Acetobacter pasteurianus IFO 3283–01 (“Acetobacter(1)”), and Acetobacter pasteurianus IFO 3283-01-42C (“Acetobacter(2)”). Alignments were generated using the program MUSCLE [57]. ‘*’ indicates that the residues are identical in all sequences and ‘:’ and ‘.’, respectively, indicated conserved and semi-conserved substitutions as defined by MUSCLE.
Figure 8
Figure 8
Phylogenetic relationship between CydA protein sequence and presence of small Cyd proteins in the operon. CydA DNA sequences were translated and aligned using MUSCLE, and the alignment was used to build a PHYLIP Neighbor Joining phylogenetic tree. Shading overlaying the phylogeny corresponds to CydA proteins that contain a cydX, cydY, or cydZ gene in the same operon.
Figure 9
Figure 9
Synteny between the cydX gene and the long Q-loop allele of cydA . (A) Alignment of the Q-loop region from select CydA homologues. Sequences are shaded in a gradient going from longest Q-loop (darkest) to shortest Q-loop (lightest). (B) Histogram showing the number of CydA homologues containing Q-loops of increasing size (black bars) and the number of CydA proteins encoded in an operon also containing cydX (grey bars). (C) Diagram of the CydA protein containing the Q-loop. Residues shown in black are those that are present only in long Q-loop CydA variants. (D) Diagram showing mutual information shared between residues in the CydX protein, shown in its predicted orientation in the inner membrane, and the Q-loop, shown as the residues spanning transmembrane regions 6 (TM6) and 7 (TM7) of CydA. Lines between residues show high mutual information between residues. The conserved and variable regions of the Q-loop are labeled. Spaces between residues in the Q-loop region represent residues that are missing because they either show no mutual information or share mutual information with other Q-loop residues and not with CydX. A mutual information filter cutoff of 10 was used for this figure. Species are as follows: Francisella philomiragia subsp. philomiragia ATCC 25017 (“Francisella”), Janthinobacterium sp. Marseille (“Janthinobacterium”), Burkholderia xenovorans LB400 (“Burkholderia”), Escherichia coli 536 (“Escherichia”), Brachybacterium faecium DSM 4810 (“Brachybacterium”), Mycobacterium marinum M (“Mycobacterium”), and Bacillus subtilis subsp. spizizenii str. W23 (“Bacillus”). Mutual information was determined using the program MISTIC. Alignments were generated using the program MUSCLE [57]. ‘*’ indicates that the residues are identical in all sequences and ‘:’ and ‘.’, respectively, indicated conserved and semi-conserved substitutions as defined by MUSCLE.
Figure 10
Figure 10
New cyd -related small proteins identified in this study. (A) The CydY small protein found in Epsilon and Deltaproteobacter species downstream of cydAB operons encoding CydA with a long Q-loop. (B) The CydZ small protein found in over 150 cydAB operons encoding CydA with a short Q-loop. Operon organization is shown on top of each figure, with an example alignment shown below followed by a consensus sequence logo shown at the bottom of the figure. Species are as follows: Desulfurispirillum indicum S5 (“Desulfurispirillum”), Campylobacter concisus 13826 (“Campylobacter”), Sulfuricurvum kujiense DSM 16994 (“Sulfuricurvum”), Arcobacter butzleri RM4018 (“Arcobacter”), Campylobacter jejuni subsp. doylei 269.97 (“Campylobacter”), Serratia sp. AS12 (“Serratia”), Vibrio parahaemolyticus RIMD 2210663 (“Vibrio”), Enterobacter aerogenes KCTC 2190 (“Enterobacter”), Pseudomonas aeruginosa LESB58 (“Pseudomonas”), Achromobacter xylosoxidans A8 (“Achromobacter”), Bordetella parapertussis 12822 (“Bordetella”), Zymomonas mobilis subsp. mobilis ZM4 (“Zymomonas”). Sequence logos were generated using the program WebLogo [57]. Alignments were generated using the program MUSCLE [54]. ‘*’ indicates that the residues are identical in all sequences and ‘:’ and ‘.’, respectively, indicated conserved and semi-conserved substitutions as defined by MUSCLE.

Comment in

Similar articles

Cited by

References

    1. Galindo MI, Pueyo JI, Fouix S, Bishop SA, Couso JP. Peptides encoded by short ORFs control development and define a new eukaryotic gene family. PLOS Bio. 2007;5:1052–1062. doi: 10.1371/journal.pbio.0050106. - DOI - PMC - PubMed
    1. Waters LS, Sandoval M, Storz G. The Escherichia coli MntR miniregulon includes genes encoding a small protein and an efflux pump required for manganese homeostasis. J Bacteriol. 2011;193:5887–5897. doi: 10.1128/JB.05872-11. - DOI - PMC - PubMed
    1. Gassel M, Möllenkamp T, Puppe W, Altendorf K. The KdpF subunit is part of the K(+)-translocating Kdp complex of Escherichia coli and is responsible for stabilization of the complex in vitro. J Biol Chem. 1999;274:37901–37907. doi: 10.1074/jbc.274.53.37901. - DOI - PubMed
    1. Hobbs EC, Yin X, Paul BJ, Astartia JL, Storz G. Conserved small protein associates with the multidrug efflux pump AcrB and differentially affects antibiotic resistance. Proc Natl Acad Sci U S A. 2012;109:16696–16701. doi: 10.1073/pnas.1210093109. - DOI - PMC - PubMed
    1. Ramamurthi KS, Lecuyer S, Stone HA, Losick R. Geometric cue for protein localization in a bacterium. Science. 2009;323:1354–1357. doi: 10.1126/science.1169218. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources