Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2000 Oct;68(10):5889-900.
doi: 10.1128/IAI.68.10.5889-5900.2000.

Diversity of PspA: mosaic genes and evidence for past recombination in Streptococcus pneumoniae

Affiliations

Diversity of PspA: mosaic genes and evidence for past recombination in Streptococcus pneumoniae

S K Hollingshead et al. Infect Immun. 2000 Oct.

Abstract

Pneumococcal surface protein A (PspA) is a serologically variable protein of Streptococcus pneumoniae. Twenty-four diverse alleles of the pspA gene were sequenced to investigate the genetic basis for serologic diversity and to evaluate the potential of diversity to have an impact on PspA's use in human vaccination. The 24 pspA gene sequences from unrelated strains revealed two major allelic types, termed "families," subdivided into clades. A highly mosaic gene structure was observed in which individual mosaic sequence blocks in PspAs diverged from each other by over 20% in many cases. This level of divergence exceeds that observed for blocks in the penicillin-binding proteins of S. pneumoniae or in many cross-species comparisons of gene loci. Conversely, because the mosaic pattern is so complex, each pair of pspA genes also has numerous shared blocks, but the position of conserved blocks differs from gene pair to gene pair. A central region of pspA, important for eliciting protective antibodies, was found in six clades, which each diverge from the other clades by >20%. Sequence relationships among the 24 alleles analyzed over three windows were discordant, indicating that intragenic recombination has occurred within this locus. The extensive recombination which generated the mosaic pattern seen in the pspA locus suggests that natural selection has operated in the history of this gene locus and underscores the likelihood that PspA may be important in the interaction between the pneumococcus and its human host.

PubMed Disclaimer

Figures

FIG. 1
FIG. 1
Modular pspA gene showing windows of the sequence analyzed. Major domains of PspA are indicated above the line drawing. Within the α-helical or charged domain, a clade-defining region of the PspA molecule is indicated by the stippled box. Choline-binding repeats are numbered 1 to 10. The small numbers directly below the gene are equivalent to amino acid residue numbers based on PspA/Rx1, the prototype PspA molecule previously sequenced (62). Arrows LSM13 and SKH2 indicate two primers used for PCR. Boxes A′, A, A*, B, and C indicate windows of sequence for analysis.
FIG. 2
FIG. 2
Pairwise comparisons among pspA genes and PspA proteins. Below the diagonal, values represent percent DNA identity. DNA comparisons are highlighted white for values less than 60%, light gray for values between 60 and 75%, and dark gray for values greater than 75%. Above the diagonal, values represent percent protein identity. Protein comparisons are highlighted white for values less than 55%, light gray for values between 55 and 73%, and dark gray for values greater than 73%. Black boxes along the diagonal indicate the 100% identity for each gene when compared with itself. Indications of the groups forming clades 1 to 6 and families 1 to 3 as described in the text are given along the left axis and across the top.
FIG. 3
FIG. 3
Dot matrix analysis of representative PspA pairs indicating the regions where variance is found. In each case, the x axis is family 1 (Fam1) and the y axis is either family 1, family 2, or family 3. The protein comparison begins with the signal peptide and continues to around amino acid 400. Comparisons were made by the method of Pustell and Kafatos (50) with the default values of a window size of 8 and a minimum percent score of 60 by using the pam250 matrix and a hash value of 2. Specific proteins used in this analysis were PspA/Rx1 (family 1, x axis), PspA/BG6692 (family 1, y axis), PspA/EF5668 (family 2, first), PspA/EF3296 (family 2, second), and PspA/BG6380 (family 3). Three gray boxes in each graph represent windows A, B, and C. The percent amino acid (AA) identity values are given for each case.
FIG. 4
FIG. 4
Alignment of all 24 PspAs by the Clustal W algorithm and the Blosum30 amino acid scoring matrix in MacVector. The printed output shows amino acids common to over 51% of the group as darkened boxes. Regions used for window A, B, and C analyses are marked on the figure by brackets at the beginning and end. The alignment is ordered based on the clade groupings identified in the B window. Line breaks under strain names indicate clades 1 to 6. Indications above the sequence mark the start of the mature protein (amino acid [AA] 32), regions of break in coiled-coil (approximately from AA 143 to 148), and, in window C, the location of an non-proline sequence block (approximately from AA 368 to 395) that is optionally present or absent in this region of PspA molecules (Fig. 5).
FIG. 4
FIG. 4
Alignment of all 24 PspAs by the Clustal W algorithm and the Blosum30 amino acid scoring matrix in MacVector. The printed output shows amino acids common to over 51% of the group as darkened boxes. Regions used for window A, B, and C analyses are marked on the figure by brackets at the beginning and end. The alignment is ordered based on the clade groupings identified in the B window. Line breaks under strain names indicate clades 1 to 6. Indications above the sequence mark the start of the mature protein (amino acid [AA] 32), regions of break in coiled-coil (approximately from AA 143 to 148), and, in window C, the location of an non-proline sequence block (approximately from AA 368 to 395) that is optionally present or absent in this region of PspA molecules (Fig. 5).
FIG. 4
FIG. 4
Alignment of all 24 PspAs by the Clustal W algorithm and the Blosum30 amino acid scoring matrix in MacVector. The printed output shows amino acids common to over 51% of the group as darkened boxes. Regions used for window A, B, and C analyses are marked on the figure by brackets at the beginning and end. The alignment is ordered based on the clade groupings identified in the B window. Line breaks under strain names indicate clades 1 to 6. Indications above the sequence mark the start of the mature protein (amino acid [AA] 32), regions of break in coiled-coil (approximately from AA 143 to 148), and, in window C, the location of an non-proline sequence block (approximately from AA 368 to 395) that is optionally present or absent in this region of PspA molecules (Fig. 5).
FIG. 5
FIG. 5
Distribution of pairwise comparisons of sequence distance among PspA proteins in windows A, B, and C. The y axis in each graph represents the number of pairwise comparisons out of 276 total comparisons which fell within a range of the percent amino acid identity given on the x axis. To the right of each graph is a dendrogram representing the relationships between the PspAs for the sequence obtained in window A, B, or C as described in the text. The numbers at the indicated nodes indicate the cutoff values for DNA (above) or amino acid (below) identity that define the branches extending from that node. Clade groups defined in the text based on region B are indicated by arrows in the center of the figure.
FIG. 6
FIG. 6
Relationship between PspA sequences over window B and over the entire sequence. Shown are four unrooted phylograms generated from mean distances by the neighbor-joining method in the program PAUP 4.0B, as follows: A, proteins, all; B, DNA, all; C, proteins, B region; D, DNA, B region. The numbers on the tree indicate the distances along the branch lengths as calculated by PAUP with the terminal branch distances suppressed. Clades and families as defined in the text are listed near the clusters or branches that they encompass.

References

    1. Abeyta M. Ph.D. dissertation. Birmingham: University of Alabama at Birmingham; 1999.
    1. Atherton J C, Cao P, Peek R M, Jr, Tummuru M K, Blaser M J, Cover T L. Mosaicism in vacuolating cytotoxin alleles of Helicobacter pylori. Association of specific vacA types with cytotoxin production and peptic ulceration. J Biol Chem. 1995;270:17771–17777. - PubMed
    1. Atherton J C, Sharp P M, Cover T L, Gonzalez-Valencia G, Peek R M, Jr, Thompson S A, Hawkey C J, Blaser M J. Vacuolating cytotoxin (vacA) alleles of Helicobacter pylori comprise two geographically widespread types, m1 and m2, and have evolved through limited recombination. Curr Microbiol. 1999;39:211–218. - PMC - PubMed
    1. Breiman R, Butler J C, Tenover F C, Elliot J, Facklam R R. Emergence of drug-resistant pneumococcal infections in the United States. JAMA. 1994;271:1831–1835. - PubMed
    1. Briles D E, Hollingshead S, Brooks-Walter A, Nabors G S, Ferguson L, Schilling M, Gravenstein S, Braun P, King J, Swift A. The potential to use PspA and other pneumococcal proteins to elicit protection against pneumococcal infection. Vaccine. 2000;18:1707–1711. - PubMed

Publication types

Substances

Associated data

LinkOut - more resources