| Structural Models of the KtrB, TrkH, and Trk1,2 Symporters Based on the Structure of the KcsA K Channel Biophysical Journal, Volume 77, Issue 2, 1 August 1999, Pages 789-807 Stewart R. Durell and H. Robert Guy Abstract Three-dimensional computer modeling is used to further investigate the hypothesis forwarded in the accompanying paper of an evolutionary relationship between four related families of K sympoter proteins and the superfamily of K channel proteins. Atomic-scale models are developed for the transmembrane regions of one member from each of the three more distinct symporter families, i.e., a TrkH protein from , a KtrB protein from , and a Trk1,2 protein from . The portions of the four consecutive M1-P-M2 motifs in the symporters that can be aligned with K channel sequences are modeled directly from the recently determined crystal structure of the KcsA K channel from . The remaining portions are developed using our previously accumulated theoretical modeling criteria and principles. Concurrently, the use of these criteria and principles is further supported by the now verified predictions of our previous K channel modeling efforts and the degree to which they are satisfied by the known structure of the KcsA protein. Thus the observed ability of the portions of the symporter models derived from the KcsA crystal structure to also satisfy the theoretical modeling criteria provides additional support for an evolutionary link with K channel proteins. Efforts to further satisfy the criteria and principles suggest that the symporter proteins from fungi and plants (i.e., Trk1,2 and HKT1) form dimeric and/or tetrameric complexes in the membrane. Furthermore, analysis of the atomic-scale models in relation to the sequence conservation within and between the protein families suggests structural details for previously proposed mechanisms for the linked symport of K with Na and H. Suggestions are also given for experiments to test these structures and hypotheses. Abstract | Full Text | PDF (3227 kb) |
| Does the KdpA Subunit from the High Affinity K-Translocating P-Type KDP-ATPase have a Structure Similar to That of K Channels? Biophysical Journal, Volume 78, Issue 1, 1 January 2000, Pages 188-199 Stewart R. Durell, Evert P. Bakker and H. Robert Guy Abstract Evidence is presented that the transmembrane KdpA subunit of the high affinity K-translocating P-type Kdp-ATPase is evolutionarily derived from the superfamily of 2TM-type K channels in bacteria. This extends a previous study relating the K channels to the KtrAB, Trk, Trk1,2, and HKT1K symporter superfamily of both prokaryotes and eukaryotes. Although the channels are formed by four single-MPM motif subunits, the transmembrane KdpA subunit and the transmembrane subunit of the symporter proteins are postulated to have four corresponding MPM motifs within a single sequence. Analysis of 17 KdpA sequences reveals a pattern of residue conservation similar to that of the symporters and channels, and consistent with the crystal structure of the KcsA K channel. In addition, the most highly conserved residues between the families, specifically the central glycines of the P2 segments, are those previously identified as crucial for the property of K-selectivity that is common to each protein. This hypothesis is consistent with an experimental study of mutations that alter K binding affinity of the Kdp transporter. Although most of the results of a previous study of the transmembrane topology of KdpA are consistent with the 4-MPM model, the one deviation can be explained by a plausible change in the structure due to the experimental method. Abstract | Full Text | PDF (1709 kb) |
| Generation, Comparison, and Merging of Pathways between Protein Conformations: Gating in K-Channels Biophysical Journal, Volume 95, Issue 8, 15 October 2008, Pages 3850-3860 Angela Enosh, Barak Raveh, Ora Furman-Schueler, Dan Halperin and Nir Ben-Tal Abstract We present a general framework for the generation, alignment, comparison, and hybridization of motion pathways between two known protein conformations. The framework, which is rooted in probabilistic motion-planning techniques in robotics, allows for the efficient generation of collision-free motion pathways, while considering a wide range of degrees of freedom involved in the motion. Within the framework, we provide the means to hybridize pathways, thus producing, the motion pathway of the lowest energy barrier out of the many pathways proposed by our algorithm. This method for comparing and hybridizing pathways is modular, and may be used within the context of molecular dynamics and Monte Carlo simulations. The framework was implemented within the Rosetta software suite, where the protein is represented in atomic detail. The K-channels switch between open and closed conformations, and we used the overall framework to investigate this transition. Our analysis suggests that channel-opening may follow a three-phase pathway. First, the channel unlocks itself from the closed state; second, it opens; and third, it locks itself in the open conformation. A movie that depicts the proposed pathway is available in the Supplementary Material (Movie S1) and at . Abstract | Full Text | PDF (861 kb) |
Copyright © 1999 The Biophysical Society. All rights reserved.
Biophysical Journal, Volume 77, Issue 2, 775-788, 1 August 1999
doi:10.1016/S0006-3495(99)76931-6
Channels, Receptors, and Transporters
Stewart R. Durell*, Yili Hao*, Tatsunosuke Nakamura#, Evert P. Bakker§ and H. Robert Guy*,
, 
* Laboratory of Experimental and Computational Biology, Division of Basic Sciences, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892-5677 USA
# Laboratory of Membrane Biochemistry, Faculty of Pharmaceutical Sciences, Chiba University, Inage-ku, Chiba 263, Japan
§ Abteilung Mikrobiologie, Universität Osnabrück, D-49069 Osnabrück, Germany
Address reprint requests to Dr. H. Robert Guy, Laboratory of Experimental and Computational Biology, National Cancer Institute, National Institutes of Health, Bldg. 12B, Rm. B116, 12 South Drive, MSC 5677, Bethesda, MD 20892-5677. Tel.: 301-496-2068; Fax: 301-402-4724.Regulation of ion gradients across the plasma membrane is a requirement of all living cells. Much of this is accomplished by membrane channel proteins that allow ions to diffuse passively down their electrochemical gradients, and by membrane transport proteins that use energy to transport ions actively against their electrochemical gradients. There have been numerous suggestions that in some active ion transporters, ions may diffuse most of the way across the membrane through a “pore” (Jardetzky, 1966,Lauger, 1979,Su et al). This hypothesis has gained support from findings that several proteins that are homologous to transporters act as channels: the Cystic Fibrosis Conductance Regulator (CFTR) is a Cl− ion channel, even though its primary sequence is homologous to the ABC superfamily of transporters (Anderson et al); the glutamate transporters (Larsson et al) and norepinephrine transporters (Galli et al) apparently act as channels under some conditions; the Kef family of bacterial K+ channels (Booth et al) is homologous to the NapA Na+/H+ family of antiporters (Reizer et al); and the Kir inward rectifying K+ channel associates with a Sur protein that is homologous to the ABC transporters (Ashcroft and Gibble, 1998). Symporters are transporters in which the transport of one ion or molecule against its electrochemical gradient is “powered” by the movement of another ion or molecule down its electrochemical gradient in the same direction through the membrane. Plausible mechanisms for symport, in which even the actively transported ion diffuses most of the way through the transmembrane protein, are discussed in the accompanying manuscript (Durell and Guy, 1999).
This report provides indirect evidence, from analysis of the sequences, for homology and common structural features between the superfamily of K+ channel proteins and four K+ symporter protein families. The four symporter families are 1) the K+-translocating TrkH subunit from the Trk systems of both bacteria and archaea (Schlösser et al,Schlösser et al,Stumpe et al), 2) the KtrB subunit from a recently described KtrAB system in eubacteria (Nakamura et al) (previously identified as NtpJ by Takase et al, and Clayton et al), 3) the Trk1,2 proteins from yeasts and Neurospora (Gaber et al,Ko and Gaber, 1991,Lichtenberg-Fraté et al,Haro et al) and 4) the HKT1 protein from wheat (Schachtman and Schroeder, 1994,Wang et al) and a homologue from Arabidopsis (Washington University Genome Sequencing Center, 1998 [The A. thaliana Genome Sequencing Project, http://genome.wustl.edu/gsc/arab/arabidopsis.html]; Bevan et al., 1999 [EU Arabidopsis sequence project, unpublished; accession no. CAB39784]). For the purpose of this analysis, the fungi and plant symporters are grouped into a single eukaryotic family called Trk-euk. The current supposition is that functional Trk-euk proteins are formed from a single type of subunit, although the structural similarities outlined below may force some reconsideration. HKT1 symport in wheat is dependent on Na+ (Rubio et al,Diatloff et al), and TKHp symport in the fission yeast Schizosaccharomycespombe (which is closely related to the budding yeast Trk1,2 system) is dependent on H+ (Lichtenberg-Fraté et al) (although a possible role for Na+ has not been excluded). In comparison, the functional forms of the bacterial Trk and KtrAB systems are clearly more structurally complex; both comprise multiple subunit types (Stumpe et al; Nakamura et al,Nakamura et al). Trk cotransports H+ with K+ (Stumpe et al), whereas KtrAB is Na+ linked (Tholema et al).
Additional evidence of the homology between these symporter and channel proteins comes from the development of 3D atomic-scale models of the transmembrane regions, which is presented in the accompanying paper. Specifically, it is found that the pattern of amino acid residue conservation within each symporter family is consistent with the structural fold and ion-selective mechanism employed by the superfamily of K+ channel proteins.
The ability to compare the symporter and channel proteins is now greatly enhanced by the recently determined crystal structure of the transmembrane component of the KcsA K+ channel from Streptomyces lividans (Doyle et al), which certifies the basic structural and functional roles of the different channel segments. Perhaps most importantly, this has confirmed the role of the P segment in forming the outer portion of the pore and the ion selectivity filter, which was previously predicted by indirect theoretical and experimental methods (see accompanying paper for details). Specifically, the four P segments (one from each of the four channel subunits) are arranged with fourfold symmetry around the axis of the pore, with each in the same hairpin conformation and dipping into the outer portion of the transmembrane region from the extracellular side. The first arm of the hairpin (P1) is an α-helix that slants toward the center of the channel, and the second arm (P2) is an extended α-structure (the backbone alternates between right- and left-handed α-helix conformations; Guy and Durell, 1995) that rises out of the channel along the axis. Collectively, the four P2 segments form the narrowest portion of the pore, which consequently acts as the selectivity filter. The K+ binding sites are formed by the backbone carbonyl oxygen atoms of conserved “signature sequence” residues of the four P2 segments. The full P-segment hairpin (P1+P2) is located between two hydrophobic transmembrane helices (M1 and M2) that together form the MPM (or 2TM) motif. This contrasts with the 6TM motif in many other types of K+ channels (e.g., the voltage-gated Shaker channel protein), in which the MPM structure is preceded by four additional hydrophobic transmembrane segments (Uozumi et al,Shih and Goldin, 1997).
The first hint of homology between symporter and channel proteins came from the sequence analysis work of Jan and Jan, 1994, who postulated that TrkH has two P-like segments similar to those of K+ channels. This led Stumpe et al to propose a transmembrane topology for TrkH that contained a MPM motif at both the N- and C-terminal ends of the transmembrane region of the sequence. While searching the databases for possible bacterial K+ channels, the group of Guy found that some specific MPM channel sequences were actually more similar to portions of some K+ symporters than to other K+ channel proteins (see Fig. 1). Surprisingly, the matching portion in these symporter sequences was not at the P regions identified by the Jan and Jan group, but rather at an intermediate location. As described below, further sequence analyses led the Guy and Nakamura-Bakker groups independently to the notion that these symporters actually comprise four sequential MPM motifs (designated MPMA, MPMB, MPMC, and MPMD).
This arrangement of primary structure suggests the process of gene duplication, similar to the evolutionary schemes deduced for related Na+, Ca+2, and some K+ channel proteins. For example, the TWIK (or 2×2TM) type of K+ channels have two MPM motifs within each of two identical subunits (Lesage et al), the yeast TOK (or DUK1) channel subunit has a 6TM motif followed by an MPM motif (Ketchum et al,Reid et al), and both Na+ and Ca+2 channels have four consecutive 6TM motifs within their primary pore-forming subunits (Noda et al). Finally, the hypothesis of homology between the channel and symporter proteins is also supported by sequence similarity between the cytoplasmic domain of many of the bacterial K+ channels and the 120-residue NAD-binding domains in cytoplasmic subunits of the Trk and KtrAB symporter complexes, e.g., TrkA and KtrA (Schlösser et al; Nakamura et al,Nakamura et al).
The four families of homologous bacterial K+ channel and symporter sequences were obtained by a combination of motif and keyword searches of the NCBI's Genbank and microbial databases (see NCBI BLAST: Unfinished Microbial Genomes (http://www.ncbi.nlm.nih.gov/BLAST/unfinishedgenome.html) and NCBI PSI-BLAST (http://www.ncbi.nlm.nih.gov/cgi-bin/BLAST/nph-psi blast)). The motif searches were carried out using the ion-selective, P-segment of K+ channels as the seed for gapped BLAST and PSI-BLAST procedures (Altschul et al). The resultant multiple sequence alignments of MPM motifs were then manually adjusted to emphasize the common features among all proteins. This involved matching features within each protein family locally and between the four families globally. Because of variability within the loop regions, our primary concern was alignment of the three main segments (i.e., M1, P, and M2) with as few gaps as possible. In sum, the data consisted of 13 multiple sequence alignments of MPM motifs—one from the K+ channels and four from each of the three symporter families—each containing three subalignments corresponding to the M1, P, and M2 segments.
To quantify the similarity among motifs, each multiple sequence alignment was converted into a numerical profile matrix. This was carried out according to the methods described by Henikoff and Henikoff, 1996 for creating a log-odds position-specific scoring matrix (PSSM). These procedures estimate the residue frequencies at each position for the entire population of related proteins in nature from the limited and nonrandomly sampled set of known sequences used in the alignments. Briefly, the steps were 1) weighting the observed counts of each residue by the calculated redundancy of the parent sequence in the multiple sequence alignment (Henikoff and Henikoff, 1994), 2) adding “imaginary” pseudo-counts to the sequence-weighted counts according to the residue diversity at each location and empirically determined residue substitution probabilities (BLOSSUM; Henikoff and Henikoff, 1992), 3) normalizing these composite counts by the expected frequency of occurrence of the specific residue (estimated from the amino acid composition of the Swiss-Prot sequence database; Bairoch and Apweiler, 1998), and 4) taking the logarithm of these normalized counts to obtain the PSSM score for each of the 20 residues at each location. Throughout this analysis, effort was directed toward determining the sensitivity to the multiplication factor used for the total number of pseudo-counts, which determines the relative proportion to the weighted sequence counts of the alignments. Because the effect on the final results was minimal, the recommended value of 5 was used (Henikoff and Henikoff, 1996).
Quantification of the similarity of each pair of PSSMs of the same segment type was performed according to the methods of Pietrokovski, 1996. This entailed calculating the Pearson's correlation coefficient for each pair of aligned profile columns and then adding the coefficients to obtain the total raw score. For the purpose of comparison, the raw scores were converted into Z scores, which is the number of standard deviations it is away from the mean of a distribution of best raw scores obtained by chance for unrelated protein families. Because there is a dependence of the raw score on the length of the segment, it is necessary to have a series of best chance score distributions corresponding to each possible segment length. Such distributions were calculated by full enumeration of every possible pair of the 3670 PSSMs of multiply aligned sequences in the Blocks 10.1 database (Henikoff and Henikoff, 1991), which resulted in over 6.7million chance scores. The database was previously modified by the removal of compositionally redundant blocks, and the sequence columns of one of each pair of PSSMs was randomly shuffled to eliminate bias in the results (Pietrokovski, 1996). Multiple collections of chance distributions were calculated for each set of trial PSSM creation parameters used to examine the sensitivity of the results (described above). Finally, assuming the distributions to be normal, the probability that a particular score would occur by chance was determined from the definite integral of the Gaussian probability distribution (Bevington, 1969). For example, the probability of obtaining Z scores of 2, 3, 4, 5, and 6 for unrelated protein families would be approximately 5, 3×10−1, 6×10−3, 6×10−5, and 4×10−9%, respectively.
To provide a reference in the context of membrane proteins, comparisons of the bacterial 2TM channels and symporters were also made with the transmembrane segment blocks of 19 bacteriorhodopsin homologues and other ion channel proteins. Whereas the bacteriorhodopsin sequences were taken to be evolutionarily unrelated, the channel proteins, which included the TWIK family from C. elegens, IRK family from eukaryotes, and Na+ channel family were expected to have various degrees of homology. In the calculations, full enumeration was used to find the contiguous segment of at least four residues with the highest Z score. The only exception was for comparison of the bacterial 2TM channel and symporter families themselves, for which the relative overall alignments of the blocks were kept the same as shown in Fig. 2, A and B. Within this restriction, enumeration was again used to find the highest Z-scoring subsegment of at least four residues.
Fig. 2, A and B, displays the global alignment of all of the MPM motifs used to study the evolutionary relationships within and between the four families of K+ channels and symporters. Although only consensus sequences are used in the figure for clarity, all analyses were conducted on the full set of multiply aligned sequences (see the Appendix for the list). The 13 consensus sequences correspond to the single MPM motif in the channels and the four MPM motifs from each of the three symporter families. In all parts of the figure, the spectrum from red to blue/black represents the range of residues from conserved to variable. Figure 2A presents the global pattern of conservation among the 13 consensus sequences, and Figure 2B presents the local pattern of conservation among the sequences used to generate each of the consensus sequences. As seen in Figure 2A, the alignments of the P and M2 segments were keyed to the highly conserved residues shown in red and orange. In contrast, alignment of the M1 segments was more difficult because of the lack of global conservation. Interestingly, the variable and highly hydrophobic nature of this segment in both the channel and symporter sequences is consistent with its position in the KcsA crystal structure, i.e., the four M1 helices are on the periphery of the protein, they are largely lipid exposed, and they do not directly form the pore structure. Consequently, these segments were initially aligned by simply matching the hydrophobic regions with as few insertions or deletions as possible. Then, as seen in Figure 2B, finer adjustments were made to align the conserved residues within each family.
Subsequently, special emphasis was given to the latter portion of the M1 segment, because this region in the KcsA structure packs closely to the crucial P segments in the crystal. For the 2TM bacterial K+ channels, the two most highly conserved residues in the C-terminal half of M1 are a glycine located nine residues before the end and a glutamate located right at the end (see Figure 2B). Thus most M1 segments of the symporters were aligned so that a small residue, usually glycine, coincides with the channel glycine. In MPMC and MPMD of KtrB and Trk-euk, a glutamate or aspartate at the end of M1 aligned with the highly conserved channel glutamate. In the few cases where these criteria were insufficient, the more highly conserved and/or more hydrophilic symporter residues in M1 were aligned with the more highly conserved channel residues that are oriented toward the protein and away from the lipid in the KcsA structure.
As indicated by the single red column in Fig. 2, A and B, the most highly conserved residue among the channel and symporter sequences is a glycine in the P region (the only exception being a serine substitution in the MPMA of the Arabidopsis protein). This provides an important evolutionary link between the protein families, in that this residue is known to play a major role in determining the ion selectivity for many classes of K+ channel proteins. For example, mutagenesis studies have found that this is the only residue in the Shaker K+ channel P segment that cannot be mutated to Cys in even one of the four subunits without loss of function (Lü and Miller, 1995). This can be explained by the findings in the KcsA structure that the backbone conformation of this glycine is energetically unfavorable for other types of residues and that the four backbone carbonyl oxygen atoms of the glycine residue—one from each of the four subunits—form an ion-binding site at the narrowest portion of the pore. Indeed, the functional significance of this glycine is further emphasized by the fact that this is the only residue that is identical among the set of 27 putative 2TM bacterial K+ channels (Figure 2B).
Next, the reddish orange columns in Figure 2A denote single residues, of the symporter P1 and M2 segments, that are identical among the consensus sequences in all but two locations. In P1 the phenylalanine aligns with the tyrosine (similarly aromatic) of the K+ channel consensus sequence. In the KcsA structure, that residue is a tryptophan, which combines with an adjacent P1 tryptophan and a P2 tyrosine in each of the four subunits to form an aromatic cuff around the selectivity filter (Doyle et al). For the M2 segment, the highly conserved residue is a glycine that is also well conserved among the channels. Its structural importance is indicted by the fact that in the KcsA structure it packs next to the innermost part of the P segment. This site may be important for channel gating, as well as selectivity, because the inner portion of M2 in the KcsA structure moves closer to the pore as the channel closes (Perozo et al,Perozo et al).
Other residues that are conserved moderately well are indicated by letters in black type. These include 1) the threonine-rich region of P1 preceding the fully conserved glycine, which is strongly conserved among the K+ channels and, to a lesser degree, among the symporters; 2) a DAL sequence conserved in many of the symporters, lying just before the highly conserved aromatic P1 residue; 3) the ILLML consensus sequence preceding the conserved M2-glycine, which is strongly conserved among the channels and partly conserved among the symporters; and finally, 4) numerous leucines that appear to be well conserved in both the M1 and M2 segments. However, such leucine matches cannot be securely interpreted as indicative of homology, because leucine is the most frequently occurring residue in the hydrophobic regions of transmembrane helices (Hofmann and Stoffel, 1993). Note in Figure 2B, for example, that many of the M1 leucines are not well conserved within each family.
Related to the conservation of specific residue types, homologous relationships are also indicated by the similar patterns of residue conservation for each of the protein families. This is demonstrated in Figure 2B, in which the consensus sequences are color coded according to the degree of conservation within each family (i.e., among the sequences used to develop the consensus sequences), or, in the case of the red and orange colors used for the symporters, by the similarity of the consensus sequences among the three families of symporters (see legend for code).
Red to orange denotes residues that are conserved in two or more families.
Yellow to green indicates residues that are well conserved within the family, but not conserved between the families.
Blue to black represents residues that are poorly conserved within the family.
As seen, the same general pattern of sequence conservation is repeated within each MPM core of the bacterial K+ channels and the three symporter families. More specifically, the poorly conserved M1 segments are followed by highly conserved P segments, which are followed by the center-region-conserved M2 segments. Furthermore, all linkers between segments are poorly conserved, with numerous insertions and deletions. Close inspection reveals that most of the globally conserved residues identified in Figure 2A are located within the regions of local conservation.
A separate comparison is also made in Figure 2B for the two plant symporters (wheat and Arabidopsis), which are colored according to the conservation between them and in relation to the fungal sequences (see legend for code).
Another line of evidence supporting homology among these proteins involves the separate subunits of the KtrAB and Trk symporters that contain dinucleotide-binding domains: i.e., KtrA and TrkA. These are peripheral membrane proteins (Bossemeyer et al; Nakamura et al,Nakamura et al), which are probably located at the cytoplasmic side of the membrane. The KtrA subunit is homologous to the dinucleotide-binding site sequences of many other proteins and combines with the transmembrane KtrB protein to form a functional symporter. The TrkA subunit, however, is more complicated. Except for three archaeal TrkA species that also contain only one dinucleotide-binding domain, all other TrkAs have two dinucleotide-binding sites contained in each of two similar subdomains (Stumpe et al,Nakamura et al,Kawarabayasi et al). In addition, TrkA interacts with multiple protein subunits, in addition to TrkH, to form the functional symporter (Dosch et al,Parra-Lopez et al,Stumpe et al,Nakamura et al). It must be noted, however, that only the E. coli TrkA protein has actually been demonstrated to bind NAD+ and/or NADH in vitro (Schlösser et al). In addition, it is not yet known whether dinucleotides influence the transport activities of the proteins in vivo.
Database searches indicated that the closest sequences to the KtrA and TrkA subunits are C-terminal portions of some 2TM bacterial K+ channels and of some members of the Kef family of K+ channels (Munro et al,Stumpe et al). The close homology between these sequences is evident in the alignment of representative samples shown in Figure 2C, in which there is 44% identity over a stretch of ∼120 residues. This is the longest segment for which the sequences can be aligned unambiguously. It contains only one complete dinucleotide-binding domain, which appears to be a chimera between the N-terminal segment of the NAD+-binding domain of malate/lactate dehydrogenase-like proteins and the C-terminal segment of the NAD+-binding domain from the glyceraldehyde-3-phosphate dehydrogenase-like proteins (Schlösser et al,Stumpe et al). Such findings suggest that the small KtrA and TrkA subunits may have derived from the cleavage of a covalently attached C-terminal region of an ancestral 2TM K+ channel.
Although most eukaryotic and some bacterial K+ channels (including the KcsA protein) lack an intrinsic dinucleotide-binding domain, various other K+ channels are found to have more distantly homologous sequences at the C-termini. These include some putative bacterial channels of the 6TM type (e.g., Kch from E. coli; Parra-Lopez et al), the high-conductance Slo-type channel from animal cells (Parra-Lopez et al,Stumpe et al), and the newly identified channel-like sequence from Aquifex aeolicus (Deckert et al). Moreover, proper β-subunits from a variety of plant and animal K+ channels have redox function; and some, such as the Shaker K+ channel β-subunit, align nicely with the eight-stranded β-barrel structure of NAD(P)H-dependent oxidoreductases (McCormack and McCormack, 1994,Jan and Jan, 1997).
The statistical analysis was intended to determine the following: the degree of homology among 1) the three symporter families, 2) the bacterial 2TM channels and the symporters, and 3) the four MPM motifs of each symporter family. As described in the Methods, the results are given as Z scores, which are the number of standard deviations the raw score is from the mean of best chance alignments for segments of the same length. The greater the Z score, the more similar the sequence profiles are, and the less likely the alignment is to occur by chance. As also described, the alignment of the bacterial 2TM channel and symporter motif blocks was the same as represented by the consensus sequences in Fig. 2, and the reported score is the highest of all possible subsegments of at least four contiguous residues. Furthermore, the linkers in each motif have been excluded because of their extreme variability, leaving the three primary segments (i.e., M1, P, and M2) to be treated individually.
For interpretation of these results it is important to consider that membrane proteins share some basic properties independent of their evolutionary relationships. For example, our experience with this methodology suggests that comparison of any two transmembrane segments in which nonpolar residues predominate will result in a positive similarity score. Thus, to determine a baseline control of this effect, each segment block of the bacterial 2TM channels and symporters was compared to each of the seven transmembrane segments of an alignment of 19 bacteriorhodopsin homologs (Horn et al). This latter family was judged an ideal membrane protein control for the following reasons: 1) they are bacterial proteins, 2) the structure of one member of the family is known (i.e., bacteriorhodopsin), 3) they lack P segments and are unrelated to K+ channels, 4) they have multiple transmembrane segments, 5) there are numerous homologs, and 6) the transmembrane segments of the homologs can be aligned with little ambiguity.
The comparisons of the three symporter families, i.e., KtrB, TrkH, and Trk-euk, are shown in Table 1. Only motifs at the same positions in the sequences were compared, rather than considering all possible cross-terms of motifs from different gene duplications. A general measure of the similarity for each of the three family comparisons is obtained by simply taking the average of the 12 scores of each group. This results in the similar average Z scores of 7.6 and 7.9 for the KtrB versus TrkH and Trk-euk comparisons, respectively, and the relatively low value of 5.0 for the TrkH versus Trk-euk comparison. Considering that the average for all of the comparisons with bacteriorhodopsin is 3.1, these results indicate statistically significant sequence similarities among almost all of the corresponding segments of the symporters. Furthermore, among the symporters, the fact that the Z scores are consistently lowest for the TrkH versus Trk-euk comparison supports the hypothesis that the KtrB family is more like the presumed common ancestor.
| Table 1 Statistical analysis of the similarity of the symporter families |
| Segment | |||||
|---|---|---|---|---|---|
| Symporters | M1 | P | M2 | ||
| KtrB vs. TrkH | |||||
| Motif A | 6.8 | 7.9 | 7.1 | ||
| Motif B | 4.3 | 7.3 | 8.4 | ||
| Motif C | 5.6 | 6.4 | 12.1 | ||
| Motif D | 7.9 | 7.7 | 9.8 | ||
| KtrB vs. Trk-euk | |||||
| Motif A | 6.1 | 10.3 | 6.2 | ||
| Motif B | 5.9 | 10.0 | 8.0 | ||
| Motif C | 6.8 | 10.9 | 6.5 | ||
| Motif D | 5.1 | 10.7 | 8.0 | ||
| TrkH vs. Trk-euk | |||||
| Motif A | 4.9 | 7.5 | 0.8 | ||
| Motif B | 5.0 | 3.5 | 6.1 | ||
| Motif C | 3.5 | 6.1 | 3.5 | ||
| Motif D | 4.6 | 7.6 | 6.5 | ||
| Control: Mean of all symporter segments vs. all 7 TMs of bacteriorhodopsin homologs. | |||||
| 3.8 | 2.2 | 3.2 | |||
| In Table 1,Table 2,Table 3, the values are given as Z scores, which are the number of standard deviations the raw score is from the mean of a distribution of raw scores for segments of the same length calculated from a database of unrelated protein families (see Methods). The more positive the Z score, the more likely it is that the two compared families are homologous. Control values for transmembrane helices of unrelated proteins are given at the bottom. In Table 1, the highest score among the three families for each segment is indicated in bold, and the lowest is underlined and italicized. |
At greater detail, it is interesting that in some instances the degrees of similarity for the three families depend upon which motif segments are being compared. For example, when the KtrB family is compared to the Trk-euk family, the four P segments are conserved substantially better than are the other two segments. (This pattern is similar to that found when different families of K+ channels are compared as shown in Table 2.) In contrast, when the KtrB family is compared to the TrkH family, most of the M2 segments are conserved to a greater extent than are the P segments. When related to the three-dimensional structure of KcsA, this indicates that the structures of the Trk-euk proteins are more similar to the KtrB proteins in the outer half of the transmembrane region (where the pore is formed by the P segments), and the structures of the TrkH proteins are more similar to the KtrB proteins at the inner half of the transmembrane region (where the pore is formed by the M2 segments). Overall, the M1 segments are found to have the least degree of conservation; the average of the four scores is 6.2 and 6.0 for the KtrB versus TrkH and Trk-euk comparisons, respectively, and 4.5 for the TrkH versus Trk-euk comparison. Again, this is consistent with the lesser structural role the M1 segment plays in forming the pore in the KcsA crystal structure.
| Table 2 Statistical analysis of the similarity of 2TM bacterial K+ channel segments to the three symporter families, eukaryote 2×2TM and 2TM inwardly rectifying K+ channels, Na+ channels, and bacteriorhodopsin homologs |
| Bacterial 2TM K+ channel segments | |||||
|---|---|---|---|---|---|
| M1 | P | M2 | |||
| Symporters | |||||
| KtrB A | 6.8 | 5.9 | 5.8 | ||
| KtrB B | 5.5 | 3.5 | 5.0 | ||
| KtrB C | 8.1 | 5.4 | 6.3 | ||
| KtrB D | 7.4 | 5.1 | 7.7 | ||
| TrkH A | 6.5 | 5.3 | 2.3 | ||
| TrkH B | 6.2 | 5.4 | 5.8 | ||
| TrkH C | 5.6 | 5.3 | 4.9 | ||
| TrkH D | 7.2 | 5.3 | 5.0 | ||
| Trk-euk A | 4.9 | 4.5 | 4.5 | ||
| Trk-euk B | 5.2 | 3.6 | 5.5 | ||
| Trk-euk C | 6.4 | 4.5 | 2.4 | ||
| Trk-euk D | 3.6 | 3.7 | 4.8 | ||
| Controls with other K+ channels | |||||
| 2×2TM MPMA | 6.5 | 10.6 | 6.3 | ||
| 2×2TM MPMB | 7.2 | 11.7 | 5.1 | ||
| Euk 2TM IRK | 4.5 | 5.6 | 4.7 | ||
| Controls with Na+ channel S5, P, and S6 segments | |||||
| Repeat I | 3.5 | 3.1 | 4.3 | ||
| Repeat II | 5.1 | 2.5 | 4.7 | ||
| Repeat III | 4.4 | 2.5 | 3.1 | ||
| Repeat IV | 4.5 | 2.0 | 4.3 | ||
| Controls with bacteriorhodopsin homologs | |||||
| Mean of 7 TMs | 4.1 | 2.8 | 3.8 | ||
| Bold indicates that the Z value is at least 2.0 greater than the control with bacteriorhodopsin homologs; underlined italics indicate that the Z score is less than 1.0 greater than the control. |
Table 2 shows the results when the M1, P, and M2 segments of the bacterial 2TM K+ channels are compared with the proposed analogous segments of the symporters. These results support our hypotheses that the four putative MPM motifs of the symporters are related to the MPM motif of the bacterial K+ channels and that the KtrB family is closest to the presumed ancestor. Specifically, three of the four MPM motifs of the KtrB symporters are found closest to the single MPM motif of the channels, scoring substantially higher (at least 2.0 points) than the control comparisons with the bacteriorodopsin segments. The exception is the MPMB motif, which instead scores highest for the TrkH family. As is expected for the shift from prokaryotic to eukaryotic species, the Trk-euk family is clearly the most distant from the bacterial K+ channels, with only one segment scoring more than two points higher than the control. It is also seen that in general the evolutionary distance between the channels and symporters is larger than among the three symporter families themselves (Table 1). Using a simple measure, the 12-score averages in Table 2 for the similarities between the channels and symporters are 6.0, 5.4, and 4.5 for the KtrB, TrkH, and Trk-euk families, respectively. The only exception is the score for the TrkH versus Trk-euk symporter families (i.e., 5.0), which indicates a greater distance than that between the channels and the KtrB and TrkH families.
To provide further insight into the calculated evolutionary distances, the bacterial 2TM K+ channel sequences were compared to three other ion channel families. These were 1) the relatively similar TWIK or 2×2TM family of K+ channels from C. elegans (which has two consecutive MPM motifs per subunit), 2) the more distantly related IRK K+ channel family from eukaryotes, and 3) the homologous S5-P-S6 regions of the Na+ channel family (which have P segments selective for Na+ instead of K+). As expected, the scores of the P segments of the 2×2TM K+ channels were significantly closer to those of the bacterial 2TM channels than were those of the symporters; however, the scores for the M1 and M2 segments were about the same as for those of the KtrB family. Surprisingly, for the IRK family the M1 and M2 scores were about two points lower than the averages for the KtrB symporters, and the score for the ion-selective P segments was only slightly higher (i.e., 0.6 and 0.3 greater than the averages for the KtrB and TrkH families). Moreover, the scores for the analogous regions of the four motifs of the Na+ channel family were on average no greater than those for the unrelated bacteriorhodopsin family. Despite the difference in P-segment ion selectivity, this is somewhat surprising, because the voltage-gated Na+ channels are thought to have evolved from voltage-gated Ca+2 channels, which in turn are thought to have evolved from voltage-gated K+ channels (Strong et al). Thus the finding that the KtrB and TrkH families score substantially higher than do the distantly related IRK and Na+ channel families supports the hypothesis that the symporter and bacterial 2TM channel families are homologous.
Table 3 displays the calculated similarities of the four MPM motifs within each of the three symporter families individually. The 18-score averages from Table 3 are 6.8, 5.2, and 4.4 for the KtrB, TrkH, and Trk-euk families, respectively. Thus comparison with Table 2 indicates that the four symporter MPM motifs are almost as similar to the bacterial 2TM K+ channel MPM motifs as they are to each other. For example, the average score from Table 3 is 0.8 greater than that from Table 2 for the KtrB family, but is 0.2 and 0.1 smaller for the TrkH and Trk-euk families. As can be seen by the pattern of bold numbers, all of the KtrB segments score substantially higher to each other than to the bacteriorhodopsin controls. This strongly supports the premise that the four MPM motifs are indeed homologous and are likely due to gene duplications. Although Table 3 indicates less similarity for the M1 and M2 segments of the other two symporter families, the strong case for mutual homology with KtrB seen in Table 1 supports the extension of this conclusion to the TrkH and Trk-euk proteins. In addition, the fact that the majority of the scores in Table 3 are highest for the KtrB family (15 of 18) is consistent with the hypothesis that this family is the closest to the common ancestor, because it indicates the least divergence of the four gene repeats. Likewise, the finding that 12 of the 18 scores are lowest for the eukaryotic Trk-euk family is consistent with it being the most divergent from the prokaryotic progenitor. Unfortunately, the pattern of conservation is not clear enough to predict the order of the motif duplications. That is, the pattern of high and low scores is not uniform among the three segments of the MPM motifs, nor is it uniform for the three families. For example, MPMA and MPMD are most similar in the KtrB family, but are the least similar for the TrkH and Trk-euk families.
| Table 3 Statistical analysis of the similarity between the four motifs of the TrkH, KtrAB, and Trk-euk symporters individually |
| Segment | |||||
|---|---|---|---|---|---|
| Symporters | M1 | P | M2 | ||
| KtrB | |||||
| Motifs | |||||
| A vs. B | 6.4 | 5.8 | 8.0 | ||
| A vs. C | 7.3 | 5.7 | 5.9 | ||
| A vs. D | 7.4 | 7.4 | 9.9 | ||
| B vs. C | 6.9 | 6.9 | 6.1 | ||
| B vs. D | 6.8 | 6.4 | 6.0 | ||
| C vs. D | 8.7 | 4.2 | 7.4 | ||
| TrkH | |||||
| Motifs | |||||
| A vs. B | 5.6 | 5.7 | 3.3 | ||
| A vs. C | 4.3 | 6.3 | 4.5 | ||
| A vs. D | 4.8 | 5.6 | 3.6 | ||
| B vs. C | 5.0 | 5.5 | 5.9 | ||
| B vs. D | 6.4 | 5.4 | 6.3 | ||
| C vs. D | 5.4 | 4.9 | 5.7 | ||
| Trk-euk | |||||
| Motifs | |||||
| A vs. B | 6.9 | 4.8 | 4.9 | ||
| A vs. C | 4.1 | 6.4 | 3.1 | ||
| A vs. D | 3.7 | 3.2 | 4.0 | ||
| B vs. C | 6.0 | 5.1 | 2.8 | ||
| B vs. D | 4.0 | 4.8 | 3.6 | ||
| C vs. D | 4.6 | 5.3 | 2.3 | ||
| Control: Mean of all symporter segments with all 7 TMs of bacteriorhodopsin homologs | |||||
| 3.8 | 2.2 | 3.2 | |||
| Bold indicates that the Z value is at least 2.0 greater than the control with bacteriorhodopsin homologs; underlined italics indicate that the Z score is less than 1.0 greater than the control. |
Based on this analysis of the sequences, an evolutionary relationship between the different channel and symporter families is deduced as shown in Fig. 3. Specifically, a single prototype MPM transmembrane motif (left) underwent a fourfold gene duplication and gene fusion to form a K+ symporter protein ancestor (center). Furthermore, the cytoplasmic dinucleotide domain of the K+ channel ancestor may have split off to form a separate dinucleotide-binding subunit that associates with the symporters. Most members of the KtrB family (right) of eubacteria have remained similar to this ancestral protein. However, KtrB's from two Mycoplasma species contain additional extracellular domains between the M1 and P1 segments of the first three MPM motifs, and KtrB (NtpJ) from Trepanoma pallidum contains two additional transmembrane domains preceding the intracellular N-terminus (not shown). The TrkH family (top) in bacteria and archaebacteria, which also has two additional transmembrane helices at the N-terminal (unique and different from those in T. pallidum KtrB), has diverged more than have most members of the KtrB family. The TrkA subunit probably underwent an internal gene duplication to produce two dinucleotide-binding domains. The Trk1,2 family in fungi (bottom) has diverged even more. Its members have an extra long cytoplasmic loop between MPMA and MPMB, and a smaller, linker-like insert between MPMC and MPMD. The two plant sequences (bottom right) are only slightly closer to the Trk1,2 sequences than to KtrB and should probably be considered a separate family. At present, the eukaryotic symporters are still not known to have a dinucleotide-binding subunit.
Paleontologists often search for evidence of links between distantly related groups of organisms. For example, the discovery of a subgroup family of dinosaurs that have feathers can establish the evolutionary link with modern-day birds (Ji et al). Although there is no fossil record for molecular evolution, a similar method can be used to establish links of distantly related proteins: i.e., by determining subgroups that have intermediary sequences, structures, and/or functions. In this and the accompanying paper, it is argued that the bacterial KtrAB and 2TM K+ channel protein families serve such a function, in that they link the K+ channels with the distantly related K+ symporter proteins.
Although we believe the sequence comparison and model building methods presented in the accompanying and present papers can be generalized constructively to other protein systems, care must be taken to avoid certain pitfalls. For example, studies and intuition concur on the benefit of using profiles of families over individual sequences to identify the homology of distantly related proteins (Tatusov et al,Henikoff and Henikoff, 1996). Unfortunately, however, this is not an automatic procedure. Beyond selection of the specific profiling and comparison scoring methods, judgment is required in selecting the range of related sequences that make up each family group. Although it is obvious that a profile of nearly identical sequences does not contain much added information, it can also be detrimental to form a profile of too diverse a grouping (as might occur in a larger superfamily). For example, comparison of the KtrB symporter and bacterial 2TM K+ channel profiles convincingly indicates an evolutionary relationship between these two protein families. However, the results are considerably more tentative for the Trk-euk symporter family, in which the scores of the M1 and M2 segments are not very similar to those of the K+ channels, or even to themselves in the different MPM motif repeats. Likewise, comparisons of the symporters to distantly related families of K+ channels, such as the IRK family, indicate little similarity (data not reported). Thus a profile that combined all of the symport families and/or that combined all of the K+ channel families would result in a weaker similarity score than that of KtrB versus bacterial 2TM K+ channels. This could lead to the erroneous conclusion that the symporter and channel proteins are not homologous. Rather, the case for the Trk-euk symporters being related to the channels comes indirectly through the strong score similarity that its P segment profiles have with the KtrB family (Table 1). The observation that the M1-P-M2 segments of bacterial 2TM K+ channels score no better with Na+ channel S5-P-S6 segments than they do with transmembrane segments of bacteriorhodopsin homologs suggests that this procedure is unable to detect distant homology for protein families in which the primary functional property (in this case ion selectivity of P segment) has changed.
It is important to note that there are other shared sequence properties between the bacterial 2TM K+ channel and symporter families indicative of an evolutionary relationship that are not quantified by the calculations presented here. For example, although the statistical analysis strongly suggests that the four MPM motifs of the KtrB symporters are homologous to each other as well as to that of the bacterial 2TM K+ channels, it does not take into account that the three constituent segments (i.e., M1, P, and M2) are always in the same order. Furthermore, no score is provided for the probability of finding the same number of MPM motifs in the symporter sequences as there are single-motif subunits in the channels. Similarly, no quantification is made for finding the similar patterns of residue conservation and polarity among the MPM motifs of the channels and symporters: e.g., the P segments are the most well conserved, whereas the M1 segments are the least well conserved. And finally, the statistical analysis also does not take into account that several highly conserved residues known to be functionally important in the channel proteins (most notably the glycines of the P2 segment responsible for ion selection) are also highly conserved in each of the four MPM motifs of the symporters. In the accompanying paper it is shown how these properties justify building 3D atomic-scale models for the three symporter families in which the four MTM motifs each have the same general fold as the single KcsA K+ channel subunit seen in the crystal structure.
An essential note of caution is that the data used for analysis in this paper are mostly from recently determined nucleic acid sequences. In only a few cases have experiments already been conducted to establish that the encoded proteins are actually expressed and that they have channel or symporter functions as predicted. This is particularly true for the putative 2TM bacterial K+ channels that contain C-terminal dinucleotide-binding domains. At present, there are no published data demonstrating that these specific genes encode functional channels rather than other types of transport proteins.
An important question is whether the transmembrane topology proposed here carries over to other families of transporters. Unfortunately, the simple method of constructing a hydropathy plot to predict the transmembrane topologies of these proteins is not very reliable and is not designed to identify P segments. To date, the transporter proteins that have been studied most extensively do not appear to have P segments. For example, the lactose permease protein has been experimentally determined to have 12 fully transmembrane segments (Lee and Manoil, 1996). Likewise, cryoelectron microscopy studies have indicated that the H+ (Auer et al) and Ca+2 (Zhang et al) P-type pumps each have 10 fully transmembrane segments. In addition, the Kef proteins appear to form a different class of bacterial K+ channels that lack the classic K+ channel P-segment “signature sequence,” but which do appear to have a dinucleotide-binding C-terminus (Booth et al). In fact, their transmembrane sequences appear to be more similar to those of the NapA Na+/H+ antiporters than to the channels (Reizer et al).
We thank Clifford Slayman for many helpful comments and assistance. Some preliminary sequences sequence data were obtained from the Institute for Genomic Research website at http://www.tigr.org and the NCBI website at http://www.ncbi.nlm.nih.gov/BLAST/unfinishedgenome.html.
The work in Osnabrück was supported by the Deutsche Forschungsgemeinschaft (SFB171) and the Fonds der Chemischen Industrie.
| Sources of sequences for Fig. 2, A and B |
| Accession no. | DBSOURCE: | Locus, Accession | |||
|---|---|---|---|---|---|
| Prokaryote 2TM K+ channels | |||||
| Organism | |||||
| Archaeoglobus fulgidus | 2648884 | GENBANK: | AE000988, AE000988 | ||
| Archaeoglobus fulgidus | 2649899 | GENBANK: | AE001055, AE001055 | ||
| Aquifex aeolicus | 2983007 | GENBANK: | AE000683, AE000683 | ||
| Bacillus caldotenax | 39429 | EMBL: | BCLCTB, X05066 | ||
| Bacillus stearothermophilus | 39974 | EMBL: | BSLCTB, X05067 | ||
| Bacillus subtilis | 1934805 | EMBL: | BSZ93936, Z93936 | ||
| Bacillus subtilis | 2522014 | DDBJ: | AB007638, AB007638 | ||
| Helicobacter pylori | 2313603 | GENBANK: | HPAE000564, AE000564 | ||
| Streptomyces coelicolor | 2808791 | EMBL: | SC7H1, AL021411 | ||
| Streptomyces coelicolor | 3413418 | EMBL: | SC10H5, AL031232 | ||
| Streptomyces lividans | 2127577 | PIR: | S60172 | ||
| Methanobacterium thermoautotrophicum | 2621577 | GENBANK: | AE000834, AE000834 | ||
| Methanobacterium thermoautotrophicum | 2622639 | GENBANK: | AE000912, AE000912 | ||
| Methanococcus jannaschii | 2493595 | SW-PROT: | Y13B_METJA, Q57604 | ||
| Methanococcus jannaschii | 2493596 | SWIPROT: | YD57_METJA, Q58752 | ||
| Mycobacterium tuberculosis | 2827610 | EMBL: | MTV014, AL021646 | ||
| Pyrococcus horikoshii | 3131937 | DDBJ: | AB009522, AB009522 | ||
| Synechocystis sp. | 1652235 | DDBJ: | D90904, D90904 | ||
| Synechocystis sp. | 1652933 | DDBJ: | D90909, D90909 | ||
| Unfinished genomesSource contig | |||||
| Chlorobium tepidum | gnl|TIGR|C.tepidum_292 | ||||
| Deinococcus radiodurans | gnl|TIGR|gdr_171 | ||||
| Deinococcus radiodurans | gnl|TIGR|gdr_52 | ||||
| Pseudomonas aeruginosa | gnl|PAGP|Contig502 | ||||
| Pseudomonas aeruginosa | gnl|PAGP|Contig536 | ||||
| Pyrococcus furiosus | gnl|UCHGR|MM64-00907 00907 | ||||
| Thermotoga maritima | gnl|TIGR|BTMDS92R | ||||
| Vibrio cholerae | gnl|TIGR|GVCCZ38F | ||||
| Prokaryote TrkH | |||||
| Organism | |||||
| Archaeoglobus fulgidus | 2649764 | GENBANK: | AE001046, AE001046 | ||
| Escherichia coli | 136239 | SW-PROT: | TRKG_ECOLI, P23849 | ||
| Escherichia coli | 1174773 | SW-PROT: | TRKH_ECOLI, P21166 | ||
| Haemophilus influenzae | 3212204 | GENBANK: | U32755, U32755 | ||
| Methanobacterium thermoautotrophicum | 2622378 | GENBANK: | AE000893, AE000893 | ||
| Methanococcus jannaschii | 2129329 | PIR: | D64485 gi|2129329 | ||
| Pyrococcus horikoshii | 3132095 | DDBJ: | AB009526, AB009526 | ||
| Thermoanaerobacter ethanolicus | 2581795 | GENBANK: | AF001974, AF001974 | ||
| Vibrio alginolyticus | 3288671 | DDBJ: | AE000743, AE000743 | ||
| Unfinished genomeSource contig | |||||
| Actinobacillus actinomycetemcomitans | gnl|OUACGT|A.actin_Contig574 | ||||
| Neisseria gonorrhoeae | gnl|OUACGT|Contig202 | ||||
| Neisseria meningitidis | gnl|Sanger|Contig395 | ||||
| Porphyromonas gingivalis W83 | gnl|TIGR|P.gingivalis_107 | ||||
| Pseudomonas aeruginosa | gnl|PAGP|Contig284 | ||||
| Pyrococcus furiosus | gnl|UCHGR|MM1-MM1 02861 | ||||