| Modeling a Self-Avoiding Chromatin Loop: Relation to the Packing Problem, Action-at-a-Distance, and Nuclear Context Structure, Volume 14, Issue 2, 2 February 2006, Pages 197-204 Michaël Bon, Davide Marenduzzo and Peter R. Cook Summary There is now convincing evidence that genomes are organized into loops, and that looping brings distant genes together so that they can bind to local concentrations of polymerases in “factories” or “hubs.” As there remains no systematic analysis of how looping affects the probability that a gene can access binding sites in such factories/hubs, we used an algorithm that we devised and Monte Carlo methods to model a DNA or chromatin loop as a semiflexible (self-avoiding) tube attached to a sphere; we examine how loop thickness, rigidity, and contour length affect where particular segments of the loop lie relative to binding sites on the sphere. Results are compared with those obtained with the traditional model of an (infinitely thin) freely jointed chain. They provide insights into the packing problem (how long genomes are packed into small nuclei), and action-at-a-distance (how firing of one origin or gene can prevent firing of an adjacent one). Summary | Full Text | PDF (404 kb) |
| What are the molecular ties that maintain genomic loops? Trends in Genetics, Volume 23, Issue 3, 1 March 2007, Pages 126-133 Davide Marenduzzo, Inês Faro-Trindade and Peter R. Cook Abstract The formation of genomic loops by proteins bound at sites scattered along a chromosome has a central role in many cellular processes, such as transcription, recombination and replication. Until recently, few such loops had been analyzed in any detail, and there was little agreement about the nature of the molecular ties maintaining these loops. Recent evidence suggests that loops are found in both prokaryotes and eukaryotes, and that the transcription machinery is a molecular tie. In addition, results obtained using site-specific recombination in bacteria and chromosome conformation capture in eukaryotes support the idea that active transcription units are in close contact. These data are consistent with a model for genome organization in which active polymerases cluster into transcription ‘factories’, which, inevitably, loops the intervening DNA. They are also consistent with the ties functioning as barriers, silencers, enhancers or locus control regions, depending on their positions relative to other genes. Abstract | Full Text | PDF (889 kb) |
| Beyond the Sequence: Cellular Organization of Genome Function Cell, Volume 128, Issue 4, 23 February 2007, Pages 787-800 Tom Misteli Summary Genomes are more than linear sequences. In vivo they exist as elaborate physical structures, and their functional properties are strongly determined by their cellular organization. I discuss here the functional relevance of spatial and temporal genome organization at three hierarchical levels: the organization of nuclear processes, the higher-order organization of the chromatin fiber, and the spatial arrangement of genomes within the cell nucleus. Recent insights into the cell biology of genomes have overturned long-held dogmas and have led to new models for many essential cellular processes, including gene expression and genome stability. Summary | Full Text | PDF (925 kb) |
Copyright © 2006 The Biophysical Society. All rights reserved.
Biophysical Journal, Volume 90, Issue 10, 3712-3721, 15 May 2006
doi:10.1529/biophysj.105.077685
Supramolecular Assemblies
Davide Marenduzzo*, Cristian Micheletti† and Peter R. Cook‡,
, 
* Mathematics Institute, University of Warwick, Coventry, United Kingdom
† International School for Advanced Studies (SISSA) and Istituto Nazionale Fisica della Materia (INFM), Trieste, Italy
‡ Sir William Dunn School of Pathology, University of Oxford, Oxford, United Kingdom
Address reprint requests to P. R. Cook, Sir William Dunn School of Pathology, University of Oxford, South Parks Rd., Oxford, OX1 3RE, UK. Tel.: 44-0-1865-275528; Fax: 44-0-1865-275515.Specific interactions between monomers (e.g., H-bonds) are known to mediate biomolecular assembly. Paradoxically, nonspecific entropic forces can also drive self-assembly. Thus, the environment within a living cell is crowded, with 20–30% of the volume occupied by macromolecules 1,2; then, aggregation of the largest particles can lower the free energy of the system through an increase in entropy of the many smaller particles 3. In Figure 1A, the centers of mass of the small spheres can access the yellow volume, but not the gray volumes surrounding each large sphere or abutting the perimeter wall. When one large sphere approaches another, these excluded volumes overlap (Figure 1A, overlap volume 1) and this allows the small spheres to access a greater volume. The resulting increase in entropy of the many small spheres generates what has been called a “depletion attraction” between the large ones. The attractive energy at contact is ∼3/2(D/d)n kBT, where D and d are the diameters of the large and small spheres, n is the volume fraction of the small spheres, kB is the Boltzmann constant, and T is the absolute temperature 3. The attraction falls to zero at a distance d between the two large spheres. A related attraction drives a large sphere to the surrounding wall (Figure 1A, overlap volume 2). Enough is known about these attractions that they are being used to model the formation of helices in proteins 4 and position particles within man-made nanostructures 5. The free-energy gains can be several kBT, which can be compared with the energy associated with a single van der Waals interaction (∼0.1kBT), a single H-bond (∼1.5kBT, or ∼1kcal/mol), and a covalent bond (10–100kBT).
Here we describe how this depletion attraction might drive genome organization. (For various models of genome structure, see Manuelidis 6, Cook 7, Sachs et al. 8, Marshall et al. 9, Munkel and Langowski 10, Belmont 11, Ostashevsky 12, and Kleckner et al. 13.) When DNA is replicated or transcribed, the resulting polymerizing complexes are large enough relative to the crowding agents that they will tend to aggregate. We consider a range of different complexes in bacteria and man, and in almost every case, the depletion attraction is sufficient to explain the observed organization—the clustering of active DNA and RNA polymerases into “factories” to form loops that may be several millions of basepairs in length 14,15,16,17.
We first review the original formulation of the entropic depletion attraction. Consider two hard (chemically noninteracting) spheres of diameter D dispersed in a solution of hard spheres of diameter d (usually the case d<D is considered). The center of mass of the small spheres is excluded from a shell surrounding the large spheres (Figure 1A). As one large sphere approaches the other, these excluded volumes overlap (Figure 1A, overlap volume 1) and the small spheres can access a greater volume; there is a net free-energy gain due to the increase in the entropy of the small spheres. The minimum of this potential is attained when the two large spheres are in contact, and is given by Asakura and Oosawa’s formula 3
![]() | (1) |
We now generalize to different and arbitrary shapes. The scale of the free energy gain depends significantly on the shape of the large objects. For example, Eq. (1) can be generalized to two different large spheres 3 with diameters D1 and D2, where D1>D2:
![]() | (2) |
A special case is that of a wall, in which D1=
; the overlap volume is larger than that with another sphere (Figure 1A, compare overlap volumes 1 and 2), so the resulting attraction is larger and given by
![]() | (3) |
Most biological interactions involve nonspherical objects like ligands that fit snugly into irregularly shaped receptors. In the most general case, theory 3 predicts that the free-energy gain for irregular objects is
![]() | (4) |
Most situations we discuss involve interactions between polymerases bound to DNA, and individual enzymes are modeled as hard spheres. However, we also discuss interactions between two clusters of polymerases where each cluster contains many enzymes (e.g., DNA polymerases in replication factories). In such cases, the biology suggests that individual enzymes intermingle when the two clusters come into contact; we call these clusters “soft,” and allow individual hard spheres in one cluster to intermingle on contact with their counterparts in the other. The result is one large cluster with the combined volume of the two original ones. This problem is complicated by the large number of possible arrangements of individual spheres within a cluster, and of one cluster relative to the other. Therefore, we restrict analysis to simple limiting cases. At the coarsest level, each cluster can be treated as one macrosphere with volume (or surface) corresponding to the total of all individual spheres. This approach is used for the “hard” gains in Fig. 2, rows 9–13. However, for Fig. 2, rows 3, 4, and 14 (hard gains), all polymerases are attached to DNA and a better model is obtained by considering the cluster of N polymerases as a linear (straight) succession of N closely packed beads; then, the free energy gained by putting two such clusters in longitudinal contact is N times the gain for two individual beads. This holds if the polymer is very stiff (i.e., its persistence length is larger than N times the diameter of a polymerase). If, on the other hand, the polymer is flexible, so that the cluster diameter is much larger than the persistence length (or if there are many individual spheres in one cluster), we allow individual spheres in one cluster to intermingle freely with their counterparts in the other (with the gain as in Eq. (5), below). This approach is used for the soft gains in Fig. 2, rows 3 and 9–14.
If the two clusters of large spheres (total diameter of each cluster=D) are soft and can fuse to give one larger sphere of size 21/3D (i.e., with conservation of volume), the entropic gain is proportional to the gain in volume excluded to the small macromolecules. This gain is given by
![]() | (5) |
If spheres in the two clusters are allowed to intermingle, the overlap volume is considerable and the entropic gain now depends on D2/d2 (Eq. (5)); this compares with D/d for hard spheres (Eq. (1)).
We now come to the central case of interest here (Figure 1B), which has not yet been analyzed: two large spheres threaded on a connecting (genomic) string. We assume the tethering string can be modeled as a polymer in a good solvent 19. Whether there is a net attraction between spheres depends on the balance between ΔFgain and ΔFloss, where ΔFgain is the entropic attraction between spheres (given by Eq. (1) or 5 for hard or soft spheres, respectively) and ΔFloss is the entropic penalty that must be paid to loop the string. This loss arises due to the tethering constraint, and is well approximated by 20,21:
![]() | (6) |
The constant c has been the subject of debate between theoretical physicists (see Hanke and Metzler 20 and references therein) and depends on loop conformation; it typically increases with string density from 1.5 for an ideal random walk or freely jointed chain, through 2.2 for the “four-legged” loop as in Figure 1B22, to higher values if the density is very high (below). l is loop length, and LK is the (statistical) Kuhn length of the string. ΔFr0 is a constant that is independent of loop length; it is physically related to the dimensions of the overlap volume (and so to the diameter of the small spheres), and to the range r0 of (short) distances between the two beads that we consider sufficient to form a loop. ΔFr0 for self-avoiding walks is generally estimated by simulation and can be significant in the cases we consider. Note that we consider the looping costs of both a freely jointed chain (in bacteria) and a self-avoiding loop (in eukaryotes); costs for the latter have not been determined previously.
The entropic attractions between two free or tethered spheres differ qualitatively in an important respect. The most probable state for two untethered spheres is to lie apart as they diffuse in three-dimensional space, and the fraction of spheres that do pair—fpairing—can be found using the van’t Hoff relation (neglecting three and higher body interactions):
![]() | (7) |
); this compares with the essentially complete pairing of two sets of 70 threaded polymerases (Results and Discussion).The E. coli genome is modeled as a freely jointed chain—a succession of infinitely thin penetrable segments, each of length LK of 0.3kbp (calculated assuming a persistence length for B DNA of 50nm 25). The eukaryotic chromatin fiber is modeled 26 as a self-avoiding tube (persistence length 40nm or ∼3.6kbp, assuming a packing of 1kbp/11nm). Note that the volume fraction, n, is known in bacteria but not in eukaryotes, whereas local DNA structure is known in eukaryotes but not in prokaryotes. As zig-zagging models have supplanted those involving 30-nm solenoids 27, tube diameter is set at 20nm in eukaryotes to reflect a wider zig-zagging fiber that can interpenetrate to some extent.
We model pro- and eukaryotic genomes differently mainly because the thickness/persistence length ratios are so different. In bacteria, there is no evidence of proteins bound stably to DNA, and DNA diameter (∼2.5nm) is smaller than persistence length (∼50nm); therefore, it seems appropriate to neglect thickness and use the analytically tractable freely jointed chain. In eukaryotes, we know that DNA is folded first into nucleosomes and then into higher-order structures; as a result, diameter (20nm) is a significant fraction of persistence length (40nm) and it seems more appropriate to use the tube model (which includes self-avoidance, but is less tractable analytically). Self-avoidance is included by ensuring that all circles going through any triplet of points taken along the tube center-line have radii larger than half the tube thickness 26. Calculation of looping costs requires Monte Carlo simulations, as existing theory does not enable us to compute ΔFr0 analytically. To calculate the looping probability, we adapt the method used previously to determine the probability that a point on a loop attached to one sphere might bind to a specified binding zone on the surface of that sphere 26. Here, we have two beads attached to each end of a flexible tube. We fix the position of the center of one bead, divide the surrounding volume into concentric shells of increasing radii, and compute for each pair of contiguous shells the conditional probability that the other end of the tube is found in the inner of these two shells, given that it is constrained to lie within the outer of the two shells.
In Fig. 2, values for ΔFloss in E. coli for equivalent structures tend to be higher than those for man. This arises for two reasons. First, bacterial DNA is less compact (above), so loops are longer (giving a higher entropic cost); if it proves to be more condensed, values for ΔFloss will be smaller. Second, the beads tend to have smaller diameters in bacteria, so values for ΔFr0—which depend on the range of distances between the two beads considered sufficient to form a loop—tend to be larger; they were 5.7kBT in Fig. 2, rows 3–5 and 7 (calculated assuming a depletion attraction in the range 10–15nm between sphere centers), 3.2kBT, 3.5kBT, and 3.5kBT (assuming a range of 43–48nm, 37–42nm, and 37–42nm) in Fig. 2, rows 6, 8, and 9, respectively. In Fig. 2, rows 11–17, we assumed interaction in the range between sphere centers of 30–35, 30–35, 75–80, 25–30, 25–30, 40–45, and 25–30nm, respectively, and calculated the entropic loss via Monte Carlo simulations 26.
For Fig. 2, rows 3 and 4, the distance between rrn operons is genome length (i.e., 4.6Mbp) divided by operon number (i.e., 7). In LB, there are ∼70 polymerases per operon 23, and ΔFgain is calculated assuming either that 70 closely packed impenetrable spheres lie in straight lines at each end of a 650-kbp thread (for hard), or that each one of the 70 hard spheres at one end can intermingle with any other sphere (for soft). These two extremes correspond to very stiff and very flexible threads, respectively, and the real situation is likely to lie in between. In contrast to other cases, here the gain given by the soft cluster (which is proportional to the number of polymerases exposed to the solvent on the surface) is smaller than that given by hard polymerases. For Figure 4C, we consider the topology in Figure 4B, and calculate the probabilities that different operons cluster together into f foci (where f is between 1 and 22). To make the problem tractable, we assume the following. 1), An observable focus corresponds to one operon (or more), with each associated with 70 polymerases tagged with green fluorescent protein (GFP) (note that 70% polymerases are engaged on rrn operons 23). 2), Neighboring operons cluster first, the next nearest neighbor is then added to the cluster, and so on. 3), We compute the separate probabilities of having fi foci for the four arms in the network (i.e., two arms containing rrnC,A,B,E,F,G,D and two with rrnC,A,B,E). Via the convolution of these quantities, we can find the probabilities of the whole system having f foci. 4), Operons are connected by a freely jointed chain (as ΔFr0 can be calculated exactly). We also assume a not-further-specified interaction between active operons, calculate the probability of observing f foci (with f=0–6 28), and adjust the interaction to fit the data. We have repeated the calculation assuming that two operons must be in the same site to be detected as a focus and found a slightly smaller value for the interaction (i.e., 13kBT instead of 16.5kBT). For Fig. 2, rows 5–8, average spacings between active polymerases are from M. Bon, S. McGowan, and P. R. Cook (unpublished). For Fig. 2, rows 5 and 7, a gain of 0.8kBT is nevertheless sufficient to increase the time spent together by 30%; the gain also doubles if transcripts are included as 10-nm hard spheres. If we model each polymerase, transcript, plus associated ribosomes as one 10-nm hard sphere (the polymerase) plus coplanar contacting hard spheres (diameter 21nm) representing ribosomes, the gain increases by 1.46kBT for each ribosome (estimated by considering the configuration where the two planar clusters are stacked in register so that equal-sized spheres are in contact).
For Fig. 2, row 9, we consider slowly growing cells with only two forks (as in Figure 4A); the entropic cost of looping is given by
![]() | (8) |
and
respectively, and then by taking the limit
of this quantity.For row 10, each fork is associated with a cluster of 25 hard spheres and is attracted to the membrane. ΔFgain (hard) is calculated assuming that each fork is associated with one larger hard sphere that can accommodate the 25 tightly packed spheres (when the entropic gain is given by Eq. (3)). We compute ΔFgain soft by comparing the volume excluded to the crowding macromolecules by a sphere cap abutting the wall, where the cap has the same volume as the 25 spheres. The gain is given by the maximum over h in the range [0,D] of the function:
![]() | (9) |
On the other hand, confining one of the forks in the topology of Figure 4A to a distance x0 from the wall costs some entropy, which if the chain is a freely jointed chain reads (l ≪ L, and both L and l are much larger than 1, with erf denoting the error function):
![]() | (10) |
![]() | (11) |
nm (the center of the cell) to within the range of the entropic attraction to the surface. The calculations leading to Eq. (10) are cumbersome but straightforward and are omitted here.For Fig. 2 rows 14–17, ΔFgain is found as for rows 3–8. For row 17, we model each polymerase, transcript, and spliceosome as three coplanar contacting hard spheres (a 15-nm polymerase, 20-nm transcript plus bound proteins, and 24-nm spliceosome). The free-energy gain is estimated by considering the configuration where the two planar clusters are stacked in register (so that equal-sized spheres are in contact). For rows 15–17, the entropy gain is less than the loss due to looping, and so is insufficient to ensure that the two transcription units are always together. However, the interaction is sufficient to drive a temporary association, which keeps the two together for a time, τpairing, which can be estimated using Kramer’s theory 25 applied to the potential resulting from the radial integration of the entropy depletion interaction 3, complemented with a Morse potential that forbids the two large spheres to interpenetrate more than 0.1nm. The resulting expression is
![]() | (12) |
For large spheres with a diameter of 10–20nm, τ0 is typically ∼5μs. This estimate is based on the assumptions that the friction experienced obeys Stokes’ law and the viscosity of the cell interior (η) is ∼10 centipoise 29,30. Applied to the case in Fig. 2, row 17, Eq. (12) provides an estimate for
of 0.3ms. We now consider cooperative effects as three large spheres cluster (Figure 5C). It appears natural to assume that the activation free energy leading to the breaking of the cluster involves the loss of two contacts at a cost of ∼8kBT. The estimated lifetime for the cluster is therefore ∼0.1s. Since the viscosity of the cell interior grows rapidly with particle size >∼25nm 29,30, this estimate (based on a nominal value for η) provides a lower bound for pairing time. We conclude that pairing lasts for a nonnegligible fraction of the ∼5min it takes to transcribe a typical human gene 31.
For Figure 5AB, we model each mRNA-producing complex as three coplanar, contacting, hard spheres (a 15-nm polymerase, 20-nm transcript plus bound proteins, and 24-nm spliceosome), although each triplet is represented as one bead in the figure. The simulation began with a linear string, the (final) attraction between any two triplets is modeled as a two-body square well with a width of 5nm and minimum equal to 4kBT (Fig. 2, row 17). In Figure 5B, the string was first compacted using an initial interaction of 8kBT.
The small crowding molecules in the cell have diameters (d) of ∼5nm, and a volume fraction (n) of ∼0.2; these commonly accepted values 1,32 will be used throughout, except for one extracellular case—the 100-mers (below). In Fig. 2, various cases are listed according to their complexity; each example is accompanied by an estimate (the “gain”) of how much the free energy is lowered upon contact of the large spheres.
To put our analysis in context, we first consider a simple example—the polymerization of two actin monomers. The major energy source driving actin polymerization comes from ATP hydrolysis; however, calculation shows that the depletion attraction makes a contribution even though it cannot provide directional assembly (which must be determined by other factors). Modeling monomers as noninteracting hard spheres (D=5nm) in the presence of many small spheres (d=5nm, n=0.2) gives an entropic gain (i.e., ΔFgain) of ∼0.5kBT (Fig. 2, row 1), compared to a measured free-energy change of 1–2kBT33,34. We conclude that the depletion attraction adds to other specific ones between molecules, and we will argue that the same is true of the cases discussed below. We can then calculate (using Kramer’s theory) that monomers remain paired for three times longer in the presence of crowding molecules (see Methods).
We now consider two of the simplest genomes. Current theories for the evolution of life involve RNA molecules able to catalyze their own synthesis 35,36. But in this “RNA world” lacking cell membranes, how are the critical components prevented from diffusing apart to maintain the high local concentrations necessary for continued evolution? Possible solutions include binding to charged surfaces, and capture within a confined space (e.g., a hydrothermal vent, a puddle on a charged surface). However, the depletion attraction could contribute. Thus, modeling two 100-mers of RNA as 4-nm spheres in a crowded solution of smaller molecules (d=1nm, n=0.2) gives an attraction (gain) of ∼1.4kBT (Fig. 2, row 2). Here, too, pairing lasts roughly three times longer than in the absence of the depletion attraction.
We now turn to the central case of interest here, where the two large spheres are threaded on a string; the spheres represent active polymerases and the string hydrated DNA (in prokaryotes) or a chromatin fiber (in eukaryotes). It is well known that specific interactions between spheres can drive genome looping. Thus, if two DNA-binding proteins present at ∼1nM interact together with a Kd of 10−7M (values typical for nuclear proteins), <1% will be complexed together in the absence of DNA 37. But if they bind to the same DNA molecule at sites 10kbp apart, the resulting local concentration ensures that two-thirds will be in the complex to loop the connecting DNA 37. Our central thesis here is that the nonspecific depletion attraction can also make a significant contribution in the crowded cell (Figure 1B). Whether aggregation occurs depends on the balance between the depletion attraction (i.e., ΔFgain; Eq. (1) in Methods) and ΔFloss (the entropic penalty that must be paid to loop the connecting string). This loss is well approximated by ckBT log(l/LK)+ΔFr0 (Methods). The constant c depends on loop conformation; it typically increases with string density from 1.5 for an ideal random walk or freely jointed chain, through 2.2 for the four-legged loop as in Figure 1B22, to higher values if the density is high. l is loop length, and LK the Kuhn length (a measure of string stiffness). Notice that we include self-avoidance in the case of the thick eukaryotic string (i.e., no two segments of the fiber are allowed to occupy the same volume). ΔFr0 is a constant that is independent of loop length; it is physically related to the dimensions of the overlap volume and the range r0 of distances between the two beads considered sufficient to form a loop (in our case ≤5nm). In the cases modeled here, the spheres are polymerases that remain irreversibly bound to their templates while active.
Two free (untethered) spheres in a crowded cell will diffuse in three-dimensional space and spend little time together, and the extent of the small paired fraction can be determined using van’t Hoff’s relation (Eq. (7) in Methods). If the two spheres are tethered to each other, the inevitable high local concentration plus depletion attraction ensure that the paired fraction is greater. The (looping) probability of finding the two spheres close enough together for their excluded volumes to overlap is illustrated in Figure 3A, which gives results for a freely jointed chain. (Similar results (not shown) are found for self-avoiding and worm-like chains (which differ by the presence of a nonzero stiffness parameter 25).) Sharp transitions are seen between the unbound (unlooped) and bound (looped) states with chains of different lengths. The diameter of the large spheres (D) and length of connecting string are important determinants of whether or not a loop forms (Figure 3B); above the upper (orange) line, two spheres will eventually come together to form a loop. As before, the time the two spend together can be estimated using Kramer’s theory (Methods).
Individual polymerases bound to DNA are modeled as hard (impenetrable) spheres. However, we also discuss interactions between clusters of bound polymerases where each cluster contains many active enzymes (e.g., DNA polymerases in replication factories). Although modeled as two clusters of (polymerase-sized) spheres or as two larger spheres, individual enzymes probably intermingle when the two clusters come into contact. Therefore, we also model such clusters as “soft,” and allow individual hard spheres in one cluster to intermingle on contact with their counterparts in the other. The result is one large cluster with the combined volume of the two original ones. Intermingling ensures that the overlap volume is considerable, and the entropic gain now depends on D2/d2 (Eq. (5) in Methods), compared to D/d for hard spheres (Eq. (1) in Methods). As a result, soft clusters are more likely to come together to form a loop, and smaller diameters are needed to ensure looping (Figure 3B, lower red line). These two cases (hard and soft) represent extremes; true values are likely to lie between the two, and (conservatively) we generally consider here the former.
The transition to the looped form occurs over a narrow free-energy range of ∼10kBT (Figure 3A), roughly equivalent to ∼7 H-bonds. It then might be advantageous for the cell to ensure that DNA-binding complexes are of a size that can exploit this transition (e.g., by creating or destroying only a few H-bonds). The depletion attraction puts an upper limit on the size of complexes that permit such tuning; if too large (i.e., with diameters of ∼100 and 40nm for hard and soft clusters, respectively), Figure 3B shows that there is a good chance they will always aggregate to give loops. As we shall see, Nature seems to set diameters so that the resulting depletion attraction lies in this tunable range.
This prompts the question: why do not all complexes in the cell—whether tethered or not—end up in one aggregate? (The fraction in the aggregate can be found using Eq. (7) and Fig. 3 for untethered and tethered components, respectively.) We suggest that they will do so if the concentration of components is high enough—for example, with hemoglobin S in the red cells of patients with sickle cell anemia 38, and with over-expressed proteins in bacteria (which sometimes form inclusion bodies). Where both the concentration and scale of the depletion effect are large enough to form aggregates, but where experimental observations yield no evidence of aggregation, it also seems likely that energy from other sources must be spent to prevent aggregation.
The genome of E. coli encodes 7 rrn operons separated on average by ∼650kbp (Figure 4A). In Luria broth (LB)—a rich medium supporting division every 30–45min—demand for rRNA is high; ∼70% of the RNA polymerase in the cell transcribes one or other of these operons, and each rrn operon is associated with ∼70 active enzymes 23. As an origin (ori) often fires and refires before genome segregation, a cell typically has a genome structure like that in Figure 4B, with ∼22 active rrn operons 23. Treating each polymerase as a hard sphere (D=10nm), and each operon as a linear string of 70 closely packed spheres, we find that the entropic attraction (i.e., ΔFgain) between two operons significantly exceeds the penalty that must be paid to loop the intervening DNA (i.e., ΔFloss; Fig. 2, row 3). (Including nascent transcripts (average length ∼2500 nucleotides, or half the length of the completed transcript) as spheres (D=10nm) attached to polymerases ensures that the attraction is even higher (not shown).) This suggests that entropy depletion inevitably drives two active operons together.
In a nutrient-poor media like M9+glucose, cells divide every 90–170min and biosynthetic capacity switches away from ribosome genesis; the genome structure is like that in Figure 4A, and each rrn operon now associates with only about four polymerases 23. As a result, the loss due to looping outweighs the gain (Fig. 2, row 4), and rrn operons are unlikely to be together.
These results are consistent with experimental data 28. Tagging with the GFP reveals that in LB the polymerases (and so the ∼22 operons to which ∼70% are bound) are clustered in one to six foci that disappear on transfer to M9+glucose. The distribution of foci in LB 28 can be fitted assuming that there is an attractive interaction of ∼16.5kBT between each operon (Figure 4C); this compares with the value we calculate for the (maximum) attraction of 31-56kBT (Fig. 2, row 3).
Engaged RNA polymerases are scattered every ∼24 and ∼8.6kbp along the bacterial genome in LB and M9, respectively (M. Bon, S. McGowan, and P. R. Cook, unpublished). If we include only the polymerase, the gain is insufficient to overcome the cost and so unlikely to bring two lone and adjacent polymerases together (Fig. 2, rows 5 and 7). However, translation occurs cotranscriptionally, so ∼10 (in LB) or 6 ribosomes (in M9 23)—each with a diameter of ∼21nm—are typically attached to the nascent transcript (length ∼500 nucleotides, equivalent to half that of a typical mRNA); this increases the gain so it now roughly equals the cost (Fig. 2, rows 6 and 8), and adjacent polymerases are likely to be together much of the time. (Treating ribosomes as soft spheres and including cooperative effects (below) increases clustering even further.) Unfortunately, we currently lack experimental data to confirm this prediction.
GFP-tagging shows that active DNA polymerases in living bacteria are concentrated in discrete factories containing at least 25 polymerases often associated with the cell membrane 39,40. We model a cluster of 25 polymerases at a fork as one 37-nm hard sphere. Soon after initiation in a poor medium (when little intervening DNA has been replicated), the gain (2.4kBT) is greater than the loss due to looping (not shown), and we would expect the two forks to be together. But as replication generates more DNA between forks, the loss increases to a maximum of 15.5kBT (Fig. 2, row 9), when we would expect the two forks to have separated. It has been shown experimentally that the two forks do indeed separate when ∼30% of the genome has been replicated 40, and we calculate that a looping cost of 11kBT balances the gain at this stage. This lies between values predicted for hard and soft spheres (i.e., 2.4 and 16.9kBT), so the depletion attraction can alone account for the observed dynamics with reasonable accuracy. It can also force spheres to associate with the membrane for some time (Fig. 2, row 10). Therefore, it provides a good explanation of why the two forks separate when they do, and their location. However, we would also expect that later the forks would aggregate again as they converge toward the terminus (when looping costs decrease); this is not observed experimentally 40, presumably because the segregation machinery prevents it.
Replication begins at origins scattered every 50–100kbp along a human chromosome, and several pairs of the resulting replication forks are clustered in small replication factories (diameter ∼75nm); on passage through S phase, these factories grow into enormous structures (diameter ∼1000nm) containing thousands of forks 14. As in bacteria, the entropic gain is greater than the loss immediately after initiation, when little replicated DNA lies between forks (not shown), so forks will be together (Fig. 2, row 11). Again as before, the loss due to looping increases to a maximum as more DNA is replicated (Fig. 2, row 11); therefore, forks are likely to separate. Even so, the gain is still sufficient to allow dynamic interactions lasting seconds (Methods). Moreover, if the clusters at forks are soft, they should remain together as the gain exceeds the loss (Fig. 2, row 11). The same applies to two origins that have just fired (Fig. 2, row 12), and to two distant factories (Fig. 2, row 13). We conclude that the depletion attraction is sufficient to bring together forks, active origins, and even factories separated by 1Mbp—as is seen. Moreover, as more origins fire, we would expect them to aggregate with existing clusters—as they do.
Each of the 10 loci encoding rRNA in the diploid human genome contains ∼80 tandem repeats, each with an ∼13-kbp transcription unit and an ∼30-kbp “spacer”; ∼100 RNA polymerase I complexes transcribe each active unit in the array. Active rDNA genes—but not inactive ones—aggregate to form nucleoli 41. As the cluster of active polymerases is so large and the spacer so short, the entropic gain due to the depletion attraction far outweighs the loss due to looping, and adjacent transcription units will inevitably aggregate (Fig. 2, row 14). Once again, the attraction can account for the organization seen.
RNA polymerase II transcribes most human genes. In a HeLa cell, the active enzyme is concentrated in nucleoplasmic factories, each containing about eight active enzymes engaged on a different transcription unit 14,15. As RNA processing occurs cotranscriptionally 42, each mRNA-producing complex typically contains a polymerase (diameter ∼15nm), a nascent transcript (average length of ∼8400 nucleotides 43) with compacted diameter ∼14nm plus its bound proteins, and attached capping, splicing (one subcomplex has dimensions of 27×22×24nm 44) and polyadenylation machineries. Modeling such complexes as 25- or 40-nm hard spheres gives a ΔFgain slightly less than ΔFloss (Fig. 2, rows 15 and 16), so they will be paired between 1% and 5% of the time (Methods). Modeling the polymerase, transcript, and spliceosome as three hard spheres (a 15-nm polymerase, 20-nm transcript plus bound proteins, and 24-nm spliceosome) ensures that they are paired 12% of the time. Thus, this simple model (in which the size of the polymerizing complex is almost certainly underestimated) also explains why active genes tend to cluster.
We now consider 21 beads (each representing one mRNA-producing complex) threaded every 20kbp along a 0.4Mbp of an active region of the human genome. Using Monte Carlo methods (Methods), we model an attraction of 4kBT between beads (Fig. 2, row 17); simulations yield two populations with energy minima depending on the approach used. Starting with a linear string, segments diffuse to give structures with ∼30% beads in clusters (Figure 5A). If the string is first compacted (a more likely representation of what happens in vivo), ∼80% are in clusters (Figure 5B). This compares with the ∼12% found above. We attribute most of the extra clustering to cooperative effects arising from the nonlinear increase in number of overlap volumes as more and more beads join a cluster (Figure 5C). Two factors may further increase clustering: the mRNA-producing complex is probably larger than we model, and—once such large structures come together—the high nucleoplasmic viscosity will slow diffusion apart (Methods). These results reinforce the idea that the depletion attraction contributes to the observed clustering and looping; moreover, similar cooperativity should be seen with all other strings discussed.
We treat active polymerizing complexes as spheres threaded on a DNA/chromatin string, and find that entropic forces drive aggregation of the complexes to loop the intervening DNA. This counterintuitive result is obtained despite the looping costs, which are outweighed by the entropy gained by the many small molecules that are packed into the cell. We suggest that Nature exploits such nonspecific aggregation to organize genomes. We do not wish to suggest such attractive entropic forces are the sole ones driving self-assembly; rather, they will augment other specific interactions (e.g., involving H-bonds, electrostatic interactions) that also position monomers precisely.
Our results help explain several aspects of genome organization. First, we predict that active (but not inactive) genomes will inevitably be looped, and they are 14,15,17. For example, old evidence shows that loops are present in active cells (from bacteria to man) but not in inactive ones (e.g., chicken erythrocytes, human sperm); moreover, loops are lost progressively as active chicken erythroblasts mature into inactive eythrocytes 45. Recent evidence also shows that three mouse genes spaced ∼10kbp and ∼15Mbp apart on the genetic map are attached to one factory when transcribed (with consequential looping), but not when inactive 46. Moreover, inhibiting transcription in living pro- and eukaryotes disperses their DNA 47,48,49, presumably by releasing loops. Second, we can explain why bacterial replication forks initially lie together before separating 40, and why bacterial and eukaryotic replication complexes tend to be found at the cell membrane or in factories 14,39. Third, we can predict the fraction of bacterial rrn operons found together in transcription factories 28 with reasonable accuracy, and why—in eukaryotes—active RNA polymerases I and II cluster in nucleoli and nucleoplasmic factories (Cook, 1999). (It is likely that energy must be spent to prevent polymerase I factories from aggregating with polymerase II factories.) These results are consistent with a model for genome organization in which active RNA polymerases cluster to loop the intervening DNA 15.
Our approach can readily be extended to other aspects of genome and cellular organization. For example, the interactions discussed here occur independently of scale. Then we can model local effects (e.g., the aggregation of hard nucleosomes into a soft cluster to form a chromatin fiber, with the depletion attraction augmenting electrostatic interactions 50) as well as global ones (e.g., the aggregation of heterochomatic clumps as chromosomes condense during mitosis). Moreover, we deliberately consider only one string here to simplify analysis; nevertheless, it is easy to imagine that the depletion attraction drives the formation of nucleoli and chromocenters (as active rDNA genes or centromeric heterochromatin on different chromosomes aggregate), as well as the pairing of meiotic chromosomes (as homologous transcription complexes aggregate 51). Finally, the depletion attraction probably contributes to the formation of many other large structures in cells (e.g., inclusion bodies, interchromatin granule clusters), and—where large structures like the cytoskeleton do exist—energy must be spent to counteract the attraction from driving them into one large aggregate.
We thank Michaël Bon and Ngo Toan for help.
We thank the Engineering and Physical Sciences Research Council for financial support.
1. (2001). Macromolecular crowding: obvious but underappreciated. Trends Biochem. Sci. 26, 597–604. Abstract | Full Text | PDF (117 kb) | CrossRef | PubMed
2. (2001). The influence of macromolecular crowding and macromolecular confinement on biochemical reactions in physiological media. J. Biol. Chem. 256, 10577–10580. PubMed
3. (1958). Interactions between particles suspended in solutions of macromolecules. J. Polym. Sci. [B] 33, 183–192. PubMed
4. (2005). Entropically driven helix formation. Science 307, 1067. CrossRef | PubMed
5. (2001). Entropically driven self-assembly and interaction in suspension. Philos. Trans. R. Soc. Lond. A 359, 921–937. PubMed
6. (1990). A view of interphase chromosomes. Science 250, 1533–1540. PubMed
7. (1995). A chromomeric model for nuclear and chromosome structure. J. Cell Sci. 108, 2927–2935. PubMed
8. (1995). A random-walk/giant-loop model for interphase chromosomes. Proc. Natl. Acad. Sci. USA 92, 2710–2714. CrossRef | PubMed
9. (1997). Deconstructing the nucleus: global architecture from local interactions. Curr. Opin. Genet. Dev. 7, 259–263. CrossRef | PubMed
10. (1998). Chromosome structure predicted by a polymer model. Phys. Rev. E 57, 5888–5896. PubMed
11. (2002). Mitotic chromosome scaffold structure: new approaches to an old controversy. Proc. Natl. Acad. Sci. USA 99, 15855–15857. CrossRef | PubMed
12. (2002). A polymer model for large-scale chromatin organization in lower eukaryotes. Mol. Biol. Cell. 13, 2157–2169. CrossRef | PubMed
13. (2005). A mechanical basis for chromosome function. Proc. Natl. Acad. Sci. USA 101, 12592–12597. CrossRef | PubMed
14. (1999). The organization of replication and transcription. Science 284, 1790–1795. CrossRef | PubMed
15. (2002). Predicting three-dimensional genome structure from transcriptional activity. Nat. Genet. 32, 347–352. CrossRef | PubMed
16. (2004). Does looping and clustering in the nucleus regulate gene expression?. Curr. Opin. Cell Biol. 16, 256–262. CrossRef | PubMed
17. (2005). Remote control of gene transcription. Hum. Mol. Genet. 14, R101–R111. CrossRef | PubMed
18. (1997). Depletion forces near curved surfaces. Phys. Rev. Lett. 83, 448–451. CrossRef | PubMed
19. (1997). Polymer models of meiotic and mitotic chromosomes. Mol. Biol. Cell 8, 2217–2231. PubMed
20. (2003). Entropy loss in long-distance DNA looping. Biophys. J. 85, 167–173. Abstract | Full Text | PDF (122 kb) | PubMed
21. (2002). Roles of stiffness and excluded volume in DNA denaturation. Phys. Rev. Lett. 88, 198101. CrossRef | PubMed
22. (2002). Equilibrium shapes of flat knots. Phys. Rev. Lett. 88, 188101. CrossRef | PubMed
23. (1996). Modulation of chemical composition and other parameters of the cell by growth rate. In Escherichia coli and Salmonella typhimurium: Cellular and Molecular Biology. Neidhardt, F.C., Curtiss, R., Lin, E.C.C., Brooks Low, K., Magasanik, B., Reznikoff, W.S., Riley, M., Schaechter, M., Umbarger, H.E., eds. 2nd Ed, (Washington, DC: ASM Press). PubMed
24. (1999). Physical Chemistry. (Oxford, UK: Oxford University Press). PubMed
25. (1999). Molecular Biophysics: Structures in Motion. (Oxford, UK: Oxford University Press). PubMed
26. (2006). Modeling a self-avoiding chromatin loop: relation to the packing problem, action-at-a-distance, and nuclear context. Structure 14, 197–204. Abstract | Full Text | PDF (404 kb) | CrossRef | PubMed
27. (2001). Higher-order structure of chromatin and chromosomes. Curr. Opin. Genet. Dev. 11, 130–135. CrossRef | PubMed
28. (2003). The distribution of RNA polymerase in Escherichia coli is dynamic and sensitive to environmental cues. Mol. Microbiol. 50, 1493–1505. CrossRef | PubMed
29. (1997). Translational diffusion of macromolecule-sized solutes in cytoplasm and nucleus. J. Cell Biol. 138, 131–142. CrossRef | PubMed
30. (2000). Cytoarchitecture and physical properties of cytoplasm: volume, viscosity, diffusion, intracellular surface area. Int. Rev. Cytol. 192, 189–221. CrossRef | PubMed
31. (2002). The transcription cycle of RNA polymerase II in living cells. J. Cell Biol. 159, 777–782. CrossRef | PubMed
32. (2004). Effects of inert volume-excluding macromolecules on protein fiber formation. II. Kinetic models for nucleated fiber growth. Biophys. Chem. 107, 299–316. CrossRef | PubMed
33. (2001). Thermodynamics and kinetics of actin filament nucleation. Biophys. J. 81, 667–674. Abstract | Full Text | PDF (365 kb) | PubMed
34. (2004). Force generation by cytoskeletal end-tracking proteins. Biophys. J. 87, 2838–2854. Abstract | Full Text | PDF (382 kb) | CrossRef | PubMed
35. (1998). The origin of life–a review of facts and speculations. Trends Biochem. Sci. 23, 491–495. Abstract | Full Text | PDF (487 kb) | CrossRef | PubMed
36. (2004). Is there a common chemical model for life in the universe?. Curr. Opin. Chem. Biol. 8, 672–689. CrossRef | PubMed
37. (2001). Making contacts on a nucleic acid polymer. Trends Biochem. Sci. 26, 733–740. Abstract | Full Text | PDF (81 kb) | CrossRef | PubMed
38. (2005). Measuring forces between protein fibers by microscopy. Biophys. J. 88, 2433–2441. Abstract | Full Text | PDF (200 kb) | CrossRef | PubMed
39. (1998). Localization of bacterial DNA polymerase: evidence for a factory model of replication. Science 282, 1516–1519. CrossRef | PubMed
40. (2005). Chromosome and replisome dynamics in E. coli: loss of sister cohesion triggers global chromosome movement and mediates chromosome segregation. Cell 121, 899–911. Abstract | Full Text | PDF (1085 kb) | CrossRef | PubMed
41. (2003). Life on a planet of its own: regulation of RNA polymerase I transcription in the nucleolus. Genes Dev. 17, 1691–1702. CrossRef | PubMed
42. (2002). An extensive network of coupling among gene expression machines. Nature 416, 499–506. CrossRef | PubMed
43. (1998). Numbers and organization of RNA polymerases, nascent transcripts and transcription units in HeLa nuclei. Mol. Biol. Cell 9, 1523–1536. PubMed
44. (2004). Three-dimensional structure of C complex spliceosomes by electron microscopy. Nat. Struct. Mol. Biol. 11, 265–269. CrossRef | PubMed
45. (1984). Replication and transcription depend on attachment of DNA to the nuclear cage. J. Cell Sci. Suppl. 1, 59–79. PubMed