Article Outline

Article Information

PubMed

Related Articles

  • …more

Copyright © 2006 The Biophysical Society. All rights reserved.
Biophysical Journal, Volume 90, Issue 10, 3712-3721, 15 May 2006

doi:10.1529/biophysj.105.077685

Supramolecular Assemblies

Entropy-Driven Genome Organization

Davide Marenduzzo*Cristian Micheletti and Peter R. CookGo To Corresponding Author 

* Mathematics Institute, University of Warwick, Coventry, United Kingdom
International School for Advanced Studies (SISSA) and Istituto Nazionale Fisica della Materia (INFM), Trieste, Italy
Sir William Dunn School of Pathology, University of Oxford, Oxford, United Kingdom

Address reprint requests to P. R. Cook, Sir William Dunn School of Pathology, University of Oxford, South Parks Rd., Oxford, OX1 3RE, UK. Tel.: 44-0-1865-275528; Fax: 44-0-1865-275515.

Abstract

DNA and RNA polymerases active on bacterial and human genomes in the crowded environment of a cell are modeled as beads spaced along a string. Aggregation of the large polymerizing complexes increases the entropy of the system through an increase in entropy of the many small crowding molecules; this occurs despite the entropic costs of looping the intervening DNA. Results of a quantitative cost/benefit analysis are consistent with observations that active polymerases cluster into replication and transcription “factories” in both pro- and eukaryotes. We conclude that the second law of thermodynamics acts through nonspecific entropic forces between engaged polymerases to drive the self-organization of genomes into loops containing several thousands (and sometimes millions) of basepairs.

Introduction

Specific interactions between monomers (e.g., H-bonds) are known to mediate biomolecular assembly. Paradoxically, nonspecific entropic forces can also drive self-assembly. Thus, the environment within a living cell is crowded, with 20–30% of the volume occupied by macromolecules 1,2; then, aggregation of the largest particles can lower the free energy of the system through an increase in entropy of the many smaller particles 3. In Figure 1A, the centers of mass of the small spheres can access the yellow volume, but not the gray volumes surrounding each large sphere or abutting the perimeter wall. When one large sphere approaches another, these excluded volumes overlap (Figure 1A, overlap volume 1) and this allows the small spheres to access a greater volume. The resulting increase in entropy of the many small spheres generates what has been called a “depletion attraction” between the large ones. The attractive energy at contact is ∼3/2(D/d)n kBT, where D and d are the diameters of the large and small spheres, n is the volume fraction of the small spheres, kB is the Boltzmann constant, and T is the absolute temperature 3. The attraction falls to zero at a distance d between the two large spheres. A related attraction drives a large sphere to the surrounding wall (Figure 1A, overlap volume 2). Enough is known about these attractions that they are being used to model the formation of helices in proteins 4 and position particles within man-made nanostructures 5. The free-energy gains can be several kBT, which can be compared with the energy associated with a single van der Waals interaction (∼0.1kBT), a single H-bond (∼1.5kBT, or ∼1kcal/mol), and a covalent bond (10–100kBT).

Display large version of this figure
Figure 1
The depletion attraction. (A) The schematic shows a suspension of large and small spheres in a box. The shaded regions around the four large spheres are excluded to the center of masses of the small spheres. When one large sphere contacts another, their excluded volumes overlap (overlap volume 1) to increase the volume available to the small spheres (increasing their entropy); then, aggregation of the large spheres paradoxically increases the entropy of the system. An analogous effect is found when a large sphere contacts the wall (2). The attraction can also be viewed as an osmotic phenomenon; small spheres cannot enter excluded volumes, and a force equivalent to their osmotic pressure acts on each side of the two touching large spheres to force them together (or on one side of the large sphere at the wall to force it to the wall). (B) Spheres bound to each end of a string will also tend to aggregate or associate with the wall, to loop the connecting string (which has an associated entropic cost). The type of overlap involved is indicated.

Here we describe how this depletion attraction might drive genome organization. (For various models of genome structure, see Manuelidis 6, Cook 7, Sachs et al. 8, Marshall et al. 9, Munkel and Langowski 10, Belmont 11, Ostashevsky 12, and Kleckner et al. 13.) When DNA is replicated or transcribed, the resulting polymerizing complexes are large enough relative to the crowding agents that they will tend to aggregate. We consider a range of different complexes in bacteria and man, and in almost every case, the depletion attraction is sufficient to explain the observed organization—the clustering of active DNA and RNA polymerases into “factories” to form loops that may be several millions of basepairs in length 14,15,16,17.


Methods

The depletion attraction

We first review the original formulation of the entropic depletion attraction. Consider two hard (chemically noninteracting) spheres of diameter D dispersed in a solution of hard spheres of diameter d (usually the case d<D is considered). The center of mass of the small spheres is excluded from a shell surrounding the large spheres (Figure 1A). As one large sphere approaches the other, these excluded volumes overlap (Figure 1A, overlap volume 1) and the small spheres can access a greater volume; there is a net free-energy gain due to the increase in the entropy of the small spheres. The minimum of this potential is attained when the two large spheres are in contact, and is given by Asakura and Oosawa’s formula 3

(1)
where n is the volume fraction of small spheres, kB the Boltzmann constant, and T the absolute temperature. Eq. (1) is an approximation and applies to values of n up to ∼30%; it then becomes less reliable until ΔF changes sign. If the two spheres are moved apart, the attraction declines progressively as the overlap volume falls. For values of D and d used here, the average attraction in the range of full to zero overlap is approximately half that given by Eq. (1).

We now generalize to different and arbitrary shapes. The scale of the free energy gain depends significantly on the shape of the large objects. For example, Eq. (1) can be generalized to two different large spheres 3 with diameters D1 and D2, where D1>D2:

(2)

A special case is that of a wall, in which D1=; the overlap volume is larger than that with another sphere (Figure 1A, compare overlap volumes 1 and 2), so the resulting attraction is larger and given by

(3)
ΔFgain is even larger with a convex wall like a bacterial cell membrane 18.

Most biological interactions involve nonspherical objects like ligands that fit snugly into irregularly shaped receptors. In the most general case, theory 3 predicts that the free-energy gain for irregular objects is

(4)
where Voverlap is the increase in volume available to the small objects. We approximate proteins and RNA here as spheres as they usually fold into globular structures.


“Soft” beads

Most situations we discuss involve interactions between polymerases bound to DNA, and individual enzymes are modeled as hard spheres. However, we also discuss interactions between two clusters of polymerases where each cluster contains many enzymes (e.g., DNA polymerases in replication factories). In such cases, the biology suggests that individual enzymes intermingle when the two clusters come into contact; we call these clusters “soft,” and allow individual hard spheres in one cluster to intermingle on contact with their counterparts in the other. The result is one large cluster with the combined volume of the two original ones. This problem is complicated by the large number of possible arrangements of individual spheres within a cluster, and of one cluster relative to the other. Therefore, we restrict analysis to simple limiting cases. At the coarsest level, each cluster can be treated as one macrosphere with volume (or surface) corresponding to the total of all individual spheres. This approach is used for the “hard” gains in Fig. 2, rows 9–13. However, for Fig. 2, rows 3, 4, and 14 (hard gains), all polymerases are attached to DNA and a better model is obtained by considering the cluster of N polymerases as a linear (straight) succession of N closely packed beads; then, the free energy gained by putting two such clusters in longitudinal contact is N times the gain for two individual beads. This holds if the polymer is very stiff (i.e., its persistence length is larger than N times the diameter of a polymerase). If, on the other hand, the polymer is flexible, so that the cluster diameter is much larger than the persistence length (or if there are many individual spheres in one cluster), we allow individual spheres in one cluster to intermingle freely with their counterparts in the other (with the gain as in Eq. (5), below). This approach is used for the soft gains in Fig. 2, rows 3 and 9–14.

Display large version of this figure
Figure 2
Energy gains (ΔFgain from the depletion attraction) and losses (ΔFloss due to looping). ΔFgain is the maximum obtained for hard spheres or soft clusters at closest contact; n=0.2, and d=5nm (except in row 2, where d=1nm). Cartoons illustrate the structures analyzed: blue, DNA; red, RNA; green, DNA polymerases; pink, RNA polymerases (pols). See Methods plus Results and Discussion for details.

If the two clusters of large spheres (total diameter of each cluster=D) are soft and can fuse to give one larger sphere of size 21/3D (i.e., with conservation of volume), the entropic gain is proportional to the gain in volume excluded to the small macromolecules. This gain is given by

(5)

If spheres in the two clusters are allowed to intermingle, the overlap volume is considerable and the entropic gain now depends on D2/d2 (Eq. (5)); this compares with D/d for hard spheres (Eq. (1)).


Two beads on a string

We now come to the central case of interest here (Figure 1B), which has not yet been analyzed: two large spheres threaded on a connecting (genomic) string. We assume the tethering string can be modeled as a polymer in a good solvent 19. Whether there is a net attraction between spheres depends on the balance between ΔFgain and ΔFloss, where ΔFgain is the entropic attraction between spheres (given by Eq. (1) or 5 for hard or soft spheres, respectively) and ΔFloss is the entropic penalty that must be paid to loop the string. This loss arises due to the tethering constraint, and is well approximated by 20,21:

(6)

The constant c has been the subject of debate between theoretical physicists (see Hanke and Metzler 20 and references therein) and depends on loop conformation; it typically increases with string density from 1.5 for an ideal random walk or freely jointed chain, through 2.2 for the “four-legged” loop as in Figure 1B22, to higher values if the density is very high (below). l is loop length, and LK is the (statistical) Kuhn length of the string. ΔFr0 is a constant that is independent of loop length; it is physically related to the dimensions of the overlap volume (and so to the diameter of the small spheres), and to the range r0 of (short) distances between the two beads that we consider sufficient to form a loop. ΔFr0 for self-avoiding walks is generally estimated by simulation and can be significant in the cases we consider. Note that we consider the looping costs of both a freely jointed chain (in bacteria) and a self-avoiding loop (in eukaryotes); costs for the latter have not been determined previously.

The entropic attractions between two free or tethered spheres differ qualitatively in an important respect. The most probable state for two untethered spheres is to lie apart as they diffuse in three-dimensional space, and the fraction of spheres that do pair—fpairing—can be found using the van’t Hoff relation (neglecting three and higher body interactions):

(7)
where Keq is the equilibrium constant of the reaction and Cb is the concentration of unbound large spheres. In contrast, two spheres threaded on a string can often be together if the interaction is large enough. Treating the thread as a freely jointed chain, we can calculate semianalytically the (“looping”) probability of finding the two within the overlap volume (Figure 3A). This probability is found by weighing the probability of the two spheres being at a distance r through the depletion attraction 3. Even if the attractive interaction cannot bring the two spheres permanently together, it can still ensure that the two pair for at least some finite time (τpairing, see below). This qualitative distinction can lead to large quantitative differences. For example, of the ∼8000 molecules of RNA polymerase (diameter ∼10nm) in an Escherichia coli cell (volume ∼0.8μm323), we calculate 24 that only 2% are paired (i.e., ); this compares with the essentially complete pairing of two sets of 70 threaded polymerases (Results and Discussion).

Display large version of this figure
Figure 3
Dependence of looping on attractive energy, chain length, and sphere diameter. Cartoons illustrate forms existing under different conditions. See Methods for details. (A) Probability of forming loops at different attractive energies (in kBT). Structures modeled are two large beads (D=10nm) connected by freely-jointed chains of different lengths; a loop is considered to exist if sphere surfaces lie within 5nm. Sharp transitions between unbound (unlooped) and bound (looped) states occur within ∼5kBT. (B) Effects of minimum diameter (D) of large spheres and length of freely-jointed chain on looping; lines mark transitions between unlooped and looped forms for hard and soft spheres.

Examples

The E. coli genome is modeled as a freely jointed chain—a succession of infinitely thin penetrable segments, each of length LK of 0.3kbp (calculated assuming a persistence length for B DNA of 50nm 25). The eukaryotic chromatin fiber is modeled 26 as a self-avoiding tube (persistence length 40nm or ∼3.6kbp, assuming a packing of 1kbp/11nm). Note that the volume fraction, n, is known in bacteria but not in eukaryotes, whereas local DNA structure is known in eukaryotes but not in prokaryotes. As zig-zagging models have supplanted those involving 30-nm solenoids 27, tube diameter is set at 20nm in eukaryotes to reflect a wider zig-zagging fiber that can interpenetrate to some extent.

We model pro- and eukaryotic genomes differently mainly because the thickness/persistence length ratios are so different. In bacteria, there is no evidence of proteins bound stably to DNA, and DNA diameter (∼2.5nm) is smaller than persistence length (∼50nm); therefore, it seems appropriate to neglect thickness and use the analytically tractable freely jointed chain. In eukaryotes, we know that DNA is folded first into nucleosomes and then into higher-order structures; as a result, diameter (20nm) is a significant fraction of persistence length (40nm) and it seems more appropriate to use the tube model (which includes self-avoidance, but is less tractable analytically). Self-avoidance is included by ensuring that all circles going through any triplet of points taken along the tube center-line have radii larger than half the tube thickness 26. Calculation of looping costs requires Monte Carlo simulations, as existing theory does not enable us to compute ΔFr0 analytically. To calculate the looping probability, we adapt the method used previously to determine the probability that a point on a loop attached to one sphere might bind to a specified binding zone on the surface of that sphere 26. Here, we have two beads attached to each end of a flexible tube. We fix the position of the center of one bead, divide the surrounding volume into concentric shells of increasing radii, and compute for each pair of contiguous shells the conditional probability that the other end of the tube is found in the inner of these two shells, given that it is constrained to lie within the outer of the two shells.

In Fig. 2, values for ΔFloss in E. coli for equivalent structures tend to be higher than those for man. This arises for two reasons. First, bacterial DNA is less compact (above), so loops are longer (giving a higher entropic cost); if it proves to be more condensed, values for ΔFloss will be smaller. Second, the beads tend to have smaller diameters in bacteria, so values for ΔFr0—which depend on the range of distances between the two beads considered sufficient to form a loop—tend to be larger; they were 5.7kBT in Fig. 2, rows 3–5 and 7 (calculated assuming a depletion attraction in the range 10–15nm between sphere centers), 3.2kBT, 3.5kBT, and 3.5kBT (assuming a range of 43–48nm, 37–42nm, and 37–42nm) in Fig. 2, rows 6, 8, and 9, respectively. In Fig. 2, rows 11–17, we assumed interaction in the range between sphere centers of 30–35, 30–35, 75–80, 25–30, 25–30, 40–45, and 25–30nm, respectively, and calculated the entropic loss via Monte Carlo simulations 26.

For Fig. 2, rows 3 and 4, the distance between rrn operons is genome length (i.e., 4.6Mbp) divided by operon number (i.e., 7). In LB, there are ∼70 polymerases per operon 23, and ΔFgain is calculated assuming either that 70 closely packed impenetrable spheres lie in straight lines at each end of a 650-kbp thread (for hard), or that each one of the 70 hard spheres at one end can intermingle with any other sphere (for soft). These two extremes correspond to very stiff and very flexible threads, respectively, and the real situation is likely to lie in between. In contrast to other cases, here the gain given by the soft cluster (which is proportional to the number of polymerases exposed to the solvent on the surface) is smaller than that given by hard polymerases. For Figure 4C, we consider the topology in Figure 4B, and calculate the probabilities that different operons cluster together into f foci (where f is between 1 and 22). To make the problem tractable, we assume the following. 1), An observable focus corresponds to one operon (or more), with each associated with 70 polymerases tagged with green fluorescent protein (GFP) (note that 70% polymerases are engaged on rrn operons 23). 2), Neighboring operons cluster first, the next nearest neighbor is then added to the cluster, and so on. 3), We compute the separate probabilities of having fi foci for the four arms in the network (i.e., two arms containing rrnC,A,B,E,F,G,D and two with rrnC,A,B,E). Via the convolution of these quantities, we can find the probabilities of the whole system having f foci. 4), Operons are connected by a freely jointed chain (as ΔFr0 can be calculated exactly). We also assume a not-further-specified interaction between active operons, calculate the probability of observing f foci (with f=0–6 28), and adjust the interaction to fit the data. We have repeated the calculation assuming that two operons must be in the same site to be detected as a focus and found a slightly smaller value for the interaction (i.e., 13kBT instead of 16.5kBT). For Fig. 2, rows 5–8, average spacings between active polymerases are from M. Bon, S. McGowan, and P. R. Cook (unpublished). For Fig. 2, rows 5 and 7, a gain of 0.8kBT is nevertheless sufficient to increase the time spent together by 30%; the gain also doubles if transcripts are included as 10-nm hard spheres. If we model each polymerase, transcript, plus associated ribosomes as one 10-nm hard sphere (the polymerase) plus coplanar contacting hard spheres (diameter 21nm) representing ribosomes, the gain increases by 1.46kBT for each ribosome (estimated by considering the configuration where the two planar clusters are stacked in register so that equal-sized spheres are in contact).

Display large version of this figure
Figure 4
The rrn operons of E. coli. See Methods for details. (A and B) Typical genome topologies. Positions of the origin (ori), terminus (ter), and seven operons encoding ribosomal RNA (rrnA–G (shaded letters)) are shown 23. In M9+glucose, replication began at the origin, and the two forks progressed only a little away around the genome. In LB, an origin fired, the two replication forks progressed most of the way to the terminus, and duplicated origins refired. (C) Probabilities that cells contain different numbers of foci marking rrn operons. Experimental data (gray line40) can be fitted (red line33) assuming that all cells contain structures like that illustrated in B with 22 rrn operons, and an attraction of 16.5kBT between rrn operons that brings one or more together so they appear as one focus; this compares with a maximum attraction of 31–56kBT calculated in Fig. 2, row 3.

For Fig. 2, row 9, we consider slowly growing cells with only two forks (as in Figure 4A); the entropic cost of looping is given by

(8)
where l and L (expressed in Kuhn lengths) denote distance between forks and total genome length, respectively. The cost in Eq. (8) is that to make one loop, as joining the two forks only creates one extra loop. Eq. (8) is valid for 1≪lL, and has been derived by first writing down the probability that three distinct freely jointed chains (of length l, l, and L−2l) have the same initial and final point, which we call and respectively, and then by taking the limit of this quantity.

For row 10, each fork is associated with a cluster of 25 hard spheres and is attracted to the membrane. ΔFgain (hard) is calculated assuming that each fork is associated with one larger hard sphere that can accommodate the 25 tightly packed spheres (when the entropic gain is given by Eq. (3)). We compute ΔFgain soft by comparing the volume excluded to the crowding macromolecules by a sphere cap abutting the wall, where the cap has the same volume as the 25 spheres. The gain is given by the maximum over h in the range [0,D] of the function:

(9)

On the other hand, confining one of the forks in the topology of Figure 4A to a distance x0 from the wall costs some entropy, which if the chain is a freely jointed chain reads (lL, and both L and l are much larger than 1, with erf denoting the error function):

(10)
(11)
where x0, l, and L are all measured in Kuhn lengths. To arrive at Eq. (10), we calculated the probability of having a network of freely jointed chains with the topology in Figure 4A, integrating over the intermediate points and requiring that this freely jointed network is rooted at a point. (The axes are such that the bacterial surface lies at z=0.) The exact entropic cost in Fig. 2, row 10, is computed assuming that the network is displaced from nm (the center of the cell) to within the range of the entropic attraction to the surface. The calculations leading to Eq. (10) are cumbersome but straightforward and are omitted here.

For Fig. 2 rows 14–17, ΔFgain is found as for rows 3–8. For row 17, we model each polymerase, transcript, and spliceosome as three coplanar contacting hard spheres (a 15-nm polymerase, 20-nm transcript plus bound proteins, and 24-nm spliceosome). The free-energy gain is estimated by considering the configuration where the two planar clusters are stacked in register (so that equal-sized spheres are in contact). For rows 15–17, the entropy gain is less than the loss due to looping, and so is insufficient to ensure that the two transcription units are always together. However, the interaction is sufficient to drive a temporary association, which keeps the two together for a time, τpairing, which can be estimated using Kramer’s theory 25 applied to the potential resulting from the radial integration of the entropy depletion interaction 3, complemented with a Morse potential that forbids the two large spheres to interpenetrate more than 0.1nm. The resulting expression is

(12)

For large spheres with a diameter of 10–20nm, τ0 is typically ∼5μs. This estimate is based on the assumptions that the friction experienced obeys Stokes’ law and the viscosity of the cell interior (η) is ∼10 centipoise 29,30. Applied to the case in Fig. 2, row 17, Eq. (12) provides an estimate for of 0.3ms. We now consider cooperative effects as three large spheres cluster (Figure 5C). It appears natural to assume that the activation free energy leading to the breaking of the cluster involves the loss of two contacts at a cost of ∼8kBT. The estimated lifetime for the cluster is therefore ∼0.1s. Since the viscosity of the cell interior grows rapidly with particle size >∼25nm 29,30, this estimate (based on a nominal value for η) provides a lower bound for pairing time. We conclude that pairing lasts for a nonnegligible fraction of the ∼5min it takes to transcribe a typical human gene 31.

Display large version of this figure
Figure 5
Cooperative effects. See Methods for details. (A and B) Monte Carlo simulations of 21 beads (green, terminal beads; red, internal ones) threaded every 20kbp along a (self-avoiding) 0.4-Mbp chromatin fiber (blue). Each bead represents three spheres (15-nm RNA polymerase II, 20-nm transcript, 24-nm spliceosome). Starting with a linear string, fiber segments are allowed to diffuse while being subjected to an attraction between any two beads of 4kBT (Fig. 2, row 17). In panel B, the string was first compacted using an initial interaction of 8kBT. After reaching equilibrium, typical structures are visualized using RasMol software. The numbers of beads in each cluster are indicated, which in both panels A and B is more than the approximately five expected in the absence of cooperative effects (from Fig. 2, row 17, calculated as for Figure 3A). (C) Trade-off between entropic gains and losses. When two complexes pair, the entropic gain involves one overlap volume (Figure 1A) relative to the cost of forming one loop. Adding a third involves two more overlap volumes but only one more looping cost; adding a fourth involves three more overlap volumes but only one more looping cost. Adding more beads is progressively less favored as crowding increases the looping cost; moreover, entanglement becomes significant with more than around eight beads 52, and this limits the maximum number of beads in a cluster.

For Figure 5AB, we model each mRNA-producing complex as three coplanar, contacting, hard spheres (a 15-nm polymerase, 20-nm transcript plus bound proteins, and 24-nm spliceosome), although each triplet is represented as one bead in the figure. The simulation began with a linear string, the (final) attraction between any two triplets is modeled as a two-body square well with a width of 5nm and minimum equal to 4kBT (Fig. 2, row 17). In Figure 5B, the string was first compacted using an initial interaction of 8kBT.



Results and discussion

The small crowding molecules in the cell have diameters (d) of ∼5nm, and a volume fraction (n) of ∼0.2; these commonly accepted values 1,32 will be used throughout, except for one extracellular case—the 100-mers (below). In Fig. 2, various cases are listed according to their complexity; each example is accompanied by an estimate (the “gain”) of how much the free energy is lowered upon contact of the large spheres.

Sphere/sphere interactions

Actin

To put our analysis in context, we first consider a simple example—the polymerization of two actin monomers. The major energy source driving actin polymerization comes from ATP hydrolysis; however, calculation shows that the depletion attraction makes a contribution even though it cannot provide directional assembly (which must be determined by other factors). Modeling monomers as noninteracting hard spheres (D=5nm) in the presence of many small spheres (d=5nm, n=0.2) gives an entropic gain (i.e., ΔFgain) of ∼0.5kBT (Fig. 2, row 1), compared to a measured free-energy change of 1–2kBT33,34. We conclude that the depletion attraction adds to other specific ones between molecules, and we will argue that the same is true of the cases discussed below. We can then calculate (using Kramer’s theory) that monomers remain paired for three times longer in the presence of crowding molecules (see Methods).


Prebiotic RNA genomes

We now consider two of the simplest genomes. Current theories for the evolution of life involve RNA molecules able to catalyze their own synthesis 35,36. But in this “RNA world” lacking cell membranes, how are the critical components prevented from diffusing apart to maintain the high local concentrations necessary for continued evolution? Possible solutions include binding to charged surfaces, and capture within a confined space (e.g., a hydrothermal vent, a puddle on a charged surface). However, the depletion attraction could contribute. Thus, modeling two 100-mers of RNA as 4-nm spheres in a crowded solution of smaller molecules (d=1nm, n=0.2) gives an attraction (gain) of ∼1.4kBT (Fig. 2, row 2). Here, too, pairing lasts roughly three times longer than in the absence of the depletion attraction.



Two beads threaded on a string

We now turn to the central case of interest here, where the two large spheres are threaded on a string; the spheres represent active polymerases and the string hydrated DNA (in prokaryotes) or a chromatin fiber (in eukaryotes). It is well known that specific interactions between spheres can drive genome looping. Thus, if two DNA-binding proteins present at ∼1nM interact together with a Kd of 10−7M (values typical for nuclear proteins), <1% will be complexed together in the absence of DNA 37. But if they bind to the same DNA molecule at sites 10kbp apart, the resulting local concentration ensures that two-thirds will be in the complex to loop the connecting DNA 37. Our central thesis here is that the nonspecific depletion attraction can also make a significant contribution in the crowded cell (Figure 1B). Whether aggregation occurs depends on the balance between the depletion attraction (i.e., ΔFgain; Eq. (1) in Methods) and ΔFloss (the entropic penalty that must be paid to loop the connecting string). This loss is well approximated by ckBT log(l/LK)+ΔFr0 (Methods). The constant c depends on loop conformation; it typically increases with string density from 1.5 for an ideal random walk or freely jointed chain, through 2.2 for the four-legged loop as in Figure 1B22, to higher values if the density is high. l is loop length, and LK the Kuhn length (a measure of string stiffness). Notice that we include self-avoidance in the case of the thick eukaryotic string (i.e., no two segments of the fiber are allowed to occupy the same volume). ΔFr0 is a constant that is independent of loop length; it is physically related to the dimensions of the overlap volume and the range r0 of distances between the two beads considered sufficient to form a loop (in our case ≤5nm). In the cases modeled here, the spheres are polymerases that remain irreversibly bound to their templates while active.

Two free (untethered) spheres in a crowded cell will diffuse in three-dimensional space and spend little time together, and the extent of the small paired fraction can be determined using van’t Hoff’s relation (Eq. (7) in Methods). If the two spheres are tethered to each other, the inevitable high local concentration plus depletion attraction ensure that the paired fraction is greater. The (looping) probability of finding the two spheres close enough together for their excluded volumes to overlap is illustrated in Figure 3A, which gives results for a freely jointed chain. (Similar results (not shown) are found for self-avoiding and worm-like chains (which differ by the presence of a nonzero stiffness parameter 25).) Sharp transitions are seen between the unbound (unlooped) and bound (looped) states with chains of different lengths. The diameter of the large spheres (D) and length of connecting string are important determinants of whether or not a loop forms (Figure 3B); above the upper (orange) line, two spheres will eventually come together to form a loop. As before, the time the two spend together can be estimated using Kramer’s theory (Methods).

“Soft” beads

Individual polymerases bound to DNA are modeled as hard (impenetrable) spheres. However, we also discuss interactions between clusters of bound polymerases where each cluster contains many active enzymes (e.g., DNA polymerases in replication factories). Although modeled as two clusters of (polymerase-sized) spheres or as two larger spheres, individual enzymes probably intermingle when the two clusters come into contact. Therefore, we also model such clusters as “soft,” and allow individual hard spheres in one cluster to intermingle on contact with their counterparts in the other. The result is one large cluster with the combined volume of the two original ones. Intermingling ensures that the overlap volume is considerable, and the entropic gain now depends on D2/d2 (Eq. (5) in Methods), compared to D/d for hard spheres (Eq. (1) in Methods). As a result, soft clusters are more likely to come together to form a loop, and smaller diameters are needed to ensure looping (Figure 3B, lower red line). These two cases (hard and soft) represent extremes; true values are likely to lie between the two, and (conservatively) we generally consider here the former.


Tunable interactions

The transition to the looped form occurs over a narrow free-energy range of ∼10kBT (Figure 3A), roughly equivalent to ∼7 H-bonds. It then might be advantageous for the cell to ensure that DNA-binding complexes are of a size that can exploit this transition (e.g., by creating or destroying only a few H-bonds). The depletion attraction puts an upper limit on the size of complexes that permit such tuning; if too large (i.e., with diameters of ∼100 and 40nm for hard and soft clusters, respectively), Figure 3B shows that there is a good chance they will always aggregate to give loops. As we shall see, Nature seems to set diameters so that the resulting depletion attraction lies in this tunable range.

This prompts the question: why do not all complexes in the cell—whether tethered or not—end up in one aggregate? (The fraction in the aggregate can be found using Eq. (7) and Fig. 3 for untethered and tethered components, respectively.) We suggest that they will do so if the concentration of components is high enough—for example, with hemoglobin S in the red cells of patients with sickle cell anemia 38, and with over-expressed proteins in bacteria (which sometimes form inclusion bodies). Where both the concentration and scale of the depletion effect are large enough to form aggregates, but where experimental observations yield no evidence of aggregation, it also seems likely that energy from other sources must be spent to prevent aggregation.



Examples

Bacterial rrn operons

The genome of E. coli encodes 7 rrn operons separated on average by ∼650kbp (Figure 4A). In Luria broth (LB)—a rich medium supporting division every 30–45min—demand for rRNA is high; ∼70% of the RNA polymerase in the cell transcribes one or other of these operons, and each rrn operon is associated with ∼70 active enzymes 23. As an origin (ori) often fires and refires before genome segregation, a cell typically has a genome structure like that in Figure 4B, with ∼22 active rrn operons 23. Treating each polymerase as a hard sphere (D=10nm), and each operon as a linear string of 70 closely packed spheres, we find that the entropic attraction (i.e., ΔFgain) between two operons significantly exceeds the penalty that must be paid to loop the intervening DNA (i.e., ΔFloss; Fig. 2, row 3). (Including nascent transcripts (average length ∼2500 nucleotides, or half the length of the completed transcript) as spheres (D=10nm) attached to polymerases ensures that the attraction is even higher (not shown).) This suggests that entropy depletion inevitably drives two active operons together.

In a nutrient-poor media like M9+glucose, cells divide every 90–170min and biosynthetic capacity switches away from ribosome genesis; the genome structure is like that in Figure 4A, and each rrn operon now associates with only about four polymerases 23. As a result, the loss due to looping outweighs the gain (Fig. 2, row 4), and rrn operons are unlikely to be together.

These results are consistent with experimental data 28. Tagging with the GFP reveals that in LB the polymerases (and so the ∼22 operons to which ∼70% are bound) are clustered in one to six foci that disappear on transfer to M9+glucose. The distribution of foci in LB 28 can be fitted assuming that there is an attractive interaction of ∼16.5kBT between each operon (Figure 4C); this compares with the value we calculate for the (maximum) attraction of 31-56kBT (Fig. 2, row 3).


Bacterial open reading frames

Engaged RNA polymerases are scattered every ∼24 and ∼8.6kbp along the bacterial genome in LB and M9, respectively (M. Bon, S. McGowan, and P. R. Cook, unpublished). If we include only the polymerase, the gain is insufficient to overcome the cost and so unlikely to bring two lone and adjacent polymerases together (Fig. 2, rows 5 and 7). However, translation occurs cotranscriptionally, so ∼10 (in LB) or 6 ribosomes (in M9 23)—each with a diameter of ∼21nm—are typically attached to the nascent transcript (length ∼500 nucleotides, equivalent to half that of a typical mRNA); this increases the gain so it now roughly equals the cost (Fig. 2, rows 6 and 8), and adjacent polymerases are likely to be together much of the time. (Treating ribosomes as soft spheres and including cooperative effects (below) increases clustering even further.) Unfortunately, we currently lack experimental data to confirm this prediction.


Bacterial replication factories

GFP-tagging shows that active DNA polymerases in living bacteria are concentrated in discrete factories containing at least 25 polymerases often associated with the cell membrane 39,40. We model a cluster of 25 polymerases at a fork as one 37-nm hard sphere. Soon after initiation in a poor medium (when little intervening DNA has been replicated), the gain (2.4kBT) is greater than the loss due to looping (not shown), and we would expect the two forks to be together. But as replication generates more DNA between forks, the loss increases to a maximum of 15.5kBT (Fig. 2, row 9), when we would expect the two forks to have separated. It has been shown experimentally that the two forks do indeed separate when ∼30% of the genome has been replicated 40, and we calculate that a looping cost of 11kBT balances the gain at this stage. This lies between values predicted for hard and soft spheres (i.e., 2.4 and 16.9kBT), so the depletion attraction can alone account for the observed dynamics with reasonable accuracy. It can also force spheres to associate with the membrane for some time (Fig. 2, row 10). Therefore, it provides a good explanation of why the two forks separate when they do, and their location. However, we would also expect that later the forks would aggregate again as they converge toward the terminus (when looping costs decrease); this is not observed experimentally 40, presumably because the segregation machinery prevents it.


Human replication factories

Replication begins at origins scattered every 50–100kbp along a human chromosome, and several pairs of the resulting replication forks are clustered in small replication factories (diameter ∼75nm); on passage through S phase, these factories grow into enormous structures (diameter ∼1000nm) containing thousands of forks 14. As in bacteria, the entropic gain is greater than the loss immediately after initiation, when little replicated DNA lies between forks (not shown), so forks will be together (Fig. 2, row 11). Again as before, the loss due to looping increases to a maximum as more DNA is replicated (Fig. 2, row 11); therefore, forks are likely to separate. Even so, the gain is still sufficient to allow dynamic interactions lasting seconds (Methods). Moreover, if the clusters at forks are soft, they should remain together as the gain exceeds the loss (Fig. 2, row 11). The same applies to two origins that have just fired (Fig. 2, row 12), and to two distant factories (Fig. 2, row 13). We conclude that the depletion attraction is sufficient to bring together forks, active origins, and even factories separated by 1Mbp—as is seen. Moreover, as more origins fire, we would expect them to aggregate with existing clusters—as they do.


Human rDNA genes

Each of the 10 loci encoding rRNA in the diploid human genome contains ∼80 tandem repeats, each with an ∼13-kbp transcription unit and an ∼30-kbp “spacer”; ∼100 RNA polymerase I complexes transcribe each active unit in the array. Active rDNA genes—but not inactive ones—aggregate to form nucleoli 41. As the cluster of active polymerases is so large and the spacer so short, the entropic gain due to the depletion attraction far outweighs the loss due to looping, and adjacent transcription units will inevitably aggregate (Fig. 2, row 14). Once again, the attraction can account for the organization seen.


Human open reading frames

RNA polymerase II transcribes most human genes. In a HeLa cell, the active enzyme is concentrated in nucleoplasmic factories, each containing about eight active enzymes engaged on a different transcription unit 14,15. As RNA processing occurs cotranscriptionally 42, each mRNA-producing complex typically contains a polymerase (diameter ∼15nm), a nascent transcript (average length of ∼8400 nucleotides 43) with compacted diameter ∼14nm plus its bound proteins, and attached capping, splicing (one subcomplex has dimensions of 27×22×24nm 44) and polyadenylation machineries. Modeling such complexes as 25- or 40-nm hard spheres gives a ΔFgain slightly less than ΔFloss (Fig. 2, rows 15 and 16), so they will be paired between 1% and 5% of the time (Methods). Modeling the polymerase, transcript, and spliceosome as three hard spheres (a 15-nm polymerase, 20-nm transcript plus bound proteins, and 24-nm spliceosome) ensures that they are paired 12% of the time. Thus, this simple model (in which the size of the polymerizing complex is almost certainly underestimated) also explains why active genes tend to cluster.



Many beads on one string: cooperative effects

We now consider 21 beads (each representing one mRNA-producing complex) threaded every 20kbp along a 0.4Mbp of an active region of the human genome. Using Monte Carlo methods (Methods), we model an attraction of 4kBT between beads (Fig. 2, row 17); simulations yield two populations with energy minima depending on the approach used. Starting with a linear string, segments diffuse to give structures with ∼30% beads in clusters (Figure 5A). If the string is first compacted (a more likely representation of what happens in vivo), ∼80% are in clusters (Figure 5B). This compares with the ∼12% found above. We attribute most of the extra clustering to cooperative effects arising from the nonlinear increase in number of overlap volumes as more and more beads join a cluster (Figure 5C). Two factors may further increase clustering: the mRNA-producing complex is probably larger than we model, and—once such large structures come together—the high nucleoplasmic viscosity will slow diffusion apart (Methods). These results reinforce the idea that the depletion attraction contributes to the observed clustering and looping; moreover, similar cooperativity should be seen with all other strings discussed.



Conclusions

We treat active polymerizing complexes as spheres threaded on a DNA/chromatin string, and find that entropic forces drive aggregation of the complexes to loop the intervening DNA. This counterintuitive result is obtained despite the looping costs, which are outweighed by the entropy gained by the many small molecules that are packed into the cell. We suggest that Nature exploits such nonspecific aggregation to organize genomes. We do not wish to suggest such attractive entropic forces are the sole ones driving self-assembly; rather, they will augment other specific interactions (e.g., involving H-bonds, electrostatic interactions) that also position monomers precisely.

Our results help explain several aspects of genome organization. First, we predict that active (but not inactive) genomes will inevitably be looped, and they are 14,15,17. For example, old evidence shows that loops are present in active cells (from bacteria to man) but not in inactive ones (e.g., chicken erythrocytes, human sperm); moreover, loops are lost progressively as active chicken erythroblasts mature into inactive eythrocytes 45. Recent evidence also shows that three mouse genes spaced ∼10kbp and ∼15Mbp apart on the genetic map are attached to one factory when transcribed (with consequential looping), but not when inactive 46. Moreover, inhibiting transcription in living pro- and eukaryotes disperses their DNA 47,48,49, presumably by releasing loops. Second, we can explain why bacterial replication forks initially lie together before separating 40, and why bacterial and eukaryotic replication complexes tend to be found at the cell membrane or in factories 14,39. Third, we can predict the fraction of bacterial rrn operons found together in transcription factories 28 with reasonable accuracy, and why—in eukaryotes—active RNA polymerases I and II cluster in nucleoli and nucleoplasmic factories (Cook, 1999). (It is likely that energy must be spent to prevent polymerase I factories from aggregating with polymerase II factories.) These results are consistent with a model for genome organization in which active RNA polymerases cluster to loop the intervening DNA 15.

Our approach can readily be extended to other aspects of genome and cellular organization. For example, the interactions discussed here occur independently of scale. Then we can model local effects (e.g., the aggregation of hard nucleosomes into a soft cluster to form a chromatin fiber, with the depletion attraction augmenting electrostatic interactions 50) as well as global ones (e.g., the aggregation of heterochomatic clumps as chromosomes condense during mitosis). Moreover, we deliberately consider only one string here to simplify analysis; nevertheless, it is easy to imagine that the depletion attraction drives the formation of nucleoli and chromocenters (as active rDNA genes or centromeric heterochromatin on different chromosomes aggregate), as well as the pairing of meiotic chromosomes (as homologous transcription complexes aggregate 51). Finally, the depletion attraction probably contributes to the formation of many other large structures in cells (e.g., inclusion bodies, interchromatin granule clusters), and—where large structures like the cytoskeleton do exist—energy must be spent to counteract the attraction from driving them into one large aggregate.


Acknowledgments

We thank Michaël Bon and Ngo Toan for help.

We thank the Engineering and Physical Sciences Research Council for financial support.

References

1. Ellis, R.J. (2001). Macromolecular crowding: obvious but underappreciated. Trends Biochem. Sci. 26, 597–604. Abstract | Full Text | PDF (117 kb) | CrossRef | PubMed

2. Minton, A.P. (2001). The influence of macromolecular crowding and macromolecular confinement on biochemical reactions in physiological media. J. Biol. Chem. 256, 10577–10580. PubMed

3. Asakura, S., and Oosawa, F. (1958). Interactions between particles suspended in solutions of macromolecules. J. Polym. Sci. [B] 33, 183–192. PubMed

4. Snir, Y., and Kamien, R.D. (2005). Entropically driven helix formation. Science 307, 1067. CrossRef | PubMed

5. Yodh, A.G., Lin, K.H., Crocker, J.C., Dinsmore, A.D., Verma, R., and Kaplan, P.D. (2001). Entropically driven self-assembly and interaction in suspension. Philos. Trans. R. Soc. Lond. A 359, 921–937. PubMed

6. Manuelidis, L. (1990). A view of interphase chromosomes. Science 250, 1533–1540. PubMed

7. Cook, P.R. (1995). A chromomeric model for nuclear and chromosome structure. J. Cell Sci. 108, 2927–2935. PubMed

8. Sachs, R.K., van den Engh, G., Trask, B.J., Yokota, H., and Hearst, J. (1995). A random-walk/giant-loop model for interphase chromosomes. Proc. Natl. Acad. Sci. USA 92, 2710–2714. CrossRef | PubMed

9. Marshall, W.F., Fung, J.C., and Sedat, J.W. (1997). Deconstructing the nucleus: global architecture from local interactions. Curr. Opin. Genet. Dev. 7, 259–263. CrossRef | PubMed

10. Munkel, C., and Langowski, J. (1998). Chromosome structure predicted by a polymer model. Phys. Rev. E 57, 5888–5896. PubMed

11. Belmont, A.S. (2002). Mitotic chromosome scaffold structure: new approaches to an old controversy. Proc. Natl. Acad. Sci. USA 99, 15855–15857. CrossRef | PubMed

12. Ostashevsky, J. (2002). A polymer model for large-scale chromatin organization in lower eukaryotes. Mol. Biol. Cell. 13, 2157–2169. CrossRef | PubMed

13. Kleckner, N., Zickler, D., Jones, G.H., Dekker, J., Padmore, R., Henle, J., and Hutchinson, J. (2005). A mechanical basis for chromosome function. Proc. Natl. Acad. Sci. USA 101, 12592–12597. CrossRef | PubMed

14. Cook, P.R. (1999). The organization of replication and transcription. Science 284, 1790–1795. CrossRef | PubMed

15. Cook, P.R. (2002). Predicting three-dimensional genome structure from transcriptional activity. Nat. Genet. 32, 347–352. CrossRef | PubMed

16. Chambeyron, S., and Bickmore, W.A. (2004). Does looping and clustering in the nucleus regulate gene expression?. Curr. Opin. Cell Biol. 16, 256–262. CrossRef | PubMed

17. West, A.G., and Fraser, P. (2005). Remote control of gene transcription. Hum. Mol. Genet. 14, R101–R111. CrossRef | PubMed

18. Roth, R., Götzelmann, B., and Dietrich, S. (1997). Depletion forces near curved surfaces. Phys. Rev. Lett. 83, 448–451. CrossRef | PubMed

19. Marko, J.F., and Siggia, E.D. (1997). Polymer models of meiotic and mitotic chromosomes. Mol. Biol. Cell 8, 2217–2231. PubMed

20. Hanke, A., and Metzler, R. (2003). Entropy loss in long-distance DNA looping. Biophys. J. 85, 167–173. Abstract | Full Text | PDF (122 kb) | PubMed

21. Carlon, E., Orlandini, E., and Stella, A.L. (2002). Roles of stiffness and excluded volume in DNA denaturation. Phys. Rev. Lett. 88, 198101. CrossRef | PubMed

22. Metzler, R., Hanke, A., Dommersnes, P.G., Kantor, Y., and Kardar, M. (2002). Equilibrium shapes of flat knots. Phys. Rev. Lett. 88, 188101. CrossRef | PubMed

23. Bremer, H., and Dennis, P.P. (1996). Modulation of chemical composition and other parameters of the cell by growth rate. In Escherichia coli and Salmonella typhimurium: Cellular and Molecular Biology. Neidhardt, F.C., Curtiss, R., Lin, E.C.C., Brooks Low, K., Magasanik, B., Reznikoff, W.S., Riley, M., Schaechter, M., Umbarger, H.E., eds. 2nd Ed, (Washington, DC: ASM Press). PubMed

24. Atkins, P.W. (1999). Physical Chemistry. (Oxford, UK: Oxford University Press). PubMed

25. Daune, M. (1999). Molecular Biophysics: Structures in Motion. (Oxford, UK: Oxford University Press). PubMed

26. Bon, M., Marenduzzo, D., and Cook, P.R. (2006). Modeling a self-avoiding chromatin loop: relation to the packing problem, action-at-a-distance, and nuclear context. Structure 14, 197–204. Abstract | Full Text | PDF (404 kb) | CrossRef | PubMed

27. Woodcock, C.L., and Dimitrov, S. (2001). Higher-order structure of chromatin and chromosomes. Curr. Opin. Genet. Dev. 11, 130–135. CrossRef | PubMed

28. Cabrera, J.E., and Jin, D.J. (2003). The distribution of RNA polymerase in Escherichia coli is dynamic and sensitive to environmental cues. Mol. Microbiol. 50, 1493–1505. CrossRef | PubMed

29. Seksek, O., Biwersi, J., and Verkman, A.S. (1997). Translational diffusion of macromolecule-sized solutes in cytoplasm and nucleus. J. Cell Biol. 138, 131–142. CrossRef | PubMed

30. Luby-Phelps, K. (2000). Cytoarchitecture and physical properties of cytoplasm: volume, viscosity, diffusion, intracellular surface area. Int. Rev. Cytol. 192, 189–221. CrossRef | PubMed

31. Kimura, H., Sugaya, K., and Cook, P.R. (2002). The transcription cycle of RNA polymerase II in living cells. J. Cell Biol. 159, 777–782. CrossRef | PubMed

32. Hall, D., and Minton, A.P. (2004). Effects of inert volume-excluding macromolecules on protein fiber formation. II. Kinetic models for nucleated fiber growth. Biophys. Chem. 107, 299–316. CrossRef | PubMed

33. Sept, D., and McCammon, J.A. (2001). Thermodynamics and kinetics of actin filament nucleation. Biophys. J. 81, 667–674. Abstract | Full Text | PDF (365 kb) | PubMed

34. Dickinson, R.B., Caro, L., and Purich, D.L. (2004). Force generation by cytoskeletal end-tracking proteins. Biophys. J. 87, 2838–2854. Abstract | Full Text | PDF (382 kb) | CrossRef | PubMed

35. Orgel, L.E. (1998). The origin of life–a review of facts and speculations. Trends Biochem. Sci. 23, 491–495. Abstract | Full Text | PDF (487 kb) | CrossRef | PubMed

36. Benner, S.A., Ricardo, A., and Carrigan, M.A. (2004). Is there a common chemical model for life in the universe?. Curr. Opin. Chem. Biol. 8, 672–689. CrossRef | PubMed

37. Rippe, K. (2001). Making contacts on a nucleic acid polymer. Trends Biochem. Sci. 26, 733–740. Abstract | Full Text | PDF (81 kb) | CrossRef | PubMed

38. Jones, C.W., Wang, J.C., Briehl, R.W., and Turner, M.S. (2005). Measuring forces between protein fibers by microscopy. Biophys. J. 88, 2433–2441. Abstract | Full Text | PDF (200 kb) | CrossRef | PubMed

39. Lemon, K.P., and Grossman, A.D. (1998). Localization of bacterial DNA polymerase: evidence for a factory model of replication. Science 282, 1516–1519. CrossRef | PubMed

40. Bates, D., and Kleckner, N. (2005). Chromosome and replisome dynamics in E. coli: loss of sister cohesion triggers global chromosome movement and mediates chromosome segregation. Cell 121, 899–911. Abstract | Full Text | PDF (1085 kb) | CrossRef | PubMed

41. Grummt, I. (2003). Life on a planet of its own: regulation of RNA polymerase I transcription in the nucleolus. Genes Dev. 17, 1691–1702. CrossRef | PubMed

42. Maniatis, T., and Reed, R. (2002). An extensive network of coupling among gene expression machines. Nature 416, 499–506. CrossRef | PubMed

43. Jackson, D.A., Iborra, F.J., Manders, E.M.M., and Cook, P.R. (1998). Numbers and organization of RNA polymerases, nascent transcripts and transcription units in HeLa nuclei. Mol. Biol. Cell 9, 1523–1536. PubMed

44. Jurica, M.S., Sousa, D., Moore, M.J., and Grigorieff, N. (2004). Three-dimensional structure of C complex spliceosomes by electron microscopy. Nat. Struct. Mol. Biol. 11, 265–269. CrossRef | PubMed

45. Jackson, D.A., Mc, S.J., Cready, S.J., and Cook, P.R. (1984). Replication and transcription depend on attachment of DNA to the nuclear cage. J. Cell Sci. Suppl. 1, 59–79. PubMed

46. Osborne, C.S., Chakalova, C., Brown, K.E., Carter, D., Horton, A., D