On the Evolution of Islands

Peter G. Doyle - Colin Mallows - Alon Orlitsky - Larry Shepp

Introduction

Suppose we have n cells arranged in a ring, or alternatively, in a row. We pick a cell at random and mark it; we pick one of the remaining unmarked cells at random and mark it; and so on until after n steps each cell is marked. After the k'th cell has been marked, the configuration of the marked cells defines some number of islands separated by seas (See Figure 1). An island is a maximal set of adjacent marked cells; a sea is a maximal set of adjacent unmarked cells. Let tex2html_wrap_inline479 be the random number of islands after k cells have been marked. Clearly tex2html_wrap_inline483 , and for a ring of cells tex2html_wrap_inline485 as well.

  
Figure 1: n=12 cells, k=7 marked cells, tex2html_wrap_inline453 islands. Numbers denote the time a cell was marked.

We show that for n cells in a ring

where tex2html_wrap_inline495 is the k'th Catalan Number

For n cells in a row, the answer is the same as for n+1 cells in a ring. To see this, break the ring at the position of the last cell marked. Hence

This latter formula is used in a companion paper, Shepp [1], to show that certain random graphs are disconnected.

From now on we will consider only the ring case, and write E instead of tex2html_wrap_inline505 throughout.

The formula

is a special case of the following formula, valid for all tex2html_wrap_inline507 :

Another particular case of this general formula is

We will also show that for all tex2html_wrap_inline509

and for all tex2html_wrap_inline507

tex2html_wrap_inline515

 

We give two proofs that

The first proof is inductive, the second uses a more elegant counting argument. The more general equation can be proved using similar methods.

An inductive proof

A straightforward inductive attack on this problem would number the cells in order 1,2,...,n, and would define tex2html_wrap_inline537 to be the number of the k'th marked cell. The sequence tex2html_wrap_inline541 gives a complete description of the evolution of the process. This attack is unlikely to succeed, since the number of islands after k cells have been marked is a complicated function of these random variables. The trick in problems like this is to find a convenient partial description of the process under study, a description that captures what is of interest and that has simple probability properties. A similar trick is effective in problems in mechanics, where the judicious choice of a coordinate system can make all the difference.

Note that if we are interested only in the number of islands at each stage, then when there are exactly i islands, the sizes of these islands are irrelevant to the subsequent development. So we consider the situation where there are i islands and m cells still to be marked. Letting

we observe that, conditional on the event tex2html_wrap_inline551 , the random variables tex2html_wrap_inline553 have a distribution that does not depend on n. So we can define

and tex2html_wrap_inline557 (we can start the whole process after the first cell has been marked, since this must give just one island). We shall set up and solve a recurrence for f.

With f(m,i) as defined above, we consider what can happen when the next cell is marked. There are m empty cells, and the next cell marked is equally likely to be any one of them. The crucial step in this approach is the observation that conditional on tex2html_wrap_inline551 , all possible sizes of the i seas are equally likely: the probability that when there are m cells still to be marked, there are exactly i islands and the sizes of the intervening seas are tex2html_wrap_inline573 (where necessarily each tex2html_wrap_inline575 is at least 1) is independent of tex2html_wrap_inline577 . This can be shown formally by Bayes' theorem.

It is convenient to distinguish two kinds of empty cells. An empty cell that is adjacent clockwise to an marked cell is called a tied cell. There are i such tied cells, and m-i remaining free cells (See Figure 2).

  
Figure 2: Tied (shaded) and free cells

(We do not count an empty cell that is adjacent anticlockwise to an island as being tied to that island.)

With probability i/m the next cell to be marked is a tied cell; and then (using the ``crucial observation'' above) with probability (i-1)/(m-1) there is an marked cell clockwise from it; with probability (m-i)/(m-1) there is a free cell clockwise from it. On the other hand, with probability (m-i)/m the next cell to be marked is a free cell; and then with probability i/(m-1) the next clockwise cell is marked, and with probability (m-i-1)/(m-1) it is empty. This gives the recurrence

eqnarray98

valid for tex2html_wrap_inline595 , with the boundary conditions f(m,m) = 1/m! since when m=i we must have tex2html_wrap_inline601 for j = m-1, m-2,...,1.

To solve this recurrence, put

so that

valid for tex2html_wrap_inline605 , with the boundary conditions

We recognize this recurrence as being related to binomial coefficients. Working out a few values of tex2html_wrap_inline607 easily leads to the conjecture

which does indeed satisfy the recurrence above. Thus we have

so that finally we have

A counting-argument proof

Let tex2html_wrap_inline625 be the i'th marked cell. tex2html_wrap_inline629 is a permutation of tex2html_wrap_inline631 . Each such permutation gives rise to a sequence tex2html_wrap_inline633 where tex2html_wrap_inline635 is the number of islands after the i'th cell has been marked. Call a sequence tex2html_wrap_inline639 of positive integers admissible if tex2html_wrap_inline641 and any two successive entries differ by at most 1. Let tex2html_wrap_inline645 be the increment in the number of islands when the i+1'st cell is marked, and let tex2html_wrap_inline649 .

The number of permutations that gives rise to an admissible sequence tex2html_wrap_inline639 is:

To see this, think of a child assembling a necklace of beads, one bead at a time. The child can be working on more than one string at once; these strings are kept in a more or less circular ring, arranged in the same order as in the finished necklace. As each successive bead is added, it is joined to any bead that it will be adjacent to in the finished necklace. Figure 3 illustrates a possible arrangement after the child placed seven beads, forming three strings.

  
Figure 3: Assembling a necklace of beads.

Suppose there are tex2html_wrap_inline635 strings after the i'th bead has been added. If tex2html_wrap_inline657 then the i+1'st bead creates a new island and there are tex2html_wrap_inline635 possible new-island locations. If tex2html_wrap_inline663 , then the i+1'st bead connects two islands and there are tex2html_wrap_inline635 possible pairs of adjacent islands. If tex2html_wrap_inline669 , then the i+1'st bead is added to an existing island and there are tex2html_wrap_inline635 islands, each with two sides, hence there are tex2html_wrap_inline675 ways to add the bead. Once all the beads have been placed, there are n ways to spin them before obtaining a recipe for marking the cells.

Dividing the number of ways an admissible sequence tex2html_wrap_inline679 can arise by n! gives the probability of the sequence:

displaymath612

The expected value that we are interested in is thus

So we just need to evaluate this sum.

Consider all possible walks tex2html_wrap_inline683 on the non-negative integers that start from 0, go up or down 1 each time, and return to 0 for the first time after the 2n'th step. The number of such walks is well known to be

Given such a walk, the sequence

is an admissible sequence, and every admissible sequence arises from tex2html_wrap_inline691 different walks. Hence

and

Further results

 

For any possible sequence tex2html_wrap_inline695 of islands in the ring, the sequence tex2html_wrap_inline697 of sea sizes at time k is uniformly distributed: every positive sequence tex2html_wrap_inline701 satisfying

arises as the value of tex2html_wrap_inline697 with the same probability. Therefore, the sequence tex2html_wrap_inline707 is a Markov Chain.

Using the uniformity of tex2html_wrap_inline697 , and letting tex2html_wrap_inline711 , it is easy to see that for tex2html_wrap_inline509 :

displaymath715

Hence

and

  equation202

Solving the recurrence with tex2html_wrap_inline719 we obtain

Similarly,

This, when solved, yields

Equation (1) can also be used to show that for all tex2html_wrap_inline507

Therefore

eqnarray215

An alternative way of proving that

is via the differences tex2html_wrap_inline733 . They satisfy

displaymath735

To see that, consider the permutation tex2html_wrap_inline737 that maps i to the cell marked at time i. The number of islands increases, decreases, or remains the same at time i, corresponding to whether i is a local minimum, maximum, or a middle point, of the inverse permutation tex2html_wrap_inline747 . Since tex2html_wrap_inline737 is distributed uniformly over all permutations of tex2html_wrap_inline631 , so is tex2html_wrap_inline747 . The integer i is a local minimum, maximum, or a middle point of tex2html_wrap_inline747 with the above probabilities. Therefore

and the result follows.

Finally, we can write down the value of tex2html_wrap_inline761 directly if we note that

eqnarray238

References

1
L. A. Shepp. Connectedness of certain random graphs. Israel J. Math., 67, 1989.



Peter Doyle