Peter G. Doyle - Colin Mallows - Alon Orlitsky - Larry Shepp
Suppose we have n cells arranged in a ring, or alternatively, in a row.
We pick a cell at random and mark it;
we pick one of the remaining unmarked cells at random and mark it;
and so on until after n steps each cell is marked.
After the k'th cell has been marked,
the configuration of the marked cells defines
some number of islands separated by seas
(See Figure 1).
An island is a maximal set of adjacent marked cells;
a sea is a maximal set of adjacent unmarked cells.
Let
be the random number of islands after k cells have been marked.
Clearly
, and for a ring of cells
as well.

Figure 1: n=12 cells, k=7 marked cells,
islands. Numbers
denote the time a cell was marked.
We show that for n cells in a ring

where
is the k'th Catalan Number

For n cells in a row, the answer is the same as for n+1 cells in a ring. To see this, break the ring at the position of the last cell marked. Hence

This latter formula is used in a companion paper, Shepp [1], to show that certain random graphs are disconnected.
From now on we will consider only the ring case, and
write E instead of
throughout.
The formula

is a special case of the following formula,
valid for all
:

Another particular case of this general formula is

We will also show that for all

and for all

We give two proofs that

The first proof is inductive, the second uses a more elegant counting argument. The more general equation can be proved using similar methods.
A straightforward inductive attack on this problem would number the cells
in order 1,2,...,n, and would define
to be the number
of the k'th marked cell.
The sequence
gives a complete
description of the evolution of the process. This attack is
unlikely to succeed, since the number of islands after k
cells have been marked is a complicated function of these
random variables. The trick in problems like this is to
find a convenient partial description of the
process under study, a description that captures what is
of interest and that has simple probability properties.
A similar trick is effective in problems in mechanics,
where the judicious choice of a coordinate system can
make all the difference.
Note that if we are interested only in the number of islands at each stage, then when there are exactly i islands, the sizes of these islands are irrelevant to the subsequent development. So we consider the situation where there are i islands and m cells still to be marked. Letting

we observe that, conditional on the event
,
the random variables
have a distribution that does not depend on n. So we can
define

and
(we can start the whole process after the
first cell has been marked, since this must give just one island).
We shall set up and solve a recurrence for f.
With f(m,i) as defined above,
we consider what can happen when
the next cell is marked. There are m empty cells, and the next
cell marked is equally likely to be any one of them.
The crucial step in this approach is the observation that
conditional on
, all possible sizes
of the i seas are equally likely:
the probability that when there are m cells
still to be marked, there are exactly i islands and the sizes of the
intervening seas are
(where necessarily each
is at least 1)
is independent of
.
This can be shown formally by Bayes' theorem.
It is convenient to distinguish two kinds of empty cells. An empty cell that is adjacent clockwise to an marked cell is called a tied cell. There are i such tied cells, and m-i remaining free cells (See Figure 2).

Figure 2: Tied (shaded) and free cells
(We do not count an empty cell that is adjacent anticlockwise to an island as being tied to that island.)
With probability i/m the next cell to be marked is a tied cell; and then (using the ``crucial observation'' above) with probability (i-1)/(m-1) there is an marked cell clockwise from it; with probability (m-i)/(m-1) there is a free cell clockwise from it. On the other hand, with probability (m-i)/m the next cell to be marked is a free cell; and then with probability i/(m-1) the next clockwise cell is marked, and with probability (m-i-1)/(m-1) it is empty. This gives the recurrence
valid for
,
with the boundary conditions f(m,m) = 1/m! since when m=i
we must have
for j = m-1, m-2,...,1.
To solve this recurrence, put

so that

valid for
, with the boundary conditions

We recognize this recurrence as being related to binomial coefficients.
Working out a few values of
easily leads to the conjecture

which does indeed satisfy the recurrence above. Thus we have

so that finally we have

Let
be the i'th marked cell.
is a permutation of
.
Each such permutation gives rise to a sequence
where
is the number of islands after the i'th cell has been marked.
Call a sequence
of positive integers admissible
if
and any two successive entries differ by at most 1.
Let
be the increment in the number of islands when
the i+1'st cell is marked, and let
.
The number of permutations that gives rise to an admissible sequence
is:

To see this, think of a child assembling a necklace of beads, one bead at a time. The child can be working on more than one string at once; these strings are kept in a more or less circular ring, arranged in the same order as in the finished necklace. As each successive bead is added, it is joined to any bead that it will be adjacent to in the finished necklace. Figure 3 illustrates a possible arrangement after the child placed seven beads, forming three strings.

Figure 3: Assembling a necklace of beads.
Suppose there are
strings after the i'th bead has been added.
If
then the i+1'st bead creates a new island and
there are
possible new-island locations.
If
, then the i+1'st bead connects two islands and
there are
possible pairs of adjacent islands.
If
, then the i+1'st bead is added to an existing island and
there are
islands, each with two sides, hence there are
ways to
add the bead.
Once all the beads have been placed, there are n ways to spin them before
obtaining a recipe for marking the cells.
Dividing the number of ways an admissible sequence
can arise by
n! gives the probability of the sequence:
The expected value that we are interested in is thus

So we just need to evaluate this sum.
Consider all possible walks
on the non-negative integers that start from 0,
go up or down 1 each time,
and return to 0 for the first time after the 2n'th step.
The number of such walks is well known to be

Given such a walk, the sequence

is an admissible sequence,
and every admissible sequence arises from
different walks.
Hence

and

For any possible sequence
of islands in the ring,
the sequence
of sea sizes at time k is
uniformly distributed:
every positive sequence
satisfying

arises as the value of
with the same probability.
Therefore, the sequence
is a Markov Chain.
Using the uniformity of
,
and letting
,
it is easy to see that for
:
Hence

and
Solving the recurrence with
we obtain

Similarly,

This, when solved, yields

Equation (1) can also be used to show that for all

Therefore
An alternative way of proving that

is via the differences
.
They satisfy
To see that, consider the permutation
that maps i
to the cell marked at time i.
The number of islands increases, decreases, or remains the same at time i,
corresponding to whether i is a local minimum, maximum, or a middle point,
of the inverse permutation
.
Since
is distributed uniformly over all permutations of
, so is
.
The integer i is a local minimum, maximum, or a middle point of
with the above probabilities.
Therefore

and the result follows.
Finally, we can write down the value of
directly
if we note that