% Modified 31 Oct 2005:  Conditioning fallacy alluded to.
% This chapter has been modified on 6-4-05.
% There are two \choice
\pagestyle{headings}
\setcounter{chapter}{0}
\chapter{Discrete Probability Distributions} \label{chp 1}
\setcounter{page}{1}

\section{Simulation of Discrete Probabilities} \label{sec 1.1}

\subsection*{Probability}

In this chapter, we shall first consider chance experiments with a finite number of
possible outcomes $\omega_1$, $\omega_2$, \dots, $\omega_n$.  For example, we
roll a die and the possible outcomes are 1, 2, 3, 4, 5, 6 corresponding to
the side that turns up.  We toss a coin with possible outcomes H (heads) and T
(tails).  
\par
It is frequently useful to be able to refer to an outcome of an experiment.  For 
example, we might want to write the mathematical expression which gives the sum of four rolls
of a die.  To do this, we could let $X_i$, $i = 1, 2, 3, 4,$ represent the values of the
outcomes of the four rolls, and then we could write the expression
$$X_1 + X_2 + X_3 + X_4$$
for the sum of the four rolls.  The $X_i$'s are called \emx {random variables}\index{random 
variable}.
A random variable is simply an expression whose value is the outcome of a particular
experiment.  Just as in the case of other types of variables in mathematics, random variables
can take on different values.
\par
Let $X$ be the random variable which represents the roll of one die.  We shall assign
probabilities to the possible outcomes of this experiment.  We do this by assigning to each
outcome $\omega_j$ a nonnegative number $m(\omega_j)$ in such a way that
$$ 
m(\omega_1) + m(\omega_2) + \cdots + m(\omega_6) = 1\ .
$$
The function $m(\omega_j)$ is called the \emx {distribution function}\index{distribution
function} of the random variable
$X$.  For the case of the roll of the die we would assign equal probabilities or
probabilities 1/6 to each of the outcomes.  With this assignment of probabilities, one could write
$$
P(X \le 4) = {2\over 3}
$$
to mean that the probability is $2/3$ that a roll of a die will have a value which does
not exceed 4.  
\par
Let $Y$ be the random variable which represents the toss of a coin.  In this case, there are
two possible outcomes, which we can label as H and T.  Unless we have reason to suspect that
the coin comes up one way more often than the other way, it is natural to assign the
probability of 1/2 to each of the two outcomes.
\par
In both of the above experiments, each outcome is assigned an equal probability.  This would certainly
not be the case in general.  For example, if a drug is found to be effective 30 percent of the time
it is used, we might assign a probability .3 that the drug is effective the next time it is used and
.7 that it is not effective.  This last example illustrates the intuitive \emx {frequency concept of
probability.}\index{probability!frequency concept of}  That is, if we have a probability $p$ that 
an experiment will result in outcome $A$, then if we repeat this experiment a large number of times 
we should expect that the
fraction of times that $A$ will occur is about $p$.  To check intuitive ideas like this, we shall
find it helpful to look at some of these problems experimentally.  We could, for 
example, toss a coin a large number of times and see if the fraction of times heads turns up is 
about 1/2.  We could also simulate this experiment on a computer.

\subsection*{Simulation}

We want to be able to perform an experiment that corresponds to a given set
of probabilities; for example, $m(\omega_1) = 1/2$, $m(\omega_2) = 1/3$, and
$m(\omega_3) = 1/6$.  In this case, one could mark three faces of a six-sided die with an $\omega_1$,
two faces with an $\omega_2$, and one face with an $\omega_3$.  
\par
In the general case we assume that $m(\omega_1)$, $m(\omega_2)$, \dots, $m(\omega_n)$ are all rational
numbers, with least common denominator $n$.  If $n > 2$, we can imagine a long cylindrical die with a
cross-section that is a regular $n$-gon.  If $m(\omega_j) = n_j/n$, then we can label $n_j$ of the
long faces of the cylinder with an $\omega_j$, and if one of the end faces comes up, we can
just roll the die again.  If $n = 2$, a coin could be used to perform the experiment.
\par
We will be particularly interested in repeating a chance experiment a large
number of times.  Although the cylindrical die would be a convenient way to carry out
a few repetitions, it would be difficult to carry out a large number of
experiments.  Since the modern computer can do a large number of operations
in a very short time, it is natural to turn to the computer for this task.

\subsection*{Random Numbers}

We must first find a computer analog of rolling a die.  This is done on
the computer by means of a \emx {random number generator.}\index{random number generator}
Depending upon the particular software
package, the computer can be asked for a real number between 0 and 1, or an integer in a given
set of consecutive integers.  In the first case, the real numbers are chosen in such a way that
the probability that the number lies in any particular subinterval of this unit interval is
equal to the length of the subinterval.  In the second case, each integer has the same
probability of being chosen.
\par
Let $X$ be a random variable with distribution function $m(\omega)$, where $\omega$ is in the set $\{\omega_1,
\omega_2, \omega_3\}$, and $m(\omega_1) = 1/2$, $m(\omega_2) = 1/3$, and $m(\omega_3) = 1/6$.  If our
computer package can return a random integer in the set $\{1, 2, ..., 6\}$, then we simply ask it to
do so, and make 1, 2, and 3 correspond to $\omega_1$, 4 and 5 correspond to $\omega_2$, and 6
correspond to $\omega_3$.  If our computer package returns a random real number $r$ in the interval
$(0,~1)$, then the expression
$$\lfloor {6r}\rfloor + 1$$
will be a random integer between 1 and 6.  (The notation $\lfloor x \rfloor$ means the greatest integer
not exceeding $x$, and is read ``floor of $x$.")
\par
The method by which random real numbers are generated on a computer is described in the historical
discussion at the end of this section.  The following example gives sample output of the program {\bf
RandomNumbers}.\index{RandomNumbers (program)}
\begin{example}(Random Number Generation) \label{exam 1.05}
The program {\bf RandomNumbers} generates $n$ random real numbers in the interval $[0, 1]$, where
$n$ is chosen by the user.  When we ran the program with $n = 20$, we obtained the data shown in
Table~\ref{table 1.1}.
\begin{table}
\centering
\begin{tabular}{llll} 
.203309 &
.762057 &
.151121 &
.623868 \\
.932052 &
.415178 &
.716719 &
.967412 \\
.069664 & 
.670982 &
.352320 &
.049723 \\
.750216 &
.784810 &
.089734 &
.966730 \\
.946708 &
.380365 &
.027381 &
.900794 \\
\end{tabular}
\caption{Sample output of the program {\bf RandomNumbers}.}
\label{table 1.1}
\end{table}
\end{example}

\begin{example}(Coin Tossing) \label{exam 1.1}
 As we have noted, our intuition suggests that the probability of
obtaining a head on a single toss of a coin is 1/2.  To have the computer
toss a coin, we can ask it to pick a random real number in the interval $[0, 1]$ and
test to see if this number is less than 1/2.  If so, we shall call the outcome \emx {heads}; if
not
we call it \emx {tails.}  Another way to proceed would be to ask the computer to pick a random
integer from the set $\{0, 1\}$.  The program {\bf CoinTosses}\index{CoinTosses (program)} carries
out the experiment of tossing a coin $n$ times.  Running this program, with
$n = 20$, resulted in:
\vskip .1in
\begin{center}
THTTTHTTTTHTTTTTHHTT.
\end{center}
\vskip .1in

Note that in 20 tosses, we obtained 5 heads and 15 tails.  Let us toss a
coin $n$ times, where $n$ is much larger than 20, and see if we obtain a proportion of heads closer to our
intuitive guess of 1/2.  The program {\bf CoinTosses} keeps track of the number of heads.  When
we ran this program with $n = 1000$, we obtained 494 heads.  When we ran it with $n = 10000$, we obtained
5039 heads.

We notice that when we tossed the coin 10{,}000 times, the proportion of
heads
was close to the ``true value" .5 for obtaining a head when a coin is
tossed.  A mathematical model for this experiment is called Bernoulli Trials (see Chapter~
\ref{chp 3}).  The
\emx {Law of Large Numbers,} which we shall study later (see Chapter~\ref{chp
8}), 
will show that in the Bernoulli Trials model, the proportion of heads should be near .5,
consistent with our intuitive idea of the frequency interpretation of probability.
\par
Of course, our program could be easily modified to simulate coins
for which the probability of a head is $p$, where $p$ is a real number between 0 and 1.
\end{example}

\par
In the case of coin tossing, we already knew the probability of the event
occurring on each experiment.  The real power of simulation comes from the
ability to estimate probabilities when they are not known ahead of time. 
This
method has been used in the recent discoveries of strategies that make the
casino game of blackjack favorable to the player.  We illustrate this idea in
a
simple situation in which we can compute the true probability and see how
effective the simulation is.

\begin{example}(Dice Rolling)\label{exam 1.2}
We consider a dice game that played an important role in the
historical development of probability.  The famous letters
between Pascal\index{PASCAL, B.} and Fermat\index{FERMAT, P.}, 
which many believe started a serious study of
probability, were instigated by a request for help from a French nobleman and
gambler, Chevalier de~M\'er\'e\index{de M\'ER\'E, CHEVALIER}.  
It is said that de~M\'er\'e had been betting
that, in four rolls of a die, at least one six would turn up.  He was winning
consistently and, to get more people to play, he changed the game to bet
that,
in 24 rolls of two dice, a pair of sixes would turn up. It is
claimed that de~M\'er\'e lost with 24 and felt that 25 rolls were necessary
to
make the game favorable.  It was \emx {un grand scandale} that mathematics
was wrong.
\par
We shall try to see if de~M\'er\'e is correct by simulating his various bets. 
The program {\bf DeMere1}\index{DeMere1 (program)} simulates a large number of experiments, 
seeing, in each one, if a six turns up in four rolls of a die. When we ran this program for
1000 plays, a six came up in the first four rolls 48.6 percent of the time.  When we ran
it for 10,000 plays this happened 51.98 percent of the time.
\par
We note that the result of the second run suggests that de~M\'er\'e was correct
in believing that his bet with one die was favorable; however, if we had
based our conclusion on the first run, we would have decided that he was wrong. 
\emx {Accurate results by simulation require a large number of experiments.}
\end{example}
\par
The program {\bf DeMere2}\index{DeMere2 (program)} simulates de~M\'er\'e's second bet that a 
pair of sixes will occur in 
$n$ rolls of a pair of dice.  The previous simulation shows that it is important to know how
many trials we should simulate in order to expect a certain degree of accuracy in our
approximation.  We shall see later that in these types of
experiments, a rough rule of thumb is that, at least 95\% of the time, the error does not exceed the
reciprocal of the square root of the number of trials.  Fortunately, for this dice game, it will be 
easy to compute the exact probabilities.  We shall show in the next section that for the first bet
the probability that de~M\'er\'e wins is $1 - (5/6)^4 = .518$. 
\par 
One can understand this calculation as follows:  The probability that no 6
turns up on the first toss is $(5/6)$.  The probability that no 6 turns up on
either of the first two tosses is $(5/6)^2$.  Reasoning in the same way, the
probability that no 6 turns up on any of the first four tosses is $(5/6)^4$. 
Thus, the probability of at least one 6 in the first four tosses is $1 -
(5/6)^4$.  Similarly, for the second bet, with 24 rolls, the probability that
de~M\'er\'e
wins is $1 - (35/36)^{24} = .491$, and for 25 rolls it is $1 - (35/36)^{25}
= .506$.  
\par
Using the rule of thumb mentioned above, it
would require 27,000 rolls to have a reasonable chance to determine these probabilities with
sufficient accuracy to assert that they lie on opposite sides of .5.  It is interesting to ponder
whether a gambler can detect such probabilities with the required accuracy from gambling
experience.  Some writers on the history of probability suggest that de~M\'er\'e was, in
fact, just interested in these problems as intriguing probability problems.

\begin{example}(Heads or Tails)\label{exam 1.3}
For our next example, we consider a problem
where the exact answer is difficult to obtain but for which simulation easily
gives the qualitative results.  Peter and Paul play a game called \emx {heads
or
tails.}  In this game, a fair coin is tossed a sequence of times---we choose
40.  Each time a head comes up Peter wins 1 penny from Paul, and each time a
tail comes up Peter loses 1 penny to Paul.  For example, if the results of
the
40 tosses are
\begin{center}
THTHHHHTTHTHHTTHHTTTTHHHTHHTHHHTHHHTTTHH.
\end{center}
\noindent Peter's winnings may be graphed as in Figure~\ref{fig 1.3}.
\putfig{3.5truein}{PSfig1-3}{Peter's winnings in 40 plays of heads or tails.}{fig 1.3}
\par
Peter has won 6 pennies in this particular game.  It is natural to ask for
the probability that he will win $j$ pennies; here $j$ could be any even
number from $-40$ to $40$.  It is reasonable to guess that the value of $j$ with
the highest probability is $j = 0$, since this occurs when the number of heads
equals the number of tails.  Similarly, we would guess that the values of $j$ with
the lowest probabilities are $j = \pm 40$.
\par
A second interesting question about this game is the following: How many times
in the 40 tosses will Peter be in the lead?  Looking at the graph of his
winnings (Figure~\ref{fig 1.3}), we see that Peter is in the lead when his winnings 
are positive, but
we have to make some convention when his winnings are 0 if we want all tosses
to contribute to the number of times in the lead.  We adopt the convention
that, when Peter's winnings are 0, he is in the lead if he was ahead at the
previous toss and not if he was behind at the previous toss.  With this
convention, Peter is in the lead 34 times in our example.  Again, our
intuition might suggest that the most likely number of times to be in the
lead is 1/2 of 40, or 20, and the least likely numbers are the extreme cases of 40
or 0.
\par
It is easy to settle this by simulating the game a large number of times and
keeping track of the number of times that Peter's final winnings are $j$, and
the number of times that Peter ends up being in the lead by $k$.  The
proportions over all games then give estimates for the corresponding
probabilities.  The program {\bf HTSimulation}\index{HTSimulation (program)} carries out this
simulation.  Note that when there are an even number of tosses in the game, it
is possible to be in the lead only an even number of times.  We have simulated this 
game 10{,}000 times. 
The results are shown in Figures~\ref{fig 1.4.5} and \ref{fig 1.4.6}.  These graphs,
which we call spike graphs\index{spike graph}, were generated using the program {\bf Spikegraph}.
\index{Spikegraph (program)}  
The vertical line, or spike, at position $x$ on the horizontal axis, has a height equal to the
proportion of outcomes which equal
$x$.  Our intuition about Peter's final winnings was quite correct, but our intuition
about the number of times Peter was in the lead was completely wrong.  The simulation
suggests that the least likely number of times in the lead is 20 and the most likely is 0
or 40.  This is indeed correct, and the explanation for it is suggested by playing the
game of heads or tails with a large number of tosses and looking at a graph of Peter's
winnings.  In Figure~\ref{fig 1.4.1} we show the results of a simulation of the game, for
1000 tosses and in Figure~\ref{fig 1.4.2} for 10{,}000 tosses.
\par
In the second example Peter was ahead most of the time.  It is a remarkable fact,
however, that, if play is continued long enough, Peter's winnings will continue
to come back to 0, but there will be very long times between the times that
this happens.  These and related results will be discussed in
Chapter~\ref{chp 12}.
\end{example}

In all of our examples so far, we have simulated equiprobable outcomes.  We
illustrate next an example where the outcomes are not equiprobable.

\putfig{4truein }{PSfig1-4-5}{Distribution of winnings.}{fig 1.4.5}

\putfig{4truein }{PSfig1-4-6}{Distribution of number of times in the lead.}{fig 1.4.6}

\putfig{4truein}{PSfig1-4}{Peter's winnings in 1000 plays of heads or tails.}{fig 1.4.1}

\putfig{4truein} {PSfig1-5}{Peter's winnings in 10,000 plays of heads or tails.}{fig 1.4.2}


\begin{example}(Horse Races)\label{exam 1.4}
Four horses (Acorn, Balky, Chestnut, and Dolby) have raced many
times.  It is estimated that Acorn wins 30 percent of the time, Balky 40
percent of the time, Chestnut 20 percent of the time, and Dolby 10 percent of
the time. 
\par
We can have our computer carry out one race as follows:  
Choose a random number $x$.  If $x < .3$ then we say that Acorn won.  If $.3 \le x < .7$
then Balky wins.  If $.7 \le x < .9$ then Chestnut wins.  Finally, if $.9 \le x$ then
Dolby wins. 
\par
The program {\bf HorseRace}\index{HorseRace (program)} uses this method to simulate the outcomes of
$n$ races.   Running this program for $n = 10$ we found that  Acorn won 40 percent of the time,
Balky 20 percent of the time, Chestnut 10 percent of the time, and Dolby 30 percent of the time.  
A larger number of races would be necessary to have better agreement with the past experience.
Therefore we ran the program to simulate 1000 races with our four horses. 
Although very tired after all these races, they performed in a manner quite consistent
with our estimates of their abilities. Acorn won 29.8 percent of the time,
Balky 39.4 percent, Chestnut 19.5 percent, and Dolby 11.3 percent of the time. 
\par
The program {\bf GeneralSimulation}\index{GeneralSimulation (program)} uses this method to simulate
repetitions of an arbitrary experiment with a finite number of outcomes occurring with known
probabilities.
\end{example}

\subsection*{Historical Remarks}

Anyone who plays the same chance game over and over is really carrying out a
simulation, and in this sense the process of simulation has been going on for
centuries.  As we have remarked, many of the early problems of probability
might well have been suggested by gamblers' experiences.  
\par
It is natural for anyone trying to understand probability theory to try
simple experiments by tossing coins, rolling dice, and so forth.  The naturalist
Buffon\index{BUFFON, G. L.} tossed a coin 4040 times, resulting in 2048 heads and 1992 tails.  He
also estimated the number $\pi$ by throwing needles on a ruled surface and
recording how many times the needles crossed a line\choice{.\footnote{See Section 2.1
of the complete Grinstead-Snell book. This is also illustrated in the Probability
Demo.}}{\ (see Section~\ref{sec 2.1}).}   The English biologist W. F. R. Weldon\index{WELDON,
W. F. R.}\footnote{T.~C.~Fry,
\emx {Probability and Its Engineering Uses,} 2nd~ed. (Princeton: Van Nostrand, 1965).}
recorded 26{,}306 throws of 12 dice, and the
Swiss scientist Rudolf Wolf\index{WOLF, R.}\footnote{E.~Czuber, \emx
{Wahrscheinlichkeitsrechnung,} 3rd~ed. (Berlin: Teubner, 1914).} recorded
100{,}000 throws of a single die without a computer.  Such experiments are
very
time-consuming and may not accurately represent the chance phenomena being
studied.  For example, for the dice experiments of Weldon and Wolf, further
analysis of the recorded data showed a suspected bias in the dice.  The
statistician Karl Pearson\index{PEARSON, K.} analyzed a large number of outcomes at certain
roulette tables and suggested that the wheels were biased.  He wrote in 1894:

\begin{quote} 
Clearly, since the Casino does not serve the valuable end of huge
laboratory for the preparation of probability statistics, it has no
scientific
\emx {raison d'\^etre.}  Men of science cannot have their most refined
theories disregarded in this shameless manner!  The French Government must be
urged by the hierarchy of science to close the gaming-saloons; it would be,
of
course, a graceful act to hand over the remaining resources of the Casino to
the
Acad\'emie des Sciences for the endowment of a laboratory of orthodox
probability; in particular, of the new branch of that study, the application
of
the theory of chance to the biological problems of evolution, which is likely
to occupy so much of men's thoughts in the near future.\footnote{K. Pearson,
``Science and Monte Carlo," \emx {Fortnightly Review}, vol.~55 (1894),
p.~193;
cited in S. M. Stigler, \emx {The History of Statistics} (Cambridge: Harvard
University Press, 1986).}
\end{quote}

However, these early experiments were suggestive and led to important
discoveries in probability and statistics.  They led Pearson to the \emx
{chi-squared test,} which is of great importance in testing whether observed
data fit a given probability distribution.

By the early 1900s it was clear that a better way to generate random numbers
was needed.  In 1927, L. H. C. Tippett\index{TIPPETT, L. H. C.} published a list of 41{,}600 digits
obtained by selecting numbers haphazardly from census reports.  In 1955, RAND
Corporation\index{RAND Corporation} printed a table of 1{,}000{,}000 random numbers generated from
electronic noise.  The advent of the high-speed computer raised the
possibility
of generating random numbers directly on the computer, and in the late 1940s
John von~Neumann\index{von NEUMANN, J.} suggested that this be done as follows: Suppose that you
want
a random sequence of four-digit numbers.  Choose any four-digit number, say
6235, to start.  Square this number to obtain 38{,}875{,}225.  For the second
number choose the middle four digits of this square (i.e., 8752).  Do the
same process starting with 8752 to get the third number, and so forth.  
\par
More modern methods involve the
concept of modular arithmetic\index{modular arithmetic}.  
If $a$ is an integer and $m$ is a positive integer, then by
$a\ (\mbox{mod}\  m)$
we mean the remainder when $a$ is divided by $m$.  For example, $10\ (
\mbox{mod}\  4) = 2$,
$8\ (\mbox{mod}\  2) = 0$, and so forth.  To generate a random sequence $X_0,
X_1, X_2,
\dots$ of numbers choose a starting number $X_0$ and then obtain the numbers
$X_{n+1}$ from $X_n$ by the formula 
$$X_{n+1} = (aX_n + c)\ (\mbox{mod}\ m)\ ,$$
where $a$, $c$, and $m$ are carefully chosen constants.  The sequence 
$ X_0,  X_1,$ 
$X_2, \dots$ 
is then a sequence of integers between 0 and $m-1$.  To obtain a sequence of real numbers in $[0,1)$,
we divide each $X_j$ by $m$.  The resulting sequence consists of rational numbers of the form $j/m$,
where $0 \le j \le m-1$.  Since $m$ is usually a very large integer, we think of the numbers in the sequence
as being random real numbers in $[0, 1)$.
\par
For both von~Neumann's squaring method and the modular arithmetic technique the sequence
of numbers is actually completely determined by the first number.  Thus,
there is nothing really random about these sequences.  However, they produce
numbers that behave very much as theory would predict for random experiments.  To
obtain different sequences for different experiments the initial number $X_0$
is chosen by some other procedure that might involve, for example, the time
of day.\footnote{For a detailed discussion of random numbers, see D.~E.~Knuth, \emx {The Art of
Computer Programming,} vol.~II (Reading: Addison-Wesley, 1969).}
\par
During the Second World War, physicists at the Los Alamos Scientific
Laboratory
needed to know, for purposes of shielding, how far neutrons travel through
various materials.  This question was beyond the reach of theoretical
calculations.  Daniel McCracken\index{McCRACKEN, D.}, writing in the \emx {Scientific American},
states:

\begin{quote}
The physicists had most of the necessary data: they knew the average
distance a neutron of a given speed would travel in a given substance before
it
collided with an atomic nucleus, what the probabilities were that the neutron
would bounce off instead of being absorbed by the nucleus, how much energy
the
neutron was likely to lose after a given collision and so on.\footnote{D.~D.~McCracken, ``The
Monte Carlo Method," \emx {Scientific American,} vol.~192 (May 1955), p.~90.} 
\end{quote}

John von~Neumann\index{von NEUMANN, J.} and Stanislas Ulam\index{ULAM, S.} 
suggested that the problem be solved by
modeling the experiment by chance devices on a computer.  Their work being
secret, it was necessary to give it a code name.  Von~Neumann chose the name
``Monte Carlo."  Since that time, this method of simulation has been called
the \emx {Monte Carlo Method.}

William Feller\index{FELLER, W.} indicated the possibilities of using computer
simulations to illustrate basic concepts in probability in his book \emx
{An Introduction to Probability Theory and Its Applications.}  In discussing the
problem about the number of times in the lead in the game of ``heads or
tails" Feller writes:

\begin{quote} 
The results concerning fluctuations in coin tossing show that widely
held beliefs about the law of large numbers are fallacious.  These results
are
so amazing and so at variance with common intuition that even sophisticated
colleagues doubted that coins actually misbehave as theory predicts.  The
record of a simulated experiment is therefore included.\footnote{W.~Feller, 
\emx {Introduction to Probability Theory and its Applications,} vol.~1,
3rd~ed. (New York: John Wiley \& Sons, 1968), p.~xi.}
\end{quote}

\noindent Feller provides a plot showing the result of 10{,}000 plays of \emx
{heads or tails} similar to that in Figure~\ref{fig 1.4.2}.

The martingale betting system\index{martingale betting system} 
described in Exercise~\ref{exer 1.1.10} 
has a long and interesting history.  Russell Barnhart\index{BARNHART, R.} pointed out to the
authors that its use can be traced back at least to 1754, when Casanova\index{CASANOVA, G.}, 
writing in his memoirs, \emx {History of My Life,} writes

\begin{quote}
 She [Casanova's mistress] made me promise to go to the casino [the
Ridotto in Venice] for money to play in partnership with her.  I went there
and
took all the gold I found, and, determinedly doubling my stakes according to
the system known as the martingale, I won three or four times a day during
the
rest of the Carnival.  I never lost the sixth card.  If I had lost it, I
should
have been out of funds, which amounted to two thousand zecchini.\footnote{G.~Casanova, \emx
{History of My Life,} vol.~IV, Chap.~7, trans.\
W.~R. Trask (New York: Harcourt-Brace, 1968), p.~124.}
\end{quote}

Even if there were no zeros on the roulette wheel so the game was perfectly
fair, the martingale system, or any other system for that matter, cannot make
the game into a favorable game.  The idea that a fair game remains fair and
unfair games remain unfair under gambling systems has been exploited by mathematicians 
to obtain important results in the study of probability.  We will introduce the general 
concept of a martingale in Chapter~\ref{chp 6}.

The word \emx {martingale}\index{martingale!origin of word} itself also has an interesting 
history.  The origin of the word is obscure.  A recent version of the \emx {Oxford English Dictionary} gives examples
of its use in the early 1600s and says that its probable origin is the reference
in Rabelais's\index{RABELAIS, F.} Book One, Chapter 20:

\begin{quote} 
Everything was done as planned, the only thing being that Gargantua
doubt\-ed if they would be able to find, right away, breeches suitable to the
old fellow's legs; he was doubtful, also, as to what cut would be most
becoming to the orator---the martingale, which has a draw-bridge effect in the seat,
to permit doing one's business more easily; the sailor-style, which affords more
comfort for the kidneys; the Swiss, which is warmer on the belly; or the
codfish-tail, which is cooler on the loins.\footnote{Quoted in the \emx
{Portable Rabelais,} ed.~S.~Putnam (New York: Viking, 1946), p.~113.}
\end{quote}

Dominic Lusinchi\index{LUSINCHI, D.} noted an earlier occurrence of the word martingale.  According to the French dictionary \emx{Le Petit Robert},
the word comes from the Proven\c{c}al word ``martegalo," which means ``from Martigues."  Martigues is a town due west of Merseille.  The dictionary gives the example of ``chausses \`{a} la martinguale" (which means Martigues-style breeches) and the date 1491.  

In modern uses martingale has several different meanings, all related to \emx
{holding down,} in addition to the gambling use.  For example, it is a strap
on a horse's harness used to hold down the horse's head, and also part of a
sailing rig used to hold down the bowsprit.

The Labouchere\index{Labouchere betting system} system described in Exercise~\ref{exer 1.1.9} 
is named after Henry du~Pre Labouchere\index{LABOUCHERE, H. du P.} 
(1831--1912), an English journalist and member of Parliament.  La\-bou\-chere attributed the 
system to Condorcet\index{CONDORCET, Le Marquis de}.  Condorcet (1743--1794) was a political 
leader during the time of the French revolution who was interested in applying probability 
theory to economics and politics.  For example, he calculated the probability that a jury using
majority vote will give a correct decision if each juror has the same probability of
deciding correctly.  His writings provided a wealth of ideas on how
probability might be applied to human affairs.\footnote{ Le Marquise de~Condorcet, 
\emx {Essai sur l'Application de l'Analyse \`a la Probabilit\'e d\`es D\'ecisions
Rendues a la Pluralit\'e des Voix} (Paris: Imprimerie Royale, 1785).}

\exercises
\begin{LJSItem}

\i\label{exer 1.1.1} Modify the program {\bf CoinTosses} to toss a coin $n$
times and print out after every 100 tosses the proportion of heads minus 1/2.  Do 
these numbers appear to approach 0 as $n$ increases?  Modify the program again to print
out, every 100 times, both of the following quantities:  the proportion of heads 
minus 1/2,  and the number of heads minus half the number of tosses.  Do these numbers 
appear to approach 0 as $n$ increases?

\i\label{exer 1.1.2} Modify the program {\bf CoinTosses} so that it tosses a
coin $n$ times
and records whether or not the proportion of heads is within .1 of .5 (i.e.,
between .4 and~.6).  Have your program repeat this experiment 100 times. 
About
how large must $n$ be so that approximately 95 out of 100 times the
proportion
of heads is between .4 and .6?

\i\label{exer 1.1.3} In the early 1600s, Galileo\index{GALILEO, G.} was asked to explain the
fact that,
although the number of triples of integers from 1 to 6 with sum 9 is the same
as the
number of such triples with sum 10, when three dice are rolled, a 9 seemed to
come up less often than a 10---supposedly in the experience of gamblers.

\begin{enumerate}
\item Write a program to simulate the roll of three dice a large number of
times and keep track of the proportion of times that the sum is 9 and the
proportion of times it is 10.
\item Can you conclude from your simulations that the gamblers were correct?
\end{enumerate}

\i\label{exer 1.1.4} In raquetball\index{raquetball}, a player continues to serve as long as
she is winning; a
point is scored only when a player is serving and wins the volley.  The first
player to win 21 points wins the game.  Assume that you serve first and have
a
probability .6 of winning a volley when you serve and probability .5 when
your
opponent serves.  Estimate, by simulation, the probability that you will win
a
game.

\i\label{exer 1.1.5} Consider the bet that all three dice will turn up
sixes at least once in $n$ rolls of three dice.  Calculate $f(n)$, the probability 
of at least one triple-six when three dice are rolled $n$ times.  
Determine the smallest value of $n$ necessary for a favorable bet that a 
triple-six will occur when three dice are rolled $n$ times.  (DeMoivre would
say it
should be about $216\log 2 = 149.7$ and so would answer 150---see
Exercise~\ref{sec 1.2}.\ref{exer 1.2.16}.
Do you agree with him?)                           

\i\label{exer 1.1.6} In Las Vegas, a roulette\index{roulette} wheel has 38 slots
numbered 0, 00, 1, 2, \dots, 36.  The 0 and 00 slots are green and half of
the
remaining 36 slots are red and half are black.  A croupier spins the wheel
and
throws in an ivory ball.  If you bet 1 dollar on red, you win 1 dollar if the
ball stops in a red slot and otherwise you lose 1 dollar.  Write a program to
find the total winnings for a player who makes 1000 bets on red.

\i\label{exer 1.1.7} Another form of bet for roulette is to bet that a
specific number (say 17) will turn up.  If the ball stops on your number, you
get your dollar back plus 35 dollars.  If not, you lose your dollar.  Write a
program that will plot your winnings when you make 500 plays of roulette at
Las
Vegas, first when you bet each time on red (see Exercise~\ref{exer 1.1.6}), 
and then for a second visit to Las Vegas when you make 500 plays betting each
time
on the number 17.  What differences do you see in the graphs of your winnings
on
these two occasions?

\i\label{exer 1.1.8} An astute student noticed that, in our simulation of
the game of heads
or tails (see Example~\ref{exam 1.3}), the proportion of times the
player is always in the lead is very close to the proportion of times that
the
player's total winnings end up~0.  Work out these probabilities by
enumeration
of all cases for two tosses and for four tosses, and see if you think that
these probabilities are, in fact, the same.

\i\label{exer 1.1.9} The \emx {Labouchere system}\index{Labouchere betting system} 
for roulette is played
as follows.  Write down a list of numbers, usually 1, 2, 3, 4.  Bet the sum
of
the first and last, $1 + 4 = 5$, on red.  If you win, delete the first and
last
numbers from your list.  If you lose, add the amount that you last bet to the
end of your list.  Then use the new list and bet the sum of the first and
last
numbers (if there is only one number, bet that amount).  Continue until your
list becomes empty.  Show that, if this happens, you win the sum, $1 + 2 + 3
+ 4
= 10$, of your original list.  Simulate this system and see if you do always
stop and, hence, always win.  If so, why is this not a foolproof gambling
system?

\i\label{exer 1.1.10} Another well-known gambling system is the \emx
{martingale doubling system}\index{martingale betting system}.  
Suppose that you are betting on red to turn up
in roulette.  Every time you win, bet 1 dollar next time.  Every time you
lose,
double your previous bet.  Suppose that you use this system until you have won at least 5
dollars or you have lost more than 100 dollars.  Write a program to simulate
this and play it a number of times and see how you do.  In his book
\emx
{The Newcomes,} W.~M.~Thackeray\index{THACKERAY, W. M.} remarks 
``You have not played as yet?  Do not do
so; above all avoid a martingale if you do."\footnote{W.~M.~Thackerey, \emx
{The
Newcomes} (London: Bradbury and Evans, 1854--55).}  Was this good advice?

\i\label{exer 1.1.11} Modify the program {\bf HTSimulation} so that it
keeps track of the
maximum of Peter's winnings in each game of 40 tosses.  Have your program
print
out the proportion of times that your total winnings take on values $0,\ 2,\ 
4,\ \dots,\ 40$.  Calculate the corresponding exact probabilities for games of
two
tosses and four tosses.

\i\label{exer 1.1.12} 
In an upcoming national election for the President of the United States, a
pollster plans to predict the winner of the popular vote by taking a random
sample
of 1000 voters and declaring that the winner will be the one obtaining the
most
votes in his sample. Suppose that 48 percent of the voters plan to vote for
the
Republican candidate and 52 percent plan to vote for the Democratic
candidate.  
To get some idea of how reasonable the pollster's plan is, write a program to
make this
prediction by simulation.  Repeat the simulation 100 times and see how many times the
pollster's prediction would come true.  Repeat your experiment, assuming now that 49 percent of
the population plan to vote for the Republican candidate; first with a sample of 1000 and then
with a sample of 3000.  (The Gallup\index{Gallup Poll} Poll uses about 3000.)  (This idea
is discussed further in Chapter~\ref{chp 9}, Section~\ref{sec 9.1}.)

\i\label{exer 1.1.13} The psychologist Tversky\index{TVERSKY, A.} and his
colleagues\footnote{See
K.~McKean, ``Decisions, Decisions," \emx {Discover,} June 1985,
pp.~22--31. 
Kevin McKean, Discover Magazine, \copyright 1987 Family Media, Inc. 
Reprinted
with permission.  This popular article reports on the work of Tverksy et.\ al.\ 
in
\emx {Judgement Under Uncertainty: Heuristics and Biases} (Cambridge:
Cambridge
University Press, 1982).} say that about four out of five people will answer
(a) to the following question:

A certain town is served by two hospitals\index{hospital}.  In the larger hospital about
45 babies\index{babies} are born each day, and in the smaller hospital 15 babies are born
each
day.  Although the overall proportion of boys is about 50 percent, the actual
proportion at either hospital may be more or less than 50 percent on any day. 
At the end of a year, which hospital will have the greater number of days on
which more than 60 percent of the babies born were boys?
\begin{enumerate}
\item the large hospital
\item the small hospital
\item neither---the number of days will be about the same.
\end{enumerate}
Assume that the probability that a baby is a boy is .5 (actual estimates make
this more like .513).  Decide, by simulation, what the right answer is to the
question.  Can you suggest why so many people go wrong?

\i\label{exer 1.1.14} You are offered the following game.  A fair coin will be
tossed until the
first time it comes up heads.  If this occurs on the $j$th toss you are paid
$2^j$ dollars.  You are sure to win at least 2 dollars so you should be
willing
to pay to play this game---but how much?  Few people would pay as much as 10
dollars to play this game.  See if you can decide, by simulation, a
reasonable
amount that you would be willing to pay, per game, if you will be allowed
to make a large number of plays of the game.
Does the amount that you would be willing to pay per game depend upon the
number of plays that you will be allowed?

\i\label{exer 1.1.15} Tversky and his colleagues\footnote{ibid.} studied
the records of 48
of the Philadelphia 76ers\index{Philadelphia 76ers} basketball games in 
the 1980--81 season to see if a player
had times when he was hot and every shot went in, and other times when he was
cold and barely able to hit the backboard.  The players estimated that they
were about 25 percent more likely to make a shot after a hit than after a
miss.  In fact, the opposite was true---the 76ers were 6 percent more likely
to
score after a miss than after a hit.  Tversky reports that the number of hot
and cold streaks was about what one would expect by purely random effects. 
Assuming that a player has a fifty-fifty chance of making a shot and makes 20
shots a game, estimate by simulation the proportion of the games in which the
player will have a streak of 5 or more hits.

\i\label{1.1.16} Estimate, by simulation, the average number of children
there would 
be in a family if all people had children until they had a boy.  Do the same
if all
people had children until they had at least one boy and at least one girl. 
How
many more children would you expect to find under the second scheme than
under the first in 100{,}000 families?  (Assume that boys and girls are
equally
likely.)

\putfig{4truein}{PSfig1-6}{Random walk.}{fig 1.5}

\i\label{exer 1.1.17} Mathematicians have been known to get some of the
best ideas while
sitting in a cafe, riding on a bus, or strolling in the park.  In the early
1900s the famous mathematician George P\'olya\index{P\'OLYA, G.} lived in a hotel near the 
woods in Zurich.  He liked to walk in the woods and think about mathematics.  
P\'olya describes the following incident:

\begin{quote}
At the hotel there lived also some students with whom I usually took my meals and had friendly
relations.  On a certain day one of them expected the visit of his fianc\'ee, what (sic) I knew, but 
I did not foresee that he and his fianc\'ee would also set out for a stroll in the woods, and then
suddenly I met them there.  And then I met them the same morning repeatedly, I don't remember how many
times, but certainly much too often and I felt embarrassed:  It looked as if I was snooping around
which was, I assure you, not the case.\footnote{G.~P\'olya, ``Two Incidents," \emx
{Scientists at Work: Festschrift in Honour of Herman Wold,} ed. T. Dalenius, G. Karlsson, and S.
Malmquist (Uppsala: Almquist \& Wiksells Boktryckeri AB, 1970).}\index{P\'OLYA, G.}
\end{quote}
This set him to thinking about whether random walkers were destined to meet.
\par
P\'olya considered random walkers in one, two, and three dimensions\index{random walk!in $n$ 
dimensions}.  In one dimension, he envisioned the walker on a very long street.  At each
intersection the walker flips a fair coin to decide which direction to walk
next (see Figure~\ref {fig 1.5}a).  In two dimensions, the walker is walking
on a grid of streets, and at each intersection he chooses one of the four
possible directions with equal probability (see Figure~\ref {fig 1.5}b).  In
three dimensions (we might better speak of a random climber), the walker
moves on a three-dimensional grid, and at each intersection there are now six
different directions that the walker may choose, each with equal probability
(see Figure~\ref {fig 1.5}c).
\par
The reader is referred to Section~\ref{sec 12.1}, where this and related problems are 
discussed.

\begin{enumerate}

\item Write a program to simulate a random walk in one dimension starting at
0.  Have your program print out the lengths of the times between returns to
the
starting point (returns to 0).  See if you can guess from this simulation the
answer to the following question: Will the walker always return to his
starting
point eventually or might he drift away forever?

\item The paths of two walkers in two dimensions who meet after $n$ steps
can be considered to be a single path that starts at $(0,0)$ and returns to
$(0,0)$ after $2n$ steps.  This means that the probability that two random
walkers in two dimensions meet is the same as the probability that a single
walker in two dimensions ever returns to the starting point.  Thus the
question of whether two walkers are sure to meet is the same as the question
of
whether a single walker is sure to return to the starting point.

Write a program to simulate a random walk in two dimensions and see if you
think that the walker is sure to return to $(0,0)$.  If so, P\'olya would be
sure
to keep meeting his friends in the park.  Perhaps by now you have conjectured 
the answer to the question: Is a random walker in one or two dimensions sure
to
return to the starting point?  P\'olya answered this question for dimensions
one, two, and three.  He established the remarkable result that the answer is
\emx {yes} in one and two dimensions and \emx {no} in three dimensions.

\item Write a program to simulate a random walk in three dimensions and see
whether, from this simulation and the results of (a) and (b), 
you could have guessed P\'olya's result.  
\end{enumerate}

\end{LJSItem}

\section{Discrete Probability Distributions}\label{sec 1.2}  

In this book we shall study many different experiments from a probabilistic
point of view.  What is involved in this study will become evident as the
theory is developed and examples are analyzed.  However, the overall idea can
be described and illustrated as follows: to each experiment that we consider
there will be associated a random variable, which represents the outcome of
any particular experiment.  The set of possible outcomes is called the \emx
{sample space}\index{sample space}.  In the first part of  this section, we will consider the
case  where the experiment has only finitely many possible outcomes, i.e., the sample space is
finite.  We will then generalize to the case that the  sample space is either finite or
countably infinite.  This leads us to the following definition.

\subsection*{Random Variables and Sample Spaces}

\begin{definition}\label{def 1.1} 
Suppose we have an experiment whose outcome depends on chance.
We represent the outcome of the experiment by
a capital Roman letter, such as $X$, called a \emx {random variable}\index{random variable}.
The \emx {sample space}\index{sample space} of the experiment is the set of all possible 
outcomes.  If the sample space is either finite or countably infinite, the random variable
is said to be \emx {discrete}.\index{random variable!discrete}
\end{definition}
We generally denote a sample space by the capital Greek letter $\Omega$.  As
stated above, in the correspondence between an experiment and the
mathematical theory by which it is studied, the sample space $\Omega$ corresponds to the
set of possible outcomes of the experiment.    
\par
We now make two additional definitions.  These are subsidiary to the
definition of sample space and serve to make precise some of the common terminology used
in conjunction with sample spaces.  First of all, we define the elements of a sample space
to be \emx {outcomes}\index{outcome}.  Second, each subset of a sample space is defined to be 
an \emx {event}\index{event}. Normally, we shall denote outcomes by lower case letters and
events  by capital letters.
 
\begin{example}\label{exam 1.5}
A die is rolled once.  We let $X$ denote the outcome of this experiment.  Then the sample
space for this experiment is the 6-element set
$$
\Omega = \{1,2,3,4,5,6\}\ ,
$$
where each outcome $i$, for~$i = 1$, \dots, 6, corresponds to the number of
dots on the face which turns up.  The event
$$
E = \{2,4,6\}
$$
corresponds to the statement that the result of the roll is an even number.  The event $E$
can also be described by saying that $X$ is even.  Unless there is reason to believe the die is
loaded, the natural assumption is that every outcome is equally likely.  Adopting this
convention means that we assign a probability of 1/6 to each of the six outcomes, i.e., $m(i) =
1/6$, for $1 \le i \le 6$.
\end{example}

\subsection*{Distribution Functions}

We next describe the assignment of probabilities.  The definitions are
motivated by the example above, in which we assigned to each outcome of the
sample space a nonnegative number such that the sum of the numbers assigned
is equal to 1.

\begin{definition}\label{def 1.2} 
Let $X$ be a random variable which denotes the value of the outcome of a certain experiment,
and assume that this experiment has only finitely many possible outcomes.  Let $\Omega$ be the
sample space of the experiment (i.e., the set of all possible values of $X$, or equivalently,
the set of all possible outcomes of the experiment.)  A \emx {distribution
function}\index{distribution function} for $X$ is a real-valued
function $m$ whose domain is $\Omega$ and which satisfies:
\begin{enumerate}
\item $m(\omega) \geq 0\ , \qquad$for all $\,\,\omega\in\Omega\ $, and  
\medbreak
\item $\sum\limits_{\omega \in \Omega} m(\omega) = 1\ $.  
\end{enumerate}
For any subset $E$ of $\Omega$, we define the \emx {probability}\index{probability!of an event}
of
$E$ to be the number $P(E)$ given by
$$
P(E) = \sum_{\omega\in E} m(\omega)\ .
$$
\end{definition}
 
\begin{example}\label{exam 1.6}
Consider an experiment in which a coin is tossed twice.  Let $X$ be the random variable which
corresponds to this experiment.  We note that there are several ways to record the outcomes of
this experiment.  We could, for example, record the two tosses, in the order in which they
occurred.  In this case, we have $\Omega = $\{HH,HT,TH,TT\}.  We could also record the outcomes
by simply noting the number of heads that appeared.  In this case, we have $\Omega =
$\{0,1,2\}.  Finally, we could record the two outcomes, without regard to the order in
which they occurred.  In this case, we have $\Omega = $\{HH,HT,TT\}.
\par
We will use, for the moment, the first of the sample spaces given above.  We will assume that
all four outcomes are equally likely, and define the distribution function $m(\omega)$ by
$$
m(\mbox{HH}) = m(\mbox{HT}) = m(\mbox{TH}) = m(\mbox{TT}) = \frac14\ .
$$
\par
Let $E = $\{HH,HT,TH\} be the event that at least one head comes up.  Then, the probability
of $E$ can be calculated as follows: 
\begin{eqnarray*}
P(E) &=& m(\mbox{HH}) + m(\mbox{HT}) + m(\mbox{TH}) \\
     &=& \frac14 + \frac14 + \frac14 = \frac34\ .
\end{eqnarray*}

Similarly, if $F = $\{HH,HT\} is the event that heads comes up on the first toss, then we have
\begin{eqnarray*}
P(F) &=& m(\mbox{HH}) + m(\mbox{HT}) \\
     &=& \frac14 + \frac14 = \frac12\ .
\end{eqnarray*}
\end{example}
 
\begin{example}\label{1.8}(Example \ref{exam 1.5} continued)
The sample space for the experiment in which the die is rolled is the
6-element
set $\Omega = \{1,2,3,4,5,6\}$.  We assumed that the die was fair, and we chose the
distribution function defined by
$$
m(i) = \frac16, \qquad {\rm{for}}\,\, i = 1, \dots, 6\ .
$$
If $E$ is the event that the result of the roll is an even number, then $E =
\{2,4,6\}$ and
\begin{eqnarray*}
P(E) &=& m(2) + m(4) + m(6) \\
     &=& \frac16 + \frac16 + \frac16 = \frac12\ .
\end{eqnarray*}
\end{example}
\par
Notice that it is an immediate consequence of the above definitions that, for
every $\omega \in \Omega$,
$$
P(\{\omega\}) = m(\omega)\ .
$$
That is, the probability of the elementary event $\{\omega\}$, consisting of
a single outcome $\omega$, is equal to the value $m(\omega)$ assigned to the
outcome $\omega$ by the distribution function.
 
\begin{example}\label{exam 1.7}
Three people, A,~B, and~C, are running for the same office, and we assume
that
one and only one of them wins.  The sample space may be taken as the
3-element
set $\Omega = $\{A,B,C\} where each element corresponds to the outcome of
that candidate's winning.  Suppose that A and B have the same chance of
winning, but that C has only 1/2 the chance of A or B.  Then we assign
$$
m(\mbox{A}) = m(\mbox{B}) = 2m(\mbox{C})\ .
$$
Since
$$
m(\mbox{A}) + m(\mbox{B}) + m(\mbox{C}) = 1\ ,
$$
we see that
$$
2m(\mbox{C}) + 2m(\mbox{C}) + m(\mbox{C}) = 1\ ,
$$
which implies that $5m(\mbox{C}) = 1$.  Hence,
$$
m(\mbox{A}) = \frac25\ , \qquad m(\mbox{B}) = \frac25\ , \qquad m(\mbox{C}) =
\frac15\ .
$$
Let $E$ be the event that either A or C wins.  Then $E = $\{A,C\}, and
$$
P(E) = m(\mbox{A}) + m(\mbox{C}) = \frac25 + \frac15 = \frac35\ .
$$
\end{example}
\par
In many cases, events can be described in terms of other events through the use of
the standard constructions of set theory.  We will briefly review the definitions
of these constructions.  The reader is referred to Figure~\ref{fig 1.6} for Venn diagrams
which illustrate these constructions. 
\par
Let $A$ and $B$ be two sets.  Then the union of $A$ and $B$ is the set
$$A \cup B = \{x\,|\, x \in A\ \mbox{or}\ x \in B\}\ .$$
The intersection of $A$ and $B$ is the set
$$A \cap B = \{x\,|\, x \in A\ \mbox{and}\ x \in B\}\ .$$
The difference of $A$ and $B$ is the set
$$A - B = \{x\,|\, x \in A\ \mbox{and}\ x \not \in B\}\ .$$
The set $A$ is a subset of $B$, written 
$A \subset B$, if every element of $A$ is also an element of $B$.  Finally,
the complement of $A$ is the set
$$\tilde A = \{x\,|\, x \in \Omega\ \mbox{and}\ x \not \in A\}\ .$$
\par
The reason that these constructions are important is that it is typically the
case that
complicated events described in English can be broken down into simpler
events using
these constructions.  For example, if $A$ is the event that ``it will snow
tomorrow and it
will rain the next day," $B$ is the event that ``it will snow tomorrow," and
$C$ is
the event that ``it will rain two days from now," then $A$ is the intersection
of the events
$B$ and $C$.  Similarly, if $D$ is the event that ``it will snow tomorrow or
it will rain the next day," then $D = B \cup C$.  (Note that care must be
taken here,
because sometimes the word ``or" in English means that exactly one of the two
alternatives
will occur.  The meaning is usually clear from context.  In this book, we
will always use
the word ``or" in the inclusive sense, i.e., $A$ or $B$ means that at least
one of the two
events $A$, $B$ is true.)  The event $\tilde B$ is the event that ``it will
not snow
tomorrow."  Finally, if $E$ is the event that ``it will snow tomorrow but it
will not rain 
the next day," then $E = B - C$. 

\putfig{5truein}{PSfig1-7}{Basic set operations.}{fig 1.6} 


\subsection*{Properties}
\nopagebreak
\begin{theorem}\label{thm 1.1}
The probabilities assigned to events by a distribution function\index{distribution function!
properties of} on a sample space
$\Omega$ 
satisfy the following properties: 
\begin{enumerate}
\item  $P(E) \geq 0$ for every $E \subset \Omega\ $.
\item  $P(\Omega) = 1\ $.
\item  If $E \subset F \subset \Omega$, then $P(E) \leq P(F)\ $.
\item  If $A$ and $B$ are \emx {disjoint} subsets of $\Omega$, then $P(A
\cup B)
= P(A) + P(B)\ $.
\item  $P(\tilde A) = 1 - P(A)$ for every $A \subset \Omega\ $.
\end{enumerate}
\proof
For any event $E$ the probability $P(E)$ is determined from the distribution $m$
by
$$
P(E) = \sum_{\omega \in E} m(\omega)\ ,
$$
for every $E \subset \Omega$.
Since the function $m$ is nonnegative, it follows that $P(E)$ is also
nonnegative.  Thus, Property 1 is true.  
\par
Property 2 is proved by the equations
$$
P(\Omega) = \sum_{\omega \in \Omega} m(\omega) = 1\ .
$$
\par
Suppose that $E \subset F \subset \Omega$.  Then every element $\omega$ that
belongs to $E$ also belongs to $F$.  Therefore,
$$
\sum_{\omega \in E} m(\omega) \leq \sum_{\omega \in F} m(\omega)\ ,
$$
since each term in the left-hand sum is in the right-hand sum, and all the terms 
in both sums are non-negative.  This implies that
$$P(E) \le P(F)\ ,$$
and Property 3 is proved.  
\par
Suppose next that $A$ and $B$ are disjoint subsets of
$\Omega$.  Then every element $\omega$ of $A \cup B$ lies either in $A$ and
not
in $B$ or in $B$ and not in $A$.  It follows that
$$\begin{array}{ll}
P(A \cup B) &= \sum_{\omega \in A \cup B} m(\omega) = \sum_{\omega \in A}
m(\omega) + \sum_{\omega \in B} m(\omega) \\
            &                             \\
            &= P(A) + P(B)\ ,
\end{array}
$$
and Property 4 is proved.  
\par
Finally, to prove Property 5, consider the disjoint union
$$
\Omega = A \cup \tilde A\ .
$$
Since $P(\Omega) = 1$, the property of disjoint additivity (Property 4) implies
that
$$
1 = P(A) + P(\tilde A)\ ,
$$
whence $P(\tilde A) = 1 - P(A)$.
\end{theorem}
\par
It is important to realize that Property~4 in Theorem~\ref{thm 1.1} 
can be extended to more than two sets.  The general finite additivity
property\index{finite additivity property} is given by the following
theorem.\hfil\break
\begin{theorem}\label{thm 1.1.5}
If $A_1$, \dots, $A_n$ are pairwise disjoint subsets of $\Omega$
(i.e., no two of the $A_i$'s have an element in common), then
$$
P(A_1 \cup \cdots \cup A_n) = \sum_{i = 1}^n P(A_i)\ .
$$
\proof
Let $\omega$ be any element in the union
$$A_1 \cup \cdots \cup A_n\ .$$
Then $m(\omega)$ occurs exactly once on each side of the equality in the
statement of the theorem.
\end{theorem} 
\par
We shall often use the following consequence of the above theorem.
\begin{theorem}\label{thm 1.2}
 Let $A_1$, \dots, $A_n$ be pairwise disjoint events with $\Omega = A_1
\cup \cdots \cup A_n$, and let $E$ be any event.  Then
$$
P(E) = \sum_{i = 1}^n P(E \cap A_i)\ .
$$
\proof
The sets $E \cap A_1$, \dots, $E \cap A_n$ are pairwise disjoint, and their 
union is the set $E$.  The result now follows from Theorem~\ref{thm 1.1.5}.
\end{theorem}
\begin{corollary}\label{cor 1.1}
For any two events $A$ and $B$,
$$
P(A) = P(A \cap B) + P(A \cap \tilde B)\ .
$$
\end{corollary}
\par
Property~4 can be generalized in another way.  Suppose that $A$ and $B$ are
subsets of $\Omega$ which are not necessarily disjoint.  Then:\hfil\break
\begin{theorem}\label{thm 1.2.5}
If $A$ and $B$ are subsets of $\Omega$, then
\begin{equation}
P(A \cup B) = P(A) + P(B) - P(A \cap B)\ .
\label{eq 1.1}
\end{equation}                                                        
\proof
The left side of Equation~\ref{eq 1.1} is the sum of $m(\omega)$ for     
$\omega$ in either $A$ or $B$.  We must show that the right side of 
Equation~\ref{eq 1.1} also adds $m(\omega)$ for $\omega$ in $A$ or $B$. 
If $\omega$ is in exactly one of the two sets, then it is counted in only one
of the three terms on the right side of Equation~\ref{eq 1.1}.                
If it is in both $A$ and $B$, it is added twice from the calculations of
$P(A)$ and $P(B)$
and subtracted once for $P(A \cap B)$.  Thus it is counted exactly once by
the right side.  Of course, if $A \cap B = \emptyset$, then Equation~\ref{eq 1.1} 
reduces to Property~4.
(Equation~\ref{eq 1.1} can also be generalized; see Theorem~\ref{thm 3.10}.)               
\end{theorem}


\subsection*{Tree Diagrams}

\begin{example}\label{exam 1.8}
Let us illustrate the properties of probabilities of events in terms of three
tosses of a coin.  When we have an experiment which takes place in stages
such
as this, we often find it convenient to represent the outcomes by a \emx {tree
diagram}\index{tree diagram} as shown in Figure~\ref {fig 1.7}.

\putfig{4truein}{PSfig1-8}{Tree diagram for three tosses of a coin.}{fig 1.7} 

A \emx {path} through the tree corresponds to a possible outcome of the
experiment.  For the case of three tosses of a coin, we have eight paths
$\omega_1$,~$\omega_2$, \dots,~$\omega_8$ and, assuming each outcome to be
equally likely, we assign equal weight, 1/8, to each path.  Let $E$ be the
event
``at least one head turns up."  Then $\tilde E$ is the event ``no heads turn
up."  This event occurs for only one outcome, namely, $\omega_8 = \mbox{TTT}$. 
Thus, $\tilde E = \{\mbox{TTT}\}$ and we have
$$
P(\tilde E) = P(\{\mbox{TTT}\}) = m(\mbox{TTT}) = \frac18\ .
$$
By Property~5 of Theorem~\ref{thm 1.1},
$$
P(E) = 1 - P(\tilde E) = 1 - \frac18 = \frac78\ .
$$
Note that we shall often find it is easier to compute the probability that an
event does not happen rather than the probability that it does.  We then use
Property~5 to obtain the desired probability.

Let $A$ be the event ``the first outcome is a head," and $B$ the event ``the
second outcome is a tail."  By looking at the paths in Figure~\ref{fig 1.7},
we see that
$$
P(A) = P(B) = \frac12\ .
$$
Moreover, $A \cap B = \{\omega_3,\omega_4\}$, and so $P(A \cap B) = 1/4.$ 
Using Theorem~\ref{thm 1.2.5}, we obtain
\begin{eqnarray*}
P(A \cup B) & = & P(A) + P(B) - P(A \cap B) \\
& = & \frac 12 + \frac 12 - \frac 14 = \frac 34\ .
\end{eqnarray*}
Since $A \cup B$ is the 6-element set,
$$
A \cup B = \{\mbox{HHH,HHT,HTH,HTT,TTH,TTT}\}\ ,
$$
we see that we obtain the same result by direct enumeration.
\end{example}
\par
In our coin tossing examples and in the die rolling example, we have assigned
an equal probability to each possible outcome of the experiment. 
Corresponding
to this method of assigning probabilities, we have the following definitions.

\subsection*{Uniform Distribution}

\begin{definition}\label{def 1.3} 
The \emx {uniform distribution}\index{uniform distribution} on a sample space $\Omega$ 
containing $n$ elements is the function $m$ defined by
$$
m(\omega) = \frac1n\ ,
$$
for every $\omega \in \Omega$.
\end{definition}
It is important to realize that when an experiment is analyzed to describe
its
possible outcomes, there is no single correct choice of sample space.  For
the
experiment of tossing a coin twice in Example~\ref{exam 1.1},
we selected the 4-element set $\Omega = $\{HH,HT,TH,TT\} as a sample space
and
assigned the uniform distribution function.  These choices are certainly
intuitively natural.  On the other hand, for some purposes it may be more
useful
to consider the 3-element sample space $\bar\Omega = \{0,1,2\}$ in which
0
is the outcome ``no heads turn up," 1 is the outcome ``exactly one head
turns up," and 2 is the outcome ``two heads turn up."  The distribution function
$\bar m$ on $\bar\Omega$ defined by the equations
$$
\bar m(0) = \frac14\ ,\qquad \bar m(1) = \frac12\ , \qquad \bar
m(2) =
\frac14
$$
is the one corresponding to the uniform probability density on the original
sample space $\Omega$.  Notice that it is perfectly possible to choose a
different distribution function.  For example, we may consider the uniform
distribution function on $\bar\Omega$, which is the function $\bar q$
defined by
$$
\bar q(0) = \bar q(1) = \bar q(2) = \frac13\ .
$$
Although $\bar q$ is a perfectly good distribution function, it is not
consistent with observed data on coin tossing.
\begin{example}\label{exam 1.11}
Consider the experiment that consists of rolling a pair of dice.  We take as
the
sample space $\Omega$ the set of all ordered pairs $(i,j)$ of
integers with $1\leq i\leq 6$ and $1\leq j\leq 6$.  Thus,
$$
\Omega = \{\,(i,j):1\leq i,\space j \leq 6\,\}\ .
$$
(There is at least one other ``reasonable" choice for a sample space, namely the
set of all unordered pairs of integers, each between 1 and 6.  For a discussion
of why we do not use this set, see Example~\ref{exam 3.15}.)
To determine the size of $\Omega$, we note that there are six choices for $i$,
and for each choice of $i$ there are six choices for $j$, leading to 36 different
outcomes.  Let us assume that the dice are not loaded.  In mathematical terms, this
means
that we assume that each of the 36 outcomes is equally likely, or
equivalently,
that we adopt the uniform distribution function on $\Omega$ by setting
$$
m((i,j)) = \frac1{36},\qquad 1\leq i,\space j \leq 6\ .
$$
What is the probability of getting a sum of 7 on the roll of two dice---or
getting a sum of 11?  The first event, denoted by $E$, is the subset
$$
E = \{(1,6),(6,1),(2,5),(5,2),(3,4),(4,3)\}\ .
$$
A sum of 11 is the subset $F$ given by
$$
F = \{(5,6),(6,5)\}\ .
$$
Consequently,
$$\begin{array}{ll}
P(E) = &\sum_{\omega \in E} m(\omega) = 6\cdot\frac1{36} = \frac16\ , \\
       &                                                            \\
P(F) = &\sum_{\omega \in F} m(\omega) = 2\cdot\frac1{36} = \frac1{18}\ .
\end{array}
$$

What is the probability of getting neither \emx {snakeeyes}\index{snakeeyes} (double ones)
nor
\emx {boxcars}\index{boxcars} (double sixes)?  The event of getting either one of these two
outcomes is the set
$$
E = \{(1,1),(6,6)\}\ .
$$
Hence, the probability of obtaining neither is given by
$$
P(\tilde E) = 1 - P(E) = 1 - \frac2{36} = \frac{17}{18}\ .
$$
\end{example}
\par
In the above coin tossing and the dice rolling experiments, we have assigned
an equal probability to each outcome.  That is, in each example, we have chosen
the uniform distribution function.  These are the natural choices provided the
coin is a fair one and the dice are not loaded.  However, the decision as
to which distribution function to select to describe an experiment is \emx {not} a part
of the basic mathematical theory of probability.  The latter begins only when
the
sample space and the distribution function have already been defined.

\subsection*{Determination of Probabilities}

It is important to consider ways in which probability distributions are
determined in practice.  One way is by \emx {symmetry.}  For the case of the
toss of a coin, we do not see any physical difference between the two sides
of
a coin that should affect the chance of one side or the other turning up. 
Similarly, with an ordinary die there is no essential difference between any
two sides of the die, and so by symmetry we assign the same probability for
any
possible outcome.  In general, considerations of symmetry often suggest the
uniform distribution function.  Care must be used here.  We should not always assume
that,
just because we do not know any reason to suggest that one outcome is more
likely than another, it is appropriate to assign equal probabilities.  For
example, consider the experiment of guessing the sex of a newborn child.  It
has been observed that the proportion of newborn children who are boys is
about
.513.  Thus, it is more appropriate to assign a distribution function which
assigns probability .513 to the outcome \emx {boy} and probability .487 to
the
outcome \emx {girl} than to assign probability 1/2 to each outcome.  This is
an
example where we use statistical observations to determine probabilities. 
Note
that these probabilities may change with new studies and may vary from
country
to country.  Genetic engineering might even allow an individual to influence
this probability for a particular case.

\subsection*{Odds}\index{odds}

Statistical estimates for probabilities are fine if the experiment under
consideration can be repeated a number of times under similar circumstances. 
However, assume that, at the beginning of a football season, you want to
assign
a probability to the event that Dartmouth\index{Dartmouth} will beat Harvard\index{Harvard}.  
You really do not
have data that relates to this year's football team.  However, you can
determine your own personal probability by seeing what kind of a bet you
would
be willing to make.  For example, suppose that you are willing to make a 1
dollar bet giving 2 to 1 odds that Dartmouth will win.  Then you are willing
to
pay 2 dollars if Dartmouth loses in return for receiving 1 dollar if
Dartmouth
wins.  This means that you think the appropriate probability for Dartmouth
winning is 2/3.

Let us look more carefully at the relation between odds and probabilities.
Suppose that we make a bet at $r$ to $1$ odds that an event $E$ occurs. 
This means that we  think that it is $r$ times as likely that $E$ will occur
as that $E$
will not occur.  In general, $r$ to $s$ odds will be taken to mean the same
thing as
$r/s$ to 1, i.e., the ratio between the two numbers is the only quantity of
importance when stating odds.  
\par
Now if it is $r$ times as likely that $E$ will occur as that $E$ will
not occur, then the probability that $E$ occurs must be $r/(r+1)$, since we
have
$$P(E) = r\,P(\tilde E)$$
and
$$P(E) + P(\tilde E) = 1\ .$$
In general, the statement that the odds are $r$ to $s$ in favor of an event
$E$
occurring is equivalent to the statement that 
\begin{eqnarray*}
P(E) & = & \frac{r/s}{(r/s) + 1}\\
& = & \frac {r}{r+s}\ .
\end{eqnarray*}
If we let $P(E) = p$, then the above equation can easily be solved for $r/s$
in terms of
$p$; we obtain $r/s = p/(1-p)$.  We summarize the above discussion in the
following definition. 

\begin{definition}\label{def 1.4} 
If $P(E) = p$, the \emx {odds} in favor of the event $E$ occurring are $r :
s$ ($r$ to $s$) where $r/s = p/(1-p)$.  If $r$ and $s$ are given, then $p$ 
can be found by using the equation $p = r/(r+s)$.
\end{definition}
\begin{example}(Example~\ref{exam 1.7} continued)\label{exam 1.9}
In Example~\ref{exam 1.7} we assigned probability 1/5 to the event that
candidate C wins the race.  Thus the odds in favor of C winning are $1/5 :
4/5$.  These odds could equally well have been written as $1 : 4$, $2
: 8$, and so forth.  A bet that C wins is fair if we receive 4 dollars
if C wins and pay 1 dollar if C loses.
\end{example}

\subsection*{Infinite Sample Spaces}\index{sample space!infinite}

If a sample space has an infinite number of points, then the way that a
distribution function
is defined depends upon whether or not the sample space is countable.  A
sample
space is \emx {countably infinite}\index{sample space!countably infinite} 
if the elements can be counted, i.e., can be
put in one-to-one correspondence with the positive integers, and \emx {uncountably
infinite} otherwise.  Infinite sample spaces require new concepts in
general \choice{}{(see Chapter~\ref{chp 2})}, 
but countably infinite spaces do not.  If
$$
\Omega = \{\omega_1,\omega_2,\omega_3, \dots\}
$$
is a countably infinite sample space, then a distribution function is defined
exactly as in Definition \ref{def 1.2}, except that the sum must now
be a \emx {convergent} infinite sum. Theorem~\ref{thm 1.1}
is still true, as are its extensions Theorems~\ref{thm 1.1.5} and \ref{thm 1.2.5}.  
One thing we cannot do on a countably infinite sample space that we could do on 
a finite sample space is
to define a \emx {uniform} distribution function as in Definition~\ref{def
1.3}.
You are asked in Exercise~\ref{exer 1.2.19} to explain why this is not
possible.
\begin{example}\label{exam 1.10}
A coin is tossed until the first time that a head turns up.  Let the outcome
of
the experiment, $\omega$, be the first time that a head turns up.  Then the
possible outcomes of our experiment are
$$
\Omega = \{1,2,3, \dots\}\ .
$$
Note that even though the coin could come up tails every time we have not
allowed for this possibility.  We will explain why in a moment.  The
probability that heads comes up on the first toss is 1/2.  The probability
that
tails comes up on the first toss and heads on the second is 1/4.  The
probability that we have two tails followed by a head is 1/8, and so forth. 
This suggests assigning the distribution function $m(n) = 1/2^n$ for $n = 1$,~2, 3, \dots. 
To
see that this is a distribution function we must show that
$$
\sum_{\omega} m(\omega) = \frac12 + \frac14 + \frac18 + \cdots = 1\ .
$$
That this is true follows from the formula for the sum of a geometric series\index{geometric
series},
$$
1 + r + r^2 + r^3 + \cdots = \frac1{1-r}\ ,
$$
or
\begin{equation}
r + r^2 + r^3 + r^4 + \cdots = \frac r{1-r}\ ,
\label{eq 1.2}
\end{equation} 
for $-1 < r < 1$.
\par
Putting $r = 1/2$, we see that we have a probability of 1 that the coin
eventually turns up heads.  The possible outcome of tails every time has to
be
assigned probability 0, so we omit it from our sample space of possible
outcomes.

Let $E$ be the event that the first time a head turns up is after an even
number of tosses.  Then
$$
E = \{2,4,6,8, \dots\}\ ,
$$
and
$$
P(E) = \frac14 + \frac1{16} + \frac1{64} +\cdots\ .
$$
Putting $r = 1/4$ in Equation~\ref{eq 1.2} see that              
$$
P(E) = \frac{1/4}{1 - 1/4} = \frac13\ .
$$
Thus the probability that a head turns up for the first time after an even
number of tosses is 1/3 and after an odd number of tosses is 2/3.
\end{example}

\subsection*{Historical Remarks}

An interesting question in the history of science is: Why was probability not
developed until the sixteenth century?  We know that in the sixteenth century
problems in gambling and games of chance made people start to think about
probability.  But gambling and games of chance are almost as old as
civilization itself.  In ancient Egypt\index{Egypt} (at the time of the First Dynasty,
ca.~3500~{\footnotesize\rm {B.C.}}) a game now called ``Hounds and Jackals"
was played. 
In this game the movement of the hounds and jackals was based on the outcome
of
the roll of four-sided dice made out of animal bones called astragali. 
Six-sided
dice made of a variety of materials date back to the sixteenth
century~{\footnotesize\rm {B.C.}}  Gambling
was widespread in ancient Greece\index{Greece} and Rome\index{Rome}.  Indeed, 
in the Roman Empire it was
sometimes found necessary to invoke laws against gambling.  Why, then, were
probabilities not calculated until the sixteenth century?

Several explanations have been advanced for this late development.  One is
that
the relevant mathematics was not developed and was not easy to develop.  The
ancient mathematical notation made numerical calculation complicated, and our
familiar algebraic notation was not developed until the sixteenth century. 
However, as we shall see, many of the combinatorial ideas needed to calculate
probabilities were discussed long before the sixteenth century.  Since many
of
the chance events of those times had to do with lotteries relating to
religious
affairs, it has been suggested that there may have been religious barriers to
the study of chance and gambling.  Another suggestion is that a stronger
incentive, such as the development of commerce, was necessary.  However, none
of these explanations seems completely satisfactory, and people still wonder
why it took so long for probability to be studied seriously.  An interesting
discussion of this problem can be found in Hacking\index{HACKING, I.}.\footnote{I.~Hacking,
\emx
{The Emergence of Probability} (Cambridge: Cambridge University Press,
1975).}
\par
The first person to calculate probabilities systematically was Gerolamo
Cardano\index{CARDANO, G.|(}
(1501--1576) in his book \emx {Liber de Ludo Aleae.}  This was translated from
the Latin by Gould and appears in the book \emx {Cardano: The Gambling
Scholar}
by Ore\index{ORE, O.}.\footnote{O.~Ore, \emx {Cardano: The Gambling Scholar}
(Princeton:
Princeton University Press, 1953).}  Ore provides a fascinating discussion of
the life of this colorful scholar with accounts of his interests in many
different fields, including medicine, astrology, and mathematics.  You will
also find there a detailed account of Cardano's famous battle with Tartaglia
over the solution to the cubic equation.

In his book on probability Cardano dealt only with the special case that we
have called the uniform distribution function.  This restriction to equiprobable outcomes
was
to continue for a long time.  In this case Cardano realized that the
probability that an event occurs is the ratio of the number of favorable
outcomes to the total number of outcomes.

Many of Cardano's examples dealt with rolling dice.  Here he realized that
the
outcomes for two rolls should be taken to be the 36 ordered pairs $(i,j)$
rather than the 21 unordered pairs.  This is a subtle point that was still
causing problems much later for other writers on probability.  For example,
in
the eighteenth century the famous French mathematician d'Alembert, author of
several works on probability, claimed that when a coin is tossed twice
the number of heads that turn up would be 0, 1, or~2, and hence we should
assign equal probabilities for these three possible
outcomes.\footnote{J.~d'Alembert, ``Croix ou Pile," in \emx
{L'Encyclop\'edie,}
ed.~Diderot, vol.~4 (Paris, 1754).}  Cardano chose the correct sample space
for
his dice problems and calculated the correct probabilities for a variety of
events.
\par
Cardano's mathematical work is interspersed with a lot of advice to the
potential gambler in short paragraphs, entitled, for example: ``Who Should
Play
and When," ``Why Gambling Was Condemned by Aristotle," ``Do Those Who Teach
Also Play Well?" and so forth.  In a paragraph entitled ``The Fundamental
Principle of Gambling," Cardano writes:
\begin{quote}
The most fundamental principle of all in gambling is simply equal conditions,
e.g., of opponents, of bystanders, of money, of situation, of the dice box,
and
of the die itself.  To the extent to which you depart from that equality, if
it
is in your opponent's favor, you are a fool, and if in your own, you are
unjust.\footnote{O.~Ore, op.\ cit., p.~189.}
\end{quote}

Cardano did make mistakes, and if he realized it later he did not go back and
change his error.  For example, for an event that is favorable in three out
of
four cases, Cardano assigned the correct odds $3 : 1$ that the event will
occur.  But then he assigned odds by squaring these numbers (i.e., $9 :
1$) for the event to happen twice in a row.  Later, by considering the case
where the odds are $1 : 1$, he realized that this cannot be correct and
was led to the correct result that when $f$ out of $n$ outcomes are
favorable, the odds for a favorable outcome twice in a row are $f^2 : n^2 -
f^2$.  Ore points out that this is equivalent to the realization that if the
probability that an event happens in one experiment is $p$, the probability
that
it happens twice is $p^2$.  Cardano proceeded to establish that for three
successes the formula should be $p^3$ and for four successes $p^4$, making it
clear that he understood that the probability is $p^n$ for $n$ successes in
$n$
independent repetitions of such an experiment.  This will follow from the
concept of independence that we introduce in Section~\ref{sec 4.1}\index{CARDANO, G.|)}. 
\par
Cardano's work was a remarkable first attempt at writing down the laws of
probability, but it was not the spark that started a systematic study of the
subject.  This came from a famous series of letters between Pascal and Fermat. 
This correspondence was initiated by Pascal to consult Fermat about problems he
had been given by Chevalier de M\'er\'e,\index{de M\'ER\'E, CHEVALIER} a well-known writer, a
prominent figure at the court of Louis XIV, and an ardent gambler.
\par
The first problem de M\'er\'e posed was a dice problem.  The story goes that he
had been betting that at least one six would turn up in four rolls of a die and
winning too often, so he then bet that a pair of sixes would turn up in 24
rolls of a pair of dice.  The probability of a six with one die is 1/6 and, by
the product law for independent experiments, the probability of two sixes when
a pair of dice is thrown is $(1/6)(1/6) = 1/36$.  Ore\footnote{O.\ Ore,
``Pascal and the Invention of Probability Theory,'' \emx {American Mathematics
Monthly}, vol. 67 (1960), pp. 409--419.}\index{ORE, O.} claims that a gambling rule of the
time suggested that, since four repetitions was favorable for the occurrence of
an event with probability 1/6, for an event six times as unlikely, $6 \cdot 4 =
24$ repetitions would be sufficient for a favorable bet.  Pascal showed, by
exact calculation, that 25 rolls are required for a favorable bet for a pair of
sixes.
\par
The second problem was a much harder one: it was an old problem and concerned
the determination of a fair division of the stakes in a tournament when the
series, for some reason, is interrupted before it is completed.  This problem
is now referred to as the problem of points.\index{problem of points}  The problem
had been a standard problem in mathematical texts; it appeared in Fra Luca
Paccioli's book \emx {summa de Arithmetica, Geometria, Proportioni et
Proportionalit\`a}, printed in Venice in 1494,\footnote{ibid., p. 414.} in the
form:
\begin{quote}
A team plays ball such that a total of 60 points are required to win the
game, and each inning counts 10 points.  The stakes are 10 ducats.  By some
incident they cannot finish the game and one side has 50 points and the other
20.  One wants to know what share of the prize money belongs to each side.  In
this case I have found that opinions differ from one to another but all seem to
me insufficient in their arguments, but I shall state the truth and give the
correct way.
\end{quote}
\par
Reasonable solutions, such as dividing the stakes according to the ratio of
games won by each player, had been proposed, but no correct solution had been
found at the time of the Pascal-Fermat correspondence.\index{PASCAL, B.|(}\index{FERMAT, P.|(} 
The letters deal mainly with the attempts of Pascal and Fermat to solve this problem.  Blaise
Pascal (1623--1662) was a child prodigy, having published his
treatise on conic sections at age sixteen, and having invented a calculating
machine at age eighteen.  At the time of the letters, his demonstration of the weight of the
atmosphere had already established his position at the forefront of
contemporary physicists.  Pierre de~Fermat (1601--1665) was a learned jurist in
Toulouse, who studied mathematics in his spare time.  He has been called by some the prince of
amateurs and one of the greatest pure mathematicians of all times.
\par
The letters, translated by Maxine Merrington, appear in Florence David's
fascinating historical account of probability, \emx {Games, Gods and Gambling}.\footnote{F. N.
David, \emx {Games, Gods and Gambling\/} (London: G. Griffin, 1962), p. 230
ff.}\index{DAVID, F.\ N.}  In a letter dated Wednesday, 29th July, 1654, Pascal writes to
Fermat:
\begin{quote}
Sir,
\par
Like you, I am equally impatient, and although I am again ill in bed, I
cannot help telling you that yesterday evening I received from M. de Carcavi
your letter on the problem of points, which I admire more than I can possibly
say.  I have not the leisure to write at length, but, in a word, you have
solved the two problems of points, one with dice and the other with sets of
games with perfect justness; I am entirely satisfied with it for I do not doubt
that I was in the wrong, seeing the admirable agreement in which I find myself
with you now\dots
\par
Your method is very sound and is the one which first came to my mind in this
research; but because the labour of the combination is excessive, I have found
a short cut and indeed another method which is much quicker and neater, which I
would like to tell you here in a few words: for henceforth I would like to open
my heart to you, if I may, as I am so overjoyed with our agreement.  I see that
truth is the same in Toulouse as in Paris.
\par
Here, more or less, is what I do to show the fair value of each game, when two
opponents play, for example, in three games and each person has staked 32
pistoles.
\par
Let us say that the first man had won twice and the other once; now they play
another game, in which the conditions are that, if the first wins, he takes all
the stakes; that is 64 pistoles; if the other wins it, then they have each won
two games, and therefore, if they wish to stop playing, they must each take
back their own stake, that is, 32 pistoles each.
\par
Then consider, Sir, if the first man wins, he gets 64 pistoles; if he loses he
gets 32.  Thus if they do not wish to risk this last game but wish to separate
without playing it, the first man must say: `I am certain to get 32 pistoles,
even if I lost I still get them; but as for the other 32, perhaps I will get
them, perhaps you will get them, the chances are equal.  Let us then divide
these 32 pistoles in half and give one half to me as well as my 32 which are
mine for sure.'  He will then have 48 pistoles and the other 16\dots
\end{quote}

Pascal's argument produces the table illustrated in Figure~\ref{fig 1.9} for the amount
due player A at any quitting point.
\putfig{3truein}{PSfig1-9}{Pascal's table.}{fig 1.9} 
\par
Each entry in the table is the average of the numbers just above and to the
right of the number.  This fact, together with the known values when the
tournament is completed, determines all the values in this table.  If player A
wins the first game, then he needs two games to win and B needs three games to
win; and so, if the tounament is called off, A should receive 44 pistoles.
\par
The letter in which Fermat presented his solution has been lost; but
fortunately, Pascal describes Fermat's method in a letter dated Monday, 24th
August, 1654.  From Pascal's letter:\footnote{ibid., p. 239ff.}
\begin{quote}
This is your procedure when there are two players: If two
players, playing several games, find themselves in that position when the first
man needs \emx {two} games and second needs \emx {three}, then to find the fair
division of stakes, you say that one must know in how many games the play will
be absolutely decided.
\par
It is easy to calculate that this will be in \emx {four} games, from which you
can conclude that it is necessary to see in how many ways four games can be
arranged between two players, and one must see how many combinations would make
the first man win and how many the second and to share out the stakes in this
proportion.  I would have found it difficult to understand this if I had not
known it myself already; in fact you had explained it with this idea in mind.
\end{quote}
\par
Fermat realized that the number of ways that the game might be finished may not
be equally likely.  For example, if A needs two more games and B needs three to
win, two possible ways that the tournament might go for A to win are WLW and
LWLW.  These two sequences do not have the same chance of occurring.  To avoid
this difficulty, Fermat extended the play, adding fictitious plays, so that all
the ways that the games might go have the same length, namely four.  He was
shrewd enough to realize that this extension would not change the winner and
that he now could simply count the number of sequences favorable to each player
since he had made them all equally likely.  If we list all possible ways that
the extended game of four plays might go, we obtain the following 16 possible
outcomes of the play:
\vskip .1in
\begin{table}[h]
\centering
\begin{tabular}{llll}
{\underline{WWWW}} & {\underline{WLWW}} & {\underline{LWWW}} &
{\underline{LLWW}} \\
{\underline{WWWL}} & {\underline{WLWL}} &
{\underline{LWWL}} & {LLWL} \\
{\underline{WWLW}} &
{\underline{WLLW}} & {\underline{LWLW}} & {LLLW} \\
{\underline{WWLL}} & {WLLL} & {LWLL} & {LLLL}\ .
\end{tabular}
\end{table}
\vskip .1in
Player A wins in the cases where there are at least two wins (the 11
underlined cases), and B wins in the cases where there are at least three
losses (the other 5 cases).  Since A wins in 11 of the 16 possible cases Fermat
argued that the probability that A wins is 11/16.  If the stakes are 64
pistoles, A should receive 44 pistoles in agreement with Pascal's result. 
Pascal and Fermat developed more systematic methods for counting the number of
favorable outcomes for problems like this, and this will be one of our central
problems.  Such counting methods fall under the subject of \emx {combinatorics},
which is the topic of Chapter~\ref{chp 3}.
\par
We see that these two mathematicians arrived at two very different ways to
solve the problem of points.  Pascal's method was to develop an algorithm and
use it to calculate the fair division.  This method is easy to implement on a
computer and easy to generalize.  Fermat's method, on the other hand, was to
change the problem into an equivalent problem for which he could use counting
or combinatorial methods.  We will see in Chapter~\ref{chp 3} that, in fact, Fermat used 
what has become known as Pascal's triangle!  In our study of probability today we 
shall find that both the algorithmic approach and the combinatorial approach share equal
billing, just as they did 300 years ago when probability got its start.
\index{PASCAL, B.|)}\index{FERMAT, P.|)}

\exercises
\begin{LJSItem}

\i\label{exer1.2.1} Let $\Omega = \{a,b,c\}$ be a sample space.  Let $m(a)
= 1/2$, $m(b) =
1/3$, and $m(c) = 1/6$.  Find the probabilities for all eight subsets of
$\Omega$.

\i\label{exer 1.2.2} Give a possible sample space $\Omega$ for each of
the following experiments:
\begin{enumerate}
\item An election decides between two candidates A and B.
\item A two-sided coin is tossed.
\item A student is asked for the month of the year and the day of the week
on which her birthday falls.
\item A student is chosen at random from a class of ten students.
\item You receive a grade in this course.
\end{enumerate}

\i\label{exer 1.2.3} For which of the cases in Exercise~\ref{exer 1.2.2}  
would it be reasonable to assign the uniform distribution function?

\i\label{exer 1.2.4} Describe in words the events specified by
the following subsets of 
$$\Omega = \{HHH,\ HHT,\ HTH,\ HTT,\ THH,\ THT,\ TTH,\ TTT\}$$
(see Example~\ref{exam 1.5}).
\begin{enumerate}
\item $E = \{\mbox{HHH,HHT,HTH,HTT}\}$. 
\item $E = \{\mbox{HHH,TTT}\}$.
\item $E = \{\mbox{HHT,HTH,THH}\}$. 
\item $E =
\{\mbox{HHT,HTH,HTT,THH,THT,TTH,TTT}\}$.
\end{enumerate}

\i\label{exer 1.2.5} What are the probabilities of the events described in
Exercise~\ref{exer 1.2.4}?

\i\label{exer 1.2.6} A die is loaded in such a way that the probability of
each face turning
up is proportional to the number of dots on that face.  (For example, a six
is
three times as probable as a two.)  What is the probability of getting an
even
number in one throw?

\i\label{exer 1.2.7} Let $A$ and $B$ be events such that $P(A \cap B) =
1/4$, $P(\tilde A) =
1/3$, and $P(B) = 1/2$.  What is $P(A \cup B)$?

\i\label{exer 1.2.8} A student must choose one of the subjects, art,
geology, or psychology,
as an elective.  She is equally likely to choose art or psychology and twice
as
likely to choose geology.  What are the respective probabilities that she
chooses art, geology, and psychology?

\i\label{exer 1.2.9} A student must choose exactly two out of three
electives: art, French,
and mathematics.  He chooses art with probability 5/8, French with
probability
5/8, and art and French together with probability 1/4.  What is the
probability
that he chooses mathematics?  What is the probability that he chooses either
art or French?

\i\label{exer 1.2.10} For a bill to come before the president of the
United States, it must
be passed by both the House of Representatives and the Senate.  Assume that,
of
the bills presented to these two bodies, 60 percent pass the House, 80
percent
pass the Senate, and 90 percent pass at least one of the two.  Calculate the
probability that the next bill presented to the two groups will come before
the
president.

\i\label{exer 1.2.11} What odds should a person give in favor of the
following events?
\begin{enumerate}
\item A card chosen at random from a 52-card deck is an ace.
\item Two heads will turn up when a coin is tossed twice.
\item Boxcars (two sixes) will turn up when two dice are rolled.
\end{enumerate}

\i\label{exer 1.2.12} You offer $3 : 1$ odds that your friend Smith will
be elected mayor
of your city.  What probability are you assigning to the event that Smith
wins?

\i\label{exer 1.2.13} In a horse race, the odds that Romance will win are
listed as $2 :
3$ and that Downhill will win are $1 : 2$.  What odds should be given for
the event that either Romance or Downhill wins?

\i\label{exer 1.2.13.1} Let $X$ be a random variable with distribution function
$m_X(x)$ defined by
$$
m_X(-1) = 1/5,\ \  m_X(0) = 1/5,\ \ m_X(1) = 2/5,\ \  m_X(2) = 1/5\ .
$$
\begin{enumerate}
\item Let $Y$ be the random variable defined by the equation $Y = X + 3$.  Find the
distribution function $m_Y(y)$ of $Y$.
\item Let $Z$ be the random variable defined by the equation $Z = X^2$.  Find the 
distribution function $m_Z(z)$ of $Z$.
\end{enumerate}

\istar\label{exer 1.2.14} John and Mary are taking a mathematics course.  The
course has only
three grades: A, B, and C.  The probability that John gets a B is .3.  The
probability that Mary gets a B is .4.  The probability that neither gets an A
but at least one gets a B is .1.  What is the probability that at least one
gets a B but neither gets a C?

\i\label{exer 1.2.15} In a fierce battle, not less than 70 percent of the
soldiers lost one
eye, not less than 75 percent lost one ear, not less than 80 percent lost one
hand, and not less than 85 percent lost one leg.  What is the minimal
possible
percentage of those who simultaneously lost one ear, one eye, one hand, and
one
leg?\footnote{See Knot X, in Lewis Carroll, \emx {Mathematical Recreations,}
vol.~2 (Dover, 1958).}

\istar\label{exer 1.2.16}
Assume that the probability of a ``success" on
a single experiment with $n$ outcomes is $1/n$.  Let $m$ be the number of
experiments necessary to make it a favorable bet that at least one success
will
occur (see Exercise~\ref{sec 1.1}.\ref{exer 1.1.5}).

\begin{enumerate}
\item Show that the probability that, in $m$ trials, there are no successes
is $(1 - 1/n)^m$.

\item (de Moivre)\index{de MOIVRE, A.} Show that if $m = n \log 2$ then
$$
\lim_{n \to \infty} \left(1 - \frac1n \right)^m = \frac12\ .
$$
\emx {Hint}:
$$
\lim_{n \to \infty} \left(1 - \frac1n \right)^n = e^{-1}\ .
$$
Hence for large $n$ we should choose $m$ to be about $n \log 2$.

\item Would DeMoivre have been led to the correct answer for de~M\'er\'e's
\index{de M\'ER\'E, CHEVALIER}
two bets if he had used his approximation?
\end{enumerate}

\i\label{exer 1.2.17}
\begin{enumerate}
\item For events $A_1$, \dots, $A_n$, prove that 
$$P(A_1 \cup \cdots \cup A_n) \leq P(A_1) + \cdots + P(A_n)\ .$$
\item For events $A$ and $B$, prove that
$$
P(A \cap B) \geq P(A) + P(B) - 1.
$$
\end{enumerate}

\i\label{exer 1.2.18}
If $A$, $B$, and~$C$ are any three events, show that
$$\begin{array}{ll}
P(A \cup B \cup C) &= P(A) + P(B) + P(C) \\
                   &\ \  -\, P(A \cap B) - P(B \cap C) - P(C \cap A) \\
                   &\ \  +\, P(A \cap B \cap C)\ .
\end{array}
$$

\i\label{exer 1.2.19} Explain why it is not possible to define a
uniform distribution function (see Definition~\ref{def 1.3}) 
on a countably infinite sample space.  \emx {Hint}: Assume $m(\omega) = a$
for all
$\omega$, where $0 \leq a \leq 1$.  Does $m(\omega)$ have all the properties
of
a distribution function?

\i\label{exer 1.2.20} In Example~\ref{exam 1.10} find the probability that
the
coin turns up heads for the first time on the tenth, eleventh, or twelfth
toss.

\i\label{exer 1.2.21} A die is rolled until the first time that a six
turns up.  We shall
see that the probability that this occurs on the $n$th roll is
$(5/6)^{n-1}\cdot(1/6)$.  Using this fact, describe the appropriate infinite
sample space and distribution function for the experiment of rolling a die until a six
turns
up for the first time.  Verify that for your distribution function $\sum_{\omega} m(\omega)
=
1$.

\i\label{exer 1.2.22} 
Let $\Omega$ be the sample space
$$
\Omega = \{0,1,2,\dots\}\ ,
$$
and define a distribution function by
$$
m(j) = (1 - r)^j r\ ,
$$
for some fixed $r$, $0 < r < 1$, and for $j = 0, 1, 2, \ldots$.
Show that this is a distribution function for $\Omega$.

\i\label{exer 1.2.23} Our calendar\index{calendar} has a 400-year cycle.  
B.~H.~Brown\index{BROWN, B. H.}
noticed that the
number of times the thirteenth of the month falls on each of the days of the
week in
the 4800 months of a cycle is as follows:

\par Sunday 687
\par Monday 685
\par Tuesday 685
\par Wednesday 687
\par Thursday 684
\par Friday 688
\par Saturday 684

\noindent From this he deduced that the thirteenth was more likely to fall on
Friday than on any other day.  Explain what he meant by this.

\i\label{exer 1.2.24} Tversky and Kahneman\footnote{K.~McKean, ``Decisions,
Decisions," pp.~22--31.} asked a group of subjects to carry out the
following
task.  They are told that:\index{Linda problem}

\begin{quote}
Linda is 31, single, outspoken, and very bright.  She majored in philosophy
in
college.  As a student, she was deeply concerned with racial discrimination
and
other social issues, and participated in anti-nuclear demonstrations.
\end{quote}

The subjects are then asked to rank the likelihood of various alternatives,
such as:\hfill\break
\indent(1)~ Linda is active in the feminist movement.\hfill\break
\indent(2)~ Linda is a bank teller.\hfill\break
\indent(3)~ Linda is a bank teller and active in the feminist
movement.\hfill\break

\noindent Tversky\index{TVERSKY, A.} and Kahneman\index{KAHNEMAN, D.} found that 
between 85 and 90 percent of the subjects rated
alternative (1) most likely, but alternative (3) more likely than alternative
(2).  Is it?  They call this phenomenon the \emx {conjunction fallacy,}\index{conjunction
fallacy}\index{fallacy} and
note that it appears to be unaffected by prior training in probability or
statistics.  Is this phenomenon a fallacy?  If so, why? Can you give a possible
explanation for the subjects' choices?

\i\label{exer 1.2.25} Two cards are drawn successively from a deck of 52
cards.  Find the
probability that the second card is higher in rank than the first card.  \emx
{Hint}: Show that $1 = P(\mbox{higher}) + P(\mbox{lower}) + P(\mbox{same})$
and use the fact that $P(\mbox{higher}) = P(\mbox{lower})$.

\i\label{exer 1.2.26} A \emx {life table}\index{life table} is a table that lists for a
given number of births the estimated number of people who will live to a
given age.  In Appendix~C we give a life table based upon 100{,}000
births for ages from 0 to 85, both for women and for men.  Show how from this 
table you can estimate the probability $m(x)$ that a person born in 1981 
would live to age $x$.  Write a program to plot $m(x)$ both for men and for women, 
and comment on the differences that you see in the two cases.

\istar\label{exer 1.2.27} Here is an attempt to get around the fact that we
cannot 
choose a ``random integer."\index{random integer}
\begin{enumerate}
\item What, intuitively, is the probability that a ``randomly chosen"
positive integer is a multiple of 3?

\item Let $P_3(N)$ be the probability that an integer, chosen at random
between 1 and $N$, is a multiple of 3 (since the sample space is finite, this
is a legitimate probability).  Show that the limit
$$
P_3 = \lim_{N \to \infty} P_3(N)
$$
exists and equals 1/3.  This formalizes the intuition in
(a), and gives us a way to assign ``probabilities" to
certain events that are infinite subsets of the positive integers.

\item If $A$ is any set of positive integers, let $A(N)$ mean the number
of elements of $A$ which are less than or equal to $N$.  Then define the
``probability" of $A$ as 
$$
P(A) = \lim_{N \to \infty} A(N)/N\ ,
$$
provided this limit exists.  Show that this definition would assign
probability 0 to any finite set and probability 1 to the set of all positive integers. 
Thus, the probability of the set of all integers is not the sum of the
probabilities of the individual integers in this set.  This means that the
definition of probability given here is not a completely satisfactory
definition.

\item Let $A$ be the set of all positive integers with an odd number of digits.
Show that $P(A)$ does not exist.  This shows that under the above definition
of probability, not all sets have probabilities.

\end{enumerate}


\i\label{exer 1.2.28}
(from Sholander\footnote{M. Sholander, Problem \#1034, \emx {Mathematics Magazine,} vol.~52,
no.\ 3 (May 1979), p.~183.})\index{SHOLANDER, M.}\index{clover-leaf interchange} In a standard
clover-leaf interchange, there are four ramps for making right-hand turns, and inside these
four ramps, there are four more ramps for making left-hand turns.  Your car approaches the
interchange from the south.  A mechanism has been installed so that at each point where there
exists a choice of directions, the car turns to the right with fixed probability~$r$. 
\begin{enumerate}
\item
If $r = 1/2$, what is your chance of emerging from the interchange going west?
\item
Find the value of $r$ that maximizes your chance of a westward departure from the interchange.
\end{enumerate}

\i\label{exer 1.2.29}
(from Benkoski\footnote{S. Benkoski, Comment on Problem \#1034, \emx {Mathematics Magazine,}
vol.~52, no.\ 3 (May 1979), pp.~183-184.})\index{BENKOSKI, S.} Consider a ``pure" cloverleaf
interchange in which there are no ramps for right-hand turns, but only the two intersecting
straight highways with cloverleaves for left-hand turns.  (Thus, to turn right in such an
interchange, one must make three left-hand turns.)  As in the preceding problem, your car
approaches the interchange from the south.  What is the value of $r$ that maximizes your
chances of an eastward departure from the interchange?

\i\label{exer 1.2.30}
(from vos~Savant\footnote{M.~vos~Savant, \emx{Parade Magazine}, 3 March 1996, p. 14.}\index{vos
SAVANT, M.}) A reader of Marilyn vos~Savant's column wrote in with the following question: 
\begin{quote}
My dad heard this story on the radio. At Duke
University, two students had received A's in chemistry all semester. But on the night before
the final exam, they were partying in another state and didn't get back to Duke until it was
over. Their excuse to the professor was that they had a flat tire, and they asked if they
could take a make-up test. The professor agreed, wrote out a test and sent the two to separate
rooms to take it. The first question (on one side of the paper) was worth 5 points, and they
answered it easily. Then they flipped the paper over and found the second question, worth 95
points: `Which tire was it?' What was the probability that both students would say the
same thing? My dad and I think it's 1 in 16. Is that right?"
\end{quote}


\begin{enumerate}
\item
Is the answer 1/16?
\item
The following question was asked of a class of students.  ``I was driving to
school today, and one of my tires went flat.  Which tire do you think it was?"
The responses were as follows:  right front, 58\%, left front, 11\%, right rear, 
18\%, left rear, 13\%.  Suppose that this distribution holds in the general population,
and assume that the two test-takers are randomly chosen from the general population.  What
is the probability that they will give the same answer to the second question?
\end{enumerate}


\end{LJSItem}
