ńņš. 5 |

assured by one of the fundamental theorems of probability called the

law of large numbers. This theorem asserts that, for any > 0,

x

ā’ p| < ]

Pr[|

n

tends to 1 as n increases indeļ¬nitely.

It is important to understand what this theorem says and what it

does not say. Let us illustrate its meaning in the case of coin tossing.

We are going to toss a coin n times and we want the probability to be

very high, say greater than .99, that the fraction of heads which turn

up will be very close, say within .00l of the value .5. The law of large

numbers assures us that we can have this if we simply choose n large

enough. The theorem itself gives us no information about how large n

must be. Let us however consider this question.

To say that the fraction of the times success results is near p is the

same as saying that the actual number of successes x does not deviate

too much from the expected number np. To see the kind of deviations

which might be expected we can study the value of Pr[|x ā’ np| ā„ d]. A

table of these values for p = .3 and various values of n and d are given

in Figure 4.19. Let us ask how large d must be before a deviation as

large as d could be considered surprising. For example, let us see for

each n the value of d which makes Pr[|x ā’ np| ā„ d] about .04. From

the table, we see that d should be 7 for n = 50, 9 for n = 80, 10 for

n = 100, etc. To see deviations which might be considered more typical

we look for the values of d which make Pr[|x ā’ np| ā„ d] approximately

1

. Again from the table, we see that d should be 3 or 4 for n = 50, 4

3

or 5 for n = 80, 5 for n = 100, etc. The answers to these two questions

148 CHAPTER 4. PROBABILITY THEORY

are given in the last two columns of the table. An examination of these

numbers shows us that deviations which we would consider surprising

ā

are approximately ā while those which are more typical are about

n

one-half as large or n/2.

ā

This suggests that n, or a suitable multiple of it, might be taken

as a unit of measurement for deviations. Of course, we would also have

to study how Pr[|x ā’ np| ā„ d] depends on p. When this is done, one

ā

ļ¬nds that npq is a natural unit; it is called a standard deviation. It

can be shown that for large n the following approximations hold.

ā

Pr[|x ā’ np| ā„ npq] ā .3174

ā

Pr[|x ā’ np| ā„ 2 npq] ā .0455

ā

Pr[|x ā’ np| ā„ 3 npq] ā .0027

That is, a deviation from the expected value of one standard devi-

ation is rather typical, while a deviation of as much as two standard

deviations is quite surprising and three very surprising. For values of p

ā 1

not too near 0 or 1, the value of pq is approximately 2 . Thus these

approximations are consistent with the results we observed from our

table.

ā ā

For large n, Pr[x ā’ np ā„ k npq] or Pr[x ā’ np ā¤ ā’k npq] can be

shown to be approximately the same. Hence these probabilities can be

1

estimated for k = 1, 2, 3 by taking 2 the values given above.

Example 4.19 In throwing an ordinary coin 10,000 times, the ex-

pected number of heads is 5000, and the standard deviation for the

11

number of heads is 10, 000( 2 )( 2 ) = 50. Thus the probability that

the number of heads which turn up deviates from 5000 by as much as

one standard deviation, or 50, is approximately .317. The probability

of a deviation of as much as two standard deviations, or 100, is ap-

proximately .046. The probability of a deviation of as much as three

ā™¦

standard deviations, or 150, is approximately .003.

Example 4.20 Assume that in a certain large city, 900 people are

chosen at random and asked if they favor a certain proposal. Of the

900 asked, 550 say they favor the proposal and 350 are opposed. If, in

fact, the people in the city are equally divided on the issue, would it be

unlikely that such a large majority would be obtained in a sample of 900

of the citizens? If the people were equally divided, we would assume

that the 900 people asked would form an independent trials process

149

4.10. THE LAW OF LARGE NUMBERS

1 1

with probability 2 for a āyesā answer and 2 for a ānoā answer. Then

the standard deviation for the number of āyesā answers in 900 trials is

900( 2 )( 1 ) = 15. Then it would be very unlikely that we would obtain

1

2

a deviation of more than 45 from the expected number of 450. The

fact that the deviation in the sample from the expected number was

100, then, is evidence that the hypothesis that the voters were equally

divided is incorrect. The assumption that the true proportion is any

value less than 1 would also lead to the fact that a number as large

2

as 550 favoring in a sample of 900 is very unlikely. Thus we are led to

1

suspect that the true proportion is greater than 2 . On the other hand,

if the number who favored the proposal in the sample of 900 were 465,

we would have only a deviation of one standard deviation, under the

assumption of an equal division of opinion. Since such a deviation is

not unlikely, we could not rule out this possibility on the evidence of

ā™¦

the sample.

Example 4.21 A certain Ivy League college would like to admit 800

students in their freshman class. Experience has shown that if they

admit 1250 students they will have acceptances from approximately

800. If they admit as many as 50 too many students they will have

to provide additional dormitory space. Let us ļ¬nd the probability that

this will happen assuming that the acceptances of the students can be

considered to be an independent trials process. We take as our estimate

800

for the probability of an acceptance p = 1250 . Then the expected num-

ber of acceptances is 800 and the standard deviation for the number of

ā

acceptances is 1250 Ā· .64 Ā· .36 ā 17. The probability that the number

accepted is three standard deviations or 51 from the mean is approx-

imately .0027. This probability takes into account a deviation above

the mean or below the mean. Since in this case we are only interested

in a deviation above the mean, the probability we desire is half of this

or approximately .0013. Thus we see that it is highly unlikely that the

college will have to have new dormitory space under the assumptions

ā™¦

we have made.

We ļ¬nish this discussion of the law of large numbers with some ļ¬nal

remarks about the interpretation of this important theorem.

Of course no matter how large n is we cannot prevent the coin from

coming up heads every time. If this were the case we would observe a

fraction of heads equal to 1. However, this is not inconsistent with the

1

theorem, since the probability of this happening is ( 2 )n which tends to

150 CHAPTER 4. PROBABILITY THEORY

0 as n increases. Thus a fraction of 1 is always possible, but becomes

increasingly unlikely.

The law of large numbers is often misinterpreted in the following

manner. Suppose that we plan to toss the coin 1000 times and after

500 tosses we have already obtained 400 heads. Then we must obtain

less than one-half heads in the remaining 500 tosses to have the fraction

come out near 1 . It is tempting to argue that the coin therefore owes

2

us some tails and it is more likely that tails will occur in the last 500

tosses. Of course this is nonsense, since the coin has no memory. The

point is that something very unlikely has already happened in the ļ¬rst

500 tosses. The ļ¬nal result can therefore also be expected to be a result

not predicted before the tossing began.

We could also argue that perhaps the coin is a biased coin but this

would make us predict more heads than tails in the future. Thus the

law of averages, or the law of large numbers, should not give you great

comfort if you have had a series of very bad hands dealt you in your

last 100 poker hands. If the dealing is fair, you have the same chance

as ever of getting a good hand.

Early attempts to deļ¬ne the probability p that success occurs on

a single experiment sounded like this. If the experiment is repeated

indeļ¬nitely, the fraction of successes obtained will tend to a number p,

and this number p is called the probability of success on a single exper-

iment. While this fails to be satisfactory as a deļ¬nition of probability,

the law of large numbers captures the spirit of this frequency concept

of probability.

Exercises

1. If an ordinary die is thrown 20 times, what is the expected number

of times that a six will turn up? What is the standard deviation

for the number of sixes that turn up?

10 5

[Ans. ; .]

33

2. Suppose that an ordinary die is thrown 450 times. What is the

expected number of throws that result in either a three or a four?

What is the standard deviation for the number of such throws?

3. In 16 tosses of an ordinary coin, what is the expected number

of heads that turn up? What is the standard deviation for the

number of heads that occur?

151

4.10. THE LAW OF LARGE NUMBERS

[Ans. 8;2.]

4. In 16 tosses of a coin, ļ¬nd the exact probability that the number

of heads that turn up diļ¬ers from the expected number by (a)

as much as one standard deviation, and (b) by more than one

standard deviation. Do the same for the case of two standard

deviations, and for the case of three standard deviations. Show

that the approximations given for large n lie between the values

obtained, but are not very accurate for so small an n.

[Ans. .454; .210; .077; .021; .004; .001.]

5. Consider n independent trials with probability p for success. Let r

and s be numbers such that p < r < s. What does the law of large

x

numbers say about Pr[r < n < s] as we increase n indeļ¬nitely?

Answer the same question in the case that r < p < s.

6. A drug is known to be eļ¬ective in 20 per cent of the cases where

it is used. A new agent is introduced, and in the next 900 times

the drug is used it is eļ¬ective 250 times. What can be said about

the eļ¬ectiveness of the drug?

7. In a large number of independent trials with probability p for

success, what is the approximate probability that the number of

successes will deviate from the expected number by more than

one standard deviation but less than two standard deviations?

[Ans. .272.]

8. What is the approximate probability that, in 10,000 throws of an

ordinary coin, the number of heads which turn up lies between

4850 and 5150? What is the probability that the number of heads

lies in the same interval, given that in the ļ¬rst 1900 throws there

were 1600 heads?

9. Suppose that it is desired that the probability be approximately

.95 that the fraction of sixes that turn up when a die is thrown n

1

times does not deviate by more than .01 from the value 6 . How

large should n be?

[Ans. Approximately 5555.]

152 CHAPTER 4. PROBABILITY THEORY

10. Suppose that for each roll of a fair die you lose $1 when an odd

number comes up and win $1 when an even number comes up.

Then after 10,000 rolls you can, with approximately 84 per cent

conļ¬dence, expect to have lost not more than $(how much?).

11. Assume that 10 per cent of the people in a certain city have

cancer. If 900 people are selected at random from the city, what

is the expected number which will have cancer? What is the

standard deviation? What is the approximate probability that

more than 108 of the 900 chosen have cancer?

[Ans. 90;9;.023.]

12. Suppose that in Exercise 11, the 900 people are chosen at random

from those people in the city who smoke. Under the hypothesis

that smoking has no eļ¬ect on the incidence of cancer, what is the

expected number in the 900 chosen that have cancer? Suppose

that more than 120 of the 900 chosen have cancer, what might be

said concerning the hypothesis that smoking has no eļ¬ect on the

incidence of cancer?

13. In Example 4.20, we made the assumption in our calculations

that, if the true proportion of voters in favor of the proposal

were p, then the 900 people chosen at random represented an

independent trials process with probability p for a āyesā answer,

and 1 ā’ p for a ānoā answer. Give a method for choosing the 900

people which would make this a reasonable assumption. Criticize

the following methods.

(a) Choose the ļ¬rst 900 people in the list of registered Republi-

cans.

(b) Choose 900 names at random from the telephone book.

(c) Choose 900 houses at random and ask one person from each

house, the houses being visited in the mid-morning.

14. For n throws of an ordinary coin, let tn be such that

x1

ā’ < tn ] = .997,

Pr[ā’tn <

n2

where x is the number of heads that turn up. Find tn for n = 104 ,

n = 106 , and n = 1020 .

4.11. INDEPENDENT TRIALS WITH MORE THAN TWO OUTCOMES153

[Ans. .015; .0015; .000,000,000,15.]

15. Assume that a calculating machine carries out a million opera-

tions to solve a certain problem. In each operation the machine

gives the answer 10ā’5 too small, with probability 1 , and 10ā’5 too

2

1

large, with probability 2 . Assume that the errors are independent

of one another. What is a reasonable accuracy to attach to the

answer? What if the machine carries out 1010 operations?

[Ans. Ā±.01; Ā±1.]

16. A computer tosses a coin 1 million times, and obtains 499,588

heads. Is this number reasonable?

4.11 Independent trials with more than

two outcomes

By extending the results of Section 4.8, we shall study the case of

independent trials in which we allow more than two outcomes. We

assume that we have an independent trials process where the possible

outcomes are a1 , a2 , . . . , ak , occurring with probabilities p1 , p2 , . . . , pk ,

respectively. We denote by

f (r1 , r2 , . . . , rk ; p1 , p2 , . . . , pk )

the probability that, in n = r1 + r2 + . . . + rk such trials, there will be r1

occurrences of a1 , r2 occurrences of a2 , etc. In the case of two outcomes

this notation would be f (r1 , r2 ; p1 , p2 ). In Section 4.8 we wrote this as

f (n, r + 1; p) since r2 and p2 are determined from n, r1 , and p1 . We

shall indicate how this probability is found in general, but carry out

the details only for a special case. We choose k = 3, and n = 5 for

purposes of illustration. We shall ļ¬nd f (1, 2, 2; p1 , p2 , p3 ).

We show in Figure 4.20 enough of the tree for this process to indicate

the branch probabilities for a path (heavy lined) corresponding to the

outcomes a2 , a3 , a1 , a2 , a3 . The tree measure assigns weight p2 Ā· p3 Ā· p1 Ā·

p2 Ā· p3 = p1 Ā· p2 Ā· p2 to this path.

2 3

There are, of course, other paths through the tree corresponding to

one occurrence of a1 , two of a2 , and two of a3 . However, they would all

be assigned the same weight p1 Ā· p2 Ā· p2 , by the tree measure. Hence to

2 3

154 CHAPTER 4. PROBABILITY THEORY

Figure 4.20: ā™¦

ļ¬nd f (l, 2, 2; p1 , p2 , p3 ) we must multiply this weight by the number of

paths having the speciļ¬ed number of occurrences of each outcome.

We note that the path a2 , a3 , a1 , a2 , a3 can be speciļ¬ed by the three-

cell partition [{3}, {1, 4}, {2, 5}] of the numbers from 1 to 5. Here the

ļ¬rst cell shows the experiment which resulted in a1 , the second cell

shows the two that resulted in a2 , and the third shows the two that

resulted in a3 . Conversely, any such partition of the numbers from 1

to 5 with one element in the ļ¬rst cell, two in the second, and two in

the third corresponds to a unique path of the desired kind. Hence the

number of paths is the number of such partitions. But this is

5 5!

=

1, 2, 2 1!2!2!

(see 3.4), so that the probability of one occurrence of a1 , two of a2 , and

two of a3 is

5

Ā· p1 Ā· p2 Ā· p2 .

2 3

1, 2, 2

The above argument carried out in general leads, for the case of

independent trials with outcomes a1 , a2 , . . . , ak occurring with proba-

bilities p1 , p2 , . . . , pk , to the following.

The probability for r1 occurrences of a1 , r2 occurrences of

a2 , etc., is given by

n

p r1 Ā· p r2 Ā· . . . Ā· p rk .

f (r1 , r2 , . . . , rk ; p1 , p2 , . . . , pk ) =

r1 , r2 , . . . , rk 1 2 k

4.11. INDEPENDENT TRIALS WITH MORE THAN TWO OUTCOMES155

Example 4.22 A die is thrown 12 times. What is the probability that

each number will come up twice? Here there are six outcomes, 1, 2,

3, 4, 5, 6 corresponding to the six sides of the die. We assign each

1

outcome probability 6 . We are then asked for

111111

f (2, 2, 2, 2, 2, 2; , , , , , )

666666

which is

12 111111

( )2 ( )2 ( )2 ( )2 ( )2 ( )2 = .0034.

2, 2, 2, 2, 2, 2 6 6 6 6 6 6

ā™¦

Example 4.23 Suppose that we have an independent trials process

with four outcomes a1 , a2 , a3 , a4 occurring with probability p1 , p2 . p3 ,

p4 , respectively. It might be that we are interested only in the proba-

bility that r1 occurrences of a1 and r2 occurrences of a2 will take place

with no speciļ¬cation about the number of each of the other possible

outcomes. To answer this question we simply consider a new exper-

iment where the outcomes are a1 , a2 , a3 . Here a3 corresponds to an

ĀÆ ĀÆ

occurrence of either a3 or a4 in our original experiment. The corre-

sponding probabilities would be p1 , p2 and p3 with p3 = p3 + p4 . Let

ĀÆ ĀÆ

r3 = n ā’ (r1 + r2 ) Then our question is answered by ļ¬nding the prob-

ĀÆ

ability in our new experiment for r1 occurrences of a1 , r2 of a2 , and r3 ĀÆ

of a3 , which is

ĀÆ

n

pr1 Ā· pr2 Ā· p3 rĀÆ3 .

ĀÆ

r1 , r2 , r 3 1 2

ĀÆ

ā™¦

The same procedure can be carried out for experiments with any

number of outcomes where we specify the number of occurrences of

such particular outcomes. For example, if a die is thrown ten times

the probability that a one will occur exactly twice and a three exactly

three times is given by

10 114

( )2 ( )2 ( )5 = .043.

2, 3, 5 6 6 6

156 CHAPTER 4. PROBABILITY THEORY

Exercises

1. Suppose that in a city 60 per cent of the population are Democrats,

30 per cent are Republicans, and 10 per cent are Independents.

What is the probability that if three people are chosen at random

there will be one Republican, one Democrat, and one Independent

voter?

[Ans. .108.]

2. Three horses, A, B, and C, compete in four races. Assuming

that each horse has an equal chance in each race, what is the

probability that A wins two races and B and C win one each?

What is the probability that the same horse wins all four races?

41

[Ans. ; .]

27 27

3. Assume that in a certain large college 40 per cent of the students

are freshmen, 30 per cent are sophomores, 20 per cent are juniors,

and 10 per cent are seniors. A committee of eight is chosen at

random from the student body. What is the probability that

there are equal numbers from each class on the committee?

4. Let us assume that when a batter comes to bat, he or she has

probability .6 of being put out, .1 of getting a walk, .2 of getting

a single, .1 of getting an extra base hit. If he or she comes to bat

ļ¬ve times in a game, what is the probability that

(a) He gets two walks and three singles?

[Ans. .0008.]

(b) He gets a walk, a single, an extra base hit (and is out twice)?

[Ans. .043.]

(c) He has a perfect day (i.e., never out)?

[Ans. .010.]

1

5. Assume that a single torpedo has a probability 2 of sinking a

ship, probability 1 of damaging it, and probability 4 of missing.

1

4

Assume further that two damaging shots sink the ship. What

is the probability that four torpedos will succeed in sinking the

ship?

4.11. INDEPENDENT TRIALS WITH MORE THAN TWO OUTCOMES157

251

[Ans. .]

256

6. Jones, Smith, and Green live in the same house. The mailman has

observed that Jones and Smith receive the same amount of mail

on the average, but that Green receives twice as much as Jones

(and hence also twice as much as Smith). If he or she has four

letters for this house, what is the probability that each resident

receives at least one letter?

7. If three dice are thrown, ļ¬nd the probability that there is one six

and two ļ¬ves, given that all the outcomes are greater than three.

1

[Ans. .]

9

8. An athlete plays a tournament consisting of three games. In each

1 1 1

game he or she has probability 2 for a win, 4 for a loss, and 4 for a

draw, independently of the outcomes of other games. To win the

tournament he or she must win more games than he or she loses.

What is the probability that he or she wins the tournament?

9. Assume that in a certain course the probability that a student

chosen at random will get an A is .1, that he or she will get a B

is .2, that he or she will get a C is .4, that he or she will get a D

is .2, and that he or she will get an F is .1. What distribution of

grades is most likely in the case of four students?

[Ans. One B, two Cā™s, one D.]

10. Let us assume that in a World Series game a batter has probability

1 1 1

of getting no hits, 2 for getting one hit, 4 for getting two hits,

4

assuming that the probability of getting more than two hits is

negligible. In a four-game World Series, ļ¬nd the probability that

the batter gets

(a) Exactly two hits.

7

[Ans. .]

64

(b) Exactly three hits.

7

[Ans. .]

32

(c) Exactly four hits.

158 CHAPTER 4. PROBABILITY THEORY

35

[Ans. .]

128

(d) Exactly ļ¬ve hits.

7

[Ans. .]

32

(e) Fewer than two hits or more than ļ¬ve.

23

[Ans. .]

128

11. Gypsies sometimes toss a thick coin for which heads and tails are

1

equally likely, but which also has probability 5 of standing on

edge (i.e., neither heads nor tails). What is the probability of

exactly one head and four tails in ļ¬ve tosses of a gypsy coin?

12. A family car is driven by the father, two sons, and the mother.

The fenders have been dented four times, three times while the

mother was driving. Is it fair to say that the mother is a worse

driver than the men?

4.12 Expected value

In this section we shall discuss the concept of expected value. Although

it originated in the study of gambling games, it enters into almost any

detailed probabilistic discussion.

Deļ¬nition. If in an experiment the possible outcomes are numbers,

a1 , a2 , . . . , ak , occurring with probability p1 , p2 , . . . , pk , then the expected

value is deļ¬ned to be

E = a 1 p1 + a 2 p2 + . . . + a k pk .

The term āexpected valueā is not to be interpreted as the value that

will necessarily occur on a single experiment. For example, if a person

bets $1 that a head will turn up when a coin is thrown, he or she may

either win $1 or lose $1. His expected value is (1)( 1 ) + (ā’1)( 1 ) = 0,

2 2

which is not one of the possible outcomes. The term, expected value,

had its origin in the following consideration. If we repeat an experiment

with expected value E a large number of times, and if we expect a1 a

fraction p1 of the time, a2 a fraction p2 of the time, etc., then the average

that we expect per experiment is E. In particular, in a gambling game

E is interpreted as the average winning expected in a large number

of plays. Here the expected value is often taken as the value of the

game to the player. If the game has a positive expected value, the

159

4.12. EXPECTED VALUE

game is said to be favorable; if the game has expected value zero it

is said to be fair; and if it has negative expected value it is described

as unfavorable. These terms are not to be taken too literally, since

many people are quite happy to play games that, in terms of expected

value, are unfavorable. For instance, the buying of life insurance may

be considered an unfavorable game which most people choose to play.

Example 4.24 For the ļ¬rst example of the application of expected

value we consider the game of roulette as played at Monte Carlo. There

are several types of bets which the gambler can make, and we consider

two of these.

The wheel has the number 0 and the numbers from 1 to 36 marked

on equally spaced slots. The wheel is spun and a ball comes to rest

in one of these slots. If the player puts a stake, say of $1, on a given

number, and the ball comes to rest in this slot, then he or she receives

from the croupier 36 times the stake, or $36. The player wins $35

with probability 37 and loses $1 with probability 36 . Hence his or her

1

37

expected winnings are

1 36

36 Ā· ā’1Ā· = ā’.027.

37 37

This can be interpreted to mean that in the long run the player can

expect to lose about 2.7 per cent of his or her stakes.

A second way to play is the following. A player may bet on āredā

or āblackā. The numbers from 1 to 36 are evenly divided between the

two colors. If a player bets on āredā, and a red number turns up, the

player receives twice the stake. If a black number turns up, the player

loses the stake. If 0 turns up, then the wheel is spun until it stops on

a number diļ¬erent from 0. If this is black, the player loses; but if it is

red, the player receives only the original stake, not twice it. For this

type of play, the player wins $1 with probability 18 , breaks even with

37

probability 2 Ā· 37 , and loses $1 with probability 37 + 1 Ā· 37 . Hence his

1 1 18 1

2

or her expected winning is

18 1 37

1Ā· +0Ā· ā’1Ā· = ā’.0135.

37 74 74

In this case the player can expect to lose about 1.35 per cent of his

or her stakes in the long run. Thus the expected loss in this case is

ā™¦

only half as great as in the previous case.

160 CHAPTER 4. PROBABILITY THEORY

Example 4.25 A player rolls a die and receives a number of dollars

corresponding to the number of dots on the face which turns up. What

should the player pay for playing, to make this a fair game? To answer

this question, we note that the player wins 1, 2, 3, 4, 5 or 6 dollars,

1

each with probability 6 . Hence, the playerā™s expected winning is

1 1 1 1 1 1

1( ) + 2( ) + 3( ) + 4( ) + 5( ) + 6( ) = 3.5.

6 6 6 6 6 6

ā™¦

Thus if the player pays $3.50, the expected winnings will be zero.

Example 4.26 What is the expected number of successes in the case

1

of four independent trials with probability 3 for success? We know that

4

the probability of x successes is x ( 3 )x ( 2 )4ā’x . Thus

1

3

4 1024 4 1123 4 1222

E = 0Ā· ( ) ( ) +1Ā· ( ) ( ) +2Ā· ( )( ) +

033 133 233

4 1321 4 1420

3Ā· ( ) ( ) +4Ā· ( )( )

333 433

32 48 24 4 108 4

= 0+ + + + = =.

81 81 81 81 81 3

In general, it can be shown that in n trials with probability p for success,

ā™¦

the expected number of successes is np.

Example 4.27 In the game of craps a pair of dice is rolled by one of

the players. If the sum of the spots shown is 7 or 11, he or she wins.

If it is 2, 3, or 12, he or she loses. If it is another sum, he or she must

continue rolling the dice until he or she either repeats the same sum or

rolls a 7. In the former case he or she wins, in the latter he or she loses.

Let us suppose that he or she wins or loses $1. Then the two possible

outcomes are +1 and ā’1. We will compute the expected value of the

game. First we must ļ¬nd the probability that he or she will win.

We represent the possibilities by a two-stage tree shown in Figure

4.21. While it is theoretically possible for the game to go on indeļ¬-

nitely, we do not consider this possibility. This means that our analysis

applies only to games which actually stop at some time.

The branch probabilities at the ļ¬rst stage are determined by think-

ing of the 36 possibilities for the throw of the two dice as being equally

likely and taking in each case the fraction of the possibilities which

161

4.12. EXPECTED VALUE

Figure 4.21: ā™¦

correspond to the branch as the branch probability. The probabilities

for the branches at the second level are obtained as follows. If, for

example, the ļ¬rst outcome was a 4, then when the game ends, a 4 or 7

must have occurred. The possible outcomes for the dice were

{(3, 1), (1, 3), (2, 2), (4, 3), (3, 4), (2, 5), (5, 2), (1, 6), (6, 1)}.

Again we consider these possibilities to be equally likely and assign to

the branch considered the fraction of the outcomes which correspond to

this branch. Thus to the 4 branch we assign a probability 9 = 1 . The

3

3

other branch probabilities are determined in a similar way. Having the

tree measure assigned, to ļ¬nd the probability of a win we must simply

add the weights of all paths leading to a win. If this is done, we obtain

244

. Thus the playerā™s expected value is

495

244 251 7

1Ā·( ) + (ā’1) Ā· ( )=ā’ = ā’.0141.

495 495 495

Hence the player can expect to lose 1.41 per cent of his or her stakes

in the long run. It is interesting to note that this is just slightly less

ā™¦

favorable than the losses in betting on āredā in roulette.

162 CHAPTER 4. PROBABILITY THEORY

Exercises

1. Suppose that A tosses two coins and receives $2 if two heads

appear, $1 if one head appears, and nothing if no heads appear.

What is the expected value of the game to A?

[Ans. $1.]

2. Smith and Jones are matching coins. If the coins match, Smith

gets $1, and if they do not, Jones get $1.

(a) If the game consists of matching twice, what is the expected

value of the game for Smith?

(b) Suppose that Smith quits if he or she wins the ļ¬rst round he

or she quits, and plays the second round if he or she loses the

the ļ¬rst. Jones is not allowed to quit. What is the expected

value of the game for Smith?

3. If ļ¬ve coins are thrown, what is the expected number of heads

that will turn up?

5

[Ans. .]

2

4. A coin is thrown until the ļ¬rst time a head comes up or until

three tails in a row occur. Find the expected number of times the

coin is thrown.

5. A customer wishes to purchase a ļ¬ve cent newspaper. The cus-

tomer has in his or her pocket one dime and ļ¬ve pennies. The

news agent oļ¬ers to let the customer have the paper in exchange

for one coin drawn at random from the customerā™s pocket.

(a) Is this a fair proposition and, if not, to whom is it favorable?

[Ans. Favorable to customer.]

(b) Answer the same question assuming that the news agent

demands two coins drawn at random from the customerā™s

pocket.

[Ans. Fair proposition.]

6. A bets 50 cents against Bā™s x cents that, if two cards are dealt

from a shuļ¬„ed pack of ordinary playing cards, both cards will be

of the same color. What value of x will make this bet fair?

163

4.12. EXPECTED VALUE

7. Prove that if the expected value of a given experiment is E, and

if a constant c is added to each of the outcomes, the expected

value of the new experiment is E + c.

8. Prove that, if the expected value of a given experiment is E, and

if each of the possible outcomes is multiplied by a constant k, the

expected value of the new experiment is k Ā· E.

9. A gambler plays the following game: A card is drawn from a

bridge deck; if it is an ace, the gambler wins $5; if it is a jack, a

queen or a king, he or she wins $2; for any other card he or she

loses $1. What is the expected winning per play?

10. An urn contains two black and three white balls. Balls are suc-

cessively drawn from the urn without replacement until a black

ball is obtained. Find the expected number of draws required.

11. Using the result of Exercises 13 and 14 of Section 4.6, ļ¬nd the

expected number of games in the World Series (a) under the as-

sumption that each team has probability 1 of winning each game

and (b) under the assumption that the stronger team has proba-

bility .6 of winning each game.

[Ans. 5.81; 5.75.]

12. Suppose that we modify the game of craps as follows: On a 7 or

11 the player wins $2, on a 2, 3, or 12 he or she loses $3; otherwise

the game is as usual. Find the expected value of the new game,

and compare it with the old value.

13. Suppose that in roulette at Monte Carlo we place 50 cents on

āredā and 50 cents on āblackā. What is the expected value on

the game? Is this better or worse than placing $1 on āredā?

14. Betting on āredā in roulette can be described roughly as follows.

We win with probability .49, get our money back with probability

.01, and lose with probability .50. Draw the tree for three plays

of the game, and compute (to three decimals) the probability of

each path. What is the probability that we are ahead at the end

of three bets?

[Ans. .485.]

164 CHAPTER 4. PROBABILITY THEORY

15. Assume that the odds are r : s that a certain statement will be

true. If a gambler receives s dollars if the statement turns out

to be true, and gives r dollars if not, what is his or her expected

winning?

16. Referring to Exercise 9 of Section 4.3, ļ¬nd the expected number

of languages that a student chosen at random reads.

17. Referring to Exercise 5 of Section 4.4, ļ¬nd the expected number

of men who get their own hats.

[Ans. 1.]

18. A pair of dice is rolled. Each die has the number 1 on two opposite

faces, the number 2 on two opposite faces, and the number 3 on

two opposite faces. The ārollerā wins a dollar if

(i) the sum of four occurs on the ļ¬rst roll; or

(ii) the sum of three or ļ¬ve occurs on the ļ¬rst roll and the same

sum occurs on a subsequent roll before the sum of four occurs.

Otherwise he or she loses a dollar.

(a) What is the probability that the person rolling the dice wins?

23

[Ans. .]

45

(b) What is the expected value of the game?

1

[Ans. .]

45

4.13 Markov chains

In this section we shall study a more general kind of process than the

ones considered in the last three sections.

We assume that we have a sequence of experiments with the fol-

lowing properties. The outcome of each experiment is one of a ļ¬nite

number of possible outcomes a1 , a2 , . . . , ar . It is assumed that the prob-

ability of outcome aj on any given experiment is not necessarily inde-

pendent of the outcomes of previous experiments but depends at most

upon the outcome of the immediately preceding experiment. We as-

sume that there are given numbers pij which represent the probability

165

4.13. MARKOV CHAINS

Figure 4.22: ā™¦

of outcome aj on any given experiment, given that outcome ai oc-

curred on the preceding experiment. The outcomes a1 , a2 , . . . , ar are

called states, and the numbers pij are called transition probabilities. If

we assume that the process begins in some particular state, then we

have enough information to determine the tree measure for the process

and can calculate probabilities of statements relating to the over-all se-

quence of experiments. A process of the above kind is called a Markov

chain process.

The transition probabilities can be exhibited in two diļ¬erent ways.

The ļ¬rst way is that of a square array. For a Markov chain with states

a1 , a2 , and a3 , this array is written as

ļ£« ļ£¶

p11 p12 p13

ļ£¬ ļ£·

P = ļ£ p21 p22 p23 ļ£ø .

p31 p32 p33

Such an array is a special case of a matrix. Matrices are of fundamental

importance to the study of Markov chains as well as being important

in the study of other branches of mathematics. They will be studied in

detail in ??.

A second way to show the transition probabilities is by a transition

diagram. Such a diagram is illustrated for a special case in Figure

4.22. The arrows from each state indicate the possible states to which

a process can move from the given state.

The matrix of transition probabilities which corresponds to this

166 CHAPTER 4. PROBABILITY THEORY

diagram is the matrix

a1 a2 a3ļ£¶

ļ£«

a1 010

1ļ£·.

ļ£¬ 1

a2 ļ£0 2 2ļ£ø

1

02

a3 3 3

An entry of 0 indicates that the transition is impossible.

Notice that in the matrix P the sum of the elements of each row is

1. This must be true in any matrix of transition probabilities, since the

elements of the ith row represent the probabilities for all possibilities

when the process is in state ai .

The kind of problem in which we are most interested in the study

of Markov chains is the following. Suppose that the process starts in

state i. What is the probability that after n steps it will be in state

(n)

j? We denote this probability by pij . Notice that we do not mean by

this the nth power of the number pij . We are actually interested in this

probability for all possible starting positions i and all possible terminal

positions j. We can represent these numbers conveniently again by a

matrix. For example, for n steps in a three-state Markov chain we write

these probabilities as the matrix

ļ£« ļ£¶

(n) (n) (n)

p p12 p13

ļ£¬ 11 (n) ļ£·

= ļ£¬ p(n) p(n) p23 ļ£· .

P (n) ļ£ 21 ļ£ø

22

(n) (n) (n)

p31 p32 p33

Example 4.28 Let us ļ¬nd for a Markov chain with transition proba-

bilities indicated in Figure 4.22 the probability of being at the various

possible states after three steps, assuming that the process starts at

state a1 . We ļ¬nd these probabilities by constructing a tree and a tree

measure as in Figure 4.23.

(3)

The probability p13 , for example, is the sum of the weights assigned

by the tree measure to all paths through our tree which end at state

a3 . That is,

11 12 7

(3)

p13 = 1 Ā· Ā· + 1 Ā· Ā· = .

22 23 12

Similarly

11 1

(3)

p12 = 1 Ā· Ā· =

22 4

and

11 1

(3)

p11 = 1 Ā· Ā· = .

23 6

167

4.13. MARKOV CHAINS

Figure 4.23: ā™¦

By constructing a similar tree measure, assuming that we start

(3) (3) (3)

at state a2 , we could ļ¬nd p21 ,p22 ,and p23 . The same is true for

(3) (3) (3)

p31 ,p32 ,and p33 . If this is carried out (see Exercise 7) we can write

the results in matrix form as follows:

a1 a2 a3 ļ£¶

ļ£« 1 1 7

a1

P (3) = 6 4 12 .

ļ£¬ ļ£·

7 7 37

a2 ļ£ ļ£ø

36 24 72

4 7 25

a3 27 18 54

Again the rows add up to 1, corresponding to the fact that if we start

at a given state we must reach some state after three steps. Notice

now that all the elements of this matrix are positive, showing that it is

possible to reach any state from any state in three steps. In the next

chapter we will develop a simple method of computing P (n) . ā™¦

Example 4.29 example:4.13.2 Suppose that we are interested in study-

ing the way in which a given state votes in a series of national elec-

tions. We wish to make long-term predictions and so will not consider

conditions peculiar to a particular election year. We shall base our

predictions only on the past history of the outcomes of the elections,

Republican or Democratic. It is clear that a knowledge of these past

results would inļ¬‚uence our predictions for the future. As a ļ¬rst ap-

proximation, we assume that the knowledge of the past beyond the last

election would not cause us to change the probabilities for the outcomes

on the next election. With this assumption we obtain a Markov chain

with two states R and D and matrix of transition probabilities

RD

1ā’a

R a .

1ā’b

D b

168 CHAPTER 4. PROBABILITY THEORY

The numbers a and b could be estimated from past results as follows.

We could take for a the fraction of the previous years in which the

outcome has changed from Republican in one year to Democratic in

the next year, and for b the fraction of reverse changes.

We can obtain a better approximation by taking into account the

previous two elections. In this case our states are RR, RD, DR, and

DD, indicating the outcome of two successive elections. Being in state

RR means that the last two elections were Republican victories. If the

next election is a Democratic victory, we will be in state RD. If the

election outcomes for a series of years is DDDRDRR, then our process

has moved from state DD to DD to DR to RD to DR, and ļ¬nally to RR.

Notice that the ļ¬rst letter of the state to which we move must agree

with the second letter of the state from which we came, since these

refer to the same election year. Our matrix of transition probabilities

will then have the form,

RR DR RD DD

ļ£« ļ£¶

1ā’a

RR 0 a 0

ļ£¬ ļ£·

1ā’b

DR b 0 0 .

ļ£¬ ļ£·

ļ£¬ ļ£·

1ā’c

RD 0 0 c

ļ£ ļ£ø

1ā’d

DD 0 d 0

Again the numbers a, b, c, and d would have to be estimated. The

ā™¦

study of this example is continued in ??.

Example 4.30 The following example of a Markov chain has been

used in physics as a simple model for diļ¬usion of gases. We shall see

later that a similar model applies to an idealized problem in changing

populations.

We imagine n black balls and n white balls which are put into

two urns so that there are n balls in each urn. A single experiment

consists in choosing a ball from each urn at random and putting the ball

obtained from the ļ¬rst urn into the second urn, and the ball obtained

from the second urn into the ļ¬rst. We take as state the number of

black balls in the ļ¬rst urn. If at any time we know this number, then

we know the exact composition of each urn. That is, if there are j black

balls in urn 1, there must be n ā’ j black balls in urn 2, n ā’ j white

balls in urn 1, and j white balls in urn 2. If the process is in state j,

then after the next exchange it will be in state j ā’ 1, if a black ball is

chosen from urn 1 and a white ball from urn 2. It will be in state j if a

ball of the same color is drawn from each urn. It will be in state j + 1

169

4.13. MARKOV CHAINS

if a white ball is drawn from urn 1 and a black ball from urn 2. The

transition probabilities are then given by (see Exercise 12)

j

pjjā’1 = ( )2 , j > 0

n

2j(n ā’ j)

pjj =

n2

nā’j 2

pjj+1 = ( ) , j<n

n

pjk = 0 otherwise.

A physicist would be interested, for example, in predicting the compo-

sition of the urns after a certain number of exchanges have taken place.

Certainly any predictions about the early stages of the process would

depend upon the initial composition of the urns. For example, if we

started with all black balls in urn 1, we would expect that for some time

there would be more black balls in urn 1 than in urn 2. On the other

hand, it might be expected that the eļ¬ect of this initial distribution

would wear oļ¬ after a large number of exchanges. We shall see later,

ā™¦

in ??, that this is indeed the case.

Exercises

1. Draw a state diagram for the Markov chain with transition prob-

abilities given by the following matrices.

ļ£« ļ£¶

1 1

0

2 2

ļ£¬

0 1 0 ļ£·,

ļ£ ļ£ø

1 1

02

2

ļ£« ļ£¶

1 1 1

3 3 3

ļ£¬ ļ£·

1 1 1

ļ£ø,

ļ£ 3 3 3

1 1 1

3 3 3

01

,

10

ļ£« ļ£¶

0 100

ļ£¬ 0 0 0ļ£·

1

ļ£¬ ļ£·

1 ļ£·.

ļ£¬ 1

0 0 2 2ļ£ø

ļ£

011

0 2 2

170 CHAPTER 4. PROBABILITY THEORY

Figure 4.24: ā™¦

2. Give the matrix of transition probabilities corresponding to the

transition diagrams in Figure 4.24.

3. Find the matrix P (2) for the Markov chain determined by the

matrix of transition probabilities

1 1

2 2

P= .

1 2

3 3

5 7

12 12

[Ans. .]

7 11

18 18

4. What is the matrix of transition probabilities for the Markov

chain in Example 4.30, for the case of two white balls and two

black balls?

5. Find the matrices P (2) , P (3) , P (4) for the Markov chain deter-

mined by the transition probabilities

10

.

01

Find the same for the Markov chain determined by the matrix

01

.

10

171

4.13. MARKOV CHAINS

6. Suppose that a Markov chain has two states, a1 and a2 , and

transition probabilities given by the matrix

1 2

3 3 .

1 1

2 2

By means of a separate chance device we choose a state in which

1

to start the process. This device chooses a1 with probability 2

1

and a2 with probability 2 . Find the probability that the process

is in state a1 after the ļ¬rst step. Answer the same question in the

1

case that the device chooses a1 with probability 3 and a2 with

2

probability 3 .

54

[Ans. ; .]

12 9

7. Referring to the Markov chain with transition probabilities indi-

cated in Figure 4.22, construct the tree measures and determine

the values of

(3) (3) (3)

p21 , p22 , p23

and

(3) (3) (3)

p31 , p32 , p33 .

8. A certain calculating machine uses only the digits 0 and 1. It is

supposed to transmit one of these digits through several stages.

However, at every stage there is a probability p that the digit

which enters this stage will be changed when it leaves. We form a

Markov chain to represent the process of transmission by taking

as states the digits 0 and 1. What is the matrix of transition

probabilities?

9. For the Markov chain in Exercise 8, draw a tree and assign a tree

measure, assuming that the process begins in state 0 and moves

through three stages of transmission. What is the probability

that the machine after three stages produces the digit 0, i.e., the

correct digit? What is the probability that the machine never

changed the digit from 0?

10. Assume that a manā™s profession can be classiļ¬ed as professional,

skilled laborer, or unskilled laborer. Assume that of the sons

of professional men 80 per cent are professional, 10 per cent are

skilled laborers, and 10 per cent are unskilled laborers. In the

172 CHAPTER 4. PROBABILITY THEORY

Figure 4.25: ā™¦

case of sons of skilled laborers, 60 per cent are skilled laborers, 20

per cent are professional, and 20 per cent are unskilled laborers.

Finally, in the case of unskilled laborers, 50 per cent of the sons

are unskilled laborers, and 25 per cent each are in the other two

categories. Assume that every man has a son, and form a Markov

chain by following a given family through several generations. Set

up the matrix of transition probabilities. Find the probability

that the grandson of an unskilled laborer is a professional man.

[Ans. .375.]

11. In Exercise 10 we assumed that every man has a son. Assume

instead that the probability a man has a son is .8. Form a Markov

chain with four states. The ļ¬rst three states are as in Exercise

10, and the fourth state is such that the process enters it if a man

has no son, and that the state cannot be left. This state repre-

sents families whose male line has died out. Find the matrix of

transition probabilities and ļ¬nd the probability that an unskilled

laborer has a grandson who is a professional man.

[Ans. .24.]

12. Explain why the transition probabilities given in Example 4.30

are correct.

Supplementary exercises.

13. Five points are marked on a circle. A process moves clockwise

2

from a given point to its neighbor with probability 3 , or counter-

clockwise to its neighbor with probability 1 .

3

173

4.13. MARKOV CHAINS

(a) Considering the process to be a Markov chain process, ļ¬nd

the matrix of transition probabilities.

(b) Given that the process starts in a state 3, what is the prob-

ability that it returns to the same state in two steps?

14. In northern New England, years for apples can be described as

good, average, or poor. Suppose that following a good year the

probabilities of good, average, or poor years are respectively .4, .4,

and .2. Following a poor year the probabilities of good, average,

or poor years are .2, .4, and .4 respectively. Following an average

year the probabilities that the next year will be good or poor are

each .2, and of an average year, .6.

(a) Set up the transition matrix of this Markov chain.

(b) 1965 was a good year. Compute the probabilities for 1966,

1967, and 1968.

[Ans. For 1967: .28, .48, .24.]

1

15. In Exercise 14 suppose that there is probability 4 for a good

year, 2 for an average year, and 1 for a poor year. What are the

1

4

probabilities for the following year?

16. A teacher in an oversized mathematics class ļ¬nds, after grading

all homework papers for the ļ¬rst two assignments, that it is nec-

essary to reduce the amount of time spent in such grading. He

therefore designs the following system: Papers will be marked

satisfactory or unsatisfactory. All papers of students receiving a

mark of unsatisfactory on any assignment will be read on each of

the two succeeding days. Of the remaining papers, the teacher

will read one-ļ¬fth, chosen at random. Assuming that each paper

has a probability of one-ļ¬fth of being classiļ¬ed āunsatisfactoryā,

(a) Set up a three-state Markov chain to describe the process.

(b) Suppose that a student has just handed in a satisfactory

paper. What are the probabilities for the next two assign-

ments?

17. In another model for diļ¬usion, it is assumed that there are two

urns which together contain N balls numbered from 1 to N . Each

second a number from 1 to N is chosen at random, and the ball

with the corresponding number is moved to the other urn. Set

174 CHAPTER 4. PROBABILITY THEORY

up a Markov chain by taking as state the number of balls in urn

1. Find the transition matrix.

4.14 The central limit theorem

We continue our discussion of the independent trials process with two

outcomes. As usual, let p be the probability of success on a trial, and

f (n, p; x) be the probability of exactly x successes in n trials.

In Figure 4.26 we have plotted bar graphs which represent f (n, .3; x)

for n = 10, 50, 100, and 200. We note ļ¬rst of all that the graphs are

drifting oļ¬ to the right. This is not surprising, since their peaks occur

at np, which is steadily increasing. We also note that while the total

area is always 1, this area becomes more and more spread out.

We want to redraw these graphs in a manner that prevents the

drifting and the spreading out. First of all, we replace x by x ā’ np,

assuring that our peak always occurs at 0. Next we introduce a new

unit for measuring the deviation, which depends on n, and which gives

comparable scales. As we saw in Section 4.10, the standard deviation

ā

npq is such a unit.

We must still insure that probabilities are represented by areas in

the graph. In Figure 4.26 this is achieved by having a unit base for each

rectangle, and having the probability f (n, p; x) as height. Since we are

now representing a standard deviation as a single unit on the horizontal

ā

axis, we must take f (n, p; x) npq as the heights of our rectangles. The

resulting curves for n = 50 and 200 are shown in Figures 4.27 and 4.28,

respectively.

We note that the two ļ¬gures look very much alike. We have also

shown in Figure 4.28 that it can be approximated by a bell-shaped

curve. This curve represents the function

1 2

f (x) = ā eā’x /2

2Ļ

and is known as the normal curve. It is a fundamental theorem of

probability theory that as n increases, the appropriately rescaled bar

graphs more and more closely approach the normal curve. The theorem

is known as the Central Limit Theorem, and we have illustrated it

graphically.

More precisely, the theorem states that for any two numbers a and

175

4.14. THE CENTRAL LIMIT THEOREM

Figure 4.26: ā™¦

176 CHAPTER 4. PROBABILITY THEORY

Figure 4.27: ā™¦

Figure 4.28: ā™¦

177

4.14. THE CENTRAL LIMIT THEOREM

Figure 4.29: ā™¦

b, with a < b,

x ā’ np

Pr[a < ā < b]

npq

approaches the area under the normal curve between a and b, as n

increases. This theorem is particularly interesting in that the normal

curve is symmetric about 0, while f (n, p; x) is symmetric about the

1

expected value np only for the case p = 2 . It should also be noted that

we always arrive at the same normal curve, no matter what the value

of p is.

In Figure 4.29 we give a table for the area under the normal curve

between 0 and d. Since the total area is 1, and since it is symmetric

about the origin, we can compute arbitrary areas from this table. For

example, suppose that we wish the area between ā’1 and +2. The area

between 0 and 2 is given in the table as .477. The area between ā’1

and 0 is the same as between 0 and 1, and hence is given as .341. Thus

the total area is .818. The area outside the interval (ā’1, 2) is then

178 CHAPTER 4. PROBABILITY THEORY

1 ā’ .818 = .182.

Example 4.31 Let us ļ¬nd the probability that x diļ¬ers from the ex-

pected value np by as much as d standard deviations.

x ā’ np

ā

Pr[|x ā’ np| ā„ d npq] = Pr[| ā ā„ d].

npq

and hence the approximate answer should be the area outside the

interval (ā’d, d) under the normal curve. For d = 1, 2, 3 we obtain

1 ā’ (2 Ā· .341) = .318, 1 ā’ (2 Ā· .477) = .046, 1 ā’ (2 Ā· .4987) = .0026, re-

spectively. These agree with the values given in Section 4.10, to within

rounding errors. In fact, the Central Limit Theorem is the basis of

ā™¦

those estimates.

Example 4.32 In Example 4.19 we considered the example of throw-

ing a coin 10,000 times. The expected ā number of heads that turn up

is 5000 and the standard deviation is 10, 000 Ā· 2 Ā· 1 = 50. We ob-

1

2

served that the probability of a deviation of more than two standard

deviations (or 100) was very unlikely. On the other hand, consider the

probability of a deviation of less than .1 standard deviation. That is,

of a deviation of less than ļ¬ve. The area from 0 to .1 under the normal

curve is .040 and hence the probability of a deviation from 5000 of less

than ļ¬ve is approximately .08. Thus, while a deviation of 100 is very

unlikely, it is also very unlikely that a deviation of less than ļ¬ve will

ā™¦

occur.

Example 4.33 The normal approximation can be used to estimate

the individual probabilities f (n, x; p) for large n. For example, let us

estimate f (200, 65; .3). The graph of the probabilities f (200, x; .3) was

given in Figure 4.28 together with the normal approximation. The

desired probability is the area of the bar corresponding to x = 65. An

inspection of the graph suggests that we should take the area under the

normal curve between 64.5 and 65.5 as an estimate for this probability.

In normalized units this is the area between

4.5

ā

200(.3)(.7)

and

5.5

ā

200(.3)(.7)

179

4.14. THE CENTRAL LIMIT THEOREM

or between .6944 and .8487. Our table is not ļ¬ne enough to ļ¬nd this

area, but from more complete tables, or by machine computation, this

area may be found to be .046 to three decimal places. The exact value

to three decimal places is .045. This procedure gives us a good estimate.

If we check all of the values of f (200, x; .3) we ļ¬nd in each case

that we would make an error of at most .001 by using the normal

approximation. There is unfortunately no simple way to estimate the

error caused by the use of the Central Limit Theorem. The error will

clearly depend upon how large n is, but it also depends upon how near

1

ā™¦

p is to 0 or 1. The greatest accuracy occurs when p is near 2 .

Example 4.34 Suppose that a drug has been administered to a num-

ber of patients and found to be eļ¬ective a fraction p of the time. Assum-

ing an independent trials process, it is natural to take p as an estimate

for the unknown probability p for success on any one trial. It is useful

to have a method of estimating the reliability of this estimate. One

method is the following. Let x be the number of successes for the drug

given to n patients. Then by the Central Limit Theorem

x ā’ np

| ā¤ 2] ā .95.

Pr[| ā

npq

This is the same as saying

x/n ā’ p

| ā¤ 2] ā .95.

Pr[|

pq/n

Putting p = x/n, we have

ĀÆ

Pr[|ĀÆ ā’ p| ā¤ 2 pq/n] ā .95.

p

Using the fact that pq < 4 (see Exercise 12) we have

1

Pr[|ĀÆ ā’ p| ā¤ ā ] ā„ .95.

p

n

This says that no matter what p is, with probability ā„ .95, the true

1

value will not deviate from the estimate p by more than ān It is cus-

tomary then to say that

1 1

pā’ ā ā¤pā¤p+ ā

ĀÆ ĀÆ

n n

180 CHAPTER 4. PROBABILITY THEORY

with conļ¬dence .95. The interval

1 1

[ĀÆ ā’ ā , p + ā ]

p ĀÆ

n n

is called a 95 per cent conļ¬dence interval. Had we started with

x ā’ np

| ā¤ 3] ā .99,

Pr[| ā

npq

we would have obtained the 99 per cent conļ¬dence interval

3 3

[ĀÆ ā’ ā , p + ā ]

p ĀÆ

2n 2n

For example, if in 400 trials the drug is found eļ¬ective 124 times,

or .31 of the times, the 95 per cent conļ¬dence interval for p is

1 1

[.31 ā’ , .31 + ] = [.26, .36]

20 20

and the 99 per cent conļ¬dence interval is

3 3

[.31 ā’ , .31 + ] = [.235, .385].

40 40

ā™¦

Exercises

1. Let x be the number of successes in n trials of an independent

trials process with probability p for success. Let x = xā’np For

ā

npq

large n estimate the following probabilities.

(a) Pr[x < ā’2.5].

[Ans. .006.]

(b) Pr[x < 2.5].

(c) Pr[x ā„ ā’.5].

(d) Pr[ā’1.5 < x < 1].

[Ans. .774.]

2. A coin is biased in such a way that a head comes up with proba-

bility .8 on a single toss. Use the normal approximation to esti-

mate the probability that in a million tosses there are more than

ńņš. 5 |