Without data, all you are is just another person with an opinion.
Unknown

Solutions to Supplemental Exercises for Math 10: Introductory Statistics

Answers to the homework sets from the textbook (those labeled with letters) may be found in the book's appendix. In consideration of others who wish to use the textbook without the answers to the review exercises available, I will not post those. Here are answers to the exercises I wrote to supplement the book.

Part IV: Probability

Chapter 13: What are the chances?

1. Roll 4 six-sided dice, colored red, blue, green, and yellow, and observe the numbers that come up.
1. What is the probability the red and blue dice show 1? 1/36
2. What is the probability all dice show even numbers? 1/16
3. What is the probability at least one die shows a 6? 671/1296 or 51.8%
4. What is the probability the red die shows 5 and the green die shows 3? 1/36
2. Draw two cards, one at a time and without replacement, from a well-shuffled deck of 52.
1. What is the probability both cards are kings? 1/221
2. What is the probability neither card is a king? 188/221
3. What is the probability the second card is red, given that the first card was a diamond? 25/51
4. What is the probability the second card is a heart, given that the first card was a spade? 13/51
3. Roll one die and observe the number showing. Are the events "is greater than 4" and "is an even number" dependent or independent? independent
4. You have a bag containing 5 blue marbles, 6 black chessmen, 5 white chessmen, 12 black checkers, and 5 red checkers.
1. Draw one object at random from the bag and observe its color and type.
1. What is the probability the object is black? 6/11
2. Are the events "is black" and "is a chessman" independent? How about "is black" and "is a checker"? Finally, "is a chessman" and "is a checker"? independent, dependent, dependent
3. What is the probability that the object is a checker given that it is (a) red, or (b) blue? (a) 1, (b) 0
2. Draw two objects from the bag, one at a time, with replacement.
1. What is the probability both objects are blue? 25/1089 or 2.3%
2. What is the probability neither object is blue? 784/1089 or 72%
3. What is the probability both objects are blue, given that at least one was blue? 25/305 or 2.3/28 or 8.2%
3. Draw two objects from the bag, one at a time, without replacement.
1. What is the probability both objects are red? 5/264 or 1.9%
2. What is the probability neither object is red? 63/88 or 71.6%
3. What do you think the probability should be that exactly one of the objects is red? 26.5%; see Chp 14 Sec 2 and 4

1. A coin is flipped and two dice are rolled. What is the probability of getting a head or a pair of 1s? 37/72 or 51.4%
2. The probability of event A is 30% and the probability of event B is 40%. If the probability of event "A and B" is 15%, what is the probability that neither event A nor event B occurs? 45%
3. The probability of event A is 45%, the probability of event B is 20%, and the probability of event "A or B" is 56%. Are event A and event B independent? Yes
4. The probability of event A is 20%, and the probability of event B is 50%. What are the minimum and maximum possible values for the probability of the events "A and B" and "A or B"? A and B: 0 to 20%; A or B: 50 to 70%
5. Repeat exercise 4 with probabilities 40% and 80% for events A and B, respectively. A and B: 20 to 40%; A or B: 80 to 100%.

Chapter 15: The binomial formula

1. 5 dice are rolled.
1. What is the probability of getting exactly two 3s? 625/3888 or 16.1%
2. What is the probability of getting exactly three 5s? 125/3888 or 3.2%
3. What is the probability of getting exactly two 3s and exactly three 5s? 5/3888 or 0.13%
4. What is the probability of getting exactly two 3s or exactly three 5s? 745/3888 or 19.2%
5. Are the events "exactly two 3s are rolled" and "exactly three 5s are rolled" independent? no
2. Again, roll 5 dice. What is the probability of getting exactly two 3s and exactly one 4? Hint: For the probability, you simply have three specified values and two that can range over the remaining numbers. For position, take two steps: the positions of the 3s and the position of the 4 in the remaining places. 5/81 or 6.2%
3. Calculate 1037 choose 1035 by hand. Note: in fact you might have trouble doing this on a calculator, depending on the order in which it does the operations. 537166

Part II: Descriptive Statistics

Chapter 3: The histogram

1. This exercise extends Set D #2 and is intended to help you keep your Part IV skills alive. We make the highly improbable assumption that there were an equal number of high school and college educated women in the group from which this data was drawn.
1. Suppose you choose a woman from the study at random. What is the probability she has three children? 10.9%
2. Are the events "has one or two children" and "has a college degree" independent? No
2. Answers to Excel exercises will not be provided.

Chapter 4: The average and standard deviation

1. Answers to Excel exercises will not be provided.
2. Suppose I tell you the five-value summary for a set of plant-height measurements (in centimeters) is 2, 5, 6.5, 8, 13. What is the probability that a plant chosen at random from the sample has height at most 13 cm? At most 8 cm? Between 5 and 8 cm? At most 5 cm? 100%, 75%, 50%, 25%.
3. Suppose I now tell you that the mean of the plant heights is 6.5 cm and the standard deviation is 1.5 cm. What assumption do you need to make to draw conclusions about the probability of height when you choose a plant at random from the sample? Given that assumptions, What is the probability that a plant chosen at random from the sample has height at most 9.5 cm? At most 8 cm? Between 5 and 8 cm? At most 5 cm? We must assume the data is (nearly) bell-shaped. Given that, the probabilities are approximately 97.5%, 84%, 68%, 16%.
4. In Exercise Set E (Section 6), #5-7 ask you to calculate the mean and standard deviation for pairs of related data sets. Interpret this graphically; for each of the problems, fill in the blank: The manipulation that produced data set (ii) from data set (i)       a/b/c/d       the histogram. In general, what will the effect on mean and standard deviation be when the histogram is       a/b/c/d      ? (For b and c, assume the left end of the histogram is fixed and the right end moves toward or away from it; for d assume the reflection is about the y-axis.)
1. translated (shifted)
2. dilated (stretched out)
3. compressed
4. horizontally reflected
5-a, 6-a and b, 7-d; a translates the mean and leaves the SD unchanged; b moves the mean up and increases the SD; c moves the mean down and decreases the SD; d negates the mean and leaves the SD unchanged.

Chapter 5: The normal approximation for data

1. Answers to Excel exercises will not be provided.
2. A certain measurement has been made that can range from 0 to 20 cm. A data set has been created from a set of measurements by converting each measurement into a fraction of the maximum, so 10 is converted to 0.5. The mean and standard deviation of the converted data set are 0.6 and 0.2.
1. What are the mean and standard deviation of the original measurements? 12 and 4
2. Translate outcomes 0.4 and 0.7 from the converted data set into both original measurements and standard units. original: 8, 14. standard: -1 and 0.5

Midterm 1 Extra Review

2. If you are comfortable with this one you can feel confident of your skills in calculating probabilities: Six dice are rolled and the numbers showing are observed. Let event A be "exactly two 1s and exactly one 6 appear", and let event B be "at least one 5 appears". Find the probability of A and of B given A. (5•27)/65 (=20/243), 1-(3/4)3
3. How many subsets does a set of 2 elements have? 3 elements? 4? n? [Note: the empty set and the entire original set both count as subsets.] Note: There are two approaches to this, one by the multiplication principle and one by using combinations to count subsets of particular sizes. The result is that the number of subsets of a set of size n is also the sum of the combinations n choose i for i from 0 to n. 4, 8, 16, 2n
4. Suppose you know the value of 1000 choose 40; call it x. Find 1000 choose 960 and 1000 choose 41 in terms of x. x, (960/41)x

Part III: Correlation and Regression

Chapter 8: Correlation

1. Suppose you have two normally-distributed data sets of equal size and are going to match their entries at random. What proportion of the matched pairs will have entries
1. each within one standard deviation of the mean? 0.462
2. each within two standard deviations of the mean? 0.9
3. each more than two standard deviations from the mean? 0.0025
You may use the rounded values of 68% within one SD of the mean and 95% within two SDs of the mean.
2. Answers to Excel exercises will not be provided.
1. Example answer: You might divide the wolves according to age; perhaps the fact that mammals have longer teeth as they age and their gums recede is masking that there actually is a connection to body length here.

Chapter 10: Regression

1. Answers to Excel exercises will not be provided.
2. Suppose I measured the boiling point of water as 202.5 degrees. What would you predict the atmospheric pressure was? 24.8 in
3. Suppose I measured the atmospheric pressure as 21 inches of mercury. What would you expect the boiling point of water to be? 195.2 degrees

Midterm 2 Extra Review

After giving a large group of people two tests, time to win a video game and time to assemble a puzzle, the summary data is as follows.
video game mean: 4 minutes      video game SD: 1.5 minutes
puzzle mean: 3 minutes      puzzle SD: 1 minute
r = 0.7

1. What percentage of the people at the 80th percentile for the video game do you expect to also be at or above the 80th percentile for the puzzle? 36.3%
2. What is the predicted score on the puzzle test for someone at the 80th percentile on the video game test? 3.6
3. If the actual score pair for someone is (5.3, 4), where the video game time is first and the puzzle time second, what is the error for that prediction? 0.4
4. Of course in timed tests we usually think of "better" scores as shorter times. For this problem, then, define 80th percentile to be the score less than or equal to 80% of the times. Does that change affect your answer to part 1? What does it make the answer to part 2? What would the time for the puzzle test be in part 3 to make the error the same as in part 3 but with the video game score now the 80th percentile defined in this way? Part 1 is the same. Part 2 is now 2.4. For part 3 the pair would be (2.7, 2.8).
5. What assumption are you making to compute these estimates? How valid do you think it is - how well do you think your estimates should correspond to reality, especially in part 4 as compared to parts 1-3? We are assuming normal distribution and homoscedasticity for all the computations, and can exploit the symmetry of the normal distribution and the regression estimate (in the latter case, symmetry about the point of averages, with respect to the SD line) to get the new part 1 without recomputing anything, and know that for part 2 we can just go the same distance down from the mean that we previously went up. Of course, the normality assumption has the problem that no one can solve the puzzle or win the game in 0 or fewer minutes, so the tail is cut off; this affects part 4 more than 1-3. However, for the puzzle 0 is 3 SDs away from the mean, and for the video game it is 2 2/3 SDs away. In the latter case that means our approximation puts about 0.4% of the outcomes at impossible values, but since that is a very small percentage (and for the puzzle the percentage is even smaller: 0.135%) it shouldn't affect things too badly.

Part VI: Sampling

Chapter 21: The accuracy of percentages

1. Answers to Excel exercises will not be provided.
2. Unfortunately since the sample is small the intervals are wide, but Dunn says every time he has done this experiment with students the weights have been over the advertised weight of 18 grams. If that is a real phenomenon, why do you think it occurs? No one ever complains when they get more product than advertised, but they might if they got less. If the cost of extra materials is not too high, it is in the company's interest to make the average weight of the candy bars too high, so the bars that are smaller than average by chance variability are still at least the advertised size.

Part VIII: Tests of Significance

Chapter 26: Tests of significance

1. Out of 500 people surveyed, 43 were regular consumers of Hi-Brow Bran Flakes. Using the z-test, test the null hypotheses that (a) 5% of people are regular consumers of Hi-Brow, and (b) 10% of people are regular consumers of Hi-Brow. Can you conclude anything about the relative probability of these hypotheses? (a) gives a z-value of 3.66, so P is under 1% and we reject the null hypothesis. (b) gives a z-value of -1.04, so P is greater than 5% and we retain the null hypothesis. Rejection versus acceptance is an assertion that the hypothesis of (b) is more likely, but we cannot say anything more specific.
2. Is there any sample proportion such that the z-test would support both null hypotheses (a) and (b) of exercise 1 simultaneously? No.
3. Is there any sample proportion such that the z-test would simultaneously support both null hypotheses (c): 20% of people are regular consumers of Hi-Brow and (d): 25% of people are regular consumers of Hi-Brow? Yes: anything from 110 to 114 inclusive. This is why we say a P-value above 5% shows the data is consistent with the null hypothesis, or that the null hypothesis is plausible, rather than saying the null hypothesis has been proven. It has simply not been shown to be improbable.

Chapter 28: The chi-square test

1. From Johnson and Bhattacharyya, Statistics: Principles and Methods: Out of 100 people who volunteered to donate blood, the frequency of blood types was as follows: 40 O, 44 A, 10 B, and 6 AB. Use the χ2 method to test the hypothesis that (a) all four blood types are equally distributed in the population of potential donors, and (b) O and A are each four times as common as B or AB. Use Excel if you like. Can you conclude anything about the relative probability of these hypotheses? The χ2 value for (a) is 46.88, giving an incredibly tiny P-value, so we reject the null hypothesis. The χ2 value for (b) is 2, giving a P-value between 50% and 70%, so we retain the null hypothesis. As in exercise 1 for chapter 26, we cannot say anything numerical about the relative probabilities of the null hypothesis, but believe that of (b) to be higher than that of (a).
2. A market research firm sends surveys to 900 firms, 300 each of small, medium, and large sized. The number of surveys returned was 200 for the small firms, 175 for the medium-sized firms, and 155 for the large firms. Are response rate and size of company independent? No, P is much less than 1% (χ2 = 14, 2 degrees of freedom).
3. From Moore, McCabe, and Craig, Introduction to the Practice of Statistics: Some data from the early 1990s on on-time and delayed flights by two airlines at two airports is reported below. (a) In terms of delay, is there a real difference between Alaska Airlines and America West? (b) In terms of delay, is there a real difference between Los Angeles and Phoenix?
Los AngelesPhoenix
On Time  Delayed    On Time  Delayed
America West6941174840415

Hint: You are testing independence of the on-time/delayed variable against either the airline variable or the airport variable. The given eight values must be combined into a 2x2 table; what is added up depends on whether you are working (a) or (b).
(a) No. P is between 50% and 70% (χ2 0.284 with 1 degree of freedom). (b) Yes. P is much less than 1% (χ2 38.103 with 1 degree of freedom).

Final Exam Extra Review

Special Review (temporary):

Chapter 29 Section 8, p 565: #5 Since this is different people for each age we cannot conclude anything about changing lifestyles; it is most likely to be the result of people with healthier lifestyles living longer and thus being a larger percentage of the older population. #6 The mean is so much larger than the median you know the top 50% covers a much larger spread than the lower 50%; it is possible that this is concentrated at the top 10%, but far more likely that the distance between the 90th and 50th percentiles is much larger than the distance between the 50th and 10th percentiles. #7 Looks fine: strong positive correlation, a few outliers, all dots on whole numbers because grownups don't usually say "I'm 47 and a half". #8 r=2/3. #9b D is using an ecological correlation, which usually (though not always) increases the strength of correlation. #10 (iii). #11 65.5. #13 just about (provided the data is homoscedastic); this is the rms error. #14 29935. #15 (a) 1/5525 = 0.02%; (b) 4324/5525 = 78.3%; (c) 38/85 = 44.7%; (d) 47/85 = 55.3%. #16 5/11664 = 0.043%. #20 80.64% (the sample average and SD are red herrings). #21 (i) - the correction factors are both extremely close to 1. #25 (a) (i); (b) (ii); (c) (iii), (ii). #26 (a) T; (b) T; (c) F; (d) unknown; (e) unknown but likely; (f) F (square root of 2). #34 (a) T (generously rounded); (b) F; (c) F. #37 (a) (i) no, our techniques only apply to sample random samples; (ii) we simply can't answer this based on data alone; (b) (i) P < 1%; conclude there is a difference.

Supplemental Review Exercises:

1. A computer program is written to simulate two draws with replacement from a box containing four tickets numbered 1 through 4. 1600 runs of the program result in the following frequencies for the possible sequences of draws, which seem not quite right to one of the programmers:
 draw freq draw freq draw freq draw freq 1 1 93 1 2 100 1 3 91 1 4 105 2 1 108 2 2 102 2 3 104 2 4 82 3 1 89 3 2 86 3 3 114 3 4 120 4 1 100 4 2 97 4 3 94 4 4 115
1. Check the distribution of values on the first draw. P=88.8% (with 3 degrees of freedom; testing (389, 396, 409, 406) against (400, 400, 400, 400) - I used Excel)
2. Check the distribution of values on the second draw. P=56.3% (with 3 degrees of freedom; testing (390, 385, 403, 422) against (400, 400, 400, 400))
3. Check the distribution of values on second draw for each first draw value. 1X: expected frequency 97.25, P=73.3%; 2X: expected frequency 99, P=25.3%; 3X: expected frequency 102.25, P=3.3%; 4X: expected frequency 101.5, P=46.3% -- all look okay except first value 3, which looks skewed toward the larger numbers
4. State your results in terms of (conditional) probability: what seems to be wrong with the program? [you may be able to read this by highlighting it; otherwise copy it into a text file or email:] The frequencies obtained line up with the probabilities of getting outcome X on draw Y, and three of the probabilities of getting outcome X on draw 2 given you got outcome Y on draw 1, but the conditional probabilities for draw 2 given outcome 3 on draw 1 are somehow not being followed. (Of course the conditional probabilities are the same as the ordinary probabilities because the draws are independent, but the data we use to test them is a subset of the data we use to test the nonconditional probabilities.)
2. Deal five cards from a well-shuffled deck. Let event A be that the first card is a face cards and the remaining cards are half black and half red. Let event B be that the first card is a heart and the remaining cards are all non-face cards (A-10) in diamonds and spades. What is the probability of the event A or B? 40897/433160 = 9.4% (prob A = 75/833; prob B = 19/3920; prob (A and B) = 81/173264)
3. For a certain large group of exercisers, the duration of each workout and the number of workouts per week are slightly negatively correlated. The average duration is 1.5 hours and the average frequency is 2.75 times per week, with SDs of 0.5 and 1, respectively. The correlation is r = -0.3 and the data is homoscedastic.
1. Approximately what percentage of the exercisers who work out for 1.5 hours at a time work out at least three times per week? 40%
2. Approximately what percentage of the exercisers who work out twice a week work out for no more than 1 hour at a time? 10%
3. If you choose an exerciser from the group at random, what is the probability that
1. he or she works out at least three times a week? 40%
2. he or she works out at most one hour at a time? 16%
4. For a recent graduating class of over one million high school students, the ACT and SAT scores were essentially normally distributed with ACT mean 20.8 and standard deviation 4.8 points, and SAT mean 1026 with standard deviation 209 points (the ACT is on a scale from 1 to 36 and the SAT from 400 to 1600).
1. If ACT and SAT scores are perfectly positively correlated, what score on the SAT is equivalent to a score of 30 on the ACT? 1427 (if you round your ACT z-value you should get 1423)
2. If ACT and SAT scores are perfectly positively correlated, what score is better, 1240 on the SAT or 25 on the ACT? 1240 (its z-value is greater than 1; 25's is less than 1)
3. What is the 80th percentile for ACT scores? 24.88 (or 25, since you don't get fractional scores)

Examples from Class: (fully worked out)

A simple random sample of 400 people from a population of 100,000 is surveyed in the fall, and another sample the following spring. The fall sample showed 41% of people in favor of repaving downtown; the average time per week spent downtown was 4 hours with a standard deviation of 2 hours. In the spring, 45% were in favor of repaving and the average weekly time spent downtown was 3.5 hours with a standard deviation of 1.75 hours. For each of the two things measured, find (a) a 95% confidence interval for the population parameter for the fall survey, and (b) whether it appears there was a real change from the fall to the spring. Repaving: (a) 36.1-45.9%, (b) no; P = 12.5%. Time spent downtown: (a) 3.8-4.2 hours/week, (b) yes; P = 0.009%.

1. Roll a die five times. What is the probability of getting a 1 first, and then half even numbers and half odd numbers? 6.25%
2. Deal four cards from a well-shuffled deck. What is the probability of getting exactly one heart and exactly one face card? 19%
3. Roll six dice. What is the probability of getting at least one 3 or 4? 91.2%

A person claiming to have invented a new source of energy releases data from successive trial runs of the machine. The data, converted into standard units via the average power produced and the standard deviation of the values, is presented in terms of unit intervals below. One would expect the amounts to be normally distributed; at this level of detail do they appear to be? Yes, a little too well: P is just under 99%.

zfreq
< -22
-2 to -19
-1 to 025
0 to 128
1 to 212
> 22

Back to the Math 10 homework page

Back to the main Math 10 page