Quiz 3 Review

Firstly, a flowchart for when to use each distribution:

Practice Problems

  1. We are interested in testing whether a certain at-risk population for diabetes has a daily sugar intake that is equal to the general population, which is equal to 77 grams/day. A sample of size 37 was taken from this at-risk population, and we obtained a sample mean of 80 and sample standard deviation of 11 grams. Perform a hypothesis test to test whether this population has a significantly different mean sugar intake from 77 grams.


  1. The distribution of LDL cholesterol levels in a certain population is approximately normal with mean 90 mg/dl and standard deviation 8 mg/dl.
  1. What is the probability an individual will have a LDL cholesterol level above 95 mg/dl?
  2. Suppose we have a sample of 10 people from this population. What is the probability of exactly 3 of them being above 95 mg/dl?
  3. Take the sample of size 10, as in part b. What is the probability that the sample mean will be above 95 mg/dl?
  4. Suppose we take 5 samples of size 10 from the population. What is the probability that at least one of the sample means will be greater than 95 mg/dl?


  1. In the city of Chicago, 235 baseball fans were sampled and asked if they cheer for the White Sox or the Cubs. 155 of the questioned people preferred the Cubs, while the remaining 80 fans preferred the White Sox. Use normal approximation to test the hypothesis that the Cubs and White Sox have an equal number of fans in the city at the \(\alpha = .05\) level. Construct a 95% Confidence Interval for the true proportion of Chicago fans who cheer for the Cubs, still using normal approximation.


  1. An article in the New England Journal of Medicine reported that among adults living in the United States, the average level of albumin in cerebrospinal fluid is 29.5 mg/dl, with a standard deviation of 9.25 mg/dl. We are going to select a sample of size 20 from this population. Assume albumin levels in cerebrospinal fluid in U.S. adults are normally distributed.
  1. How does the variability of our sample mean compare with the variability of albumin levels in the population?
  2. What is the probability that our sample mean will lie between 29 and 31 mg/dl?
  3. What two values will contain the middle 50% of our sample means?
  4. Now assume we don’t know the standard deviation or mean of the population and our sample of 20 has a mean albumin level of 30.1 mg/dl and a standard deviation of 8.95 mg/dl. Construct a 95% confidence interval for the population mean and interpret it.
  5. Why is the confidence interval constructed in (d) wider than the confidence interval that would be constructed if we used the normal distribution? What variables affect the width of the confidence interval?


  1. In the following scenarios, identify what will happen to the power of a hypothesis test:
  1. We increase the sample size.
  2. The standard deviation of the sample is larger than what we expected.
  3. Our effect size moves from 5 units to 10 units.


Solutions

Problem 1

mu <- 77
xbar <- 80
s <- 11
n <- 37

# Use a t-test for the sample

t <- (xbar-mu)/(s/sqrt(n))

2*pt(abs(t),n-1,lower.tail=FALSE)
## [1] 0.1058177

We do not have significant evidence to conclude that the at-risk mean sugar intake is different from the general population mean sugar intake (p = 0.11).

Problem 2

## Part 2a
mu <- 90
sigma <- 8

(p <- pnorm(95,mu,sigma,lower.tail=FALSE))
## [1] 0.2659855
## Part 2b
dbinom(3,10,p)
## [1] 0.2592314
## Part 2c
z <- (95-90)/(sigma/sqrt(10))

(p <- pnorm(z,lower.tail=FALSE))
## [1] 0.02405341
## Part 2d
1-dbinom(0,5,p)
## [1] 0.1146189

Problem 3

Hypothesis test:

This is binary / proportion data, and we want to use a normal approximation:

# Writing down our known values
n <- 235
p <- 155 / n
p_0 <- 0.5
alpha <- 0.05

# Finding our z-statistic
z <- (p - p_0) / (sqrt(p_0*(1 - p_0) / n))

# Using our z-statistic to get our p-value
pvalue <- 2*(1 - pnorm(z, mean = 0, sd = 1))
pvalue
## [1] 9.958308e-07

Conclusion: There is sufficient evidence to support the claim that the proportion of Cubs fans in Chicago is not equal to the proportion of White Sox fans.

Confidence interval:

We use our confidence interval formula for proportions (note for the standard error, we use the sample proportion, rather than the null proportion)

SE <- sqrt(p*(1-p) / n)

# Confidence interval upper and lower bounds:
p - qnorm(alpha/2)*SE
## [1] 0.7201584
p + qnorm(alpha/2)*SE
## [1] 0.5989906

Our null hypothesis value \(p = 0.5\) is not contained in the interval. This is further proof that we can reject the null hypothesis, as it is not in the range of plausible values.

We can double check our work for both the hypothesis test and the CI with the binom.test function:

binom.test(155, 235)
## 
##  Exact binomial test
## 
## data:  155 and 235
## number of successes = 155, number of trials = 235, p-value = 1.134e-06
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.5951389 0.7199254
## sample estimates:
## probability of success 
##              0.6595745

Our answers don’t match exactly, but that’s to be expected since we used a normal approximation, and binom.test gives an exact answer.

Problem 4

Part a:

The standard deviation of the sample mean is the standard error, which is \(9.25 / \sqrt{20} = 2.07\). This is less than the standard deviation of the albumin levels (9.25).

Part b:

\(P(29 < \overline{X} < 31)\) = \(P(\frac{29 - 29.5}{9.25/\sqrt{20}} < Z < \frac{31 - 29.5}{9.25/\sqrt{20}})\)

= \(P(-0.242 < Z < 0.725)\) = \(P(Z < 0.725) - P(Z < -0.242)\) = \(0.766 - 0.404\) = \(0.362\)

Part c:

For the middle 50% of the sample means, we want to find \(z_{0.25}\) and \(z_{0.75}\) in the table. These 2 values are \(\pm0.675\). Since we are finding the middle 50% of the sample means rather than the individuals, we will use the standard error rather than the standard deviation. Therefore,

\(\mu \pm z *SE\) = \(29.5 \pm 0.675 * \frac{9.25}{\sqrt{20}}\) = \((28.104, 30.896)\)

Part d:

Since we no longer know the standard deviation of the population, we will use the T distribution:

\(95\% \space CI = \bar{x} \pm t_{0.025, \space 19}{\frac{s}{\sqrt{n}}}\) = \(30.1 \pm 2.09*\frac{8.95}{\sqrt{20}}\) = \((25.92, 34.28)\)

(note that if the course website table were used, if 19 degrees of freedom were on the table, 2.09 would be under \(\alpha = 0.05\))

Part e:

3 variables affect the width of the interval: the standard deviation, the sample size, and the critical value. In this case, since we are using the t-distribution, the tails are fatter and the critical value is larger, causing the interval to be wider.

Problem 5

Part A: Power increases
Part B: Power decreases
Part C: Power increases