Firstly, a flowchart for when to use each distribution:
mu <- 77
xbar <- 80
s <- 11
n <- 37
# Use a t-test for the sample
t <- (xbar-mu)/(s/sqrt(n))
2*pt(abs(t),n-1,lower.tail=FALSE)
## [1] 0.1058177
We do not have significant evidence to conclude that the at-risk mean sugar intake is different from the general population mean sugar intake (p = 0.11).
## Part 2a
mu <- 90
sigma <- 8
(p <- pnorm(95,mu,sigma,lower.tail=FALSE))
## [1] 0.2659855
## Part 2b
dbinom(3,10,p)
## [1] 0.2592314
## Part 2c
z <- (95-90)/(sigma/sqrt(10))
(p <- pnorm(z,lower.tail=FALSE))
## [1] 0.02405341
## Part 2d
1-dbinom(0,5,p)
## [1] 0.1146189
Hypothesis test:
This is binary / proportion data, and we want to use a normal approximation:
# Writing down our known values
n <- 235
p <- 155 / n
p_0 <- 0.5
alpha <- 0.05
# Finding our z-statistic
z <- (p - p_0) / (sqrt(p_0*(1 - p_0) / n))
# Using our z-statistic to get our p-value
pvalue <- 2*(1 - pnorm(z, mean = 0, sd = 1))
pvalue
## [1] 9.958308e-07
Conclusion: There is sufficient evidence to support the claim that the proportion of Cubs fans in Chicago is not equal to the proportion of White Sox fans.
Confidence interval:
We use our confidence interval formula for proportions (note for the standard error, we use the sample proportion, rather than the null proportion)
SE <- sqrt(p*(1-p) / n)
# Confidence interval upper and lower bounds:
p - qnorm(alpha/2)*SE
## [1] 0.7201584
p + qnorm(alpha/2)*SE
## [1] 0.5989906
Our null hypothesis value \(p = 0.5\) is not contained in the interval. This is further proof that we can reject the null hypothesis, as it is not in the range of plausible values.
We can double check our work for both the hypothesis test and the CI with the binom.test function:
binom.test(155, 235)
##
## Exact binomial test
##
## data: 155 and 235
## number of successes = 155, number of trials = 235, p-value = 1.134e-06
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.5951389 0.7199254
## sample estimates:
## probability of success
## 0.6595745
Our answers don’t match exactly, but that’s to be expected since we used a normal approximation, and binom.test gives an exact answer.
The standard deviation of the sample mean is the standard error, which is \(9.25 / \sqrt{20} = 2.07\). This is less than the standard deviation of the albumin levels (9.25).
\(P(29 < \overline{X} < 31)\) = \(P(\frac{29 - 29.5}{9.25/\sqrt{20}} < Z < \frac{31 - 29.5}{9.25/\sqrt{20}})\)
= \(P(-0.242 < Z < 0.725)\) = \(P(Z < 0.725) - P(Z < -0.242)\) = \(0.766 - 0.404\) = \(0.362\)
For the middle 50% of the sample means, we want to find \(z_{0.25}\) and \(z_{0.75}\) in the table. These 2 values are \(\pm0.675\). Since we are finding the middle 50% of the sample means rather than the individuals, we will use the standard error rather than the standard deviation. Therefore,
\(\mu \pm z *SE\) = \(29.5 \pm 0.675 * \frac{9.25}{\sqrt{20}}\) = \((28.104, 30.896)\)
Since we no longer know the standard deviation of the population, we will use the T distribution:
\(95\% \space CI = \bar{x} \pm t_{0.025, \space 19}{\frac{s}{\sqrt{n}}}\) = \(30.1 \pm 2.09*\frac{8.95}{\sqrt{20}}\) = \((25.92, 34.28)\)
(note that if the course website table were used, if 19 degrees of freedom were on the table, 2.09 would be under \(\alpha = 0.05\))
3 variables affect the width of the interval: the standard deviation, the sample size, and the critical value. In this case, since we are using the t-distribution, the tails are fatter and the critical value is larger, causing the interval to be wider.
Part A: Power increases
Part B: Power decreases
Part C: Power increases