Below is a sample of 15 patients’ systolic blood pressures prior to having surgery and 15 patients’ systolic blood pressures post to having surgery:
The median appears to be about 130 and 80, respectively
The “before surgery” blood pressures have 1 outlier of about 220. This does not affect the median, but it has a large effect on the mean.
About 50% (since the median is at about 80)
About 160 (the 3rd quartile is at about 160)
The mean will be higher (since it is skewed right)
Skewed Right (the tail goes towards the right)
Suppose the probability that a potato is a Yukon Gold is 1/3.
Suppose the probability that a potato is mashed, given that it was Yukon Gold, is 3/4.
Suppose the probability that a potato is mashed, given that it was NOT Yukon Gold, is 1/2.
Law of Total Probability & Multiplication Rule:
\(P(Mashed) = P(Mashed | Yukon)*P(Yukon) + P(Mashed | Not Yukon)*P(NotYukon)\)
= (3/4)(1/3) + (1/2)(1-1/3) = 7/12
Multiplication Rule:
\(P(Mashed \cap Yukon) = P(Mashed | Yukon)*P(Yukon)\)
= (3/4)(1/3) = 1/4
\(P(Yukon | Mashed) = \frac{P(Yukon \cap Mashed)}{P(Mashed)}\)
= (1/4) / (7/12) = 3/7
\(Addition Rule:\)
\(P(Yukon \cup Mashed) = P(Yukon) + P(Mashed) - P(Yukon \cap Mashed)\)
= 1/3 + 7/12 - 1/4 = 2/3
\(P(Yukon)*P(Yukon) = (1/3)^2 = 1/9\)
Below is information about a study concerned with the effect that listening to classical music has on students’ test scores. Time spent listening to classical music is given in minutes per week and test scores are the percent that students scored on their biostatistics exams.
## Mean Time Listening to Classical Music: 54.44
## Mean Test Scores: 80.36
## Std Dev of Time Listening to Classical Music: 7.3
## Std Dev of Test Scores: 9.9
## Correlation: 0.3919
## Regression Line Slope: 0.531768
\(Z_y = r*Z_x\)
\(Z_y = 0.3919*-1 = -0.3919\) standards deviations above average (ie 0.3919 below average).
\(Z_y = r*Z_x\)
\(Z_y = 0.3919*3 = 1.1739\) standards deviations above average for music
\(y = \overline{y} + SD_Y * Z_Y\)
\(y = 54.44 + 7.3 * 1.1739 = 63.01\)
**Note that the slope does not work here because we are now using exam scores to predict music
Method 1:
\(Z_x = \frac{x - \overline{x}}{SD_x}\)
\(Z_x = \frac{5}{7.3} = 0.685\)
\(Z_y = r*Z_x\)
\(Z_y = 0.3919*0.685 = 0.268\)
\(y = \overline{y} + SD_Y * Z_Y\)
\(y = 80.36 + 9.9*0.268 = 83.017\)
Method 2:
\(y = \overline{y} + \hat{\beta} *(x-\overline{x})\)
\(y = 80.36 + 0.531768*5 = 83.019\)
Below is data about a fictitious HIV rapid-diagnostic test. Please fill in the rest of the table and answer the following questions. Assume the prevalence in this population is 0.1% (\(P(D+) = 0.001\)).
Disease | Positive | Negative | Total |
---|---|---|---|
Present | 2970 | 30 | 3000 |
Absent | 11000 | 539000 | 550000 |
Total | 13970 | 539030 | 553000 |
From lecture, we know that when there are two possible outcomes (success/failure) in n trials, the number of ways of one event occurring x times is \(\frac{n!}{x!(n-x)!}\).
We also know that, given independence, the probability of an intersection of events is \(p^x(1-p)^{1-x}\).
Combining these, we get the formula for the binomial distribution:
Using the information about Yukon Gold potatoes from the Probability Review section, let’s find the probability that if 3 potatoes are picked, 2 are Yukon Gold. We can calculate this probability using the formula in R:
# Setting our values
n <- 3
x <- 2
p <- 1/3
# Manually using the binomial formula:
factorial(n)/(factorial(x)*factorial(n-x)) * p^x * (1-p)^(n-x)
## [1] 0.2222222
# You can also do this using the 'choose' function:
choose(n, x)* p^x * (1-p)^(n-x)
## [1] 0.2222222
We can also use R’s built-in ‘dbinom’ function to answer this question. dbinom takes the following arguments:
x - the number of successes
size - the total number of trials
prob - the probability of success
dbinom(x = 2, size = 3, prob = 1/3)
## [1] 0.2222222
R’s built-in functions can also help us answer other questions of interest. For example, consider our quiz review problem 3 involving Yukon Gold Potatoes. Suppose we pick 10 potatoes and get 5 Yukon Golds. We can find the probability of this event just like we did earlier with the dbinom function:
# Manually using the binomial formula:
factorial(10)/(factorial(5)*factorial(10-5)) * (1/3)^5 * (1-(1/3))^(10-5)
## [1] 0.1365645
# Using the dbinom function:
dbinom(x = 5, size = 10, prob = 1/3)
## [1] 0.1365645
However, we may also be interested in finding the probability of seeing an event as extreme or more extreme than the one we observed (this also corresponds to the p-value).
Since the probability of picking a Yukon Gold is 1/3 and we picked a total of 10 potatoes, we would expect to see about (10)(1/3) = 3.33 Yukon Gold potatoes. What we observed (5) is 5 - 3.33 = 1.67 potatoes greater than what we’d expect. So in order to be as extreme or more extreme, we are interested in anything greater than or equal to 5, or less than or equal to 1.67 (3.33 - 1.67). Since the data is discrete, it cannot take on decimal values (only whole potatoes), so this is the same thing as \(P(x\leq1 \cup x\geq5)\).
(Note that we round down from 1.67 to 1 instead of rounding to 2. This is because 2 is not more extreme than 1.67, but 1 is, so we use 1.)
We can calculate this using pbinom(). The ‘pbinom’ function finds the probability of being LESS THAN or equal to a value. If we want to find the probability of being GREATER or equal to a number, we tell R to calculate 1-pbinom() of one LESS than what we’re interested in.
Alternatively, you can using the ‘lower.tail’ argument to manually tell R that you want the ‘greater than’ probability.
# The probability of picking 1 or less potatoes (1 potato or 0 potatoes)
pbinom(1, size = 10, prob = 1/3)
## [1] 0.1040492
# The probability of picking 5 or more potatoes (5, 6, 7, 8, 9, or 10 potatoes)
# Note that instead of x = 5, we use x = 4
1 - pbinom(4, size = 10, prob = 1/3)
## [1] 0.2131281
# Alternatively, using the lower.tail argument:
pbinom(4, size = 10, prob = 1/3, lower.tail = F)
## [1] 0.2131281
# Total of the Extremes:
pbinom(1, size = 10, prob = 1/3) + (1 - pbinom(4, size = 10, prob = 1/3))
## [1] 0.3171773
Sometimes the pbinom function can be hard to keep track of, especially remembering whether it includes the number you input or not. To avoid this, you can simply add up your desired values using the ‘dbinom’ function, though it will require a few extra lines of code. For example, the probability of picking 5 or more potatoes:
# The probability of picking 5 or more potatoes (5, 6, 7, 8, 9, or 10 potatoes)
dbinom(5, size = 10, prob = 1/3) +
dbinom(6, size = 10, prob = 1/3) +
dbinom(7, size = 10, prob = 1/3) +
dbinom(8, size = 10, prob = 1/3) +
dbinom(9, size = 10, prob = 1/3) +
dbinom(10, size = 10, prob = 1/3)
## [1] 0.2131281
The binom.test function gives us a variety of information. It’s core purpose is to test whether the true probability of success is equal to some value, but you can also use it as an easy way to get our p-values from earlier.
For example, if we were looking for the p-value of getting a result as extreme or more extreme than our 5 Yukon Gold potatoes out of 10, we can use the following syntax. Note that this function not only gives us a p-value, but also a confidence interval, hypothesis, and sample estimate.
binom.test(x=5, n=10, p=1/3)
##
## Exact binomial test
##
## data: 5 and 10
## number of successes = 5, number of trials = 10, p-value = 0.3172
## alternative hypothesis: true probability of success is not equal to 0.3333333
## 95 percent confidence interval:
## 0.187086 0.812914
## sample estimates:
## probability of success
## 0.5
For which of the following scenarios could we apply a binomial distribution: Recall: The binomial distribution has the following characteristics:
Suppose we have a standard 6 sided die, and we roll the die 10 times. Consider the following questions
# Recall that the pbinom function calculates the probability of getting less than or equal to x. Hence, for our first argument we put 1 instead of 2.
1 - pbinom(1, size = 10, prob = 1/6)
## [1] 0.5154833
binom.test(x=5,n=10,p=1/6)$p.value
## [1] 0.01546197