Lab 10

In today’s lab we are going to learn about a couple of functions that can be done by hand as well as in R that help out with t-tests. In class and in last weeks lab you learned abou the difference between a z-test and a t-test and when it is appropriate to use which one. Today we are going to talk about the different cases of samples and how the fisher’s exact test and chi squared test can help.

Part 1 “t-tests in R”

At the end of last weeks lab we talked quickly about the t.test() function. Here we are going to show you again what to give the function and what it’s output contains. First we will begin by loading in the two datasets below:

lipids <- read.delim("http://myweb.uiowa.edu/pbreheny/data/lipids.txt")
cf <- read.delim("http://myweb.uiowa.edu/pbreheny/data/cystic-fibrosis.txt")

One sample continuous data

in the lipids dataset above if we look at the variable TRG we will see this variable is a continuous variable which we have one sample of. So lets say the national average for TRG is said to be 120. Is our data significantly different that this national average? We can do this in R using the t-test() function as shown below, the funciton takes the data then a set mu that you want to compare the data too.

t.test(lipids$TRG, mu = 120)

Paired data case

In this case we can look at the cystic fibrosis data, cf, and recall that this was a crossover study so it is paired data. A “paired t-test” does the same thing as the t-test done above however it does it on the difference between the paired observations. To run a paired t-test in R we will need to use the code below:

t.test(cf$Drug, cf$Placebo, paired = TRUE)

When this is ran you will see that the output is the same as if you were to do the code below. This is because it takes all the variables for Drug and Placebo and finds the differences and then does the t.test on those differences.

t.test(cf$Drug - cf$Placebo)
## 
##  One Sample t-test
## 
## data:  cf$Drug - cf$Placebo
## t = -2.2885, df = 13, p-value = 0.03949
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -265.357354   -7.642646
## sample estimates:
## mean of x 
##    -136.5

The t.test() function has other inputs in it as well. We have shown you the paired = TRUE input for when you want to run a paired t-test. If you do not put paired = TRUE for a paired t-test it will assume that the two sets of numbers are two separate samples and will run a two-sample t-test. There is also the alternative = “two.sided” or “less”, or “greater” in that it will change the alternative hypothesis to find the p-value you are searching for. We can also change the conf.level = .95 to other values to get other confidence intervals besides that of a 95 percent.

Power and sample size Calculations in R

In R there is a power.t.test() function that is very useful. It takes the parameters of delta = the expected difference in means, sd, n, and power. To see how it works lets take the cyctic fibrosis data again and lets say we want to use this data to run a more powerful study. How many patients would we need in our NEW study to achieve a power of .9?

Well using the power.t.test() funciton whatever one of the parameters you leave out it will compute that one for you as long as the other three are present. So in this example we would the following code to find that sample size:

power.t.test(delta = mean(cf$Drug - cf$Placebo), power = .9, sd = sd(cf$Drug-cf$Placebo), type = "paired")

Now when we look at the results we can see an n value is given and to get a power of at least .9 we need at least 31 pairs.

Find power

This function can also be used to find power than as well. So lets say we have an n value of 50 in this study what would the power be? Well we can use the function to find:

power.t.test(delta = -136.5, n = 50, sd = 223.17, type = "paired")

Fisher’s Exact test and Chi-squared test

In class you have learned how to do both of these by hand using their equations. Make sure you know how to do these by hand because that is how you will have to do them on your quiz and final exam. However, in R we can do them very quickly by putting the data into a table and running the chisq.test() function or the fisher.test() function.

For this example lets read in the lister data set from the class website and create a table:

lister <- read.delim("http://myweb.uiowa.edu/pbreheny/data/lister.txt")

tab <- table(lister)
tab
##          Outcome
## Group     Died Survived
##   Control   16       19
##   Sterile    6       34

The table above that was saved as the name “tab” can now be put directly into the functions to get output that answers the questions of a chi-squared test and fishers exact test.

chisq.test(tab, correct = FALSE)
## 
##  Pearson's Chi-squared test
## 
## data:  tab
## X-squared = 8.4952, df = 1, p-value = 0.003561
fisher.test(tab)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  tab
## p-value = 0.005018
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##   1.437621 17.166416
## sample estimates:
## odds ratio 
##   4.666849

Looking at these two outputs we can see that the chi-squared test gives the test statistic, DF, and p-value where as the fisher’s exact test gives us the p-value, CI for the odds ratio and the odds ratio. The odds ratio is something we will be talking about within the next two weeks so if you want to compute it fisher’s exact test will do so if you have the data in a table.

Practice problems:

1

Lets say that a study in which there were 4540 people younger than 25 in which 65 of them had cancer and 1628 people older than 25 in which 31 of them had cancer. First put these into a table and run a fisher.test function on them to see if thee is an association between age and cancer.

2

Consider a problem in which we are observing students to see if they took notes by hand or by laptop and compared their tests scores. Of the 67 that used a laptop 44 passed and of the 76 that wrote by hand 56 of them passed. Can we use this data to see if there is a relationship between taking notes and passing the test?

Answers

table <- cbind(c(4475,1597), c(65,31))
fisher.test(table)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  table
## p-value = 0.1992
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.8385228 2.0890648
## sample estimates:
## odds ratio 
##   1.336333
table2 <- cbind(c(23,20), c(44,56))
chisq.test(table2)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  table2
## X-squared = 0.73952, df = 1, p-value = 0.3898
fisher.test(table2)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  table2
## p-value = 0.3616
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.6713414 3.1989070
## sample estimates:
## odds ratio 
##   1.459682