Objectives

Analyze studies that have two unmatched/unpaired groups with continuous outcomes
Practice calculating and interpreting a two sample t-test and confidence intervals
Discuss assumptions for two sample t-test and use Welch’s test

Infant Diarrhea Study

Diarrhea is a major public health problem in underdeveloped countries, especially for babies. Diarrhea leads to dehydration, which results in millions of deaths each year worldwide.

Researchers in Peru conducted a double-blind randomized controlled trial, published in The New England Journal of Medicine, to determine whether Bismuth salicylate would improve the outcomes of infants suffering from diarrhea.

In their study, all infants received the standard therapy for diarrhea: oral re-hydration. In addition to the rehydration, 85 babies received bismuth salicylate, while 84 babies received a placebo. The total stool volumes for all infants over the course of their illness was measured. To adjust for body size, the researchers divided by body weight to obtain their outcome of interest: stool output per kilogram of body weight. The results of their study are available on the course website.

DiaData <- read.delim("https://raw.githubusercontent.com/IowaBiostat/data-sets/main/diarrhea/diarrhea.txt")

Explore the data

Let’s examine the distribution of stool output by group using a boxplot of the data:

boxplot(DiaData$Stool ~ DiaData$Group, 
        col = c("burlywood4", "darkgoldenrod3"), 
        ylab = "Drug Group",
        xlab = "Stool Output",
        main = "Effect of bismuth salicylate", 
        horizontal = T)

Summarize the distribution and describe any differences you notice about the two groups

Answer

The distribution appears to be right skewed in both groups with some very high values. Stool output appears to be a little lower in the treatment group.

Two Sample t-tests

Student’s two sample t-test

To do a student’s two sample t-test, we must be able to find the following values:

Standard deviation using a pooled variance
Standard error
The t-statistic.

These three can be found using the equations below (which are also in your lecture slides):

Test statistic:

\(SD_{(pooled)} = \sqrt{ \frac{(n_1 - 1)S_1^2 + (n_2 - 1)S_2^2}{n_1 + n_2 -2} }\)

\(SE_d = SD_{(pooled)} * \sqrt{ \frac{1}{n_1} + \frac{1}{n_2}}\)

\(t = \frac{\bar{x_1} - \bar{x_2}}{SE_d}\) with:

\(df = n_1 + n_2 - 2\)

Confidence interval:

\(\bar{x_1} - \bar{x_2} \pm t_{\alpha/2, df}*SE_d\)

So as you can tell we need to compute several statistics from the data in order to do this t-test. We need to compute the mean of continuous outcomes for both groups as those will be our \(\bar{x_1}\) and \(\bar{x_2}\) as well as the standard error above.

What is our null hypothesis for this data?

Answer

The null hypothesis is that there is no difference in the means of the two groups, meaning \(\bar{x_1} = \bar{x_2}\) or equivalently \(\bar{x_1} - \bar{x_2} = 0\).

Calculate the test statistic and 95% confidence interval manually. Summarize your results.

As a reminder, here are some helpful functions for finding summary statistics:

Recall the ‘by’ function we learned in Lab 4: allows us to run a function over a data set by groups.

The first parameter is the data variable of interest
The second parameter is the data variable that defines the groups
The third parameter is the function to run over the data

#Find n in each group
table(DiaData$Group)

#Find mean of each group
by(DiaData$Stool, DiaData$Group, mean)

#Find standard deviation of each group 
by(DiaData$Stool, DiaData$Group, sd)

Hypothesis Test:

x1 <- 260.2976 # Mean of control group
x2 <- 181.8706 # Mean of treatment group

sd1 <- 253.6135 # SD of control group
sd2 <- 197.3636 # SD of treatment group

n1 <- 84 # n for control group
n2 <- 85 # n for treatment group

sd_pooled <- sqrt(((n1 - 1)*sd1^2 + (n2 - 1)*sd2^2) / (n1 + n2 - 2))
SE <- sd_pooled * sqrt((1 / n1) + (1 / n2))
df <- n1 + n2 - 2

t <- (x1 - x2) / (SE)
t

## [1] 2.244989

2*pt(t, df = df, lower.tail = F)

## [1] 0.02608141

The stool output is significantly lower in the treatment group than in the control group.

Confidence Interval:

(x1 - x2) + qt(c(0.025, 0.975), df = df)*SE

## [1]   9.45734 147.39666

In other words, the average amount by which the treatment lowers stool output is between 9 and 147 ml/kg.

Check with t.test function

Comments about t.test code:

Note that we can use the tilde (~) to specify that you want to look at stool output by group. We use the tilde because all stool outputs are in one column and group is given in another column. If you were given the stool outputs in two columns (Control & Treatment), then you would simply give R the two columns separated by a comma. It would look something like: ‘t.test(Treatment, Control, var.equal = TRUE)’. See lab 10’s paired t-test example using the ‘Anorexia’ data for an example of data where each group is in a separate column.

Note also that we no longer have to say ‘paired=TRUE’ in the function. The default is ‘paired=FALSE’ and that is exactly the test we want.

We have to specify that we are assuming the population variances are equal using the argument ‘var.equal=TRUE’, otherwise the results will not be the same as the results we computed by hand.

t.test(DiaData$Stool~DiaData$Group, var.equal = TRUE)

## 
##  Two Sample t-test
## 
## data:  DiaData$Stool by DiaData$Group
## t = 2.245, df = 167, p-value = 0.02608
## alternative hypothesis: true difference in means between group Control and group Treatment is not equal to 0
## 95 percent confidence interval:
##    9.457362 147.396700
## sample estimates:
##   mean in group Control mean in group Treatment 
##                260.2976                181.8706

Welch’s two sample t-test

When we calculated the two sample t-test by hand above, we had to assume the population variances (square of the standard deviation) were equal.

Review the standard deviations of the two groups and describe whether you think the assumption is met

by(DiaData$Stool, DiaData$Group, sd)

## DiaData$Group: Control
## [1] 253.6135
## ------------------------------------------------------------ 
## DiaData$Group: Treatment
## [1] 197.3636

If the population variances were not similar, we would consider using the Welch’s two sample t-test. The Welch’s test accounts for data that do not have equal population variances, providing us with a more accurate test result if the assumption of equal variances is violated. To do this test by hand is quite computational so we will learn how to compute it in R.

To conduct a Welch’s test, we simply switch the variance argument to var.equal= FALSE. This is also the default, so you can leave out the argument entirely and get the same answer.

t.test(DiaData$Stool~DiaData$Group, var.equal = FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  DiaData$Stool by DiaData$Group
## t = 2.2417, df = 156.64, p-value = 0.02638
## alternative hypothesis: true difference in means between group Control and group Treatment is not equal to 0
## 95 percent confidence interval:
##    9.323083 147.530979
## sample estimates:
##   mean in group Control mean in group Treatment 
##                260.2976                181.8706

Here our p-value is very close to the one we obtained using the Student’s test. This will generally be the case when the population variances are reasonably close or when the sample sizes are very large.

Practice Problems: Heart Study

Some infants are born with congenital heart defects and require surgery very early in life. One approach to performing this surgery is known as “circulatory arrest.” A downside of this procedure, however, is that it cuts off the flow of blood to the brain, possibly resulting in brain damage. An alternative procedure, “low-flow bypass” maintains circulation to the brain, but does so with an external pump that potentially causes other sorts of injuries to the brain.

To investigate the treatments, surgeons at Harvard Medical School conducted a randomized controlled trial. In the trial, 70 infants received low-flow bypass surgery and 73 received the circulatory arrest approach. The researchers looked at two outcomes: the Psychomotor Development Index (PDI), which measures physiological development, and the Mental Development Index (MDI), which measures mental development. For both indices, higher scores indicate greater levels of development. The results of their study are on the course website.

Use:

heart <- read.delim("https://raw.githubusercontent.com/IowaBiostat/data-sets/main/infant-heart/infant-heart.txt")

Calculate the standard deviations of PDI and MDI for each group. Based on these values, should we do a Student’s test or a Welch’s test?

Answer

by(heart$PDI, heart$Treatment, sd)

## heart$Treatment: Circulatory arrest
## [1] 16.488
## ------------------------------------------------------------ 
## heart$Treatment: Low-flow bypass
## [1] 14.68527

by(heart$MDI, heart$Treatment, sd)

## heart$Treatment: Circulatory arrest
## [1] 16.56448
## ------------------------------------------------------------ 
## heart$Treatment: Low-flow bypass
## [1] 14.57256

The standard deviations appear fairly similar in both cases, therefore we will use student’s two sample t-test.

Conduct a t-test (whichever one you decided was most appropriate) to determine whether the difference in physiological development between the infants in the circulatory arrest group and the low-flow bypass group is statistically significant.

Answer

t.test(heart$PDI ~ heart$Treatment, var.equal = T)

## 
##  Two Sample t-test
## 
## data:  heart$PDI by heart$Treatment
## t = -2.2385, df = 141, p-value = 0.02676
## alternative hypothesis: true difference in means between group Circulatory arrest and group Low-flow bypass is not equal to 0
## 95 percent confidence interval:
##  -11.0232390  -0.6840017
## sample estimates:
## mean in group Circulatory arrest    mean in group Low-flow bypass 
##                         91.91781                         97.77143

There is significant evidence that physiological development is better in the low-flow bypass group.

Conduct a t-test (whichever one you decided was most appropriate) to determine whether the difference in mental development between the infants in the circulatory arrest group and the low-flow bypass group is statistically significant.

Answer

t.test(heart$MDI ~ heart$Treatment, var.equal = T)

## 
##  Two Sample t-test
## 
## data:  heart$MDI by heart$Treatment
## t = -1.2696, df = 141, p-value = 0.2063
## alternative hypothesis: true difference in means between group Circulatory arrest and group Low-flow bypass is not equal to 0
## 95 percent confidence interval:
##  -8.484009  1.848393
## sample estimates:
## mean in group Circulatory arrest    mean in group Low-flow bypass 
##                         103.0822                         106.4000

There is insufficient evidence that mental development is difference between the two groups.

Calculate the 95% confidence intervals for these two tests. Interpret these in terms of the context of the study. What does the confidence interval mean? Which group did better?

Answer

Using the output from part b), we get a 95% confidence interval of (-11.02, -0.68). This means that we can say with 95% confidence that the physiological development score is between 0.68 and 11.02 higher for those in the low-flow bypass group. Therefore, the low-flow bypass group did better.

Using the output from part c), we get a 95% confidence interval of (-8.48, 1.85). This means we can say with 95% confidence that the true difference in mental development score between the circulatory arrest and low-flow bypass groups is between -8.45 and 1.85. Therefore, we cannot claim that either group did better.

If your child had to have open-heart surgery as an infant, which treatment option would you prefer? Why?

Conclusion from Article

Summary taken from the journal for comparison (some of their analyses were different, but our results should support the same conclusions):

Of the 171 patients enrolled in the study, 155 were evaluated. After adjustment for the presence or absence of a ventricular septal defect, the infants assigned to circulatory arrest, as compared with those assigned to low-flow bypass, had a lower mean score on the Psychomotor Development Index. The method of support was not associated with the prevalence of abnormalities on MRI scans of the brain, or scores on the Mental Development Index. Heart surgery performed with circulatory arrest as the predominant support strategy is associated with a higher risk of delayed motor development and neurologic abnormalities at the age of one year than is surgery with low-flow bypass as the predominant support strategy.

Lab 12 - Two-Sample Contiuous Data

April 9-10, 2024

Objectives

Infant Diarrhea Study

Explore the data

Two Sample t-tests

Student’s two sample t-test

Welch’s two sample t-test

Practice Problems: Heart Study