Objectives

  1. Learn to plot and interpret Kaplan-Meier survival curves
  2. Perform log-rank tes
  3. Review for final exam

Survival Analysis

In lecture this week, we briefly discussed the concepts of survival analysis. Today’s lab will focus on Kaplan-Meier survival curves and the log-rank test. We will examine the aplastic anemia dataset using R. The dataset contains five variables:

. Trt: Whether the patient received Methotrexate (MTX) or Methotrexate and cyclosporine (MTX+CSP).
. Time_gvhd: Time until graft-versus-host disease. Measured in days.
. Status_gvhd: What happened at the end of Time_gvhd. The patient was either censored (0)
or developed graft-versus-host disease (1).
. Time: Time until death. Measured in days.
. Status: What happened at the end of Time. The patient was either censored (0) or died (1).

Two common endpoints in survival analysis are Overall Survival and Progression-Free Survival. In this dataset, Overall Survival is time since randomization until death and Progression-Free Survival is time since randomization until GVHD.

Kaplan-Meier Estimates

A survival function is a function of time, and is defined as the probability of the event in question not occurring by time t (i.e., the patient surviving until time t or later).

Ex: S(10) = .95 means there is a 95% chance of surviving until day 10 (or equivalently, only a 5% chance of dying by day 10).

The most popular way to estimate survival functions is using Kaplan-Meier estimates. To estimate the Overall Survival function, do the following:

library(survival)
anemia <- read.delim('https://raw.githubusercontent.com/IowaBiostat/data-sets/main/anemia/anemia.txt')
S <- with(anemia, Surv(Time,Status!=0))
fit <- survfit(S~1)

The survfit function calculates the survival curve that we learned how to compute in class. Recall if you know the time of death and number of subjects at risk, we can calculate survival probability. For example, here is the survival probability estimated at the first five times when a death occurred and the cumulative product of survival used to estimate the survival curve:

##      time n(t) d(t) [n(t)-d(t)]/n(t) cumproduct
## [1,]    3   46    1           0.9783     0.9783
## [2,]   12   45    1           0.9778     0.9565
## [3,]   25   44    1           0.9773     0.9348
## [4,]   30   43    1           0.9767     0.9130
## [5,]   44   42    1           0.9762     0.8913

To plot the entire estimated survival curve, use:

plot(fit, ylab = "Probability", xlab = "Time")

This is the Kaplan-Meier survival function estimate of the survival function, ignoring the different treatment groups.

Why do the confidence intervals seem to get wider as time progresses?

What do the steps represent?

What is the median survival time?

We can also stratify by treatment group and examine both survival estimates.

fit2 <- with(anemia, survfit(S~Trt))
plot(fit2, ylab = "Overall Survival", xlab = "Time", col =
c("red","blue"))
legend("bottomleft", c("MTX","MTX + CSP"), text.col = c("red","blue"),
bty = "n")

Log-rank test

If you are curious, here is the code you would use to conduct a log-rank test to determine if treatment type significantly improves survival

survdiff(Surv(anemia$Time, anemia$Status) ~ anemia$Trt)
## Call:
## survdiff(formula = Surv(anemia$Time, anemia$Status) ~ anemia$Trt)
## 
##                     N Observed Expected (O-E)^2/E (O-E)^2/V
## anemia$Trt=MTX     24        9     6.45     1.007      2.01
## anemia$Trt=MTX+CSP 22        4     6.55     0.992      2.01
## 
##  Chisq= 2  on 1 degrees of freedom, p= 0.2

Is there a significant difference between the two treatment groups’ survival?

Final Review

Note: While we tried to cover to a variety of topics from the course, there is not enough time in lab for the review to cover all of the topics. Don’t solely rely on the review material in this lab when studying for the final exam.

Types of Bias

Selection bias Instead of random sampling, certain subgroups of the population were more likely to be included than others.

Nonresponse bias Nonresponders can differ from responders in many important ways

Perception bias The perception of benefit from a treatment (placebo effect)

Confounding Confounding is a major source of bias. In order to avoid confounding, we conduct randomized controlled experiments so that the control and treatment groups are as similar as possible.

Errors

##                H0 True H0 False
## Reject               A        B
## Fail to Reject       C        D

Type I Error

A Type I error is committed when a true null hypothesis is rejected.
In terms of disease detection (where the null hypothesis is no disease), this is a false positive.
In the table above, this is A.

Type I Error Rate (\(\alpha\))

The Type I error rate is the proportion of true hypotheses that were rejected.
In the table above, this is A/(A+C).

Type II Error

A Type II error is committed when a false null hypothesis is not rejected.
In the table above, this is D.

Type II Error Rate (\(\beta\))

The Type II error rate is the proportion of false null hypotheses that failed to be rejected.
In the table above, this is D/(B+D).

Probability

Sensitivity

Sensitivity is the probability of a patient testing positive for a disease given that the patient has the disease. This is often denoted by \(P(+|D)\), where \(+\) indicates a positive test result and \(D\) indicates having the disease.

Specificity

Specificity is the probability of a patient testing negative for a disease given that the patient does not have the disease. This is often denoted by \(P(-|D^c)\), where \(-\) indicates a negative test result and \(D^c\) indicates not having the disease.

Hypothesis Testing

Flowchart for Hypothesis Testing
Flowchart for Hypothesis Testing

Practice Problems

Problem 1

1256 individuals were tested in a saliva-based screening test for HIV. We know that 368 of the individuals tested have HIV, and 358 of them tested positive in the saliva-based screening test. Overall there were a total of 360 positive test results.

  1. Construct a contingency table for the data.
Answer
##        HIV No HIV
## Test + 358      2
## Test -  10    886
  1. What is the sensitivity of the saliva test?
Answer

\(\frac{358}{368} = 0.973\)

  1. What is the specificity?
Answer

\(\frac{886}{888} = 0.998\)

  1. Given that the probability of having HIV is 0.01, what is the positive predictive value?
Answer

Use Bayes’ Rule

\(\frac{0.973 * 0.01}{(0.973 * 0.01) + (1 - 0.998)*(1 - 0.01)} = 0.831\)


Problem 2

The distribution of LDL cholesterol levels in a certain population is approximately normal with mean 90 mg/dl and standard deviation 8 mg/dl.

  1. What is the probability an individual will have a LDL cholesterol level above 100 mg/dl?
Answer
(z <- (100-90)/8) 
## [1] 1.25
(p <- pnorm(z,lower.tail=FALSE))
## [1] 0.1056498

When you use the table, you get the area below that value. In order to get the probability of an LDL level above 100, subtract this value from 1.

  1. Suppose we have a sample of 5 people from this population. What is the probability that at least one of them having levels above 100 mg/dl?
Answer
1-dbinom(0,5,p)
## [1] 0.4278128

Use the binomial distribution to calculate this by hand: \(\frac{n!}{k!(n-k)!}p^k(1-p)^{n-k}\)

  1. What is the probability that this sample mean of the 5 individuals is above 100 mg.
Answer
(z <- (100-90)/(8/sqrt(5)))
## [1] 2.795085
(p <- pnorm(z,lower.tail=FALSE))
## [1] 0.002594304


Problem 3

A psychologist was interested in exploring whether or not male and female college students have different driving behaviors. She focused on the fastest speed ever driven by an individual to see if the mean fastest speed driven by male college students differs from than the mean fastest speed driven by female college students. She surveyed 34 male college students and 29 female college students. The mean for males was 105.5 mph while the mean for females was 90.9 mph. The two samples had a pooled standard deviation of 16.9.

  1. Conduct a t-test comparing the two groups.
Answer

\(SE = 16.9 * \sqrt{\frac{1}{34} + \frac{1}{29}} = 4.272\)

\(t = \frac{105.5 - 90.9}{4.272} = 3.42\)

From the table, the p-value with \(df = 61\) is between 0.001 and 0.005.

  1. Construct a 95% confidence interval for this difference.
Answer

\((105.5-90.9) \pm 2.00(4.272)\)

\(14.6 \pm 8.544\)

\((6.06,23.14)\)


Problem 4

A team from Yale School of Medicine took a look at 1,433 people diagnosed with intracranial meningioma, the most commonly diagnosed brain tumor in the United States. Researchers compared these patients to a test group of 1,350 people without tumors. Participants offered self-reported lifetime dental X-ray histories. Researchers then analyzed whether they had a specific type of X-ray, called a “bitewing” X-ray, at least once a year.

  1. What type of study is this? What type of test would you perform?
Answer

Retrospective; Chi-square or Fisher’s Exact

  1. If the odds ratio calculated for this study turned out to be (1.15, 2.5), what would you conclude?
Answer

The confidence interval does not contain 1, so this would suggest that it is significant.

Our conclusion would be:

There is sufficient evidence to suggest that having at least one “bitewing” X-ray per year is associated with being diagnosed with intracranial meningioma.


Problem 5

Read the following case studies and outline what statistical methods you would use to analyze the prompt:

  • In a study of 16 overweight young adults in India, participants were given, in turns, a dose of an extract made from unroasted coffee beans and a placebo, three times a day over 22 weeks. Their diet throughout the study was unchanged, and they were physically active. Between trials, the participants were given a two-week break for their bodies to reset. Though a few participants given the extract only lost 7 pounds, others lost as much as 26 pounds. On average, the subjects lost 17.5 pounds each, and reduced their body weight by 10.5 percent. Body fat also declined by 16 percent, even though the participants were eating an average of 2,400 calories and burning roughly 400.
Answer

Paired t-test

  • Researchers from Penn State found that increasing the amount of spices in your diet may lower the level of potentially harmful fat in your bloodstream. The experiment compared two groups of healthy, overweight men. One group ate meals seasoned with the special spice blend; the other ate the same meals prepared without the spices. Men who ate the spicy food saw a decrease of one-third in the level of triglycerides (a type of fat linked to heart disease) in their bloodstreams, and 20 percent lower insulin levels overall — even when the meals were high in fat and made with heavy oils.
Answer

2 sample t-test

  • Researchers at Colgate wished to test the effectiveness of a new toothpaste. They collected a sample of 143 individuals and assigned them to either use the current Colgate toothpaste or the new toothpaste for 2 weeks. Participants waited one week and then switched to using the other toothpaste for two weeks. Based on plaque build-up, they determined that 77 participants did better on the new toothpaste than the old. (Note: This study is fictional)
Answer

Binomial Exact test or 1-sample z-test

  • Exposure to cosmic radiation during deep-space missions may damage an astronaut’s heart, a new NASA-funded study suggests. Researchers at Florida State University compared the deaths of 35 astronauts who never traveled into space with those of 42 astronauts who ventured beyond Earth’s protective magnetic field, including seven Apollo veterans who flew to the moon between 1968 and 1972. The study found that lunar astronauts were five times more vulnerable to heart disease—43 percent of them died from cardiovascular ailments compared with only 9 percent of the astronauts that didn’t journey to the moon. A follow-up study involving mice reveals that radiation can trigger long-term changes in the lining of blood vessels associated with atherosclerosis, or “hardening of the arteries.”
Answer

Chi-sq or Fisher’s exact test

  • An investigator collected the annual earnings of 1642 Iowans and 1563 Nebraskans to compare income level by state. The Iowa group had a mean of $65,000, a median of $59,000, and a standard deviation of $12,000. The Nebraska group had a mean of $64,000, a median of $61,000, and a standard deviation of $12,000. (Note: This study is fictional)
Answer

Log transform and 2-sample t-test or Mann-Whitney/Wilcoxon Rank Sum test

  • Researchers at the University of College London surveyed nearly 8,000 participants over the age of 52. Using a fake aspirin bottle complete with instructions as the testing instrument, researchers asked participants to answer four basic questions, including “What is the maximum number of days you may take this medicine?” and “List three situations for which you should consult a doctor.” All the answers could be found on the label. One third of the adults failed to correctly answer all four questions, and one in eight got two or more wrong. Researchers then monitored the volunteers’ health for five years. During that time, 621 of the participants died, and people who missed two or more questions were more than twice as likely to have died than those who got the answers correct.
Answer

Chi-sq or Fisher’s Exact

  • In a study published in Psychological Science, researchers had groups of participants ages 18 to 65 perform simple exercises, such as pressing a button when a letter appeared onscreen or tapping in time with their own breathing. The experts checked periodically to ask the volunteers whether their minds were on the task or they were thinking of something else. At the end, participants were tested on their ability to remember a series of letters while doing math problems; individuals who let their mind wander scored higher on the test.
Answer

2 sample t-test

  • According to the US Census Bureau, the national poverty rate is 11.5%. We wish to see if poverty in Johnson County differs significantly than the national average. We collect a random sample of 1000 individuals to test this.
Answer

Binomial Exact test or 1-sample z-test