Objectives

In today’s lab we will:

  1. Learn to make tables and proportion tables in R
  2. Learn bracketing techniques for indexing tables
  3. Work on calculating weighted averages of data, adjusting for confounders

Constructing Tables

First, let’s read in the ‘titanic’ dataset and compute some summary statistics.

titanic <- read.delim("https://s3.amazonaws.com/pbreheny-data-sets/titanic.txt")
summary(titanic)
##     Class               Sex                Age              Survived        
##  Length:2201        Length:2201        Length:2201        Length:2201       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character

By default, when the summary() function encounters categorical data, it produces a table for that column, as evidenced above, when it created 4 separate tables. A better way to look at the variables in the dataset would be to use the table() function. Recall from lab 1 that we can extract vectors from the dataset using the ‘$’ symbol:

table(titanic$Class)
## 
##  1st  2nd  3rd Crew 
##  325  285  706  885

But the table function is more versatile than that. For example, we can create tables using 2 variables:

table(titanic$Class,titanic$Survived)
##       
##        Died Survived
##   1st   122      203
##   2nd   167      118
##   3rd   528      178
##   Crew  673      212

If we give the function more than two variables, it creates multiple tables (one for each level):

table(titanic$Class, titanic$Survived, titanic$Sex)
## , ,  = Female
## 
##       
##        Died Survived
##   1st     4      141
##   2nd    13       93
##   3rd   106       90
##   Crew    3       20
## 
## , ,  = Male
## 
##       
##        Died Survived
##   1st   118       62
##   2nd   154       25
##   3rd   422       88
##   Crew  670      192

It is recommended to keep the number of variables down to 2 or 3, as more than that begins to get a bit cluttered and confusing.

If we save a table, we can use square brackets to access individual numbers [row,column]. When the row or column entry is left blank using this type of indexing, R will take all the rows or columns (whichever one was left blank):

class_surv <- table(titanic$Class,titanic$Survived)

class_surv
##       
##        Died Survived
##   1st   122      203
##   2nd   167      118
##   3rd   528      178
##   Crew  673      212
class_surv[3,2] 
## [1] 178
# The 3 indicates the third row and the 2 indicates the second column, 
# so this is the number of 3rd class passengers who survived.

class_surv[4, 1]
## [1] 673
class_surv[,2] 
##  1st  2nd  3rd Crew 
##  203  118  178  212
# Note that the column position is left blank. This extracts all rows and 2nd column.
# In the context of the data, this shows us the number of people that survived in each class.

class_surv[3,]
##     Died Survived 
##      528      178
# Note that the row position is left blank. This extracts the 3rd row (3rd class) and all columns.
# In the context of the data, this shows us the number of people who died and survived from the 3rd class.

Additionally, we can use the “==” to access data with certain specifications. For instance, if we wanted to access only the data in which people survived, we could use the following code:

justSurv <- titanic[titanic$Survived == "Survived",]
# Note that the name of the column you want should be in quotations.
# ie, == "Survived",] is different from == Survived,]. The latter will produce an error code.

# Or if we only want 2nd class travelers.

justSecond <- titanic[titanic$Class == "2nd",]

We can also use prop.table() to get the proportions for each cell of a table:

prop.table(class_surv, 1) # Gives proportions for each class (by row)
##       
##             Died  Survived
##   1st  0.3753846 0.6246154
##   2nd  0.5859649 0.4140351
##   3rd  0.7478754 0.2521246
##   Crew 0.7604520 0.2395480
prop.table(class_surv, 2) # Gives proportions for the survival groups (by column)
##       
##              Died   Survived
##   1st  0.08187919 0.28551336
##   2nd  0.11208054 0.16596343
##   3rd  0.35436242 0.25035162
##   Crew 0.45167785 0.29817159

Weighted Averages (Manual Steps)

Let’s investigate Titanic survival rates based on class.
(For the sake of practice, we will do this manually and then with an R function.)

Recall the steps for computing a weighted average:

  1. Identify the outcome, the group you are interested in, and the confounder that you are adjusting for

  2. Calculate \(w_1, w_2, ..., w_n\), the overall proportion of observations that belong to each level of the confounder

  3. Calculate \(\bar{x}_1,\bar{x}_2, ..., \bar{x}_n\), the average (or percentage) for that group at each level of the confounder

  4. Calculate the weight average \(\bar{x} = \Sigma_iw_i\bar{x}_i\)

, , Sex = Female

      Survived
Class  Survived Total
  1st       141   145
  2nd        93   106
  3rd        90   196
  Crew       20    23

, , Sex = Male

      Survived
Class  Survived Total
  1st        62   180
  2nd        25   179
  3rd        88   510
  Crew      192   862
Part A:

From “class_surv” above, calculate the overall percentages of survival for each class.

Part B:

Now, create a table listing the percentage of passengers in each class who survived, broken down by sex.

Part C:

Create a table listing the total proportion of male and female passengers on the ship.

Part D:

Finally, construct a weighted average of the percentage of passengers in each class who survived, controlling for the effect of sex (i.e., report one number for each class).

Do any of these results surprise you? What changed in Part a when compared to Part d? What conclusions can we draw from this?

Answers

## Part a
##       1st       2nd       3rd      Crew 
## 0.6246154 0.4140351 0.2521246 0.2395480
## Part b
##       
##           Female      Male
##   1st  0.9724138 0.3444444
##   2nd  0.8773585 0.1396648
##   3rd  0.4591837 0.1725490
##   Crew 0.8695652 0.2227378
## Part c
## 
##    Female      Male 
## 0.2135393 0.7864607
## Part d
## 1st :  0.479
## 2nd :  0.297
## 3rd :  0.234
## Crew:  0.361

Weighted Averages (with R function)

We can also use R to solve these problems. There is no simple function that allows you to calculate the weighted average. Below are a few of ways to do this. Note that the first method could introduce mistakes since you are inputting values individually and also could be a lengthy process depending on the structure of the dataset. Typically, we would prefer to use the second method which is more efficient and has less opportunity for error.

overallSex <- prop.table(table(titanic$Sex))

#First Method
firstclass <- c(141, 62) / c(145, 180) # From table provided

(first_method1 <- weighted.mean(firstclass, overallSex))
## [1] 0.4785406
#Second Method
classtable <- table(titanic$Sex,titanic$Class,titanic$Survived)
classes <- prop.table(classtable, 1:2)[,,2]

(first_method2 <- weighted.mean(classes[,1], overallSex))
## [1] 0.4785406

Practice

Now let’s say that we want to investigate the difference in survival by sex for the Titanic data set. Construct a weighted average of the percentage of passengers for each sex who survived, controlling for the effect of class. For the sake of learning the process do not use the weighted.mean() function except to check your answer

Practice Answers

## , , Class = 1st
## 
##         Survived
## Sex      Survived Total
##   Female      141   145
##   Male         62   180
## 
## , , Class = 2nd
## 
##         Survived
## Sex      Survived Total
##   Female       93   106
##   Male         25   179
## 
## , , Class = 3rd
## 
##         Survived
## Sex      Survived Total
##   Female       90   196
##   Male         88   510
## 
## , , Class = Crew
## 
##         Survived
## Sex      Survived Total
##   Female       20    23
##   Male        192   862

First we find the proportion of people who survived by sex:

##    Female      Male 
## 0.7319149 0.2120162

Then we find the percentage of passengers of each sex who survived broken down by class:

##         
##                1st       2nd       3rd      Crew
##   Female 0.9724138 0.8773585 0.4591837 0.8695652
##   Male   0.3444444 0.1396648 0.1725490 0.2227378

Next we create a table listing the total proportion of passengers on the ship in each class:

## 
##       1st       2nd       3rd      Crew 
## 0.1476602 0.1294866 0.3207633 0.4020900

Finally we construct a weighted average of the percentage of passengers of each sex who survived, controlling for the effect of class:

## F :  0.754
## M :  0.214

Then we can check our answer using the weighted.mean() function:

overallclass <- prop.table(table(titanic$Class))

#Second Method
practice_table <- table(titanic$Class,titanic$Sex,titanic$Survived)
sexes <- prop.table(practice_table, 1:2)[,,2]

(weighted.mean(sexes[,1], overallclass))
## [1] 0.7541256
(weighted.mean(sexes[,2], overallclass))
## [1] 0.2138535