Read in the Tips dataset.
How do you access just the Tip column?
How do you access just the Tip column but only for the smokers?
How do you access just the Tip column but only when the total bill is less than 15?
(For checking purposes, the means for each of these are presented below)
## [1] 2.998279
## [1] 3.00871
## [1] 2.05025In R, the function we will use is called ‘lm’, which stands for “linear model”. Regression is useful because it can be generalized to all kinds of settings through the notion of a model, as alluded to by its name in R.
To create a model in R, the code looks like this:
mod<-lm(Tip~TotBill,data=tips)
(mod)## 
## Call:
## lm(formula = Tip ~ TotBill, data = tips)
## 
## Coefficients:
## (Intercept)      TotBill  
##      0.9203       0.1050Note: The name is arbitrary; within the function, it should look like (y variable) ~ (x variable), data = (the dataset)
Printing out the model itself gives you the intercept and slope.
If you want to add the line to a plot of the data, you can use the abline() funciton, and just put the name of your model inside the parentheses. In this case, it would be mod. (We will practice this below.)
The output tells us that for every additional dollar that a meal costs, the waiter can expect to get 10.5 cents more on his tip. Note that we could have calculated this from the output the function cor() gives us as well:
cor(tips$Tip,tips$TotBill)*sd(tips$Tip)/sd(tips$TotBill)## [1] 0.1050245Perhaps our prediction of the waiter’s tip should depend on whether or not the table is in the smoking section. One approach would be to fit regression lines separately and make separate predictions for the two groups (which I would encourage you to do for practice). However, for a lot of reasons, making predictions based on multiple variables gets complicated. This is the kind of thing you would explore in the next course, should you decide to take it.
You can perform operations on whole columns of data at a time. For instance:
tipPctgs<-tips$Tip/tips$TotBill
summary(tipPctgs)##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.03564 0.12910 0.15480 0.16080 0.19150 0.71030tipPctgs2<-tipPctgs*100
summary(tipPctgs2)##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.564  12.910  15.480  16.080  19.150  71.030# Problem 1
plot(tips$TotBill,tips$Tip)
# Problem 2
abline(mod)# Problem 3
cor(tips$Tip,tips$TotBill)
# Problem 4
2/sd(tips$Tip)*cor(tips$Tip,tips$TotBill)*sd(tips$TotBill)
# Problem 5
mean(tips$Tip)+10/sd(tips$TotBill)*cor(tips$Tip,tips$TotBill)*sd(tips$Tip)
# Alternatively,
mod$coefficients[1]+mod$coefficients[2]*(mean(tips$TotBill)+10)## [1] 0.6757341
## [1] 8.695428
## [1] 4.048524
## (Intercept) 
##    4.048524