::run_tutorial(
learnrname = "LinearRegression1",
package = "DataScienceExercises",
shiny_args=list("launch.browser"=TRUE))
ποΈ Sessions 17 & 18: Linear regresssion
Simple linear regression is one of the most commonly used methods in inferential statistics or supervised machine learning. It can be used to study the relationship between two numerical variables and make predictions about the values of one of them based on the analysis of a sample. In this session we will discuss when to use linear regression models and where the limitations of this method lie.
π¨βπ« Lecture Slides
Either click on the slide area below or click here to download the slides.
Lecture code
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(tibble) | |
library(ggplot2) | |
library(moderndive) | |
# 1. Implement linear regression----------------- | |
# Make a shortcut to the data: | |
beer_data <- as_tibble(DataScienceExercises::beer) | |
head(beer_data) | |
# Conduct the linear regression: | |
beer_lm <- lm( | |
formula = consumption ~ income, | |
data = beer_data) | |
beer_lm | |
# To get more information about the regression: | |
summary(beer_lm) | |
moderndive::get_regression_table(beer_lm) | |
# Digression: the results might change drastically if you do a multiple linear | |
# regression | |
summary(lm( | |
formula = consumption ~ income + price, | |
data = beer_data)) | |
# More info on this: "omitted variable bias" | |
# 2. Compute R2----------------- | |
# Illustrating what we mean by total variation: | |
mean_consumption <- mean(beer_data$consumption) | |
ggplot(data = beer_data, aes(x=1:30, y=consumption)) + | |
geom_hline(yintercept = mean_consumption) + | |
geom_point() + theme_linedraw() | |
# Compute TSS, RSS and ESS manually: | |
tss <- sum((beer_data$consumption - mean_consumption)**2) | |
rss <- sum(beer_lm$residuals**2) | |
ess <- sum((beer_lm$fitted.values - mean_consumption)**2) | |
# From this we can compute R2 manually: | |
ess/tss | |
# Compare to what you get from, e.g., summary(): | |
summary(beer_lm)[["r.squared"]] |
π₯ Lecture videos
So far, there are no learning videos available for this lecture.
π Mandatory Reading
π Further readings
βοΈ Coursework
- Do the exercises
LinearRegression1
from theDataScienceExercises
package
Quick code for starting the exercises
References
Ismay, C. and Kim, A. Y.-S. (2020) Statistical inference via data science: A ModernDive, into R and the tidyverse, Boca Raton: CRC Press, Taylor and Francis Group, available at https://moderndive.com/index.html.
James, G., Witten, D., Hastie, T. and Tibshirani, R. (2021) An introduction to statistical learning: With applications in R, Second edition., New York, NY: Springer.