🗓️ Sessions 17 & 18: Linear regresssion

Author

Published

14 06 2024

Modified

21 06 2024

Simple linear regression is one of the most commonly used methods in inferential statistics or supervised machine learning. It can be used to study the relationship between two numerical variables and make predictions about the values of one of them based on the analysis of a sample. In this session we will discuss when to use linear regression models and where the limitations of this method lie.

👨‍🏫 Lecture Slides

Either click on the slide area below or click here to download the slides.

Lecture code

	library(tibble)
	library(ggplot2)
	library(moderndive)

	# 1. Implement linear regression-----------------
	# Make a shortcut to the data:
	beer_data <- as_tibble(DataScienceExercises::beer)
	head(beer_data)

	# Conduct the linear regression:
	beer_lm <- lm(
	formula = consumption ~ income,
	data = beer_data)
	beer_lm

	# To get more information about the regression:
	summary(beer_lm)
	moderndive::get_regression_table(beer_lm)

	# Digression: the results might change drastically if you do a multiple linear
	# regression
	summary(lm(
	formula = consumption ~ income + price,
	data = beer_data))
	# More info on this: "omitted variable bias"

	# 2. Compute R2-----------------

	# Illustrating what we mean by total variation:
	mean_consumption <- mean(beer_data$consumption)

	ggplot(data = beer_data, aes(x=1:30, y=consumption)) +
	geom_hline(yintercept = mean_consumption) +
	geom_point() + theme_linedraw()

	# Compute TSS, RSS and ESS manually:
	tss <- sum((beer_data$consumption - mean_consumption)**2)
	rss <- sum(beer_lm$residuals**2)
	ess <- sum((beer_lm$fitted.values - mean_consumption)**2)

	# From this we can compute R2 manually:
	ess/tss

	# Compare to what you get from, e.g., summary():
	summary(beer_lm)[["r.squared"]]

view raw LinearRegression.R hosted with ❤ by GitHub

🎥 Lecture videos

So far, there are no learning videos available for this lecture.

📚 Mandatory Reading

Chapter 5 in Ismay and Kim (2020).

🏆 Further readings

Chapter 3 in James et al. (2021)

✍️ Coursework

Do the exercises LinearRegression1 from the DataScienceExercises package

Quick code for starting the exercises

learnr::run_tutorial(
  name = "LinearRegression1", 
  package = "DataScienceExercises", 
  shiny_args=list("launch.browser"=TRUE))

References

Ismay, C. and Kim, A. Y.-S. (2020) Statistical inference via data science: A ModernDive, into R and the tidyverse, Boca Raton: CRC Press, Taylor and Francis Group, available at https://moderndive.com/index.html.

James, G., Witten, D., Hastie, T. and Tibshirani, R. (2021) An introduction to statistical learning: With applications in R, Second edition., New York, NY: Springer.