🗓️ Sessions 15 and 16: Sampling

Author

Published

07 06 2024

Modified

14 06 2024

A central concept in data science - and in applied statistics more generally - is that of sampling. This refers to the strategy of using (small) samples to learn about a (large) population. For example, if you wanted to understand the effect of TV advertising on the consumer behaviour of young men in Germany, you could study the whole population of young men in Germany. But since this is usually not feasible, you would rather take a sample of young men, study their behaviour and then generalise to the whole population. In this session we will discuss when and how this is possible. In this context, we will also learn about the concept of Monte Carlo simulations and two central concepts of probability theory underlying applied statistics: the central limit theorem and the law of large numbers, both of which underlie much of modern sampling theory.

👨‍🏫 Lecture Slides

Either click on the slide area below or click here to download the slides.

Lecture code

Extensive solution for the exercise on the average height of EUF students
Solution for the exercise on terminology
Explanation of the Central Limit Theorem and the corresponding exercise

🎥 Lecture videos

So far, there are no learning videos available for this lecture.

📚 Mandatory Reading

Tutorial on sampling
Chapter 7 in Ismay and Kim (2020).

✍️ Coursework

Do the exercises Sampling from the DataScienceExercises package

Quick code for starting the exercises

learnr::run_tutorial(
  name = "Sampling", 
  package = "DataScienceExercises", 
  shiny_args=list("launch.browser"=TRUE))

👨‍🏫 Lecture Slides

🎥 Lecture videos

📚 Mandatory Reading

✍️ Coursework

References