🗓️ Session 10: Data preparation

Author
Published

03 05 2024

Modified

11 05 2024

In this session you learn how to turn your raw data into a state such that you can work with it. Luckily, there is one particular form for our data that represents the common starting point for all further operations, such as visualization or modelling. This form is called tidy data. And the goal of this session is to equip you with the tools that you need to turn the often messy raw data into tidy data. These skills are important because they make you independent: you will be able to prepare any data you find or create yourself such that you can further process it, and you will not rely on others to provide you data in a particular form.

👨‍🏫 Lecture Slides

Either click on the slide area below or click here to download the slides.

🎥 Lecture videos

All the videos are available via this playlist.

📚 Mandatory Reading

Further Reading

✍️ Coursework

  • Do the exercises Wrangling1 from the DataScienceExercises package
learnr::run_tutorial(
  name = "Wrangling1", 
  package = "DataScienceExercises", 
  shiny_args=list("launch.browser"=TRUE))
  • Download data about the CO2 emissions for some countries of your choice from the World Bank website for the years 2000 to 2020. Set up an R project, save the data, import it, and make a line graph.
  • If you want more exercises on the challenge of making data longer/wider, you can do the exercises Wrangling2 from the DataScienceExercises package
learnr::run_tutorial(
  name = "Wrangling2", 
  package = "DataScienceExercises", 
  shiny_args=list("launch.browser"=TRUE))

References

Wickham, H. (2014) Tidy Data,” Journal of Statistical Software 59(10).
Wickham, H., Çetinkaya-Rundel, M. and Grolemund, G. (2023) R for data science: Import, tidy, transform, visualize, and model data, 2nd edition., Beijing et al.: O’Reilly, available at https://r4ds.hadley.nz/.

Footnotes

  1. You can ignore the make_co2_data.R for now and only look at make_co2_plot.R.↩︎