๐Ÿ—“๏ธ Sessions 5 and 6: Recap and practice

Author
Published

18 04 2024

Modified

30 04 2024

This session is about recap and practice. We will do exercises on topics that you suggest, and recap concepts you found particularly hard to grasp. To this end, make sure you communicate your preferences on topics via Moodle until one week before this session.

๐Ÿ‘จโ€๐Ÿซ Lecture Slides

There were no slides used during this session.

This is basically a step-by-step solution to the first function exercise of the Basics exercise collection in the package DataScienceExercises.
# A step-by-step solution for the first function task of the "Basics" tutorial
# Goal: define a function that computes the sample variance of a vector
# Note: there are many strategies to develop functions; here we start by first
# writing the code that solves our problem for one particular case outside the
# function, and then generalize this code in a function.
# First step: think about the starting point for your function. In our case:
# we start with a vector containing some numbers, because this is from what
# we compute the variance in the first place:
example_vector <- c(1,2,3,4)
# Second step: break the formula for the sample variance into parts, and
# translate each part into code. We start with the numerator of the formula:
vector_mean <- mean(example_vector) # The mean of the vector
vector_mean
vector_deviations <- example_vector - vector_mean # Deviation of each element from the mean
vector_deviations
vector_deviations_squared <- vector_deviations**2 # The squared deviations
vector_deviations_squared
numerator <- sum(vector_deviations_squared) # The sum of the squared deviations
numerator
# Now that we have computed the numerator, lets move to the denominator of the
# fraction:
nb_elements_vector <- length(example_vector) # This is the 'n' in the equation
denominator <- nb_elements_vector - 1
# To get the overall result, just divide the numerator by the denominator:
result <- numerator / denominator
result
# Third step: generalize our solution for the particular case into a general
# function. To this end, think of a function name (here: 'var_manual') and
# think about the number of arguments this function needs (here: one, i.e.
# the vector for which we want to compute the variance):
var_manual <- function(x){
print(x)
}
var_manual(example_vector)
# So far, the function only prints the argument we give to it. Now copy paste
# our code from above, and replace the name 'example_vector' with the name
# of our argument (here: x). If we did not do so, the function would only work
# if 'example_vector' was defined outside the function, and it would always
# return the same result no matter what input we provide as an argument.
var_manual <- function(x){
# Numerator:
vector_mean <- mean(x) # The mean of the vector
vector_deviations <- x - vector_mean # Deviation of each element from the mean
vector_deviations_squared <- vector_deviations**2 # The squared deviations
numerator <- sum(vector_deviations_squared) # The sum of the squared deviations
# Denominator:
nb_elements_vector <- length(x) # This is the 'n' in the equation
denominator <- nb_elements_vector - 1
# Result:
result <- numerator / denominator
result
}
# Now we can use the function with arbitrary vectors as input. In the following
# three examples, the same procedure from within the function was applied to
# three different vectors. The variable 'x' within the function can be thought
# of as a placeholder that is replaced by the function input:
var_manual(x = c(1, 2, 3, 4, 5, 99))
var_manual(x = c(-4, 2, 4, 8, 10))
var_manual(x = c(50, 100, 82, 33))
view raw FunctionDefinition.R hosted with โค by GitHub

๐ŸŽฅ Lecture videos

There will be no videos for recap sessions.

๐Ÿ“š Suggested Reading

  • Read again the tutorial on functions
  • I added more mini exercises for defining functions to the package DataScienceExercises as part of the new exercise package Functions
learnr::run_tutorial(
  name = "Functions", 
  package = "DataScienceExercises", 
  shiny_args=list("launch.browser"=TRUE))

โœ๏ธ Coursework

During the first 60 minutes of the session you will work on this sheet in groups of three people. Please use only one computer but develop the solutions together. In the remaining 30 minutes we will go through your solutions and discuss problems. I will post my own solutions online after the session, but urge to first try to come up with solutions on your own.