<- c(2, 5, 2.4, 11) ex1_vec
Exercises for Recap Session 1
Possible solutions
Exercise 1: Basic object types I
- Create a vector containing the numbers
2
,5
,2.4
and11
.
- Replace the second element with
5.9
.
2] <- 5.9
ex1_vec[ ex1_vec
[1] 2.0 5.9 2.4 11.0
- Add the elements
3
and1
to the beginning, and the elements"8.0"
and"9.2"
to the end of the vector.
<- c(3, 1)
va_1 <- c("8.0", "9.2")
va_2 <- c(va_1, ex1_vec, va_2)
ex1_vec_extended ex1_vec_extended
[1] "3" "1" "2" "5.9" "2.4" "11" "8.0" "9.2"
- Create a vector with the numbers from -8 to 9 (step size: 0.5)
<- seq(-8, 9, by = 0.5)
ex1_vec_4 ex1_vec_4
[1] -8.0 -7.5 -7.0 -6.5 -6.0 -5.5 -5.0 -4.5 -4.0 -3.5 -3.0 -2.5 -2.0 -1.5 -1.0
[16] -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5
[31] 7.0 7.5 8.0 8.5 9.0
- Compute the square root of each element of the first vector using vectorisation.
sqrt(ex1_vec_4)
Warning in sqrt(ex1_vec_4): NaNs produced
[1] NaN NaN NaN NaN NaN NaN NaN
[8] NaN NaN NaN NaN NaN NaN NaN
[15] NaN NaN 0.0000000 0.7071068 1.0000000 1.2247449 1.4142136
[22] 1.5811388 1.7320508 1.8708287 2.0000000 2.1213203 2.2360680 2.3452079
[29] 2.4494897 2.5495098 2.6457513 2.7386128 2.8284271 2.9154759 3.0000000
- Create a character vector containing then strings
"Number_1"
to"Number_5"
. Use suitable helper functions to create this vector quickly.
<- paste0("Number_", seq(1, 5))
ex1_char_vec ex1_char_vec
[1] "Number_1" "Number_2" "Number_3" "Number_4" "Number_5"
Exercise 2: Basic object types II
Consider the following vector:
<- c(1.9, "2", FALSE) ex_2_vec
- What is the type of this vector? Why?
typeof(ex_2_vec)
[1] "character"
Atomic vectors only contain objects of the same type, and there is a hierarchy. Elements that themselves are of a type lower in the hierarchy are coerced to the same type as the object highest in the hierarchy. The hierarchy is as as follows:
character
double
integer
logical
Therefore, the type of ex_2_vec
is character
. The underlying reason is that you can, for instance, always transform a double
value into a character
but not vice versa.
- What happens if you coerce this vector into type integer? Why?
as.integer(ex_2_vec)
Warning: NAs introduced by coercion
[1] 1 2 NA
Because integer
is lower in the hierarchy than character
, the transformation is not straightforward. By coincidence, the first two elements can actually be coerced into integers (albeit maybe not with the expected result), but there is no way you can transform the logical value FALSE
into an integer, which is why a missing value is produced.
- What does
sum(is.na(x))
tell you about a vectorx
? What is happening here?
<- c(1,2,3,NA,NA,8) x
First, is.na(x)
creates a vector with logical values indicating whether a value of the original vector is missing (i.e. NA
):
is.na(x)
[1] FALSE FALSE FALSE TRUE TRUE FALSE
Then, sum()
computes the sum over this vecor of boolean values:
sum(is.na(x))
[1] 2
Here, TRUE
counts as one and FALSE
as zero, so sum()
gives the number of cases in which is.na(x)
has evaluated to TRUE
:
- Is it a good idea to use
as.integer()
on double characters to round them to the next integer? Why (not)? What other ways are there to do the rounding?
No, because as.integer()
is not acutally rounding numbers (as, for example, as.integer(2.1)
would make you think), but only removing the decimal part of the number:
as.integer(2.9) # you might expect 2...
[1] 2
Better use round()
:
round(2.9)
[1] 3
Exercise 3: Define a function
Create functions that take a vector as input and returns:
- The last value.
<- function(x){
get_last_val <- x[length(x)]
last_val return(last_val)
}
- Every element except the last value and any missing values.
<- function(x){
get_beginning <- x[-length(x)] # Removes last value
beginning <- which(is.na(beginning)) # Get positions of NA values
na_positions <- beginning[-na_positions] # Removes these values
beginning_nonas return(beginning_nonas)
}
- Only even numbers.
Hint: Use the operation
x %% y
to get the remainder from divingx
byy
, the so called ‘modulo y’. For even numbers, the modulo 2 is zero.
<- function(x){
get_even <- x%%2 # Module 2 is zero for even numbers only
modulo_2s <- x[modulo_2s==0] # Keep only those for which modulo 2 is zero
even_nbs <- which(is.na(even_nbs)) # Get positions of NA values
na_positions <- even_nbs[-na_positions] # Removes these values
even_nbs_nonas return(even_nbs_nonas)
}
Apply your function to the following example vector:
<- c(1, -8, 99, 3, NA, 4, -0.5, 50) ex_3_vec
get_last_val(ex_3_vec)
[1] 50
get_beginning(ex_3_vec)
[1] 1.0 -8.0 99.0 3.0 4.0 -0.5
get_even(ex_3_vec)
[1] -8 4 50
Exercise 4: Lists
- Create a list that contains three elements called
'a'
,'b'
and'c'
. The first element should correspond to a double vector with the elements1.5
,-2.9
and99
. The second element should correspond to a character vector with the elments'Hello'
,'3'
, and'EUF'
. The third element should contain three times the entryFALSE
.
<- list(
ex_4_list 'a' = c(1.5, -2.9, 99),
'b' = c('Hello', "'3'", 'EUF'),
'c' = rep(FALSE, 3)
)
- Transform this list into a
data.frame
and atibble
. Then applystr()
to get information about the respective structure. How do the results differ?
<- as.data.frame(ex_4_list)
ex_4_df <- tibble::as_tibble(ex_4_list)
ex_4_tb str(ex_4_list)
List of 3
$ a: num [1:3] 1.5 -2.9 99
$ b: chr [1:3] "Hello" "'3'" "EUF"
$ c: logi [1:3] FALSE FALSE FALSE
str(ex_4_df)
'data.frame': 3 obs. of 3 variables:
$ a: num 1.5 -2.9 99
$ b: chr "Hello" "'3'" "EUF"
$ c: logi FALSE FALSE FALSE
str(ex_4_tb)
tibble [3 × 3] (S3: tbl_df/tbl/data.frame)
$ a: num [1:3] 1.5 -2.9 99
$ b: chr [1:3] "Hello" "'3'" "EUF"
$ c: logi [1:3] FALSE FALSE FALSE
str()
only differs with regard to the first line describing the type.
Exercise 5: Data frames and the study semester distribution at EUF
The package DataScienceExercises
contains a data set called EUFstudentsemesters
, which contains information about the distribution of study semesters of enrolled students at the EUF in 2021. You can shortcut the data set as follows:
<- DataScienceExercises::EUFstudentsemesters euf_semesters
- What happens if you extract the column with study semesters as a vector and transform it into a
double
?
unique(euf_semesters[["Semester"]])
[1] "6" "4" "2" "8" "9 or higher"
[6] "7" "5" "3" "1"
<- as.double(euf_semesters[["Semester"]]) semesters
Warning: NAs introduced by coercion
unique(semesters)
[1] 6 4 2 8 NA 7 5 3 1
We see that the previous entry "9 or higher"
has been transformed into NA
.
- What is the average study semester of those students being in their 8th or earlier semester?
mean(semesters, na.rm = TRUE)
[1] 4.177026
- How many students are in their 9th or higher study semester?
sum(euf_semesters$Semester=="9 or higher")
[1] 469
- What does
typeof(euf_semesters)
return and why?
typeof(euf_semesters)
[1] "list"
It returns list
, because while euf_semesters
is a tibble
, typeof()
always gives the underlying basic object type. For tibble
s, this is list
.