A real world example
real-world-example.Rmd
This is practically the same code you can find on this blog post of mine: https://www.brodrigues.co/blog/2018-11-14-luxairport/ but with some minor updates to reflect the current state of the tidyverse packages as well as logging using {loud}
.
library(loud)
#> Loading required package: rlang
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
library(stringr)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
data("avia")
# Define required functions
l_select <- loudly(select)
l_pivot_longer <- loudly(pivot_longer)
l_filter <- loudly(filter)
l_mutate <- loudly(mutate)
l_separate <- loudly(separate)
l_group_by <- loudly(group_by)
l_summarise <- loudly(summarise)
avia_clean <- avia %>%
l_select(1, contains("20")) %>% # select the first column and every column starting with 10
bind_loudly(l_pivot_longer, -starts_with("unit"), names_to = "date", values_to = "passengers") %>%
bind_loudly(l_separate,
col = 1,
into = c("unit", "tra_meas", "air_pr\\time"),
sep = ",")
Let’s focus on monthly data:
avia_monthly <- avia_clean %>%
bind_loudly(l_filter,
tra_meas == "PAS_BRD_ARR",
!is.na(passengers),
str_detect(date, "M")) %>%
bind_loudly(l_mutate,
date = paste0(date, "01"),
date = ymd(date)) %>%
bind_loudly(l_select,
destination = "air_pr\\time", date, passengers)
Now that the data is clean, we can take a look at the log and see what was done:
avia_monthly %>%
pick("log")
#> [1] "Log start..."
#> [2] "✔ select(.,1,contains(\"20\")) started at 2022-03-16 21:11:59 and ended at 2022-03-16 21:11:59"
#> [3] "✔ pivot_longer(.l$result,-starts_with(\"unit\"),date,passengers) started at 2022-03-16 21:11:59 and ended at 2022-03-16 21:11:59"
#> [4] "✔ separate(.l$result,1,c(\"unit\", \"tra_meas\", \"air_pr\\\\time\"),,) started at 2022-03-16 21:11:59 and ended at 2022-03-16 21:12:00"
#> [5] "✔ filter(.l$result,tra_meas == \"PAS_BRD_ARR\",!is.na(passengers),str_detect(date, \"M\")) started at 2022-03-16 21:12:00 and ended at 2022-03-16 21:12:00"
#> [6] "✔ mutate(.l$result,paste0(date, \"01\"),ymd(date)) started at 2022-03-16 21:12:00 and ended at 2022-03-16 21:12:00"
#> [7] "✔ select(.l$result,air_pr\\time,date,passengers) started at 2022-03-16 21:12:00 and ended at 2022-03-16 21:12:00"
This is especially useful if the object avia_monthly
gets saved using saveRDS()
. People that then read this object, can read the log to know what happened and reproduce the steps if necessary.
Let’s take a look at the final data set:
avia_monthly %>%
pick("result")
#> # A tibble: 7,632 × 3
#> destination date passengers
#> <chr> <date> <chr>
#> 1 LU_ELLX_AT_LOWW 2018-03-01 3967
#> 2 LU_ELLX_AT_LOWW 2018-02-01 3232
#> 3 LU_ELLX_AT_LOWW 2018-01-01 3701
#> 4 LU_ELLX_AT_LOWW 2017-12-01 4249
#> 5 LU_ELLX_AT_LOWW 2017-11-01 4311
#> 6 LU_ELLX_AT_LOWW 2017-10-01 4591
#> 7 LU_ELLX_AT_LOWW 2017-09-01 4816
#> 8 LU_ELLX_AT_LOWW 2017-08-01 4399
#> 9 LU_ELLX_AT_LOWW 2017-07-01 4277
#> 10 LU_ELLX_AT_LOWW 2017-06-01 4674
#> # … with 7,622 more rows