Skip to contents

This is practically the same code you can find on this blog post of mine: https://www.brodrigues.co/blog/2018-11-14-luxairport/ but with some minor updates to reflect the current state of the tidyverse packages as well as logging using {loud}.

library(loud)
#> Loading required package: rlang
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
library(stringr)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

data("avia")

# Define required functions

l_select <- loudly(select)
l_pivot_longer <- loudly(pivot_longer)
l_filter <- loudly(filter)
l_mutate <- loudly(mutate)
l_separate <- loudly(separate)
l_group_by <- loudly(group_by)
l_summarise <- loudly(summarise)
avia_clean <- avia %>%
  l_select(1, contains("20")) %>% # select the first column and every column starting with 10
  bind_loudly(l_pivot_longer, -starts_with("unit"), names_to = "date", values_to = "passengers") %>%
  bind_loudly(l_separate,
              col = 1,
              into = c("unit", "tra_meas", "air_pr\\time"),
              sep = ",") 

Let’s focus on monthly data:

avia_monthly <- avia_clean %>%
  bind_loudly(l_filter,
              tra_meas == "PAS_BRD_ARR",
              !is.na(passengers),
              str_detect(date, "M")) %>%
  bind_loudly(l_mutate,
              date = paste0(date, "01"),
              date = ymd(date)) %>%
  bind_loudly(l_select,
              destination = "air_pr\\time", date, passengers)

Now that the data is clean, we can take a look at the log and see what was done:


avia_monthly %>%
  pick("log")
#> [1] "Log start..."                                                                                                                                              
#> [2] "✔ select(.,1,contains(\"20\")) started at 2022-03-16 21:11:59 and ended at 2022-03-16 21:11:59"                                                            
#> [3] "✔ pivot_longer(.l$result,-starts_with(\"unit\"),date,passengers) started at 2022-03-16 21:11:59 and ended at 2022-03-16 21:11:59"                          
#> [4] "✔ separate(.l$result,1,c(\"unit\", \"tra_meas\", \"air_pr\\\\time\"),,) started at 2022-03-16 21:11:59 and ended at 2022-03-16 21:12:00"                   
#> [5] "✔ filter(.l$result,tra_meas == \"PAS_BRD_ARR\",!is.na(passengers),str_detect(date, \"M\")) started at 2022-03-16 21:12:00 and ended at 2022-03-16 21:12:00"
#> [6] "✔ mutate(.l$result,paste0(date, \"01\"),ymd(date)) started at 2022-03-16 21:12:00 and ended at 2022-03-16 21:12:00"                                        
#> [7] "✔ select(.l$result,air_pr\\time,date,passengers) started at 2022-03-16 21:12:00 and ended at 2022-03-16 21:12:00"

This is especially useful if the object avia_monthly gets saved using saveRDS(). People that then read this object, can read the log to know what happened and reproduce the steps if necessary.

Let’s take a look at the final data set:

avia_monthly %>%
  pick("result")
#> # A tibble: 7,632 × 3
#>    destination     date       passengers
#>    <chr>           <date>     <chr>     
#>  1 LU_ELLX_AT_LOWW 2018-03-01 3967      
#>  2 LU_ELLX_AT_LOWW 2018-02-01 3232      
#>  3 LU_ELLX_AT_LOWW 2018-01-01 3701      
#>  4 LU_ELLX_AT_LOWW 2017-12-01 4249      
#>  5 LU_ELLX_AT_LOWW 2017-11-01 4311      
#>  6 LU_ELLX_AT_LOWW 2017-10-01 4591      
#>  7 LU_ELLX_AT_LOWW 2017-09-01 4816      
#>  8 LU_ELLX_AT_LOWW 2017-08-01 4399      
#>  9 LU_ELLX_AT_LOWW 2017-07-01 4277      
#> 10 LU_ELLX_AT_LOWW 2017-06-01 4674      
#> # … with 7,622 more rows