Introduction
When building a pipeline, all the build artifacts are store in the
Nix store of your computer. If you update the pipeline, new artifacts
will be built, but the old ones are still in the store, as long as you
don’t empty it (by calling nix-store --gc
in your
terminal). We can take advantage of this feature to compare old build
artifacts to the new build artifacts after updating the pipeline, or for
debugging purposes.
rixpress keeps track of all pipeline builds by saving detailed build logs. These logs contain information about each derivation, whether it was built successfully, its path in the Nix store, and its outputs. This vignette explains how to work with these logs.
Listing Available Build Logs
When you build a pipeline with rxp_make()
,
rixpress automatically saves a build log in the
_rixpress
directory of your project. Each log is saved with
a timestamp and the hash of the pipeline, making it easy to track
different builds over time.
To list all available build logs, use the
rxp_list_logs()
function:
This will return a data frame with information about all available logs:
filename modification_time size_kb
1 build_log_20250511_194737_242khmf40s7bd7rli9bj4xggg38zhn7w.rds 2025-05-11 19:47:37 0.49
2 build_log_20250511_194238_242khmf40s7bd7rli9bj4xggg38zhn7w.rds 2025-05-11 19:42:38 0.49
3 build_log_20250511_173611_hj4rv79dwcq92vcdqwjjvng1my4zdpaw.rds 2025-05-11 17:36:11 0.49
4 build_log_20250511_173111_5gzvnxswk2lvcb45b0hilw35w4f4p42w.rds 2025-05-11 17:31:11 0.49
The logs are sorted by modification time, with the most recent log first. Each log filename contains:
- A timestamp in the format
YYYYMMDD_HHMMSS
- A hash that uniquely identifies the pipeline configuration
Inspecting Build Logs
To inspect the contents of a build log, use the
rxp_inspect()
function:
# Inspect the most recent build log
rxp_inspect()
# Inspect a specific build log by matching part of the filename
rxp_inspect(which_log = "hj4rv")
The which_log
parameter accepts a regular expression
that is matched against the log filenames. If multiple logs match, the
most recent one is used.
Reading and Loading Data from Specific Builds
The rxp_read()
and rxp_load()
functions
also accept a which_log
parameter, allowing you to read or
load data from specific builds:
# Read output from the most recent build
rxp_read("mtcars_head")
mpg cyl disp hp drat wt qsec vs am gear carb
1 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
2 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
3 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
4 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
5 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
6 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
7 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
8 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
9 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
10 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
rxp_read("mtcars_head", which_log = "hj4rv")
Using log file: build_log_20250511_173611_hj4rv79dwcq92vcdqwjjvng1my4zdpaw.rds
mpg cyl disp hp drat wt qsec vs am gear carb
1 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
2 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
3 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
4 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
5 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
6 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
This is particularly useful when you want to compare results from different builds or revert to a previous version of your data or model.
Use Cases for Working with Multiple Logs
There are several scenarios where working with multiple build logs can be beneficial:
- Comparing results: Compare outputs from different pipeline configurations or code versions.
- Debugging: Identify when a specific issue was introduced by examining logs from different points in time.
- Reproducibility: Access specific versions of your data or models for reproducible analysis.
- Rollback: Revert to a previous version of your pipeline if needed.
Conclusion
Build logs are an important part of rixpress’s reproducibility features. By understanding how to work with these logs, you can better track, debug, and reproduce your data science workflows.