Skip to contents

Introduction

When building a pipeline, all the build artifacts are store in the Nix store of your computer. If you update the pipeline, new artifacts will be built, but the old ones are still in the store, as long as you don’t empty it (by calling nix-store --gc in your terminal). We can take advantage of this feature to compare old build artifacts to the new build artifacts after updating the pipeline, or for debugging purposes.

rixpress keeps track of all pipeline builds by saving detailed build logs. These logs contain information about each derivation, whether it was built successfully, its path in the Nix store, and its outputs. This vignette explains how to work with these logs.

Listing Available Build Logs

When you build a pipeline with rxp_make(), rixpress automatically saves a build log in the _rixpress directory of your project. Each log is saved with a timestamp and the hash of the pipeline, making it easy to track different builds over time.

To list all available build logs, use the rxp_list_logs() function:

This will return a data frame with information about all available logs:

                                                        filename   modification_time size_kb
1 build_log_20250511_194737_242khmf40s7bd7rli9bj4xggg38zhn7w.rds 2025-05-11 19:47:37    0.49
2 build_log_20250511_194238_242khmf40s7bd7rli9bj4xggg38zhn7w.rds 2025-05-11 19:42:38    0.49
3 build_log_20250511_173611_hj4rv79dwcq92vcdqwjjvng1my4zdpaw.rds 2025-05-11 17:36:11    0.49
4 build_log_20250511_173111_5gzvnxswk2lvcb45b0hilw35w4f4p42w.rds 2025-05-11 17:31:11    0.49

The logs are sorted by modification time, with the most recent log first. Each log filename contains:

  1. A timestamp in the format YYYYMMDD_HHMMSS
  2. A hash that uniquely identifies the pipeline configuration

Inspecting Build Logs

To inspect the contents of a build log, use the rxp_inspect() function:

# Inspect the most recent build log
rxp_inspect()

# Inspect a specific build log by matching part of the filename
rxp_inspect(which_log = "hj4rv")

The which_log parameter accepts a regular expression that is matched against the log filenames. If multiple logs match, the most recent one is used.

Reading and Loading Data from Specific Builds

The rxp_read() and rxp_load() functions also accept a which_log parameter, allowing you to read or load data from specific builds:

# Read output from the most recent build
rxp_read("mtcars_head")
    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
1  32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
2  30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
3  33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
4  27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
5  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
6  30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
7  15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
8  19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
9  15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
10 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
rxp_read("mtcars_head", which_log = "hj4rv")
Using log file: build_log_20250511_173611_hj4rv79dwcq92vcdqwjjvng1my4zdpaw.rds
   mpg cyl  disp  hp drat    wt  qsec vs am gear carb
1 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
2 30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
3 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
4 27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
5 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
6 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2

This is particularly useful when you want to compare results from different builds or revert to a previous version of your data or model.

Use Cases for Working with Multiple Logs

There are several scenarios where working with multiple build logs can be beneficial:

  1. Comparing results: Compare outputs from different pipeline configurations or code versions.
  2. Debugging: Identify when a specific issue was introduced by examining logs from different points in time.
  3. Reproducibility: Access specific versions of your data or models for reproducible analysis.
  4. Rollback: Revert to a previous version of your pipeline if needed.

Conclusion

Build logs are an important part of rixpress’s reproducibility features. By understanding how to work with these logs, you can better track, debug, and reproduce your data science workflows.