Introduction
This vignette will guide you through the primary debugging workflow in rixpress, covering how to:
- Inspect the error messages from a failed build.
- Trace the dependency graph to find structural problems.
- Isolate specific parts of the pipeline for focused debugging.
- Access logs from previous builds to investigate regressions.
The First Response to a Failed Build:
rxp_inspect()
Imagine you have just run rxp_make()
and are greeted
with an error message in your console.
Build process started...
+ > mtcars building
+ > mtcars_am building
+ > mtcars_head building
x mtcars_head errored
✓ mtcars built
✓ mtcars_am built
! pipeline completed [2 completed, 1 errored]
Build failed! Run `rxp_inspect()` for a summary.
The build has failed. Your immediate next step should always be to
run rxp_inspect()
. By default, this function reads the most
recent build log, which in this case is the one from our failed run.
This will return a data frame summarizing the status of every derivation in the pipeline. Let’s look at a hypothetical output:
derivation build_success path output
1 all-derivations FALSE /nix/store/j5...-all-derivations mtcars_head
2 mtcars_am TRUE /nix/store/a4...-mtcars_am mtcars_am
3 mtcars_head FALSE <NA> <NA>
4 mtcars TRUE /nix/store/b9...-mtcars mtcars
error_message
1 <NA>
2 <NA>
3 Error: function 'headd' not found\nExecution halted\n
4 <NA>
The two most important columns for debugging are
build_success
and error_message
.
-
build_success
: ThisTRUE
/FALSE
column immediately tells you which derivation failed. In our example,mtcars_head
is the culprit. -
error_message
: This column contains the standard error output captured from the Nix build process. It provides the exact reason for the failure. Here, the message"Error: function 'headd' not found"
points to a simple typo in our R code.
By pinpointing the specific derivation and providing the raw error
message, rxp_inspect()
eliminates guesswork and directs you
straight to the source of the problem.
Investigating Structural Issues with rxp_trace()
Sometimes, a pipeline fails not because of a typo in a single
derivation, but because of a logical error in how the derivations are
connected. rxp_trace()
is the tool for diagnosing these
structural issues. It reads the pipeline’s dependency graph
(dag.json
) and helps you answer questions like:
- “What steps must run before this one?” (Dependencies)
- “If I change this step, what other steps will be affected?” (Reverse Dependencies)
For instance, if mtcars_mpg
is producing an unexpected
result, you can trace its lineage:
rxp_trace("mtcars_mpg")
This might return:
==== Lineage for: mtcars_mpg ====
Dependencies (ancestors):
- filtered_mtcars
- mtcars*
Reverse dependencies (children):
- final_report
Note: '*' marks transitive dependencies (depth >= 2).
This output clearly shows that mtcars_mpg
depends
directly on filtered_mtcars
and indirectly (transitively)
on mtcars
. It also shows that final_report
depends on it. If you expected mtcars_mpg
to depend on a
different intermediate object, this trace would immediately reveal the
mistake in your pipeline definition.
Calling rxp_trace()
without any arguments will print the
entire dependency tree, which is useful for getting a high-level
overview of your project’s structure.
You could instead plot the DAG using rxp_ggdag()
for
example, but if the project is large, reading the DAG could be
difficult. rxp_trace()
should be more useful in these
cases.
A Proactive Strategy: Isolating Derivations with
noop_build
When debugging or prototyping, you often need to make frequent changes to an early step in your pipeline. If a slow, computationally expensive derivation depends on this changing step, your development cycle can become painfully slow. Because Nix’s caching is based on inputs, any change to an upstream step will invalidate the cache for all downstream steps. Imagine a pipeline where you are tuning a data preprocessing step, which is then followed by a lengthy model training process:
list(
# We are actively changing the filter condition in this step
rxp_r(
name = preprocessed_data,
expr = filter(raw_data, year > 2020)
),
# This step takes hours to run
rxp_r(
name = expensive_model,
expr = run_long_simulation(preprocessed_data)
),
rxp_rmd(
name = final_report,
rmd_file = "report.Rmd" # Depends on expensive_model
)
)
In this scenario, every time you adjust the filter() condition in preprocessed_data, Nix correctly invalidates the cache for expensive_model. This means the hours-long simulation will be re-triggered with every small change, making it impossible to iterate quickly on the preprocessing logic. This is the perfect use case for noop_build = TRUE. By applying it to the expensive downstream step, you temporarily break the dependency chain:
list(
# We can now change this step as much as we want
rxp_r(
name = preprocessed_data,
expr = filter(raw_data, year > 2020)
),
# This and all downstream steps will be skipped
rxp_r(
name = expensive_model,
expr = run_long_simulation(preprocessed_data),
noop_build = TRUE
),
rxp_rmd(
name = final_report,
rmd_file = "report.Rmd" # Also becomes a no-op
)
)
Now, when you run rxp_make()
,
preprocessed_data
will build as normal. However,
expensive_model
will resolve to a no-op build, and because
final_report
depends on it, it will also become a no-op.
This allows you to rapidly iterate on and validate the
preprocessed_data
logic in isolation, without waiting for
the simulation to run. Once you are satisfied with the preprocessing,
simply remove noop_build = TRUE
to re-enable the full
pipeline and run the expensive model training with your finalized
data.
Historical Debugging: Going Back in Time
When iterating quickly, it might be useful to compare results to the ones obtained from previous runs. It is possible to check results from previous runs using the logs.
First, use rxp_list_logs()
to see the build history:
filename modification_time size_kb
1 build_log_20250815_113000_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6.rds 2025-08-15 11:30:00 0.51
2 build_log_20250814_170000_z9y8x7w6v5u4t3s2r1q0p9o8n7m6l5k4.rds 2025-08-14 17:00:00 0.50
You can see a successful build from yesterday
(20250814
). To find out the differences with today’s
results, you can inspect that specific log by providing a unique part of
its filename to which_log
:
# Inspect yesterday's successful build log
rxp_inspect(which_log = "20250814")
This allows you to compare yesterday’s build summary with today’s
one. Furthermore, you can use rxp_read()
with
which_log
to load the actual artifact from the
previous run, which is invaluable for comparing data or model outputs
across different versions of your pipeline.
# Load the output of `mtcars_head` from yesterday's build
old_head <- rxp_read("mtcars_head", which_log = "20250814")
Conclusion
Debugging in rixpress is a systematic process supported by a powerful set of tools. By following this workflow, you can efficiently resolve issues in your pipelines:
- For runtime errors, start with
rxp_inspect()
to find the failed derivation and its error message. - For logical or structural errors, use
rxp_trace()
to understand the dependencies. - To speed up iteration, use
noop_build = TRUE
to isolate the part of the pipeline you are working on. - For regressions, use
rxp_list_logs()
and thewhich_log
argument to travel back in time and compare results.