ryxpress — Reproducible Analytical Pipelines with Nix (Python)
If you’re looking for {rixpress}, the R package version look here.
ryxpress is a Python reimplementation/port of the R package {rixpress}. It
provides helpers and a small framework to build and work with reproducible,
polyglot analytical pipelines that are built with Nix.
The goal is to define a pipeline using the following R code:
library(rixpress)
library(igraph)
list(
rxp_py_file(
name = mtcars_pl,
path = 'data/mtcars.csv',
read_function = "lambda x: polars.read_csv(x, separator='|')"
),
rxp_py(
# reticulate doesn't support polars DFs yet, so need to convert
# first to pandas DF
name = mtcars_pl_am,
expr = "mtcars_pl.filter(polars.col('am') == 1).to_pandas()"
),
rxp_py2r(
name = mtcars_am,
expr = mtcars_pl_am
),
rxp_r(
name = mtcars_head,
expr = my_head(mtcars_am),
user_functions = "functions.R"
),
rxp_r2py(
name = mtcars_head_py,
expr = mtcars_head
),
rxp_py(
name = mtcars_tail_py,
expr = 'mtcars_head_py.tail()'
),
rxp_py2r(
name = mtcars_tail,
expr = mtcars_tail_py
),
rxp_r(
name = mtcars_mpg,
expr = dplyr::select(mtcars_tail, mpg)
),
rxp_qmd(
name = page,
qmd_file = "my_doc/page.qmd",
additional_files = c("my_doc/content.qmd", "my_doc/images")
)
) |>
rxp_populate(project_path = ".", build = TRUE)
ryxpress will execute an R session and call {rixpress} transparently to
build this pipeline in a completely reproducible way. Exploring the build
artifacts can then be done from an interactive Python session.
If you previously used the R version ({rixpress}), ryxpress aims to provide a
similar user experience for Python projects while integrating with the same
Nix-first workflow.
Video introduction (original R demo)
Quick overview
- Use Nix to describe reproducible runtime/build environments.
- Define pipeline derivations (build steps) in your project using R syntax, but inspect and load artifacts using Python.
- Build pipelines with Nix and use ryxpress helpers to read, load or copy outputs from the Nix store.
Installation
ryxpress is on Pypi and can be installed using any of the usual package managers.
That being said, since ryxpress requires both Nix and R to be available to
function, we provide instructions for Nix only.
Prerequisites
- Nix installed on your machine. See the Nix project docs or Determinate Systems' installer.
Because ryxpress is a wrapper around the R version, both R and {rixpress} need to be available,
and since there’s not much point in using ryxpress if you don’t have Nix installed, the easiest
way to install it is to build the environment as defined by this default.nix:
let
pkgs = import (fetchTarball "https://github.com/rstats-on-nix/nixpkgs/archive/2025-09-11.tar.gz") {};
rixpress = (pkgs.rPackages.buildRPackage {
name = "rixpress";
src = pkgs.fetchgit {
url = "https://github.com/b-rodrigues/rixpress";
rev = "9a5dd6c31be9e6d413529924dd0816a510335881";
sha256 = "sha256-iQRo42RSnJ1C/ySCRyuaDt2MTP9G6g52wm+kkSHCir0=";
};
propagatedBuildInputs = builtins.attrValues {
inherit (pkgs.rPackages)
igraph
jsonlite
processx;
};
});
pyconf = builtins.attrValues {
inherit (pkgs.python313Packages)
pip
ipykernel
biocframe
pandas
rds2py
ryxpress;
};
system_packages = builtins.attrValues {
inherit (pkgs)
glibcLocales
nix
python313
R;
};
shell = pkgs.mkShell {
LOCALE_ARCHIVE = if pkgs.system == "x86_64-linux" then "${pkgs.glibcLocales}/lib/locale/locale-archive" else "";
LANG = "en_US.UTF-8";
LC_ALL = "en_US.UTF-8";
LC_TIME = "en_US.UTF-8";
LC_MONETARY = "en_US.UTF-8";
LC_PAPER = "en_US.UTF-8";
LC_MEASUREMENT = "en_US.UTF-8";
RETICULATE_PYTHON = "${pkgs.python313}/bin/python";
buildInputs = [ rixpress pyconf system_packages ];
};
in
{
inherit pkgs shell;
}
You can change the date at the top to a more recent date to benefit from fresher packages.
If you plan to use uv to manage Python packages, remove the pyconf block completely, and
replace python313 with uv in the system_packages block.
Basic usage examples
Create a pipeline as an R script:
library(rixpress)
list(
rxp_py_file(
name = dataset_np, # Keep name indicating NumPy array
path = "data/pima-indians-diabetes.csv",
read_function = "lambda x: loadtxt(x, delimiter=',')"
),
rxp_py(
name = X,
expr = "dataset_np[:,0:8]"
),
rxp_py(
name = Y,
expr = "dataset_np[:,8]"
),
rxp_py(
name = splits,
expr = "train_test_split(X, Y, test_size=0.33, random_state=7)"
),
# Extract X_train (index 0)
rxp_py(
name = X_train,
expr = "splits[0]"
),
# Extract X_test (index 1)
rxp_py(
name = X_test,
expr = "splits[1]"
),
# Extract y_train (index 2)
rxp_py(
name = y_train,
expr = "splits[2]"
),
# Extract y_test (index 3)
rxp_py(
name = y_test,
expr = "splits[3]"
),
rxp_py(
name = model,
expr = "XGBClassifier(use_label_encoder=False, eval_metric='logloss').fit(X_train, y_train)"
),
rxp_py(
name = y_pred,
expr = "model.predict(X_test)"
),
# Combine the y_test and y_pred vectors to export to csv
# This will be done used in an R environment by yardstick::conf_mat
rxp_py(
name = combined_df,
expr = "DataFrame({'truth': y_test, 'estimate': y_pred})"
),
rxp_py(
name = combined_csv,
expr = "combined_df",
user_functions = "functions.py",
encoder = "write_to_csv"
),
# yardstick::conf_mat needs factor variables
rxp_r(
combined_factor,
expr = mutate(
combined_csv,
across(.cols = everything(), .fns = factor)
),
decoder = "read.csv"
),
rxp_r(
name = confusion_matrix,
expr = conf_mat(
combined_factor,
truth,
estimate
)
),
rxp_py(
name = accuracy,
expr = "accuracy_score(y_test, y_pred)"
)
) |>
rxp_populate(build = FALSE) # Need to set to FALSE because we
# adjust imports first
adjust_import(
"import numpy",
"from numpy import array, loadtxt"
)
adjust_import("import xgboost", "from xgboost import XGBClassifier")
adjust_import(
"import sklearn",
"from sklearn.model_selection import train_test_split"
)
add_import("from sklearn.metrics import accuracy_score", "default.nix")
add_import("from pandas import DataFrame", "default.nix")
Start a Python session and:
from ryxpress import rxp_make
rxp_make()
This will build the pipeline.
Note on formats:
rxp_read/rxp_loadwill trypickle.loadfirst- If pickle fails
rxp_read/rxp_loadwill attempt to use the optional rds2py package (if present) to parse the file. This will load serialized R objects. - If neither loader succeeds, the function returns the path(s).
Inspect builds and outputs
rxp_inspectinspects the project build logs and helps resolve derivation outputs.rxp_copycopies artifacts from/nix/storeinto your working directory for inspection.rxp_gchelps manage cache/cleanup of local artifacts.
Docs and API reference (developer docs)
This repository uses MkDocs + mkdocstrings to generate documentation and an autogenerated API reference from the package docstrings.
Contributing
Contributions are welcome. When contributing, please:
- Provide small, focused, and runnable examples.
- Prefer small datasets and short-running examples for tests/docs.
- Document any system-level dependencies for examples in a default.nix so the pipeline can be reproduced.
If you are unsure about a change, open an issue to discuss before submitting a
PR. See CONTRIBUTING.md for guidelines (if present).
Scope
The Python port focuses on the same “micropipeline” use case: single-machine pipelines for small-to-medium projects where Nix provides reproducible builds. It aims to mirror the user experience of the R package where practical, but it is not a drop-in replacement for all R-specific workflows. See the docs for current feature coverage and examples.
Examples & demos
See the examples and demos in the companion repository: https://github.com/b-rodrigues/rixpress_demos
License
This project is licensed under the GNU General Public License v3.0 (GPL-3.0). See LICENSE for details.