Skip to contents

Introduction

Data pipelines in rixpress often require controlling how objects are stored and restored, especially when dealing with:

  1. Non-standard R objects (e.g., machine learning models, large tables).
  2. Multiple file formats (CSV, qs compressed files, etc.).
  3. Cross-language workflows mixing R and Python.

This vignette focuses on encoding and decoding in R, and on transferring data between R and Python using rxp_py2r() and rxp_r2py().

Custom Encoding and Decoding in R

By default, rixpress uses saveRDS() and readRDS(). You can override this to handle different formats or complex objects:

library(rixpress)

# Encode output as CSV instead of RDS
d2 <- rxp_r(
  mtcars_head,
  my_head(mtcars_am, 100),
  user_functions = "my_head.R",
  nix_env = "default.nix",
  encoder = write.csv
)

# Encode as qs, decode input from CSV
d3 <- rxp_r(
  mtcars_tail,
  my_tail(mtcars_head),
  user_functions = "my_tail.R",
  nix_env = "default2.nix",
  encoder = qs::qsave,
  decoder = read.csv
)

# Decode multiple upstream objects with different decoders
d4 <- rxp_r(
  mtcars_mpg,
  full_join(mtcars_tail, mtcars_head),
  nix_env = "default2.nix",
  decoder = c(
    mtcars_tail = "qs::qread",
    mtcars_head = "read.csv"
  )
)

Key points:

  • encoder controls how this step’s output is stored.
  • decoder specifies how to read inputs from upstream derivations.
  • You can assign different decoders per upstream object using a named vector.

As shown in the examples above, you can pass a function or a string representation of the function to encoder and decoder.

By encoding the object in a cross-language format, it is possible to pass it to another language. For example, read a csv file using Julia, encode it to Arrow and read it back in R:

library(rixpress)

list(
  rxp_jl_file(
    mtcars,
    # Assume here that mtcars.csv is separated by "|" instead of ","
    path = "data/mtcars.csv",
    read_function = "read_csv",
    user_functions = "functions.jl",
    encoder = "write_arrow"
    # read_csv and write_arrow are both
    # defined in the functions.jl script
    # and looks like this:

    #function write_arrow(df::DataFrame, filename::String)
    #    Arrow.write(filename, df)
    #end

    #function read_csv(path::String)
    #    df = CSV.read(path, DataFrame; delim="|")
    #return df
    #end

  ),

  rxp_r(
    mtcars2,
    select(mtcars, am, cyl, mpg),
    decoder = "read_feather"
  )
) |>
  rxp_populate()

You can find this example here. You can use the same approach to transfer data to Python (well, from and to any of the three supported languages).

Cross-Language Data Transfer: R ↔︎ Python

In the specific case of transferring objects (data, lists, vectors, arrays, etc.) between R and Python, it also possible to use reticulate’s built-in conversion by using rxp_py2r() and rxp_r2py(). These functions enable seamless movement of objects between R and Python:

library(rixpress)

# Python step producing pandas DataFrame
d1 <- rxp_py(
  name = mtcars_pl_am,
  expr = "mtcars_pl.filter(polars.col('am') == 1).to_pandas()"
)

# Transfer Python -> R
d2 <- rxp_py2r(
  name = mtcars_am,
  expr = mtcars_pl_am
)

# R step processing the data
d3 <- rxp_r(
  name = mtcars_head,
  expr = my_head(mtcars_am),
  user_functions = "functions.R"
)

# Transfer R -> Python
d3_1 <- rxp_r2py(
  name = mtcars_head_py,
  expr = mtcars_head
)

For this to work, you need to add reticulate to the pipeline’s execution environment.

Summary

  • Use encoder/decoder for non-RDS objects (CSV, qs, Keras models) and to pass data to and from different languages.
  • Explicitly set decoders per upstream object to avoid mismatches.
  • Use rxp_py2r() and rxp_r2py() if you want to re-use reticulate’s bulit-in conversion (useful for more complex objects).