Skip to contents

Creates a Nix expression that reads in a file (or folder of data) using Python.

Usage

rxp_py_file(
  name,
  path,
  read_function,
  nix_env = "default.nix",
  copy_data_folder = FALSE,
  env_var = NULL
)

Arguments

name

Symbol, the name of the derivation.

path

Character, the file path to include (e.g., "data/mtcars.shp") or a folder path (e.g., "data"). See details.

read_function

Character, a Python function to read the data, taking one argument (the path).

nix_env

Character, path to the Nix environment file, default is "default.nix".

copy_data_folder

Logical, if TRUE then the entire folder is copied recursively into the build sandbox.

env_var

List, defaults to NULL. A named list of environment variables to set before running the Python script, e.g., c(PYTHONPATH = "/path/to/modules"). Each entry will be added as an export statement in the build phase.

Value

An object of class derivation which inherits from lists.

Details

There are three ways to read in data in a rixpress pipeline: the first is to point directly to a file, for example, rxp_py_file(mtcars, path = "data/mtcars.csv", read_function = pandas.read_csv). The second way is to point to a file but to also include of the files in the "data/" folder (the folder can named something else). This is needed when data is split between several files, such as a shapefile which typically also needs other files such as .shx and .dbf files. For this, copy_data_folder must be set to TRUE. The last way to read in data, is to only point to a folder, and use a function that recursively reads in all data. For example rxp_py_file(many_csvs, path = "data", read_function = 'lambda x: pandas.read_csv(os.path.join(x, os.listdir(x)[0]), delimiter="|")') the provided anonymous function will read all the .csv file in the data/ folder.

See also

Other derivations: rxp_jl(), rxp_py(), rxp_qmd(), rxp_r(), rxp_r_file(), rxp_rmd()

Examples

if (FALSE) { # \dontrun{
  # Read a CSV file with pandas
  rxp_py_file(
    name = pandas_data,
    path = "data/dataset.csv",
    read_function = "pandas.read_csv"
  )

# Read all CSV files in a directory using a
# user defined function
 rxp_py_file(
  name = mtcars_py,
  path = 'data',
  read_function = "read_many_csvs",
  copy_data_folder = TRUE
)
} # }