Creates a Nix expression that reads in a file (or folder of data) using Python.
Usage
rxp_py_file(
name,
path,
read_function,
nix_env = "default.nix",
copy_data_folder = FALSE,
env_var = NULL
)
Arguments
- name
Symbol, the name of the derivation.
- path
Character, the file path to include (e.g., "data/mtcars.shp") or a folder path (e.g., "data"). See details.
- read_function
Character, a Python function to read the data, taking one argument (the path).
- nix_env
Character, path to the Nix environment file, default is "default.nix".
- copy_data_folder
Logical, if TRUE then the entire folder is copied recursively into the build sandbox.
- env_var
List, defaults to NULL. A named list of environment variables to set before running the Python script, e.g., c(PYTHONPATH = "/path/to/modules"). Each entry will be added as an export statement in the build phase.
Details
There are three ways to read in data in a rixpress pipeline: the
first is to point directly to a file, for example, rxp_py_file(mtcars, path = "data/mtcars.csv", read_function = pandas.read_csv)
. The second way is to
point to a file but to also include of the files in the "data/" folder (the
folder can named something else). This is needed when data is split between
several files, such as a shapefile which typically also needs other files
such as .shx
and .dbf
files. For this, copy_data_folder
must be set
to TRUE
. The last way to read in data, is to only point to a folder, and
use a function that recursively reads in all data. For example
rxp_py_file(many_csvs, path = "data", read_function = 'lambda x: pandas.read_csv(os.path.join(x, os.listdir(x)[0]), delimiter="|")')
the
provided anonymous function will read all the .csv
file in the data/
folder.
Examples
if (FALSE) { # \dontrun{
# Read a CSV file with pandas
rxp_py_file(
name = pandas_data,
path = "data/dataset.csv",
read_function = "pandas.read_csv"
)
# Read all CSV files in a directory using a
# user defined function
rxp_py_file(
name = mtcars_py,
path = 'data',
read_function = "read_many_csvs",
copy_data_folder = TRUE
)
} # }