1  Reproducibility with Nix

1.1 Learning Outcomes

By the end of this chapter, you will:

  • Understand the need for environment reproducibility in modern workflows
  • Install Nix
  • Use {rix} to generate default.nix files
  • Build cross-language environments for data work or software development

1.2 Why Reproducibility? Why Nix? (2h)

1.2.1 Motivation: Reproducibility in Scientific and Data Workflows

To ensure that a project is reproducible you need to deal with at least four things:

  • Make sure that the required/correct version of R (or any other language) is installed;
  • Make sure that the required versions of packages are installed;
  • Make sure that system dependencies are installed (for example, you’d need a working Java installation to install the rJava R package on Linux);
  • Make sure that you can install all of this for the hardware you have on hand.

But in practice, one or most of these bullet points are missing from projects. The goal of this course is to learn how to fullfill all the requirements to build reproducible projects.

1.2.2 Problems with Ad-Hoc Tools

Tools like Python’s venv or R’s renv only deal with some pieces of the reproducibility puzzle. Often, they assume an underlying OS, do not capture native system dependencies (like libxml2, pandoc, or curl), and require users to “rebuild” their environments from partial metadata. Docker helps but introduces overhead, security challenges, and complexity.

Traditional approaches fail to capture the entire dependency graph of a project in a deterministic way. This leads to “it works on my machine” syndromes, onboarding delays, and subtle bugs.

1.2.3 Nix, a declarative package manager

Nix is a tool for reproducible builds and development environments, often introduced as a package manager. It captures complete dependency trees, from your programming language interpreter to every system-level library you rely on. With Nix, environments are not recreated from documentation, but rebuilt precisely from code.

Nix can be installed on Linux distributions, macOS and it even works on Windows if you enable WSL2. In this course, we will use Nix mostly as a package manager (but towards the end also as a build automation tool).

What’s a package manager? If you’re not a Linux user, you may not know. Let me explain it this way: in R, if you want to install a package to provide some functionality not included with a vanilla installation of R, you’d run this:

install.packages("dplyr")

It turns out that Linux distributions, like Ubuntu for example, work in a similar way, but for software that you’d usually install using an installer (at least on Windows). For example you could install Firefox on Ubuntu using:

sudo apt-get install firefox

(there’s also graphical interfaces that make this process “more user-friendly”). In Linux jargon, packages are simply what we call software (or I guess it’s all “apps” these days). These packages get downloaded from so-called repositories (think of CRAN, the repository of R packages) but for any type of software that you might need to make your computer work: web browsers, office suites, multimedia software and so on.

So Nix is just another package manager that you can use to install software.

But what interests us is not using Nix to install Firefox, but instead to install R and the R packages that we require for our analysis (or any other programming language that we need). But why use Nix instead of the usual ways to install software on our operating systems?

The first thing that you should know is that Nix’s repository, nixpkgs, is huge. Humongously huge. As I’m writing these lines, there’s more than 120’000 pieces of software available, and the entirety of CRAN and Bioconductor is also available through nixpkgs. So instead of installing R as you usually do and then use install.packages() to install packages, you could use Nix to handle everything. But still, why use Nix at all?

Nix has an interesting feature: using Nix, it is possible to install software in (relatively) isolated environments. So using Nix, you can install as many versions of R and R packages that you need. Suppose that you start working on a new project. As you start the project, with Nix, you would install a project-specific version of R and R packages that you would only use for that particular project. If you switch projects, you’d switch versions of R and R packages.

However Nix has quite a steep learning curve, so this is why for the purposes of this course we are going to use an R package called {rix} to set up reproducible environments.

1.2.4 The rix package

The idea of {rix} is for you to declare the environment you need using the provided rix() function. rix() is the package’s main function and generates a file called default.nix which is then used by the Nix package manager to build that environment. Ideally, you would set up such an environment for each of your projects. You can then use this environment to either work interactively, or run R or Python scripts. It is possible to have as many environments as projects, and software that is common to environments will simply be re-used and not get re-installed to save space. Environments are isolated for each other, but can still interact with your system’s files, unlike with Docker where a volume must be mounted. Environments can also interact with the software installed on your computer through the usual means, which can sometimes lead to issues. For example, if you already have R installed, and a user library of R packages, more caution is required to properly use environments managed by Nix.

You don’t need to have R installed or be an R user to use {rix}. If you have Nix installed on your system, it is possible to “drop” into a temporary environment with R and {rix} available and generate the required Nix expression from there.

But first, let’s install Nix and try to use temporary shells.

1.2.5 Installing Nix

If you are on Windows, you need the Windows Subsystem for Linux 2 (WSL2) to run Nix. If you are on a recent version of Windows 10 or 11, you can simply run this as an administrator in PowerShell:

wsl --install

You can find further installation notes at this official MS documentation.

I recommend to activate systemd in Ubuntu WSL2, mainly because this supports other users than root running Nix. To set this up, please do as outlined this official Ubuntu blog entry:


# in WSL2 Ubuntu shell

sudo -i
nano /etc/wsl.conf

This will open the /etc/wsl.conf in a nano, a command line text editor. Add the following line:

[boot]
systemd=true

Save the file with CTRL-O and then quit nano with CTRL-X. Then, type the following line in powershell:

wsl --shutdown

and then relaunch WSL (Ubuntu) from the start menu. For those of you running Windows, we will be working exclusively from WSL2 now. If that is not an option, then I highly recommend you set up a virtual machine with Ubuntu using VirtualBox for example, or dual-boot Ubuntu.

Installing (and uninstalling) Nix is quite simple, thanks to the installer from Determinate Systems, a company that provides services and tools built on Nix, and works the same way on Linux (native or WSL2) and macOS.

Do not use your operating system’s package manager to install Nix. Instead, simply open a terminal and run the following line (on Windows, if you cannot or have decided not to activate systemd, then you have to append --init none to the command. You can find more details about this on The Determinate Nix Installer page):

curl --proto '=https' --tlsv1.2 -sSf \
    -L https://install.determinate.systems/nix | \
     sh -s -- install

Then, install the cachix client and configure the rstats-on-nix cache: this will install binary versions of many R packages which will speed up the building process of environments:

nix-env -iA cachix -f https://cachix.org/api/v1/install

then use the cache:

cachix use rstats-on-nix

You only need to do this once per machine you want to use {rix} on. Many thanks to Cachix for sponsoring the rstats-on-nix cache!

1.2.6 Temporary shells

You now have Nix installed; before continuing, it let’s see if everything works (close all your terminals and reopen them) by droping into a temporary shell with a tool you likely have not installed on your machine.

Open a terminal and run:

which sl

you will likely see something like this:

which: no sl in ....

now run this:

nix-shell -p sl

and then again:

which sl

this time you should see something like:

/nix/store/cndqpx74312xkrrgp842ifinkd4cg89g-sl-5.05/bin/sl

This is the path to the sl binary installed through Nix. The path starts with /nix/store: the Nix store is where all the software installed through Nix is stored. Now type sl and see what happens!

You can find the list of available packages here.

1.3 Session 1.2 – Dev Environments with Nix (2h)

1.3.1 Some Nix concepts

While temporary shells are useful for quick testing, this is not how Nix is typically used in practice. Nix is a declarative package manager: users specify what they want to build, and Nix takes care of the rest.

To do so, users write files called default.nix that contain the a so-called Nix expression. This expression will contain the definition of a (or several) derivations.

In Nix terminology, a derivation is a specification for running an executable on precisely defined input files to repeatably produce output files at uniquely determined file system paths. (source)

In simpler terms, a derivation is a recipe with precisely defined inputs, steps, and a fixed output. This means that given identical inputs and build steps, the exact same output will always be produced. To achieve this level of reproducibility, several important measures must be taken:

  • All inputs to a derivation must be explicitly declared.
  • Inputs include not just data files, but also software dependencies, configuration flags, and environment variables, essentially anything necessary for the build process.
  • The build process takes place in a hermetic sandbox to ensure the exact same output is always produced.

The next sections of this document explain these three points in more detail.

1.3.2 Derivations

Here is an example of a simple Nix expression:

let
 pkgs = import (fetchTarball "https://github.com/rstats-on-nix/nixpkgs/archive/2025-04-11.tar.gz") {};

in

pkgs.stdenv.mkDerivation {
  name = "filtered_mtcars";
  buildInputs = [ pkgs.gawk ];
  dontUnpack = true;
  src = ./mtcars.csv;
  installPhase = ''
    mkdir -p $out
    awk -F',' 'NR==1 || $9=="1" { print }' $src > $out/filtered.csv
  '';
}

I won’t go into details here, but what’s important is that this code uses awk, a common Unix data processing tool, to filter the mtcars.csv file to keep only rows where the 9th column (the am column) equals 1. As you can see, a significant amount of boilerplate code is required to perform this simple operation. However, this approach is completely reproducible: the dependencies are declared and pinned to a specific dated branch of our rstats-on-nix/nixpkgs fork (more on this later), and the only thing that could make this pipeline fail (though it’s a bit of a stretch to call this a pipeline) is if the mtcars.csv file is not provided to it. This expression can be instantiated into a derivation, and the derivation is then built into the actual output that interests us, namely the filtered mtcars data.

The derivation above uses the Nix builtin function mkDerivation: as its name implies, this function makes a derivation. But there is also mkShell, which is the function that builds a shell instead. Nix expressions that built a shell is the kind of expressions {rix} generates for you.

1.3.3 Using {rix} to generate development environments

If you have successfully installed Nix, but don’t have yet R installed on your system, you could install R as you would usually do on your operating system, and then install the {rix} package, and from there, generate project-specific expressions and build them. But you could also install R using Nix. Running the following line in a terminal will drop you in an interactive R session that you can use to start generating expressions:

nix-shell -p R rPackages.rix

This will drop you in a temporary shell with R and {rix} available. Navigate to an empty directory to help a project, call it rix-session-1:

mkdir rix-session-1

and start R and load {rix}:

R
library(rix)

you can now generate an expression by running the following code:

rix(
  date = "2025-06-02",
  r_pkgs = c("dplyr", "ggplot2"),
  py_conf = list(
    py_version = "3.13",
    py_pkgs = c("polars", "great-tables")
  ),
  ide = "positron",
  project_path = ".",
  overwrite = TRUE
)

This will write a file called default.nix in your project’s directory. This default.nix contains a Nix expression which will build a shell that comes with R, {dplyr} and {ggplot2} as they were on the the 2nd of June 2025 on CRAN. This will also add Python 3.13 and the ploars and great-tables Python packages as they were at the time in nixpkgs (more on this later). Finally, this also add the Positron IDE, which is a fork of VS Code for data science. This is just an example, and you can use another IDE if you wish. See this vignette for learning how to setup your IDE with Nix.

1.3.4 Using nix-shell to Launch Environments

Once your file is in place, simply run:

nix-shell

This gives you an isolated shell session with all declared packages available. You can test code, explore APIs, or install further tools within this session.

To remove the packages that were installed, call nix-store --gc. This will call the garbage collector. If you want to avoid that an environment gets garbage-collected, use nix-build instead of nix-shell. This will create a symlink called result in your project’s root directory and nix-store --gc won’t garbage-collect this environment until you manually remove result.

1.3.5 Pinning with nixpkgs

To ensure long-term reproducibility, pin the version of Nixpkgs used. Replace <nixpkgs> with a fixed import:

let
  pkgs = import (fetchTarball "https://github.com/rstats-on-nix/nixpkgs/archive/2025-06-02.tar.gz") {};
in
pkgs.mkShell {
  buildInputs = [ pkgs.r pkgs.rPackages.dplyr ];
}

This avoids unexpected updates and lets others reproduce your environment exactly.

1.4 Configuring your IDE

We now need to configure an IDE to use both our Nix shells as development environments, and GitHub Copilot. You are free to use whatever IDE you want but the instructions below are going to focus on RStudio, VS Code and Positron.

The following are the setups we recommend you use to work using an IDE and Nix environments. To be recommended, a setup should:

  • be easy to setup;
  • work the same on any operating system;
  • not require any type of special maintenance.

Regardless of your operating system, a general-purpose editor such as VS Code (or Codium), Emacs, or Neovim meets the above requirements. Recent releases of Positron also work quite well. (Note: Neovim is not covered here due to lack of experience—PRs welcome!) However, some editors perform better on certain platforms.

Also, we recommend you uninstall R if it’s installed system-wide and also remove your local library of packages and instead only use dedicated Nix shells to manage your projects. While we made our possible for Nix shells to not interfere with a system-installed R, we recommend users go into the habit of taking some minutes at the start of a project to properly set up their development environment.

1.4.4 RStudio

RStudio must be installed by Nix in order to see and use Nix shells. So you cannot use the RStudio already installed on your computer to work with Nix shells. This means you need to set ide = "rstudio" if you wish to use RStudio.

You should also be aware that there is currently a bug in the RStudio Nix package that makes RStudio ignore project-specific .Rprofile files, which can be an issue if you also have a system-level library of packages. Instead, you can sure the .Rprofile generated by rix() yourself or you can uninstall the system-level R and library of packages.

1.4.4.1 RStudio on macOS

To use RStudio on macOS simply use ide = "rstudio", but be aware that this will only work for R version 4.4.3 at least, or for a date on or after the 2025-02-28. If you don’t need to work with older versions of R or older date, RStudio is an appropriate choice. Then, build the environment using nix-build and drop into the shell using nix-shell. Then, type rstudio to start RStudio. If you wish, you can even put the rstudio command in the shell hook to start it immediately as you run nix-shell.

1.4.4.2 RStudio on Linux or Windows

To use RStudio on Linux or Windows simply use ide = "rstudio". Then, build the environment using nix-build and drop into the shell using nix-shell. Then, type rstudio to start RStudio.

If you plan to use RStudio on Ubuntu, then you need further configuration to make it work, because of newly introduced sandboxing features in Ubuntu 24.04. You will need to create an RStudio-specific AppArmor profile. To do so create this apparmor profile:

sudo nano /etc/apparmor.d/nix.rstudio

Populate it with:

profile nix.rstudio /nix/store/*-RStudio-*-wrapper/bin/rstudio flags=(unconfined) {
    userns,
}

Save it, load the profile and start RStudio:

sudo apparmor_parser -r /etc/apparmor.d/nix.rstudio
sudo systemctl reload apparmor

You can now start RStudio from the activated Nix shell.

On Windows, you need to have WSLg enabled, which should be the case on the latest versions of Windows. If you wish, you can even put the rstudio command in the shell hook to start it immediately as you run nix-shell.

On Linux and WSL, depending on your desktop environment, and for older versions of RStudio, you might see the following error message when trying to launch RStudio:

qt.glx: qglx_findConfig: Failed to finding matching FBConfig for QSurfaceFormat(version 2.0, options QFlags<QSurfaceFormat::FormatOption>(), depthBufferSize -1, redBufferSize 1, greenBufferSize 1, blueBufferSize 1, alphaBufferSize -1, stencilBufferSize -1, samples -1, swapBehavior QSurfaceFormat::SingleBuffer, swapInterval 1, colorSpace QSurfaceFormat::DefaultColorSpace, profile  QSurfaceFormat::NoProfile)
Could not initialize GLX
Aborted (core dumped)

in this case, run the following before running RStudio:

export QT_XCB_GL_INTEGRATION=none

To use GitHub Copilot with RStudio, follow these instructions.

1.4.5 VS Code or Positron

Positron is a fork of VS Code made by Posit and tailored for data science. Henceforth, I will refer to both editors simply as Code.

The same instructions apply whether your host operating system is Linux, macOS or Windows. The first step is of course to install Code on your operating system using the usual means of installing software.

If you’re on Windows, install Code on Windows, not in WSL. Code on Windows is able to interact with WSL seamlessly and before continuing here, please follow these instructions (it’s mostly about installing the right extensions after having installed Positron).

On macOS, start by installing Code using the official .dmg installer. Start Code, and then the command palette using COMMAND-SHIFT-P. In the search bar, type "Install 'positron' command in PATH" and click on it: this will make it possible to start Positron from a terminal.

Once Code is installed, you need to install a piece of software called direnv: direnv will automatically load Nix shells when you open a project that contains a default.nix file in an editor. It works on any operating system and many editors support it, including Code. Follow the instructions for your operating system here but if you’re using Windows, install direnv in WSL (even though you’ve just installed Code for Windows), so follow the instructions for whatever Linux distribution you’re using there (likely Ubuntu), or use Nix to install direnv if you prefer (this is the way I recommend to install it on macOS, unless you already use brew):

nix-env -f '<nixpkgs>' -iA direnv

This will install direnv and make it available even outside of Nix shells!

Then, we highly recommend to install the nix-direnv extension:

nix-env -f '<nixpkgs>' -iA nix-direnv

It is not mandatory to use nix-direnv if you already have direnv, but it’ll make loading environments much faster and seamless. Finally, if you haven’t used direnv before, don’t forget this last step.

Then, in Code, install the direnv extension (and also the WSL extension if you’re on Windows, as explained in the official documentation linked above!). Finally, add a file called .envrc and simply write the following two lines in it:

use nix
mkdir $TMP

in it. On Windows, remotely connect to WSL first, but on other operating systems, simply open the project’s folder using File > Open Folder... and you will see a pop-up stating direnv: /PATH/TO/PROJECT/.envrc is blocked and a button to allow it. Click Allow and then open an R script. You might get another pop-up asking you to restart the extension, so click Restart. Be aware that at this point, direnv will run nix-shell and so will start building the environment. If that particular environment hasn’t been built and cached yet, it might take some time before Code will be able to interact with it. You might get yet another popup, this time from the R Code extension complaining that R can’t be found. In this case, simply restart Code and open the project folder again: now it should work every time. For a new project, simply repeat this process:

  • Generate the project’s default.nix file;
  • Build it using nix-build;
  • Create an .envrc and write the two lines from above in it;
  • Open the project’s folder in Code and click allow when prompted;
  • Restart the extension and Code if necessary.

Another option is to create the .envrc file and write use nix in it, then open a terminal, navigate to the project’s folder, and run direnv allow. Doing this before opening Code should not prompt you anymore.

If you’re on Windows, using Code like this is particularly interesting, because it allows you to install Code on Windows as usual, and then you can configure it to interact with a Nix shell, even if it’s running from WSL. This is a very seamless experience.

Now configure VS Code to use GitHub Copilot, click here or for Positron click here.

1.5 Hands-On Exercises

  1. Start a temporary shell with R and {rix} again using nix-shell -p R rPackages.rix. Start an R session (by typing R) and then load the {rix} package (using library(rix)). Run the available_dates() function: using the latest available date, generate a new default.nix.
  2. Inside of an activated shell, type which R and echo $PATH. Explore what is being added to your environment. What is the significance of paths like /nix/store/...?
  3. Break it on purpose: generate a new environment with a wrong R package name, for example dplyrnaught. Try to build the environment. What happens?
  4. Go to https://search.nixos.org/packages and look for packages that you usually use for your projects to see if they are available.