4 min read

I’m Hong Ooi, data scientist with Microsoft Azure Global, and maintainer of the checkpoint package. The checkpoint package makes it easy for you freeze R packages in time, drawing from the daily snapshots of the CRAN repository that have been archived on a daily basis at MRAN since 2014.

Checkpoint has been around for nearly 6 years now, helping R users solve the reproducible research puzzle. In that time, it’s seen many changes, new features, and, inevitably, bug reports. Some of these bugs have been fixed, while others remain outstanding in the too-hard basket.

Many of these issues spring from the fact that it uses only base R functions, in particular install.packages, to do its work. The problem is that install.packages is meant for interactive use, and as an API, is very limited. For starters, it doesn’t return a result to the caller—instead, checkpoint has to capture and parse the printed output to determine whether the installation succeeded. This causes a host of problems, since the printout will vary based on how R is configured. Similarly, install.packages refuses to install a package if it’s in use, which means checkpoint must unload it first—an imperfect and error-prone process at best.

In addition to these, checkpoint’s age means that it has accumulated a significant amount of technical debt over the years. For example, there is still code to handle ancient versions of R that couldn’t use HTTPS, even though the MRAN site (in line with security best practice) now accepts HTTPS connections only.

I’m happy to announce that checkpoint 1.0 is now in beta. This is a major refactoring/rewrite, aimed at solving these problems. The biggest change is to switch to pkgdepends for the backend, replacing the custom-written code using install.packages. This brings the following benefits:

  • Caching of downloaded packages. Subsequent checkpoints using the same MRAN snapshot will check the package cache first, saving possible redownloads.
  • Allow installing packages which are in use, without having to unload them first.
  • Comprehensive reporting of all aspects of the install process: dependency resolution, creating an install plan, downloading packages, and actual installation.
  • Reliable detection of installation outcomes (no more having to screen-scrape the R window).

In addition, checkpoint 1.0 features experimental support for a checkpoint.yml manifest file, to specify packages to include or exclude from the checkpoint. You can include packages from sources other than MRAN, such as Bioconductor or Github, or from the local machine; similarly, you can exclude packages which are not publicly distributed (although you’ll still have to ensure that such packages are visible to your checkpointed session).

The overall interface is still much the same. To create a checkpoint, or use an existing one, call the checkpoint() function:

library(checkpoint)
checkpoint("2020-01-01")

This calls out to two other functions, create_checkpoint and use_checkpoint, reflecting the two main objectives of the package. You can also call these functions directly. To revert your session to the way it was before, call uncheckpoint().

One difference to be aware of is that function names and arguments now consistently use snake_case, reflecting the general style seen in the tidyverse and related frameworks. The names of ancillary functions have also been changed, to better reflect their purpose, and the package size has been significantly reduced. See the help files for more information.

There are two main downsides to the change, both due to known issues in the current pkgdepends/pkgcache chain:

  • For Windows and MacOS, creating a checkpoint fails if there are no binary packages available at the specified MRAN snapshot. This generally happens if you specify a snapshot that either predates or is too far in advance of your R version. As a workaround, you can use the r_version argument to create_checkpoint to install binaries intended for a different R version.
  • There is no support for a local MRAN mirror (accessed via a file:// URL). You must either use the standard MRAN site, or have an actual webserver hosting a mirror of MRAN.

It’s anticipated that these will both be fixed before pkgdepends is released to CRAN.

You can get the checkpoint 1.0 beta from GitHub:

remotes::install_github("RevolutionAnalytics/checkpoint")

Any comments or feedback will be much appreciated. You can email me directly, or open an issue at the repo.

  • TAGS
  • R