Skip to content

Managing Packages in RStudio Team#

Managing open source packages for data science work spans several different environments and, often, teams. For that reason, deeply understanding package management is difficult.

This page is designed to help teams figure out how to manage the packages in their environment with a minimum of conceptual learning.

For those who want a deeper conceptual treatment, please check out our environments website, and our two webinars on package management.

Package Management Overview#

Packages are installed from repositories to libraries.

In RStudio Team, there are three components to managing packages:

  1. An IT/admin configures RStudio Package Manager as the centralized package repository for RStudio Team (or chooses not to)
  2. An IT/admin configures the default package settings on RStudio Server Pro
  3. Individual data scientists manage the package libraries for their particular projects

Most teams find that adopting this model simplifies package management for admins and data scientists alike, including in environments with strong security and validation requirements.

1. Configuring Repositories#

RStudio Package Manager, one of the components of RStudio Team, is RStudio's repository for R and Python packages.

A private RStudio Package Manager instance is a requirement to successfully run RStudio Team when:

  • The environment is offline or air-gapped so RStudio Server Pro and RStudio Connect will not have direct internet access to public RStudio Package Manager
  • Packages must be validated into the environment
  • Data scientists are developing private packages for internal use

In most organizations, RStudio Package Manager is configured and administered by an IT/admin who has SSH access to the server. In some teams, an IT/admin sets up the RStudio Package Manager server and data scientists are responsible for managing the actual package sets present.

RStudio Package Manager can host one or more repositories that include public CRAN packages and private packages, as well as BioConductor and PyPI repositories. Many organizations are unsure of what repository configuration is right for them. The flow chart below is designed to help teams figure out which repository configuration is best for them.

Click on dark blue for relevant documentation.

Configure Repositories
Admin
Configure Repositories...
Yes
Yes
No
No
Environment offline?
Need to validate packages?
Private packages?
Environment offline?...
Public RStudio Package Manager
Public RStudio Pa...
Use Private RStudio Package Manager
Use Private RStudio Pa...
No
No
Yes
Yes
Have Private Packages
Have Private Packages
Configure local package source
Configure local pa...
No
No
Need to validate packages?
Need to validate...
Yes
Yes
No
No
Different package sets per R version?
Different package sets per R...
Multiple Curated CRAN repos
Multiple Curated...
Single full CRAN Repo
Single full CRAN...
Single Curated CRAN Repo
Single Curated CR...
RStudio Package Manager Configuration
RStudio Package Manag...
Combine as needed with...
Combine as needed wit...
Viewer does not support full SVG 1.1

2. Set RStudio Server Pro Defaults#

Setting a Default Repository on RStudio Server Pro#

Once the RStudio Package Manager is configured, server admins should configure it as the default repository(ies) on RStudio Server Pro.

For more information on how to actually set the default repository in RStudio Server Pro, please see this article

Installing Base Package Sets#

Admins frequently ask whether they can install base package sets for all users.

This is possible, but is usually unnecessary.

Once configuring an appropriate default repository, standard R install.packages and Python pip install commands will install from the correct repositories.

The main reason to install a base package set is to reduce duplicate package installs across users. Package sizes tend to be modest, so this is rarely an issue in practice.1

Should your organization decide to do server-wide package installs, they can be accomplished by doing standard installs in both R and Python as a sudo user. Packages must be installed per version of R/Python.

For example, after SSH-ing into the server, an admin could do

$ sudo /opt/R/3.6.2/bin/R
followed by
> install.packages("my-pkg")

In Python, this would be done directly with the pip utility

$ sudo /opt/python/3.7.3/bin/pip install my-pkg

3. Manage Libraries#

Once admins have properly configured default repositories on RStudio Server Pro, normal package installs should just work.

Increasingly, Data Scientists are snapshotting and restoring libraries on a per-project basis, which allows for project-level dependency isolation.

Project-level isolation can be achieved with the renv package in R and using virtualenv in Python.2

This virtual environment workflow makes it easy to create isolated environments on a per-project basis without doing repetitive package installs, and also allows for easy sharing of project dependencies across data scientists. For more information on how it works, see this page.

Frequently Asked Questions#

These are common questions from IT/Admins about configuring package management for RStudio Team.

What about system requirements?

Most R and Python packages have no system dependencies other than the language itself.

However, some packages depend on separate external libraries.

One of the benefits of using RStudio Server Pro instances is that these system libraries only have to be installed a few times rather than on each user's laptop.

RStudio Package Manager provides a list of required system libraries and install commands at both the package and the repository level.

To see the requirements for an individual package, search for the package in the search bar.

To see the requirements for a whole repository, click on the setup tab for that repository and scroll down. Choose your OS to get the relevant install commands.

gif of getting system requirements for a repo

What if I need to validate packages into my environment?

RStudio Package Manager allows for the creation of curated package sets, which can be validated before they are made available to users. Details on how this works are in the RStudio Package Manager admin guide.

Admins then often wonder how to lock users into those package sets.

The best way to accomplish this is to

  1. Set the right default repository in RStudio Server Pro
  2. Disallow access to other repositories as needed

Fully disallowing access to public repositories can be accomplished via networking rules. There is no way to disallow RStudio Server users from changing their repositories, but networking rules can prevent those repositories from being accessed.

It is also possible to disallow changing the installation repository in the RStudio GUI by setting the allow-r-cran-repos-edit = 0 in /etc/rstudio/rsession.conf.

What if my organzation requires offline/air-gapped operations?

This is one of the reasons RStudio Package Manager exists.

RStudio Server Pro and RStudio Connect need access to a package repository to install packages. There are no other internet connectivity requirements for the products themselves.

Diagram of RStudio Team networking

If possible, we recommend allowing outbound access from RStudio Package Manager to the RStudio sync service, to make sync-ing new packages easier. RStudio Package Manager does include utilities for fully offline operation.

Are there special considerations for RStudio Connect deployments?

When a deployment to RStudio Connect occurs, a package manifest is generated, whether implicitly (push-backed publishing) or explicitly (git- or api-backed publishing). This manifest captures the current state of the packages in the deploy environment, including the original installation repository.

If deployments are happening exclusively from RStudio Server Pro, usually no further configuration is needed.

If deployments are happening from desktop environments, it is worthwhile to configure the RPackageRepository setting to point to a binary package repository for the server's OS on RStudio Package Manager.

Any system libraries that need to be installed to RStudio Server Pro will need to be installed on the RStudio Connect server as well.

Eagle-eyed users may notice that RStudio Connect uses packrat, renv's predecessor, to deploy R packages to RStudio Connect. This build of packrat is heavily customized, and using renv to manage project environments is entirely consistent with RStudio Connect's deployment process.

How do I get the repository URL from RStudio Package Manager?

Getting the right repository from RStudio Package Manager is a 4 step process.

  1. Select the correct repo and navigate to the Setup tab.
  2. Switch the repo type from Source to Binary and select the correct OS.
  3. If needed, choose the date for the snapshot.
  4. Copy the URL from the box.

gif of choosing repository


Footnotes


  1. It used to be the case that binary R packages were unavailable on Linux, so package installs took a very long time on RStudio Server Pro and RStudio Connect and there were many compile-time package dependencies. Now that both public and private RStudio Package Manager makes these binaries available, these issues are much reduced. 

  2. There are many virtual environment managers in Python, and you should use the one that is standard for your organization. If your organization doesn't have a standard, we have seen virtualenv/venv work for many.