Managing Packages in RStudio Team#
Managing open source packages for data science work spans several different environments and, often, teams. For that reason, deeply understanding package management is difficult.
This page is designed to help teams figure out how to manage the packages in their environment with a minimum of conceptual learning.
Package Management Overview#
Packages are installed from repositories to libraries.
In RStudio Team, there are three components to managing packages:
- An IT/admin configures RStudio Package Manager as the centralized package repository for RStudio Team (or chooses not to)
- An IT/admin configures the default package settings on RStudio Workbench
- Individual data scientists manage the package libraries for their particular projects
Most teams find that adopting this model simplifies package management for admins and data scientists alike, including in environments with strong security and validation requirements.
1. Configuring Repositories#
RStudio Package Manager, one of the components of RStudio Team, is RStudio's repository for R and Python packages.
A private RStudio Package Manager instance is a requirement to successfully run RStudio Team when:
- The environment is offline or air-gapped so RStudio Workbench and RStudio Connect will not have direct internet access to public RStudio Package Manager
- Packages must be validated into the environment
- Data scientists are developing private packages for internal use
In most organizations, RStudio Package Manager is configured and administered by an IT/admin who has SSH access to the server. In some teams, an IT/admin sets up the RStudio Package Manager server and data scientists are responsible for managing the actual package sets present.
RStudio Package Manager can host one or more repositories that include public CRAN packages and private packages, as well as BioConductor and PyPI repositories. Many organizations are unsure of what repository configuration is right for them. The flow chart below is designed to help teams figure out which repository configuration is best for them.
Click on dark blue for relevant documentation.
2. Set RStudio Workbench Defaults#
Setting a Default Repository on RStudio Workbench#
Once the RStudio Package Manager is configured, server admins should configure it as the default repository(ies) on RStudio Workbench.
For more information on how to actually set the default repository in RStudio Workbench, please see this article
Installing Base Package Sets#
Admins frequently ask whether they can install base package sets for all users.
This is possible, but is usually unnecessary.
Once configuring an appropriate default repository, standard R
pip install commands will install from the correct repositories.
The main reason to install a base package set is to reduce duplicate package installs across users. Package sizes tend to be modest, so this is rarely an issue in practice.1
Should your organization decide to do server-wide package installs, they can
be accomplished by doing standard installs in both R and Python as a
sudo user. Packages must be installed per version of R/Python.
For example, after SSH-ing into the server, an admin could do
$ sudo /opt/R/3.6.2/bin/R
In Python, this would be done directly with the
$ sudo /opt/python/3.7.3/bin/pip install my-pkg
3. Manage Libraries#
Once admins have properly configured default repositories on RStudio Workbench, normal package installs should just work.
Increasingly, Data Scientists are snapshotting and restoring libraries on a per-project basis, which allows for project-level dependency isolation.
This virtual environment workflow makes it easy to create isolated environments on a per-project basis without doing repetitive package installs, and also allows for easy sharing of project dependencies across data scientists. For more information on how it works, see this page.
Frequently Asked Questions#
These are common questions from IT/Admins about configuring package management for RStudio Team.
What about system requirements?
Most R and Python packages have no system dependencies other than the language itself.
However, some packages depend on separate external libraries.
One of the benefits of using RStudio Workbench instances is that these system libraries only have to be installed a few times rather than on each user's laptop.
RStudio Package Manager provides a list of required system libraries and install commands at both the package and the repository level.
To see the requirements for an individual package, search for the package in the search bar.
To see the requirements for a whole repository, click on the setup tab for that repository and scroll down. Choose your OS to get the relevant install commands.
What if I need to validate packages into my environment?
RStudio Package Manager allows for the creation of curated package sets, which can be validated before they are made available to users. Details on how this works are in the RStudio Package Manager admin guide.
Admins then often wonder how to lock users into those package sets.
The best way to accomplish this is to
- Set the right default repository in RStudio Workbench
- Disallow access to other repositories as needed
Fully disallowing access to public repositories can be accomplished via networking rules. There is no way to disallow RStudio Server users from changing their repositories, but networking rules can prevent those repositories from being accessed.
It is also possible to disallow changing the installation repository in
the RStudio GUI by setting the
allow-r-cran-repos-edit = 0 in
What if my organzation requires offline/air-gapped operations?
This is one of the reasons RStudio Package Manager exists.
RStudio Workbench and RStudio Connect need access to a package repository to install packages. There are no other internet connectivity requirements for the products themselves.
If possible, we recommend allowing outbound access from RStudio Package Manager to the RStudio sync service, to make sync-ing new packages easier. RStudio Package Manager does include utilities for fully offline operation.
Are there special considerations for RStudio Connect deployments?
When a deployment to RStudio Connect occurs, a package manifest is generated, whether implicitly (push-backed publishing) or explicitly (git- or api-backed publishing). This manifest captures the current state of the packages in the deploy environment, including the original installation repository.
If deployments are happening exclusively from RStudio Workbench, usually no further configuration is needed.
If deployments are happening from desktop environments, it is worthwhile to
setting to point to a binary package repository for the server's OS on RStudio Package Manager.
Any system libraries that need to be installed to RStudio Workbench will need to be installed on the RStudio Connect server as well.
Eagle-eyed users may notice that RStudio Connect uses
predecessor, to deploy R packages to RStudio Connect.
This build of
packrat is heavily customized, and using
renv to manage project
environments is entirely consistent with RStudio Connect's deployment process.
How do I get the repository URL from RStudio Package Manager?
Getting the right repository from RStudio Package Manager is a 4 step process.
- Select the correct repo and navigate to the
- Switch the repo type from
Binaryand select the correct OS.
- If needed, choose the date for the snapshot.
- Copy the URL from the box.
It used to be the case that binary R packages were unavailable on Linux, so package installs took a very long time on RStudio Workbench and RStudio Connect and there were many compile-time package dependencies. Now that both public and private RStudio Package Manager makes these binaries available, these issues are much reduced. ↩
There are many virtual environment managers in Python, and you should use the one that is standard for your organization. If your organization doesn't have a standard, we have seen
venvwork for many. ↩