3. User Environments


In this section, you will learn:


Environment Needs from Data Scientists

Administrators will need to partners with the data science team to establish and maintain the necessary environments for development in Workbench. Initial requirements will include:

  • Having one or more versions of the R or Python programming language installed on the server
  • Establishing source(s) for downloading packages
  • Establishing location(s) for installed packages
  • Ensuring any underlying system dependencies required for package installation or runtime are installed on the server

As new versions of R, Python, and packages are released, the data science team may have requirements to add these to the server, while still maintaining previous environments to support any historical work.

Workbench allows you to serve multiple version of R and Python. It is important to note that when you need a new version of R or Python, add the new version and leave existing versions in place. This will allow developers to choose which version to run for each session and ensure that scripts that depend on a specific version will still run.

Adding new versions of R and Python

We recommend you install desired versions of R and Python on Workbench by following the instructions here:

Managing R and Python packages

The strength of the R and Python languages are the packages that extend core functionality. Packages are specialized modules of code built for their specific language. Packages frequently contain dependencies to other packages and sometimes have system-level dependencies. Packages are updated independently from the version of R or Python.

To provide packages to your users, you will need to partner with the data science team to address two key questions:

  • Where are packages installed from? (Repository)
  • Where do installed packages go on the server? (Library)

Repositories: Where packages are installed from

Repositories are file servers with a defined structure for R or Python packages. A repository may be public or hosted privately within your organization. Common sources for repositories include:

In general, a repository should be comprehensive, offering many packages and many package versions, to service the needs of developers.

R Package Installation

Users typically install R packages inside an R session using a function like install.packages().

Users can specify the address of the desired repository source for the package in the function. If that is not specified, the package will be installed from the first repository configured on the server where the package is found. You can list the repositories configured in the server by running in R:

options('repos')

As the administrator, you can configure the default repositories for R. You may choose to do this because:

  • You have your own internal or validated repository (e.g., Posit Package Manager)
  • You want users to install the pre-compiled package binaries from Posit Public Package Manager or an on-premise Posit Package Manager (the advantage of binaries are significantly faster installation times and avoidance of compilation errors due to missing system dependencies)

There are three options for where default R repositories can be configured on Workbench:

File Applies to… Use
/etc/rstudio/rsession.conf All RStudio Pro sessions on the server Configuring a single default repo server-wide
/etc/rstudio/repos.conf All RStudio Pro sessions on the server Configuring multiple default repo server-wide
Rprofile.site or Renviron.site All R sessions for a specific version of R Configuring different repos per R version or setting a default repo per version of R across all R sessions.
Warning

Users can override where their package installs come from in three places:

  1. Using their own .Rprofile.
  2. Changing settings in the RStudio Pro IDE.
  3. Changing the value in the R console.

The only way to definitively control where users can download packages from is by restricting the Workbench server’s connectivity and only allowing access to an on-premise repository such as Posit Package Manager.

Python packages

Pip is a package manager for python that handles the installation of packages. In contrast to R, pip is called from the terminal instead of within the Python REPL. The default repository for installations is set inside the pip.conf file. The pip.conf file can exist in several locations:

  • /etc/pip.conf for global settings
  • /opt/python/3.10.4/pip.conf for site settings
  • /home/$USER/.pip/pip.conf for user-specific settings
  • /home/$USER/.config/pip/pip.conf for user-specific settings

You can see these settings and confirm their location by running:

pip config -v list

You may want to customize the install source for Python for similar reasons to those listed for R above.

Libraries: Where packages are installed to

The package installation process downloads a package from the repository and places the package in a library for use. Where a repository is ideally comprehensive with many packages and versions, a library is deliberately more narrow. For any one version of R or Python, there can only be one version of a package within that library.

On the Workbench server, a library can exist at the system level, at the user level, and even at the project level. Determining the appropriate strategy for how and where packages are installed requires partnership with the data science team. There are multiple successful patterns that are described on the Solutions Site under the Environments Management Strategy Map.

A System library makes packages available server-wide after installation by a sudo user. The library install is specific to a version of R or Python, and there can only be one version of a package in that library. A few pros and cons of having packages in a system library:

Pros:

  • Reduces duplication of popular packages

  • Provides a base environment of working packages

  • Good for locked-down environments and more static environments

Cons:

  • Can be time-intensive for admin to manage

  • Challenging to upgrade safely

  • Unlikely to meet all package needs with one library as requirements vary across users, projects, and time

Unless access to external package repositories is locked down, users will be able to install packages in addition to the system library. By default, these packages will be installed into the user library in the user’s home directory. For example in R the user library could have a path that may look like:

/home/user_name/R/x86_64-pc-linux-gnu-library/4.2

For Python it might look like:

/home/user_name/.local/lib/python3.10/site-packages

In a running R session you can see where R packages will be installed by running .libPaths(). By default, R will install a package into the first directory given by .libPaths() that it can write to.

In Python you can find the path a package is installed at by running pip show package_name

Project libraries enable data scientists to better isolate projects. The developer can use a package such as renv for R or venv or virtualenv for Python to manage project-specific libraries.

System Dependencies

Some packages require specific operating system or third-party software. If software (OS or third-party) is missing, package installation will fail until you install the dependencies. There are a few ways you may be able to discover this:

  • Review requirements listed in package documentation

  • Check system dependencies listed for the package in Posit Package Manager

  • Install the package and review the error for any insight on missing dependencies

If you or a user are having trouble installing a package, confirming you have all dependencies installed is one great place to start troubleshooting. Using pre-compiled package binaries from Posit Public Package Manager or an on-premise Posit Package Manager repository will avoid system dependency compilation errors.

Lab 2

🚀 Launch the exercise environment!

In the exercise environment you will get experience:

  • adding new versions of R and Python

  • configuring default sources for package installs


Go to: 4. Data Access

Back to top