Using AWS managed File Systems with RStudio Workbench#
In multi-node configurations of RStudio Workbench, shared storage is a requirement for users’ Linux home directories and a convenience for users’ shared data. Commonly, this shared storage is provided by a mounted NFS volume, or AWS EFS (see Using Amazon EFS with RStudio Team). AWS FSx is another managed file storage service available on AWS as an alternative to EFS. It provides options to use Linux-compatible file systems such as Lustre and ZFS.
This article summarizes considerations and provides benchmarking results for AWS FSx for Lustre and AWS FSx OpenZFS against AWS EFS to assist teams in evaluating these options. See Using Amazon EFS (Elastic File System) with RStudio Team for EFS performance testing and configuration recommendations.
Overall, sharing user data and home directories worked well with AWS FSx for Lustre and AWS FSx OpenZFS, and were comparable to EFS. Similar to EFS, AWS FSx OpenZFS is not compatible with RStudio Workbench’s Project Sharing functionality due to a lack of support for access control lists (ACLs), however, AWS FSx for Lustre is compatible. It is important to note that RStudio Workbench is fully compatible with any given file system as long as it supports extended POSIX ACL's. You can find details on specific testing below.
|Performance||Provide shared user data||Provide shared home directories||Compatible with RSW Project Sharing|
|AWS FSx Lustre||✅||✅||✅||✅|
|AWS FSx OpenZFS||✅||✅||✅||❌|
Our Test Environment#
For testing, we used an open source file system performance benchmarking tool called fsbench to evaluate FSx as a shared file system provider for RStudio implementations.
Our RStudio Workbench architecture for this benchmarking uses 2 EC2 instances of type t2.large with the following configurations for both FSx for Lustre and OpenZFS:
|Type||AWS FSx Lustre||AWS FSx OpenZFS|
|Throughput||50 MB/s||64 MB/s|
|Other||Lustre Version: 2.10||Provisioned IOPS: Automatic, Deployment Type: Single AZ|
Using FSx to provide user data in RStudio Workbench#
One reason you may want to use FSx is to serve user data in RStudio Workbench. User data in RStudio Workbench is any data that individuals need to complete their specific tasks. This can include files such as those shared via a shared file system (Google Drive, Sharepoint, etc.), or any data that is not being accessed through a database. You may use FSx mounts to provide external storage space for user data, and for this use case, RStudio Workbench performance will be dependent on FSx settings. If you encounter any issues while trying to access data on FSx with this setup, please reach out to AWS Support.
Using FSx to provide shared home directories in RStudio Workbench#
Another reason you may want to use FSx is to provide user home directories, especially in a High Availability configuration of Workbench where it is a requirement to have shared user home directories across the cluster. RStudio Workbench uses the user home directory as the location for all configuration and project files. The home directory is essential in managing user configuration files and maintaining a consistent experience across the network. In this scenario, RStudio's performance depends on what kind of shared storage is being used for setting up home directories. Our testing showed that FSx for Lustre and OpenZFS had similar performances in installation of R Packages and the read/write of files.
Project Sharing in RStudio Workbench#
Project Sharing is a feature of RStudio Workbench that enables users to work together on RStudio Projects. To use Project Sharing, the directories hosting the projects to be shared must be on a volume that supports Access Control Lists (ACLs). RStudio uses ACLs to grant collaborators access to shared projects; ordinary file permissions are not modified.
Based on our findings, EFS and OpenZFS do not work with project sharing due to their lack of support for ACLs. FSx for Lustre does support ACLs and is compatible for use with RStudio Workbench.
It is important to note that the setup we used for this testing was minimal, and might not reflect the actual architecture (EC2 instance types, FSx throughput, etc.) that you might be considering. We highly encourage you to perform your own benchmarking and acceptance testing. You can use the fsbench tool, which is the same tool we have used in this effort.