Skip to content

Using FSx for Lustre with RStudio professional products#

AWS FSx is a popular file storage service available as an alternative to EFS. It provides an option to use a linux compatible file system based on the Lustre project, FSx for lustre, as an alternative to EFS and standard NFS.

More and more of our clients are considering or are already using FSx for lustre to provide shared storage in AWS based deployments. As with our recommendations on using EFS, our clients are also looking for some guidance on whether they can use FSx with RStudio professional products. Through this article we will provide details on our testing of FSx for lustre using an open source file system performance benchmarking tool called fsbench, and if FSx can be used as a shared file system provider for RStudio implementations.

Our RStudio Workbench architecture for this benchmarking use 2 EC2 instances of type t2.large and general purpose FSx for lustre with this configuration:

  • Deployment type: Persistent
  • Storage type: SSD
  • Throughput: 50 MB/s/TiB
  • Storage capacity: 1.2TiB
  • Lustre version: 2.10

Note: For this benchmarking, we have only considered FSx for lustre service, and not the other variants. Since FSx for lustre is the only option which is Linux compatible, we would not be able to provide any guidance on the other FSx file system options.

Using FSx to provide user data in RStudio Workbench#

If your requirement is to provide storage space through FSx where only your internal user data would reside, it should be completely fine to use FSx mounts in your RStudio deployment. Since RStudio in this setup is not using FSx to share internal metadata, its performance is not dependent on FSx settings. If there are any issues with accessing data on FSx in this setup, that would fall under the purview of AWS and how your FSx is configured.

Using FSx to provide share home directories in RStudio Workbench#

Another reason for using FSx is to provide user home directories, especially in a High Availability configuration of Workbench where it is a requirement to have shared user home directories across the cluster. In this scenario, RStudio's performance depends on what kind of shared storage is being used for setting up home directories. Thus it is important to consider FSx and its comparison with standard EFS for R related operations.

  • In comparison to EFS, installing R packages like BH and Mass was much faster on FSx.
  • Read and write operations of small(100 MB) to large(1GB) files was comparatively faster for FSx.
  • Running dd command to read/write large data(1GB) was almost similar to EFS

Conclusion and Further work#

Looking at the results, we did not find anything related to R and RStudio operations that do not work with FSx. All basic operations worked fine and comparable to EFS.

It is very important to note that the setup we used for this testing was very minimal, and might not reflect the actual architecture (EC2 instance types, FSx throughput etc) that you might be considering. We highly encourage you to perform your own benchmarking and acceptance testing and you can use the fsbench tool, which is the same tool we have used in this effort.