Deploying Data for Content On RStudio Connect#
Apps and reports on RStudio Connect usually rely on one or more data
sets. Depending on the resources your organization has and the use case,
it might make sense to store that data in a database, a flat file like a
csv, or some other kind of interface.
This article will give an overview in terms of how to update data for an app or report on RStudio Connect.
Data in the App Bundle#
If your data lives inside your app directory and is only updated as often as the app code, you can upload your data with your code and access it with a relative file path.
In this configuration, your code would access the data at the same relative path regardless of whether the code is being developed locally or is deployed to RStudio Connect.
The main limitation to including the data in the RStudio Connect bundle is that you must deploy everything (data and code) at the same time.
If you update the data more often than the app code, use a different method.
Data Outside the App Bundle#
If your data is updated more frequently than your app, the data can live in a database, in a pin on RStudio Connect, in a separate directory on RStudio Connect, or be accessed using other means.
Apps, reports, and APIs on RStudio Connect can pull data from a live connection to a database.
If you're publishing an app or API, you might want all of the data pulled in at start up time or you might want a live connection that will pull data as input is received from the user.1 Either of these patterns can work.
If you are pulling data directly from a database into a Shiny app, you'll want to consider how you will protect your credentials. We have recommendations for these topics and more at db.rstudio.com.
Pins is an R package that allows for easy storage and retrieval of data, models, and other R objects. Pins can be a good choice when you don't have write access to a database or when the data you're trying to save is something like a model that won't fit nicely into most databases.
You can create a pin with the
pins::pin_write command and retrieve the
pins::pin_read. A major benefit of pins is that your code
won't have to change at all when you deploy -- the read and write
commands will work in both the IDE during development and on RStudio
The pins page has more details on how to use pins.
Here is an example of how to use pins with either a Shiny app or Plumber API.
Directory on RStudio Connect#
Content on RStudio Connect can use the server's file system to store data. This option is usually a last resort because it requires SSH access to the server for setup, and often requires code changes between the IDE and RStudio Connect. For very large files, it may be the only option.
Content running on RStudio Connect is sandboxed within the server, so you must use an absolute path to access data, and must manually ensure that the relevant directory has the proper read/write permissions.2
One potential difficulty in using data from persistent storage is that
the data path will probably change between the RStudio IDE and RStudio
Connect, unless the directory is in the same location on both computers.
You can use the
to have different paths in the development and deployed environment.
Other Methods for Accessing Data#
Data could be stored in a separate system accessible through an underlying API (Application Programming Interface). In this case authentication/access is handled on a system by system basis.
Developers can interact directly with an API using packages such as httr and jsonlite in R. For some APIs, in order to make them more accessible, the interface is bundled as a package for developers to use. Some examples include Microsoft365R (see our additional writeup here), spotifyr, rtweet, googlesheets4, rdrop2, aws.s3, and many more. For up to date information on the best practices for how to use a package see it's documentation.
Developers can also build and deploy their own APIs for handling integrations, for example using Plumber in R or Flask in Python, that can then be deployed and accessed on Connect. We've included a Plumber example in our documentation here and a Flask example here.
Special Considerations for Shiny Apps#
Pull Data on a Scheduler#
Shiny apps work entirely on a "pull" model, so Shiny will need to check if the data is updated, as opposed to the new data "pushing" itself into the app.
If you want to write a general data-pulling function,
allows you to periodically check a resource for changes and run an
arbitrary function if it has. You can also use
to invalidate a reactive on a schedule.
There are also several useful functions that are specially designed to
schedule data refreshes. If you're using a pin,
allows you to check for updates to a pin on a schedule. Similarly,
allows you to check for updates to a file on persistent storage.
Speedy data loading#
Regardless of how and where you load data, faster is always better. Some tips to speed load times for Shiny apps:
Do less! Reducing the amount of data transmitted is the best way to reduce load times. Often, pre-aggregating data will prove sufficient for quick data loads.
Use Shiny Scoping + RStudio Connect Settings Content outside a Shiny
server block (i.e. in
global) will only be loaded once per
process on RStudio Connect. Setting RStudio Connect's
to 1 or greater will ensure that data is loaded before the first user
The user who runs the content will need access. By default this is the
rstudio-connectuser. You can check which user a piece of content will run as under the
Accesstab on the content in RStudio Connect under
Who runs this content on the server. ↩
Reactive content only runs inside Shiny server blocks. If you want to pre-load data and have the data respond to user input or reactive polling, you can load the data in the global context and then update it using the global assignment operator (