Updating Data in A Shiny App On RStudio Connect

Shiny apps are often interfaces to allow users to slice, dice, view, visualize, and upload data. The data can be stored in a variety of different ways including a database or csv, rds, or arrow files.

Many Shiny apps are developed using local data files that are bundled with the app code when it’s sent to RStudio Connect. This can be a good architecture for data with infrequently-updated data. However this architecture turns out to be very brittle when the app code is updated on a scheduled basis.

In general, it’s a good idea to separate the data from the app code if the data is frequently updated.

How do I update the data? #

For apps that only consume data, the most common pattern is scheduling an R Markdown document or Jupyter notebook on RStudio Connect. This document should update just the data, not re-deploy the entire app.

If your app also gives users the ability to upload data, consider calling a plumber API from within your Shiny app to update your data.

Where should the data live? #

Bundle the Data with the App Code #

You can add data or other files to the deployment bundle when you deploy your app. This is a good option for reasonably small data files that are seldom updated. If you’re finding that your data files are getting large or that you’re frequently updating the data but not the app code, another strategy will probably work better.

Database #

Databases are a great option for storing the data for your Shiny app.

If you want to configure your Shiny app to connect to a database, two of the top priorities are deciding how you will establish the connection and how you will protect your credentials. We have recommendations for these topics and more at db.rstudio.com.

If you find that just pulling data from the database and processing it in the Shiny app is too slow, you may want to consider adopting a design pattern for using big data from R.

Pins #

Pins is an R package that allows for easy storage and retrieval of data, models, and other R objects. Pins can be a good choice when you don’t have write access to a database or when the data you’re trying to save is
something like a model that won’t fit nicely into most databases.

Pins is easy to use from both the development environment and the deployed environment. You can create a pin with the pins::pin command and retrieve the data with pins::pin_get. The pins page has more details on how to use pins.

Here is an example of how to use pins with either a Shiny app or Plumber API.

Persistent Storage #

Shiny apps on RStudio Connect can use the server’s file system to store data. In general, using a database or a pin is going to be a less fragile workflow than using persistent storage on RStudio Connect. The main reason you might consider persistent storage over a pin or database is that it may be faster.

If you’re using persistent storage, you must manually create the directory tree to the location you want to use, and must ensure that permissions are set correctly.1 Additionally, unless you’re using a directory mounted to the same location in both the development environment and to RStudio Connect, it can be hard to test your code outside of production.

Here are some instructions for configuring a Shiny app to use persistent storage on RStudio Connect.

How does the Shiny app get or update data? #

Shiny apps work entirely on a “pull” model, so once the data is updated at the source, your Shiny app will need to find out to pull the updated data.

Read From Disk #

For apps where the data is bundled and uploaded with the app or lives on persistent storage, you can read the data from disk.

If the data is bundled with the app at deployment time, you can use a relative file path. For example, if your app directory looks like this:


You could load the data with a relative file path like read.csv('./data/data.csv').

If your data is loaded to persistent storage elsewhere on the RStudio Connect server, you should access the data with an absolute file path.

Live Connections #

It’s often possible to architect Shiny apps to use live connections to other resources. For example, if your app lets users select a data sample, filter it, and visualize their selection, you could architect the app to (1) pull all the data possible at start up and filter on user input or (2) using a live connection to pull only the data it needs based on user input.

If the data in your app is small enough, the second option has the advantage of ensuring the data in the Shiny app is always up-to-date, along with reducing app startup times.

Most live connections are either directly to a database or to a plumber API that does the data filtering.

If you’re using your Shiny app to do something that takes a while like save a big file or run a model, it is probably a good idea to use the Shiny app as a trigger for a Plumber API. Doing so avoids freezing the Shiny app session while it does your computation. Shiny async is also an option, but generally makes app code harder to read relative to using a Plumber API.

Shiny Data Reactive #

If your app is a data consumer and you’re not using a live connection, your app will need to refresh itself.

If you want to write a general data-pulling function, shiny::reactivePoll allows you to periodically check a resource for changes and run an arbitrary function if it has. You can also use shiny::invalidateLater to invalidate a reactive on a schedule.

There are also several useful functions that are specially designed to schedule data refreshes. If you’re using a pin, pins::pin_reactive allows you to check for updates to a pin on a schedule. Similarly, shiny::reactiveFileReader allows you to check for updates to a file on persistent storage.

How can I speed up initial data loading? #

The most straightforward way to load data faster is to do less! In general, reducing the amount of data that is transmitted will have the biggest impact on load time. That means that pre-aggregating the data to be loaded is a great first step. For many Shiny apps, this will prove sufficient for loading quickly.

On RStudio Connect, you can also use the Minimum Processes setting to reduce user waiting time. By setting this number to 1 or more, you ensure that there’s always a “warm” R process with the app loaded when users request a session.

Understanding how much this will help requires understanding Shiny’s scoping rules. In general, anything inside the server function will be loaded for each session and so setting the Minimum Processes will have no effect. Anything that appears before the server block will be loaded only once per R process, and having a “prewarmed” session will reduce user waiting time.

If you are prewarming sessions, it’s important to consider how data in the session will be reloaded. Data reactives to check for updated data can only run inside a Shiny server block and therefore will only run when there’s an active user session.

If you wish to check for updated data while also loading data in the global context, you’ll need to combine the Shiny data reactive with the global assignment operator (<<-) to assign data back into the global environment.

  1. The user who runs the content will need access. You can check which user a piece of content will run as under the Access tab on the content in RStudio Connect under Who runs this content on the server. ↩︎