Diffable Data Science: A Demo

Data science is diffable if changes between versions and over time are easy to examine. Diffability makes data science more impactful because it improves confidence that mistakes haven’t been introduced and makes it easier to audit the chain of decisions leading to the current state. Additionally, the current version can be changed in small ways, and the changes inspected and adopted if they improve the status quo.

Code is inherently diffable, and using version control tools like git on R and Python code helps enhance that diffability. With the git deployment feature of RStudio Connect, diffable code becomes diffable content, so it’s easy to demonstrate what changes new code will have.

For organizations using git, there are a wide variety of strategies to manage code branching and versioning. One popular strategy that works particularly well with RStudio Connect is to maintain a long-running master branch that is in production, a long-running dev branch for testing, and various feature branches with new features.

Git branching, showing adding two features to dev and then merging to master.

Using Git Deployment on RStudio Connect, it’s easy to have both the master and dev branches deployed simultaneously, and to be able to see differences between them.

The Bike Prediction App #

The Bike Prediction app displays the number of bikes predicted to be at the various docks of Washington DC’s bikeshare program in the near future.

In this app, the user can click on a dock on the map (built using the leaflet package) and get the predicted number of bikes at that station in the near future in the bottom half of the page.

The production bike prediction app.

The Dev Branch #

Someone suggested that maybe the app should be changed to purple, which has been deployed to the dev branch.

Since the content is deployed to RStudio Connect from Git, the repository branch can be linked with the deployment. So in addition to the production app, the dev branch containing a new feature can be deployed and tested in a self-contained application on the same RStudio Connect server, or a separate testing instance.

In the screenshot below, both the dev and master branches are deployed to the same RStudio Connect instance, so the difference between the two apps is apparent in the thumbnails on the home page.1

The RStudio Connect home screen with the prod app in red and the dev app in purple.

While this is a relatively superficial and simple change, more complicated changes, like changing a model, adding new parameters, or adding components to a Shiny app can similarly be diff-ed both in code and in deployment using RStudio Connect.

  1. Note that the thumbnails are not updated automatically with git-backed deployment. ↩︎

Table of Contents