Usage Tracking

RStudio Connects provides access logs for most types of assets that can be published to the server. They can be accessed via the RStudio Connect Server API. Look-up information, such as user and application metadata is also available through specific endpoints of the same RStudio Connect Server API.

More information about what is possible to do with the RStudio Connect Server API, how to access and how to interact with it can be found in the official reference guide here: RStudio Connect API Reference. The sections the pertain to tracking content usage is found under the Instrumentation section of the same guide: RStudio Connect API Reference - Instrumentation

The code to create the dashboard pictured in this section is available in a GitHub repository here: Connect Usage

This article is a walk-through of how to access and wrangle the instrumentation data using R. It aims at providing code that is generic enough so that you can copy/paste it into your own R session and run it successfully.

Setup

In order for the following code to work in your environment, you need two pieces of information unique to your enterprise:

  • RStudio Connect’s server path
  • An RStudio Connect API Key

For this example, the path to the server is loaded to a variable called rsc_server

rsc_server <- "http://my_connect_server:3939/"

To avoid having the API Key show up in the R History, or in the Environment as a variable, it’s loaded to a Environment Variable called RSTUDIO_CONNECT_API_KEY.

Sys.setenv("RSTUDIO_CONNECT_API_KEY" = rstudioapi::askForPassword("Enter Connect Token:")) 

R is able to interact with the RStudio Connect Server API via the httr package. Use the GET() function to send the API call. The add_headers() function is used to properly append your API key to the call. The response is then parsed into an R list object using the content() command. All of the steps are put into a convenience function so that we can easily make multiple API calls, without having to re-write the same code over and over.

library(httr)

rsc_get <- function(endpoint_call){
  rsc_call <- paste0(rsc_server, "__api__/v1/", endpoint_call)
  rsc_auth <- add_headers(Authorization = paste("Key", Sys.getenv("RSTUDIO_CONNECT_API_KEY")))
  resp <- GET(rsc_call, rsc_auth)
  content(resp)
}  

Get User List

Let’s start by pulling a list of all of the users in the server. To do that, we access an endpoint in the RStudio Connect Server API called users. That endpoint is documented here: RStudio Connect API Reference - getUsers.

Using the rsc_get() functions we created in the previous section, we just called use “users” as the argument. That will return an R list object with the following items:

  • results - A list object containing the actual requested data
  • current_page - The current subsection of the results. The next section will expand more on this
  • total - The number of total results.
rsc_users <- rsc_get("users")  
names(rsc_users)
## [1] "results"      "current_page" "total" 

In our case, rsc_users$total returns 65. That is the number of total users in this server.

rsc_users$total
## [1] 65

These are the fields available within the results sub-list (rsc_users$results):

names(rsc_users$results[[1]])
##  [1] "email"        "username"     "first_name"   "last_name"   
##  [5] "user_role"    "created_time" "updated_time" "active_time" 
##  [9] "confirmed"    "locked"       "guid"      

Convert API response to tibble

Each item in the users list represents an user entry. We can use purrr to convert the list into a table. The map_dfr() function will allow us to pick and choose which variables we wish to retain inside a tibble object.

library(purrr)
library(dplyr)

users <- map_dfr(
  rsc_users$results, 
  ~ tibble(
    user_guid = .x$guid, 
    user_name = .x$username, 
    user_role = .x$user_role
    )
  )

glimpse(users)
## Observations: 20
## Variables: 3
## $ user_guid <chr> "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx", "xxxxxxxx-xxx…
## $ user_name <chr> "admin", "alan", "aron", "brenna", "curtis", "hadrian…
## $ user_role <chr> "viewer", "publisher", "viewer", "viewer", "administr…

NOTE: For the purpose of this article, we are masking the GUIDs from the printed results.

Shiny usage

After introducing several concepts and coding techniques, we can move on to retrieving usage history. It is best practice to set a limit of the number of days worth of data to download from the server. To pass a time frame relative to the current date, simply pass the from argument to the API call. The API will assume that the to argument is “today”. The correct format to pass the date/time is shown in the code below for what is assigned to the five_days variable. The endpoint that we will use is called instrumentation/shiny/usage, it is documented here: RStudio Connect API Reference - Shiny Usage.

five_days <- paste0(Sys.Date() - 5, "T00:00:00-00:00") 
rsc_shiny <- rsc_get(paste0("instrumentation/shiny/usage?from=", five_days))

The list object returned by this endpoint has two items:

  • results - A list object containing the actual requested data
  • pagination - A list that has the pagination navigational information

The data returned by the results item are the following:

names(rsc_shiny$results[[1]])
## [1] "content_guid" "user_guid"    "started"      "ended"        "data_version"

Using the same technique as before, we map the desired data elements into a tibble using map_dfr().

shiny_usage <- map_dfr(
  rsc_shiny$results,
  ~ tibble(
      guid = .x$content_guid,
      user = ifelse(is.null(.x$user_guid), "anonymous", .x$user_guid),
      started = .x$started,
      ended = .x$ended,
      ver = .x$data_version
  )
)
glimpse(shiny_usage)
## Observations: 7
## Variables: 5
## $ guid    <chr> "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx", "xxxxxxxx-xxxx-…
## $ user    <chr> "anonymous", "anonymous", "anonymous", "anonymous", "0b…
## $ started <chr> "2019-08-28T18:34:17Z", "2019-08-28T18:59:35Z", "2019-0…
## $ ended   <chr> "2019-08-28T18:34:53Z", "2019-08-28T19:00:23Z", "2019-0…
## $ ver     <int> 1, 1, 1, 1, 1, 1, 1

Pagination

Pagination for the Shiny usage results works differently than it does with the users endpoint. In the Shiny usage response object, the next page’s full URL is encoded inside an item in the list.

rsc_shiny$paging$`next`
[1] "http://my_connect_server:3939/__api__/v1/instrumentation/shiny/usage?asc_order=true&from=2019-08-23T00%3A00%3A00Z&limit=20&next=105138"

The data from rsc_shiny$paging$next will be used as both the direct URL and the mechanism to determine if there are more pages to download. The rsc_get() function cannot be used in this case because it concatenates several variables to create the API call’s URL. So, the GET() and content() functions will be used again for this case. In this case, make sure to avoid loading the API key in a variable because we are not using a ephemeral function call.

while(!is.null(rsc_shiny$paging$`next`)) {
  rsc_get <- GET(
    rsc_shiny$paging$`next`, 
    add_headers(Authorization = paste("Key", Sys.getenv("RSTUDIO_CONNECT_API_KEY")))
    )
  rsc_shiny <- content(rsc_get)
  c_shiny <- map_dfr(
    rsc_shiny$results,
    ~ tibble(
      guid = .x$content_guid,
      user = ifelse(is.null(.x$user_guid), "anonymous", .x$user_guid),
      started = .x$started,
      ended = .x$ended,
      ver = .x$data_version
    )
  )
  shiny_usage <- bind_rows(shiny_usage, c_shiny)
}
shiny_usage
## # A tibble: 132 x 5
##    guid                      user     started         ended            ver
##    <chr>                     <chr>    <chr>           <chr>          <int>
##  1 xxxxxxxx-xxxx-xxxx-xxxx-… anonymo… 2019-08-24T01:… 2019-08-24T01…     1
##  2 xxxxxxxx-xxxx-xxxx-xxxx-… anonymo… 2019-08-24T01:… 2019-08-24T01…     1
##  3 xxxxxxxx-xxxx-xxxx-xxxx-… anonymo… 2019-08-24T11:… 2019-08-24T11…     1
##  4 xxxxxxxx-xxxx-xxxx-xxxx-… xxxxxxx… 2019-08-24T11:… 2019-08-24T11…     1
##  5 xxxxxxxx-xxxx-xxxx-xxxx-… anonymo… 2019-08-24T12:… 2019-08-24T12…     1
##  6 xxxxxxxx-xxxx-xxxx-xxxx-… anonymo… 2019-08-24T23:… 2019-08-24T23…     1
##  7 xxxxxxxx-xxxx-xxxx-xxxx-… anonymo… 2019-08-24T23:… 2019-08-24T23…     1
##  8 xxxxxxxx-xxxx-xxxx-xxxx-… anonymo… 2019-08-25T00:… 2019-08-25T00…     1
##  9 xxxxxxxx-xxxx-xxxx-xxxx-… anonymo… 2019-08-25T02:… 2019-08-25T02…     1
## 10 xxxxxxxx-xxxx-xxxx-xxxx-… anonymo… 2019-08-25T04:… 2019-08-25T04…     1
## # … with 122 more rows

Calculate session length

The shiny_usage table contains a start and end date/time for each session. To know how long each session was, it is necessary to operate them. The lubridate package simplifies this operation. The elements of the operation break-down in the following manner:

  1. Coerce the started and ended fields to a Date/Time object using as_datetime()
  2. Obtain the difference using the %--% operator. This returns a time interval object
  3. Dividing the time duration object by seconds() translates the interval to number of seconds


library(lubridate)

shiny_usage %>%
  mutate(
    session_length = as_datetime(started) %--% as_datetime(ended) / seconds()
    ) %>%
  select(started, ended, session_length) %>%
  head()
## # A tibble: 6 x 3
##   started              ended                session_length
##   <chr>                <chr>                         <dbl>
## 1 2019-08-23T01:30:10Z 2019-08-23T01:31:04Z             54
## 2 2019-08-23T02:45:52Z 2019-08-23T02:47:36Z            104
## 3 2019-08-23T04:14:04Z 2019-08-23T04:15:14Z             70
## 4 2019-08-23T05:08:51Z 2019-08-23T05:09:13Z             22
## 5 2019-08-23T05:09:06Z 2019-08-23T05:53:51Z           2685
## 6 2019-08-23T07:02:08Z 2019-08-23T07:02:52Z             44

Further reading

The RStudio Connect: Server API Cookbook is a collection of practical examples for interacting with RStudio Connect Server API via code.