

googleAnalyticsR v1.0.0 - GA4 API, Automatic Shiny Dashboarding, Improved UI
source link: https://code.markedmondson.me/googleanalyticsr-100-release/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Google Analytics 4 support
There is a video!
So yeah, basically you can now get GA4 in R, which in a couple of years will probably be the majority of digital analytics systems on the web.
I have to also thank Hadley for his drive-by issue on GitHub which pushed me to create a domain specific language (DSL) for googleAnalyticsR’s GA4 functions. The main benefit should be an intuitive UI you don’t need to look up all the time - e.g. unlike filter_clause_ga4()
- I wrote the code and still need to look up how it works sometimes!
See a comparison on the same filters:
In the older reporting API you have to construct the filter object yourself with three different functions:
# reporting v4 filter
mf <- met_filter("bounces", "GREATER_THAN", 0)
mf2 <- met_filter("sessions", "GREATER", 2)
df <- dim_filter("source","BEGINS_WITH","1",not = TRUE)
df2 <- dim_filter("source","BEGINS_WITH","a",not = TRUE)
fc2 <- filter_clause_ga4(list(df, df2), operator = "AND")
fc <- filter_clause_ga4(list(mf, mf2), operator = "AND")
ga_v4 <- google_analytics(ga_id,
date_range = c("2015-07-30","2015-10-01"),
dimensions='source',
metrics = 'sessions',
met_filters = fc,
dim_filters = fc2)
In GA4, there is a more intuitive interface using a more tidyverse-like NSE:
fc2 <- ga_data_filter(!source %begins% "1" & !source %begins% "a")
fc <- ga_data_filter(bounces > 0 & sessions > 2)
ga4_data <- ga_data(ga4_id,
date_range = c("2015-07-30","2015-10-01"),
dimensions='sessionSource',
metrics = 'sessions',
met_filters = fc,
dim_filters = fc2)
Workflows with GA4
Below is an example workflow using the GA4 APIs - the use case here is to analyse the first and last touch of users. It incorporates some of the new features of the API, such as the filters, creating your own dimensions and ordering the results. Personally I find it much easier to use, so it encourages more ‘flow’ in my analysis of the data.
library(googleAnalyticsR)
# ✓ Setting client.id from /Users/mark/clients/my_client.json
ga_auth()
# ℹ 2021-02-21 16:23:02 > Using email from GARGLE_EMAIL env var: [email protected]
# I've forgot my GA4 property Id so lets fetch them
al <- ga_account_list("data")
## Auto-refreshing stale OAuth token.
# ah yes this is my website
my_id <- 206670707
# what metrics/dimensions can I fetch again? Include my custom fields
custom_meta <- ga_meta("data", propertyId = my_id)
## ℹ 2021-04-17 13:57:39 > Metadata for propertyId 206670707
## ℹ 2021-04-17 13:57:40 > Caching new meta data
# last year
last_year <- c("2020-01-01","2021-01-01")
# I want to cross reference first user source and last user source
ga_data(my_id,
metrics = "sessions",
date_range = last_year,
dimensions = c("firstUserMedium","sessionMedium"))
## ℹ 2021-04-17 13:57:41 > Downloaded [ 29 ] of total [ 29 ] rows
## # A tibble: 29 x 3
## firstUserMedium sessionMedium sessions
## <chr> <chr> <dbl>
## 1 organic organic 17557
## 2 (none) (none) 5750
## 3 referral referral 3982
## 4 organic referral 1165
## 5 (none) referral 951
## 6 (none) organic 656
## 7 referral organic 362
## 8 email email 246
## 9 organic (not set) 148
## 10 (none) (not set) 127
## # … with 19 more rows
# what about only users from twitter? First check the sources
ga_data(my_id,
metrics = NULL, # to only return unique dimensions
date_range = last_year,
dimensions = "firstUserSource")
## ℹ 2021-04-17 13:57:43 > Downloaded [ 100 ] of total [ 193 ] rows
## # A tibble: 100 x 1
## firstUserSource
## <chr>
## 1 (direct)
## 2 127.0.0.1:17923
## 3 127.0.0.1:19409
## 4 127.0.0.1:20644
## 5 127.0.0.1:21594
## 6 127.0.0.1:23734
## 7 127.0.0.1:23960
## 8 127.0.0.1:25447
## 9 127.0.0.1:25537
## 10 127.0.0.1:28513
## # … with 90 more rows
# oh dear lots of ip address - lets filter those out
ip_filter <- ga_data_filter(!firstUserSource %begins% "127.0.0.1")
ga_data(
my_id,
metrics = NULL,
date_range = last_year,
dimensions = "firstUserSource",
dim_filters = ip_filter, # apply filter
limit = -1) # all results
## ℹ 2021-04-17 13:57:44 > Downloaded [ 167 ] of total [ 167 ] rows
## # A tibble: 167 x 1
## firstUserSource
## <chr>
## 1 (direct)
## 2 185.199.111.153:6080
## 3 34.66.89.215
## 4 ActiveCampaign
## 5 ConversionWorksBlog
## 6 Data_Elixir_222
## 7 Iterable
## 8 aboutdata.atlassian.net
## 9 adelachrist.top
## 10 affecheafe
## # … with 157 more rows
# twitter filter: list shows 't.co', 'twitter' and 'twitter.com'
twit <- ga_data_filter(firstUserSource == c("twitter","twitter.com","t.co"))
# did anyone who first arrived by Twitter later come back via other channels?
ga_data(
my_id,
metrics = "sessions",
date_range = last_year,
dimensions =
c("firstUserSource",
lastMediumSource = "sessionMedium/sessionSource"), #dynamic dimension
dim_filters = twit)
## ℹ 2021-04-17 13:57:45 > Downloaded [ 15 ] of total [ 15 ] rows
## # A tibble: 15 x 3
## firstUserSource lastMediumSource sessions
## <chr> <chr> <dbl>
## 1 t.co referral/t.co 801
## 2 t.co referral/code.markedmondson.me 73
## 3 t.co organic/google 41
## 4 t.co (none)/(direct) 8
## 5 t.co referral/disq.us 7
## 6 twitter social/twitter 7
## 7 t.co referral/github.com 4
## 8 t.co (not set)/(not set) 2
## 9 t.co social/meetedgar.com 2
## 10 t.co referral/alexbirkett.com 1
## 11 t.co referral/linkedin.com 1
## 12 t.co referral/mark.shinyapps.io 1
## 13 t.co referral/markedmondson.me 1
## 14 t.co referral/rweekly.org 1
## 15 twitter.com social/twitter.com 1
# lets take out self referrals and those coming back by twitter
me <- ga_data_filter(!sessionSource %contains% "markedmondson.me")
nottwit <- ga_data_filter(!sessionSource == c("twitter","twitter.com","t.co"))
# and change the metrics/dimensions
ga_data(
my_id,
metrics = c("engagementRate","activeUsers"),
date_range = last_year,
dimensions =
c("unifiedScreenName",
lastMediumSource = "sessionMedium/sessionSource"), #dynamic dimension
dim_filters = ga_data_filter(twit & me & nottwit), # combine filters
orderBys = ga_data_order(-activeUsers) # order by activeUsers metric
)
## ℹ 2021-04-17 13:57:46 > Downloaded [ 40 ] of total [ 40 ] rows
## # A tibble: 40 x 4
## unifiedScreenName lastMediumSource engagementRate activeUsers
## <chr> <chr> <dbl> <dbl>
## 1 Mark Edmondson organic/google 1 6
## 2 Welcome to bigQueryR organic/google 0.5 5
## 3 Launch RStudio Server in the Goo… organic/google 0.5 4
## 4 Google Tag Manager Server Side o… organic/google 0.4 4
## 5 Introducing googleCloudRunner - … organic/google 0.4 4
## 6 Shiny on Google Cloud Run - Scal… (none)/(direct) 1 3
## 7 Five Ways to Schedule R scripts … organic/google 0.8 3
## 8 BigQuery Queries organic/google 0 3
## 9 Google Tag Manager Server Side o… (none)/(direct) 1 2
## 10 Turning GA360 BigQuery exports i… organic/google 1 2
## # … with 30 more rows
So for Twitter first time referrals, it seems most people used Google to come back to visit my home page, which is not surprising.
Sending in Measurement Protocol hits and seeing them in real-time
The Measurement Protocol v2 for GA4 is less open and so can’t be spammed. It works using a server key so it seems useful to include in the package. Thank Martijn van Vreeden for the encouragement in making this feature.
googleAnalyticsR
includes functions to call the measurement Protocol API, but it itself depends on a smaller library (new on CRAN v0.1.0) dedicated to measurementProtocol hits. This is so you can use the measurement protocol without needing to import all the other dependencies googleAnalyticsR
uses, which can be quite heavy as its focused on usability of the data.
Once you have the measurement ID and stream secret then you can send in events associated with a clientId like so:
# preferably set this in .Renviron - take from the webUI
Sys.setenv(GA_MP_SECRET="MY_SECRET")
# your GA4 settings
my_measurement_id <- "G-43MDXK6CLZ"
my_connection <- ga_mp_connection(my_measurement_id)
a_client_id <- ga_mp_cid()
event <- ga_mp_event("an_event")
ga_mp_send(event, a_client_id, my_connection, debug_call = TRUE)
As we also have the real-time API incorporated, it can be quickly verified which hits are going in:
This has wide applications to send in analytics hits with R, such as adding customer segments to GA4 BigQuery exports using R to model the clusters.
It is also compatible with GTM server-side, which can hold a measurement protocol client. Simply change your mp_connection
object to point at your GTM endpoint instead:
# custom GTM server side endpoint
my_custom_connection <- mp_connection(
my_measurement_id,
endpoint = "https://gtm.example.com",
preview_header = "ZW52LTV8OWdPOExNWFkYjA0Njk4NmQ="
)
This holds many possibilities for enriching your data streams.
Tracking usage of googleAnalyticsR
One use case for the measurement protocol I have implemented is using it to track usage of the googleAnalyticsR
library itself. I asked and got feedback that the digital analytics community using the library are ok with tracking if its opt-in and no personal data.
This gives birth then to the ga_trackme()
function which allows people to opt-in to tracking each time the library is loaded via the ga_trackme_event()
function. Users can see what is sent by calling ga_trackme_event(debug_call=TRUE)
:
ℹ 2021-03-05 09:36:46 > MP Request: https://www.google-analytics.com/debug/mp/collect?measurement_id=G-XXX&api_secret=_XXXXXX
{
"client_id": "XXXXXX",
"non_personalized_ads": true,
"events": {
"name": "googleanalyticsr_loaded",
"params": {
"r_version": "R version 3.6.3 (2020-02-29)",
"r_platform": "x86_64-apple-darwin15.6.0 (64-bit)",
"r_locale": "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8",
"r_system": "macOS 10.16",
"package": "googleAnalyticsR 0.8.0.9000"
}
}
}
ℹ 2021-03-05 09:36:46 > Response: 200
And I thought it would be fun for people to have the option of sending their own messages to my GA4 account so it includes a say_hello
argument:
ga_trackme_event(debug_call=TRUE, say_hello = "Hello from Denmark!")
...
{
"client_id": "XXXXX",
"non_personalized_ads": true,
"events": {
"name": "googleanalyticsr_loaded",
"params": {
...
"say_hello": "Hello from Denmark!",
...
}
}
}
I can use the info to see what systems and versions of the library are being used over time.
Recommend
-
24
Far from being a Docker expert, I nevertheless tried recently to build my own Docker containers, as it makes it so much easier to transfer apps between operating systems (os). However, every start is challenging and henc...
-
36
(This article was first published on R – Statistical Graphics and more , and kindly contributed toR-bloggers)...
-
62
Introduction Because of my marketing background, finding information hiding wihtin a marketing dataset is always an interesting topic to me. It makes me feel a sense of accomplishment when I cleaned up a ver...
-
47
Known user identity when building shiny apps can sometimes come really handy. While you can implement your own user login, for instance using coo...
-
73
Google Sheets in R Shiny — a happy maRriage? Google Sheets with Rshiny as an alternative to rhandsontable Dr. J...
-
15
README.md
-
40
All the code supporting this article can be forklifted from this MatrixDS project
-
46
Photo by rawpixel.com What makes software de...
-
15
Dashboarding with JupyterLab 3Project Jupyter offers a complete suite of open-source tools for the scientific computing community, reaching from the exploratory phase of a project to the presentation of the resul...
-
5
How to Extract Numerical Data from a Web Page for Dashboarding and AlertingLive OpenSearch Online Training starting on October 12! See...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK